Re: Is 8.8.x going be stabilized and finalized?

2021-02-22 Thread S G
Hey Subhajit,

Can you share briefly what issues are being seen with 8.7+ versions?
We are planning to move a big workload from 7.6 to 8.7 version.

We created a small load-testing tool for sanitizing new Solr versions and
that showed throughput of traffic decreasing much more than Solr 7.6 as we
loaded more and more data in both the versions.
So we are a bit concerned if we should make this move or not.
If 8.7 has some grave blockers (fetaures or performance) known already,
then we will probably hold off on making the move.

Regards
SG

On Wed, Feb 17, 2021 at 11:58 AM Subhajit Das 
wrote:

> Hi Shawn,
>
> Nice to know that Solr will be considered top level project of Apache.
>
> I asked based on earlier 3 version patterns. Just hoping that 8.8 would be
> long term stable, kind of like 7.7.x line-up.
>
> Thanks for the clarification.
>
> Regards,
> Subhajit
>
> From: Shawn Heisey
> Sent: 17 February 2021 09:33 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is 8.8.x going be stabilized and finalized?
>
> On 2/16/2021 7:57 PM, Subhajit Das wrote:
> > I am planning to use 8.8 line-up for production use.
> >
> > But recently, a lot of people are complaining on 8.7 and 8.8. Also,
> there is a clearly known issue on 8.8 as well.
> >
> > Following trends of earlier versions (5.x, 6.x and 7.x), will 8.8 will
> also be finalized?
> > For 5.x, 5.5.x was last version. For 6.x, 6.6.x was last version. For
> 7.x, 7.7.x was last version. It would match the pattern, it seems.
> > And 9.x is already planned and under development.
> > And it seems, we require some stability.
>
> All released versions are considered stable.  Sometimes problems are
> uncovered after release.  Sometimes BIG problems.  We try our very best
> to avoid bugs, but achieving that kind of perfection is nearly
> impossible for any software project.
>
> 8.8.0 is the most current release.  The 8.8.1 release is underway, but
> there's no way I can give you a concrete date.  The announcement MIGHT
> come in the next few days, but it's always possible it could get pushed
> back.  At this time, the changelog for 8.8.1 has five bugfixes
> mentioned.  It should be more stable than 8.8.0, but it's impossible for
> me to tell you whether you will have any problems with it.
>
> On the dev list, the project is discussing the start of work on the 9.0
> release, but that work has not yet begun.  Even if it started tomorrow,
> it would be several weeks, maybe even a few months, before 9.0 is
> actually released.  On top of the "normal" headaches involved in any new
> major version release, there are some other things going on that might
> further delay 9.0 and future 8.x versions:
>
> * Solr is being promoted from a subproject of Lucene to it's own
> top-level project at Apache.  This involves a LOT of work.  Much of that
> work is administrative in nature, which is going to occupy us and take
> away from time that we might spend working on the code and new releases.
> * The build system for the master branch, which is currently versioned
> as 9.0.0-SNAPSHOT, was recently switched from Ant+Ivy to Gradle.  It's
> going to take some time to figure out all the fallout from that migration.
> * Some of the devs have been involved in an effort to greatly simplify
> and rewrite how SolrCloud does internal management of a cluster.  The
> intent is much better stability and better performance.  You might have
> seen public messages referring to a "reference implementation."  At this
> time, it is unclear how much of that work will make it into 9.0 and how
> much will be revealed in later releases.  We would like very much to
> include at least the first phase in 9.0 if we can.
>
>  From what I have seen over the last several years as one of the
> developers on this project, it is likely that 8.9 and possibly even 8.10
> and 8.11 will be released before we see 9.0.  Releases are NOT made on a
> specific schedule, so I cannot tell you which versions you will see or
> when they might happen.
>
> I am fully aware that despite typing quite a lot of text here, that I
> provided almost nothing in the way of concrete information that you can
> use.  Sorry about that.
>
> Thanks,
> Shawn
>
>


Re: impressive improvement in documentation

2020-12-28 Thread S G
One good way to achieve this is to use an alias 'current'
So the 'current' replaces the latest version string like 7.1 and 8.6 in the
URL.
And Google will most likely show the current version on top always as it
will see it as changing frequently, so will think that it is something more
important to keep track of.


On Fri, Dec 11, 2020 at 2:42 AM Arturas Mazeika  wrote:

> Hi Solr fans,
>
> I am impressed to see that documentation of solr improves so nicely over
> time. If one compares the 7.1 version of json api with the current (8.7)
> one sees that additional fields are documented:
>
> https://lucene.apache.org/solr/guide/8_7/json-request-api.html
> Query parametersJSON field equivalent
> q query
> fq filter
> start offset
> rows limit
> fl fields
> sort sort
> json.facet facet
> json. 
> json.queries. queries
>
> https://lucene.apache.org/solr/guide/7_1/json-request-api.html
> Query parameters JSON field equivalent
>
> q
>
> query
>
> fq
>
> filter
>
> offset
>
> start
>
> limit
>
> rows
>
> sort
>
> sort
>
> json.facet
>
> facet
>
> Now, the issue is that if one searches in google for json api, one is
> redirected often to older versions of the documentation (7.1 as of time of
> writing this email. This is not anything new. If I search for sql server of
> postgres documentation, I get redirected to a more frequently browsed older
> versions, however postgres immediately gives a possibility to jump to older
> or newer version of the description from the same page:
>
> https://www.postgresql.org/docs/9.1/tutorial-window.html
>
> with the following shown at the top of the page:
> Documentation  → PostgreSQL 9.1
> 
> Supported Versions: Current
>  (13
> ) / 12
>  / 11
>  / 10
>  / 9.6
>  / 9.5
> 
> Development Versions: devel
> 
> Unsupported versions: 9.4
>  / 9.3
>  / 9.2
>  / 9.1
>  / 9.0
>  / 8.4
> 
>
> I wonder, how hard would be to adjust the content management system, so one
> can search with google, get redirected to solr page, and then with one
> click away get to the latest version of documentation to learn, and jump
> back to the installed version of solr, to see if the function maybe is not
> yet available?
>
> Cheers,
> Arturas
>


Recommended version of zookeeper for latest 8.7 or upcoming 8.8 version

2020-12-28 Thread S G
Hello,

What version of zookeeper is recommended for latest 8.7 or upcoming 8.8
version ?
Can we use the 3.6.2 version of zookeeper?

Thanks!


Getting rid of zookeeper

2020-06-09 Thread S G
Hello,

I recently stumbled across KIP-500: Replace ZooKeeper with a Self-Managed
Metadata Quorum

Elastic-search does this too.
And so do many other systems.

Is there some work to go in this direction?
It would be nice to get rid of another totally disparate system.
Hardware savings would be nice to have too.

Best,
SG


Re: Solr Ref Guide Redesign coming in 8.6

2020-05-04 Thread S G
I really like the docs and version selecting provided by Graylog
https://docs.graylog.org/en/3.2/

It says this on the bottom:
Built with Sphinx  using a theme
 provided by Read the Docs
.

I do not have any experience with Sphinx or Read the Docs.

Here is the version selector and download menu:
[image: Screen Shot 2020-05-04 at 8.21.52 AM.png]



On Wed, Apr 29, 2020 at 11:30 AM Cassandra Targett 
wrote:

> > This design still has a minor annoyance that I have noted in the past:
> > in the table of contents pane it is easy to open a subtree, but the
> > only way to close it is to open another one. Obviously not a big
> > deal.
>
> Thanks for pointing that out, it helped me find a big problem which was
> that I used the wrong build of JQuery to support using the caret to
> open/close the subtree. It should work now to open a subtree independently
> of clicking the heading, and should also close the tree.
> > I'll probably spend too much time researching how to widen the
> > razor-thin scrollbar in the TOC panel, since it seems to be
> > independent of the way I spent too much time fixing the browser's own
> > inadequate scrollbar width. :-) Also, the thumb's color is so close to
> > the surrounding color that it's really hard to see. And for some
> > reason when I use the mouse wheel to scroll the TOC, when it gets to
> > the top or the bottom the content pane starts scrolling instead, which
> > is surprising and mildly inconvenient. Final picky point: the
> > scrolling is *very* insensitive -- takes a lot of wheel motion to move
> > the panel just a bit.
>
> I’m not totally following all of this, but if I assume you mean the left
> sidebar navigation (and not an in-page TOC) then my answer to at least part
> of it is to pare down the list of top-level topics so you don’t have to
> scroll it at all and then the only scrolling you need to do is for the
> content itself. That’s what I want to do in Phase 2, so there are several
> things in the behavior of the sidebar I’m purposely ignoring for now. Some
> will go away with a new organization and new things will be introduced that
> will need to be fixed, so to save myself some time I’m waiting to fix all
> of it at once.
>


Is Banana deprecated?

2020-04-16 Thread S G
Hello,

I still see releases happening on it:
https://github.com/lucidworks/banana/pull/355

So it is something recommended to be used for production?

Regards,
SG


Re: Solrcloud 7.6 OOM due to unable to create native threads

2020-03-31 Thread S G
One approach could be to buffer the messages in Kafka before pushing to
Solr.
And then use "Kafka mirror" to replicate the messages to the other DC.
Now both DCs' Kafka pipelines are in sync by the mirror and you can run
storm/spark/flink etc jobs to consume local Kafka and publish to local Solr
clusters.
This moves the responsibility of DR-sync to something designed specifically
for this purpose - Kafka mirror.
However do not use more than an year old version of Kafka as they had lot
of issues with mirroring.


On Mon, Mar 30, 2020 at 11:43 PM Raji N  wrote:

> Hi Eric,
>
> What are you recommendations for SolrCloud DR strategy.
>
> Thanks,
> Raji
>
> On Sun, Mar 29, 2020 at 6:25 PM Erick Erickson 
> wrote:
>
> > I don’t recommend CDCR at this point, I think there better approaches.
> >
> > The root problem is that CDCR uses tlog files as a queueing mechanism.
> > If the connection between the DCs is broken for any reason, the tlogs
> grow
> > without limit. This could probably be fixed, but a better alternative is
> to
> > use something designed to insure messages (updates) are delivered to
> > separate DCs rathe than try to have CDCR re-invent that wheel.
> >
> > Best,
> > Erick
> >
> > > On Mar 29, 2020, at 6:47 PM, S G  wrote:
> > >
> > > Is CDCR even recommended to be used in production?
> > > Or it was abandoned before it could become production ready ?
> > >
> > > Thanks
> > > SG
> > >
> > >
> > > On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> What that error usually means is that there are a zillion threads
> > running.
> > >>
> > >> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
> > >> take a look at the thread dump to see if you have lots of
> > >> threads that are running. Any by “lots” here, I mean 100s of threads
> > >> that reference the same component, in this case that have cdcr in
> > >> the stack trace.
> > >>
> > >> CDCR is not getting active work at this point, you might want to
> > >> consider another replication strategy if you’re not willing to fix
> > >> the code.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Mar 29, 2020, at 4:17 AM, Raji N  wrote:
> > >>>
> > >>> Hi All,
> > >>>
> > >>> We running solrcloud 7.6  (with the patch #
> > >>>
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> > >>> production on 7 hosts in  containers. The container memory is 48GB ,
> > heap
> > >>> is 24GB.
> > >>> ulimit -v
> > >>>
> > >>> unlimited
> > >>>
> > >>> ulimit -m
> > >>>
> > >>> unlimited
> > >>> We don't have any custom code in solr. We have set up  bidirectional
> > CDCR
> > >>> between primary and secondary Datacenter. Our secondary DC is very
> > >> unstable
> > >>> and many times many instances are down.
> > >>>
> > >>> We get below exception quite often. Is this because the CDCR
> connection
> > >> is
> > >>> broken.
> > >>>
> > >>> WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> > >>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> > >>>
> > >>> java.lang.OutOfMemoryError: unable to create new native thread
> > >>>
> > >>>  at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> > >>>
> > >>>  at java.lang.Thread.start(Thread.java:717)
> ~[?:1.8.0_211]
> > >>>
> > >>>  at
> > >>>
> > >>
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > >>> ~[httpclient-4.5.3.jar:4.5.3]
> > >>>
> > >>>  at
> > >>>
> > >>
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > >>> ~[httpclient-4.5.3.jar:4.5.3]
> > >>>
> > >>>  at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
>

Re: Cross DC CloudSolr Client

2020-03-29 Thread S G
Is there any good way of having a load-balancer across two SolrCloud
clusters (version 8.x or 7.x) that are in different regions (like Azure
East and Azure West) ?

Thanks
SG

On Thu, Mar 26, 2020 at 4:53 AM Erick Erickson 
wrote:

> I’ve never even heard of someone trying to put
> different ensembles in the same connection
> string for a single client.
>
> Create N CloudSolrClients, one for each DC.
>
> And why do you want to try to contact individual nodes?
> CloudSolrClient will do that for you.
>
> Best,
> Erick
>
> > On Mar 26, 2020, at 2:38 AM, Lucky Sharma  wrote:
> >
> > Hi all,
> > Just wish to confirm on the cross DC connection situation from the
> > CloudSolrClient.
> > Scenario:
> > We have multiple DC with the same collection data. Can we add the
> zookeeper
> > connect string of the DC's to the cloud SolrClient.
> >
> > Will it work like this:
> > The client will utilise this connection string to fetch the Solr config,
> > from ZK.
> > reading of the connection string will be in a sequence i.e. if the first
> > node itself is available, then that will be used to fetch the
> ClusterState.
> > if not available, the next node will be used.
> >
> > If we put two ZK clusters in one connection string, what will behave with
> > two/multiple leaders since the Zk-clients embedded inside SolrClient?
> > --
> > Warm Regards,
> >
> > Lucky Sharma
> > Contact No :+91 9821559918
>
>


Re: Solrcloud 7.6 OOM due to unable to create native threads

2020-03-29 Thread S G
Is CDCR even recommended to be used in production?
Or it was abandoned before it could become production ready ?

Thanks
SG


On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson 
wrote:

> What that error usually means is that there are a zillion threads running.
>
> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
> take a look at the thread dump to see if you have lots of
> threads that are running. Any by “lots” here, I mean 100s of threads
> that reference the same component, in this case that have cdcr in
> the stack trace.
>
> CDCR is not getting active work at this point, you might want to
> consider another replication strategy if you’re not willing to fix
> the code.
>
> Best,
> Erick
>
> > On Mar 29, 2020, at 4:17 AM, Raji N  wrote:
> >
> > Hi All,
> >
> > We running solrcloud 7.6  (with the patch #
> >
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> > production on 7 hosts in  containers. The container memory is 48GB , heap
> > is 24GB.
> > ulimit -v
> >
> > unlimited
> >
> > ulimit -m
> >
> > unlimited
> > We don't have any custom code in solr. We have set up  bidirectional CDCR
> > between primary and secondary Datacenter. Our secondary DC is very
> unstable
> > and many times many instances are down.
> >
> > We get below exception quite often. Is this because the CDCR connection
> is
> > broken.
> >
> > WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> >
> > java.lang.OutOfMemoryError: unable to create new native thread
> >
> >   at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> >
> >   at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
> >
> >   at
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >   at
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >   at
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> > 14:02:46]
> >
> >   at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > [?:1.8.0_211]
> >
> >   at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > [?:1.8.0_211]
> >
> >   at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > [?:1.8.0_211]
> >
> >   at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > [?:1.8.0_211]
> >
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > [?:1.8.0_211]
> >
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > [?:1.8.0_211]
> >
> >   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
> >
> > Thanks,
> > Raji
>
>


Can Solr-Zookeeper traffic be encrypted in 8.x?

2020-03-20 Thread S G
Hi,

Documentation says that this is not supported
https://lucene.apache.org/solr/guide/8_4/enabling-ssl.html#ssl-with-solrcloud

But most of the ZK-issues mentioned there are resolved or duplicates of
resolved issues.

And Zookeeper has a page on how to use SSL with no mention of any issues.
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide

So is it possible to encrypt Solr-ZK communication in 8.4.1 version?

Thanks
SG


Re: Why does Solr sort on _docid_ with rows=0 ?

2020-03-05 Thread S G
Thanks Hoss. Yes, that jira seems like a good one to fix.
And the variable name definitely does not explain why it will not cause any
sort operation.

-SG

On Mon, Mar 2, 2020 at 10:06 AM Chris Hostetter 
wrote:

> : docid is the natural order of the posting lists, so there is no sorting
> effort.
> : I expect that means “don’t sort”.
>
> basically yes, as documented in the comment right above hte lines of code
> linked to.
>
> : > So no one knows this then?
> : > It seems like a good opportunity to get some performance!
>
> The variable name is really stupid, but the 'solrQuery' variable you see
> in the code is *only* ever used for 'checkAZombieServer()' ... which
> should only be called when a server hasn't been responding to other (user
> initiated requests)
>
> : >> I see a lot of such queries in my Solr 7.6.0 logs:
>
> If you are seeing a lot of those queries, then there are other problems in
> your cluster you should investigate -- that's when/why LBSolrClient does
> this query -- to see if the server is responding.
>
> : >> *path=/select
> : >>
> params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2}
> : >> hits=287128180 status=0 QTime=7173*
>
> that is an abnormally large number of documents to have in a single shard.
>
> : >> If you want to check a zombie server, shouldn't there be a much less
> : >> expensive way to do a health-check instead?
>
> Probably yes -- i've opened SOLR-14298...
>
> https://issues.apache.org/jira/browse/SOLR-14298
>
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Why does Solr sort on _docid_ with rows=0 ?

2020-02-28 Thread S G
So no one knows this then?
It seems like a good opportunity to get some performance!

On Tue, Feb 25, 2020 at 2:01 PM S G  wrote:

> Hi,
>
> I see a lot of such queries in my Solr 7.6.0 logs:
>
>
> *path=/select
> params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2}
> hits=287128180 status=0 QTime=7173*
> On some searching, this is the code seems to fire the above:
>
> https://github.com/apache/lucene-solr/blob/f80e8e11672d31c6e12069d2bd12a28b92e5a336/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBSolrClient.java#L89-L101
>
> Can someone explain why Solr is doing this?
> Note that "hits" is a very large value and is something which could be
> impacting performance?
>
> If you want to check a zombie server, shouldn't there be a much less
> expensive way to do a health-check instead?
>
> Thanks
> SG
>
>
>
>


Why does Solr sort on _docid_ with rows=0 ?

2020-02-25 Thread S G
Hi,

I see a lot of such queries in my Solr 7.6.0 logs:


*path=/select
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2}
hits=287128180 status=0 QTime=7173*
On some searching, this is the code seems to fire the above:
https://github.com/apache/lucene-solr/blob/f80e8e11672d31c6e12069d2bd12a28b92e5a336/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBSolrClient.java#L89-L101

Can someone explain why Solr is doing this?
Note that "hits" is a very large value and is something which could be
impacting performance?

If you want to check a zombie server, shouldn't there be a much less
expensive way to do a health-check instead?

Thanks
SG


Does Solr delete child documents with parents?

2019-09-23 Thread S G
Hi,

Last section of
https://lucene.apache.org/solr/guide/8_0/indexing-nested-documents.html is
a little bit confusing:

To delete a nested document, you can delete it by the ID of the root
document. If you try to use an ID of a child document, nothing will happen
since only root document IDs are considered. If you use Solr’s
delete-by-query APIs, you *have to be careful* to ensure that no children
remain of any documents that are being deleted. *Doing otherwise will
violate integrity assumptions that Solr expects.*

So does this mean if we fire a delete-query matching only parent documents,
then that query will automatically delete the child documents?
1) If no, then why do we have to be careful to ensure no children remain of
any any documents that are being deleted?
2) If yes, then this statement does not make sense: "If you try to use an
ID of a child document, nothing will happen since only root document IDs
are considered"

What is missing ?

Thanks
SG


Re: Discuss: virtual nodes in Solr

2019-07-07 Thread S G
It could be a matter of perspective but the benefit of going from N shards
to N+k shards is just one and that benefit is a huge one IMO.

You need not double your hardware when you have to expand your cluster
"without doing a full re-ingestion".
When you have several tera-bytes of data on a performance-saturated cluster
and you want to scale the cluster for the next 1 TB of data, it is quite
costly to:
1. Go from N shards to 2N shards OR
2. Go from N shards to N+k shards with full re-ingestion of data that can
take more than a week.

Cassandra kind of data-sources have solved this problem very nicely by
allowing incremental addition of hardware.
- You are neither forced to double your hardware.
- Nor are you forced to reload all your data.
(I know the theory that Solr is not a primary datasource and user should be
ready to reload etc but it is time that we begin to add some clauses to
that theory and restrict its usage for "all" contexts since reloading TBs
of data is long and very painful)

So the only benefit of this feature is that it will save both money (on
hardware) and time (by avoiding reloading).
And user will be able to scale for every TB of data by just adding few
shards, which is very economical.

Hosting more than 1 shards on "some" nodes is not good either because then
those nodes will not perform very well.
(Note that problem we had was to scale a performance-saturated cluster for
the next unit of data like TB).

Another great and similar benefit is that it becomes easy to scale for a
burst in data.
Let us say, there is a July-sales or Black-Friday event and we expect that
in these two months, the data will be much more.
So an ability to scale shards up and down during and after such events
would again save a lot of money on the hardware and time.

Cheers,
SG


On Sat, Jun 29, 2019 at 11:30 AM Erick Erickson 
wrote:

> Offhand I suspect this would be an enormous effort, not worth the work.
>
> I agree that double-or-nothing is not terribly convenient, but that said
> since multiple replicas can be hosted on the same node and moved to other
> hardware as needed (oversharding, even for existing collections) there are
> ways to deal with this currently.
>
> There would have to be extraordinary benefits to interest me. And the
> stated benefit so far of being able to expand gradually rather than
> doubling shards isn’t an extraordinary benefit. That effort would come at
> the expense of a lot of other work.
>
> Another way of saying it is that the burden of proof for the benefits is
> on you ;).
>
> Best,
> Erick
>
> > On Jun 28, 2019, at 8:51 PM, Will Martin  wrote:
> >
> > From: S G mailto:sg.online.em...@gmail.com>>
> > Subject: Discuss: virtual nodes in Solr
> > Date: June 28, 2019 at 8:04:44 PM EDT
> > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> > Reply-To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org
> >
> >
> > Hi,
> >
> > Has Solr tried to use vnodes concept like Cassandra:
> > https://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2<
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.datastax.com%2Fdev%2Fblog%2Fvirtual-nodes-in-cassandra-1-2&data=02%7C01%7C%7Cd5e503d4cc6446e4effb08d6fc3c7ff8%7C84df9e7fe9f640afb435%7C1%7C0%7C636973734116277981&sdata=k7kocZQHr342tm8swfyS%2FovYqFfmkHm1rZtlRCS9%2FOo%3D&reserved=0
> >
> >
> > If this can be implemented carefully, we need not live with just
> > shard-splitting alone that can only double the number of shards.
> > With vnodes, shards can be increased incrementally as the need arises.
> > What's more, shards can be decreased too when the doc-count/traffic
> > decreases.
> >
> > -SG
> >
> > +1
> >
> > Carefully? Deliberate would be a better word with this community; imho.
> How about an incubation epic story PMC?
> >
> >
> >
>
>


Discuss: virtual nodes in Solr

2019-06-28 Thread S G
Hi,

Has Solr tried to use vnodes concept like Cassandra:
https://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

If this can be implemented carefully, we need not live with just
shard-splitting alone that can only double the number of shards.
With vnodes, shards can be increased incrementally as the need arises.
What's more, shards can be decreased too when the doc-count/traffic
decreases.

-SG


Re: ZooKeeper for Solr 7.6

2018-12-18 Thread S G
Why don't you try 3.4.13 instead? That's a version newer than 3.4.12

On Tue, Dec 18, 2018 at 12:37 AM Yasufumi Mizoguchi 
wrote:

> Thank you Jan.
>
> I will try it.
>
> Thanks,
> Yasufumi.
>
> 2018年12月18日(火) 17:21 Jan Høydahl :
>
> > That is no problem, doing it myself.
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 18. des. 2018 kl. 04:34 skrev Yasufumi Mizoguchi <
> yasufumi0...@gmail.com
> > >:
> > >
> > > Hi
> > >
> > > I am trying Solr 7.6 in SolrCloud mode.
> > > But I found that ZooKeeper 3.4.11 has a critical issue about handling
> > > data/log directories.
> > > (https://issues.apache.org/jira/browse/ZOOKEEPER-2960)
> > >
> > > So, I want to know if using ZooKeeper 3.4.12 with Solr 7.6 is safe.
> > >
> > > Does anyone know this?
> > >
> > > Thanks,
> > > Yasufumi.
> >
> >
>


Can high RF slow down updates?

2018-10-29 Thread S G
Hi,

To support higher select-query-rates, we are planning to increase the
replication factor from 15 to 24.
Will this put too much load on the leader nodes? since each update now has
to be propagated to 24 replica nodes.
Each node is on a different IP but in the same availability region within a
data-center.

So if 100 update requests are coming per second, will it result in RF*100 =
2400 http requests originating per second from the leader? Is there any
async-IO happening for all these requests or a separate thread is launched
for each replica-update?

Thanks
SG


KeeperErrorCode = NoNode for /collections/my-valid-collection/state.json

2018-07-27 Thread S G
Hi,


Following error is very commonly seen in Solr.

Does anybody know why that is so?

And is it asking the user to do something about it?


org.apache.solr.common.SolrException:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /collections/my-valid-collection/state.json
at 
org.apache.solr.handler.admin.ZookeeperInfoHandler$ZKPrinter.writeError(ZookeeperInfoHandler.java:544)
at 
org.apache.solr.handler.admin.ZookeeperInfoHandler$ZKPrinter.printZnode(ZookeeperInfoHandler.java:812)
at 
org.apache.solr.handler.admin.ZookeeperInfoHandler$ZKPrinter.print(ZookeeperInfoHandler.java:526)
at 
org.apache.solr.handler.admin.ZookeeperInfoHandler.handleRequestBody(ZookeeperInfoHandler.java:414)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:735)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:716)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:497)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)



Thanks

SG


Re: Remove schema.xml in favor of managed-schema

2018-06-22 Thread S G
"And that managed-schema will reorder the entries and delete the comments
on first API modification." - This is something very irritating when
comparing files with the default version of Solr to see what has changed.
When upgrading schemas/configs for new version of Solr, such automatically
removed comments are a giant pain to work with.
This does not mean that managed-schema is less useful but Solr should try
to preserve the comments and formatting etc when adding content through
schema APIs



On Wed, Jun 20, 2018 at 4:35 PM Walter Underwood 
wrote:

> I strongly prefer the classic config files approach. Our config files are
> checked into
> version control. We update on the fly by uploading new files to Zookeeper,
> then
> reloading the collection. No restart needed.
>
> Pushing changes to prod is straightforward. Check out the tested files,
> load them
> into the prod cluster, reload the collection.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jun 19, 2018, at 9:06 AM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
> >
> > I actually prefer the classic config-files approach over managed schemas.
> > Having done both Elasticsearch (where everything is configed through an
> > API), managed and non-managed Solr, I prefer the legacy non-managed Solr
> > way of doing things when its possible
> >
> > - With 'managed' approaches, the config code often turns into spaghetti
> > throughout the client application, and harder to maintain
> > - The client application is often done in any number of programming
> > languages, client APIs, etc which makes it harder to ramp up new Solr
> devs
> > on how the search engine works
> > - The file-based config can be versioned and deployed as an artifact that
> > only contains config bits relevant to the search engine
> >
> > I know there's a lot of 'it depends'. For example, if I am
> programatically
> > changing config in real-time without wanting to restart the search
> engine,
> > then I can see the benefit to the managed config. Especially a large,
> > complex deployment. But most Solr instances I see are not in the giant,
> > complex to config variety and the config file approach is simplest for
> most
> > teams.
> >
> > At least that's my 2 cents :)
> > -Doug
> >
> >
> > On Tue, Jun 19, 2018 at 11:58 AM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> And that managed-schema will reorder the entries and delete the
> comments on
> >> first API modification.
> >>
> >> Regards,
> >>Alex
> >>
> >> On Tue, Jun 19, 2018, 11:47 AM Shawn Heisey, 
> wrote:
> >>
> >>> On 6/17/2018 6:48 PM, S G wrote:
> >>>> I only wanted to know if schema.xml offer anything that managed-schema
> >>> does
> >>>> not.
> >>>
> >>> The only difference between the two is that there is a different
> >>> filename and the managed version can be modified by API calls.  The
> >>> schema format and what you can do within that format is identical
> either
> >>> way.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
>
>


Re: Remove schema.xml in favor of managed-schema

2018-06-17 Thread S G
I think my query got misinterpreted.

I only wanted to know if schema.xml offer anything that managed-schema does
not.

Best,
SG


On Sat, Jun 16, 2018 at 6:45 PM Erick Erickson 
wrote:

> Currently, there are no restrictions on hand-editing config files,
> mutable or not.
>
> The rub is that any of the APIs that modify configs operate on their
> in-memory copy and write that out (both Cloud and stand-alone modes).
>
> So if I start Solr, the nodes have image of the configs at time T.
> Now I hand-edit the file(s) and push then to ZooKeeper say at time T1
> Now I use the API to update them at T2
> At this point, my changes pushed at T1 are lost since the T2 changes
> used the in-memory copies read at time T as a basis for mods.
>
> If I change this even slightly by:
> Start Solr at T
> hand-edit and push at T1 _and reload my collection_
> use the API at T2
> At this point my configs have all the changes T1 and T2 in them since
> the reload re-read the configs.
>
> Ditto if I restart all my Solr instances after T1 but before T2.
>
> That said, how this will change in the future I have no idea. I
> usually hand-edit them but that's a personal problem.
>
> IIRC, at one point, there was one restriction: A mutable schema could
> _not_ be named schema.xml. But whether that's an accurate memory and
> if so whether it's still current I'm not sure about.
>
> And all of _that_ said, hand-editing mutable configs does, indeed,
> violate all sorts of contracts and support may change in the future,
> it's "at your own risk and you better know what you're doing". The
> same could be said for hand-editing the configs in the first place
> though I suppose ;)
>
> Best,
> Erick
>
> On Sat, Jun 16, 2018 at 1:34 PM, Doug Turnbull
>  wrote:
> > I'm not sure changing something from mutable -> unmutable means it
> suddenly
> > becomes hand-editable.
> >
> > I don't know the details here, but I can imagine a case that unmutable
> > implies some level of consistency, where the file is hashed, and later
> > might be confirmed to still be the same 'unmutable' state. Hand editing
> > would violate that contract.
> >
> > One might also imagine a future where 'managed-schema' isn't a config
> file,
> > and is just an API you use to configure a Solr. In this case 'mutable'
> > doesn't imply anything about files, just the state of the Solr config.
> >
> > -Doug
> >
> > On Sat, Jun 16, 2018 at 12:24 AM S G  wrote:
> >
> >> Hi,
> >>
> >> As per
> >>
> >>
> https://lucene.apache.org/solr/guide/7_2/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml
> >> ,
> >> the only difference between schema.xml and managed-schema is that one
> >> accepts schema-changes through an API while the other one does not.
> >>
> >> However, there is a flag "mutable" which can be used with managed-schema
> >> too to turn dynamic-changes ON or OFF
> >>
> >> If that is true, then it means schema.xml does not offer anything which
> >> managed-schema does not.
> >>
> >> Is that a valid statement to make?
> >>
> >> Infact, I see that schema.xml is not shipped anymore with Solr ?
> >>
> >> Thanks
> >> SG
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
>


Remove schema.xml in favor of managed-schema

2018-06-15 Thread S G
Hi,

As per
https://lucene.apache.org/solr/guide/7_2/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml,
the only difference between schema.xml and managed-schema is that one
accepts schema-changes through an API while the other one does not.

However, there is a flag "mutable" which can be used with managed-schema
too to turn dynamic-changes ON or OFF

If that is true, then it means schema.xml does not offer anything which
managed-schema does not.

Is that a valid statement to make?

Infact, I see that schema.xml is not shipped anymore with Solr ?

Thanks
SG


Re: UUIDUpdateProcessorFactory can cause duplicate documents?

2018-06-09 Thread S G
We do not want to generate the "id" ourselves and hence were looking for
something that would generate the "id" automatically.

UUIDUpdateProcessorFactory documentation says nothing about the
automatic "id" generation process identifying if the document received is
same as an existing document or not.

That means if I send {"color":"red", "size":"L"} once,
UUIDUpdateProcessorFactory
will
generate an "id" X and if I send the same document {"color":"red",
"size":"L"}  again,
UUIDUpdateProcessorFactory will not know that its the same document and
will generate an "id" Y.

That ways I will end up with two documents:
{"id": X, "color":"red", "size":"L"}
{"id": Y, "color":"red", "size":"L"}

And that situation can only be avoided if I use the
https://wiki.apache.org/solr/Deduplication technique of
generating an "id" based on the signature of some other fields. That will
avoid duplication and auto-generate
the "id" field too.

Is that a correct understanding?

Thanks
SG


On Mon, Jun 4, 2018 at 8:44 PM Erick Erickson 
wrote:

> First, your assumption is correct. It would be A Bad Thing if two
> identical UUIDs were generated
>
> Is this SolrCloud? If so, then the deduplication idea won't work. The
> problem is that the uuid is used for routing and there is a decent (1
> - 1/numShards) chance that the two "identical" docs would land on
> different shards, deduplication at the hash level is local to the
> replica.
>
> But why not make the hash of the doc's content the "id" field? Your
> ETL process would generate the hash and stuff it into the "id" field.
> Then in both SolrCloud or stand-alone it would "just work".
>
> Best,
> Erick
>
> On Mon, Jun 4, 2018 at 11:33 AM, Aman Tandon 
> wrote:
> > Hi,
> >
> > Suppose id field is the UUID linked field in the configuration and if
> this
> > is missing in the document coming to index then it will generate a UUID
> and
> > set it in id field. However if id field is present with some value then
> it
> > shouldn't.
> >
> > Kindly refer
> >
> http://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
> >
> >
> > On Mon, Jun 4, 2018, 23:52 S G  wrote:
> >
> >> Hi,
> >>
> >> Is it correct to assume that UUIDUpdateProcessorFactory will produce 2
> >> documents even if the same document is indexed twice without the "id"
> field
> >> ?
> >>
> >> And to avoid such a thing, we can use the technique mentioned in
> >> https://wiki.apache.org/solr/Deduplication ?
> >>
> >> Thanks
> >> SG
> >>
>


UUIDUpdateProcessorFactory can cause duplicate documents?

2018-06-04 Thread S G
Hi,

Is it correct to assume that UUIDUpdateProcessorFactory will produce 2
documents even if the same document is indexed twice without the "id" field
?

And to avoid such a thing, we can use the technique mentioned in
https://wiki.apache.org/solr/Deduplication ?

Thanks
SG


Solr 7 or 6 - stability and performance

2018-03-25 Thread S G
Hi,

Solr 7 has been out for about 6 months now. (Sep-2017 to Mar-2018)
We are planning some major upgrades from 6.2 and 6.4 versions of Solr and I
wanted to see how is Solr 7 looking in terms of stability and performance.
(Have seen http://lucene.apache.org/solr/news.html but some real users'
experience would be nice)

1) Has anyone encountered major stability issues that made them move back
to 6.x version?

2) Did anyone see more than 10% change in performance (good or bad)? I know
about https://issues.apache.org/jira/browse/SOLR-11078 and wish trie fields
were still kept in schema until point fields completely got over the
performance issue.

Thanks
SG


Expose a metric for percentage-recovered during full recoveries

2018-03-14 Thread S G
Hi,

Solr does full recoveries very frequently - sometimes even for seemingly
simple cases like adding a field to the schema, a couple of nodes go into
recovery.
It would be nice if it did not do such full recoveries so frequently but
since that may require a lot of fixing, can we have a metric that reports
how much a core has recovered already?

Example:

$ cd data
$ du -h . | grep  my_collection | grep -w index
77G   ./my_collection_shard3_replica2/data/index.20180314184942993
145G ./my_collection_shard3_replica2/data/index.20180112001943687

This shows that the shard3-replica2 core is doing a full recovery and has
only copied 77G out of 145G
That is about 50% recovery done.


It would be very nice if we can have this as a JMX metric and we can then
plot it somewhere instead of having to keep running the same command in a
loop and guessing how much is left to be copied.

A metric like the following would be great:
{
"my_collection_shard3_replica2": {
 "recovery": {
  "currentSize": "77 gb",
  "expectedSize": "145 gb",
  "percentRecovered": "50",
  "startTimeEpoch": "361273126317"
  }
}
}

If it looks useful, I will open a JIRA for the same.

Thanks
SG


Re: Why are cursor mark queries recommended over regular start, rows combination?

2018-03-14 Thread S G
Thanks everybody. This is lot of good information.
And we should try to update this in the documentation too to help users
make the right choice.
I can take a stab at this if someone can point me how to update the
documentation.

Thanks
SG


On Tue, Mar 13, 2018 at 2:04 PM, Chris Hostetter 
wrote:

>
> : > 3) Lastly, it is not clear the role of export handler. It seems that
> the
> : > export handler would also have to do exactly the same kind of thing as
> : > start=0 and rows=1000,000. And that again means bad performance.
>
> : <3> First, streaming requests can only return docValues="true"
> : fields.Second, most streaming operations require sorting on something
> : besides score. Within those constraints, streaming will be _much_
> : faster and more efficient than cursorMark. Without tuning I saw 200K
> : rows/second returned for streaming, the bottleneck will be the speed
> : that the client can read from the network. First of all you only
> : execute one query rather than one query per N rows. Second, in the
> : cursorMark case, to return a document you and assuming that any field
> : you return is docValues=false
>
> Just to clarify, there is big difference between the /export handler
> and "streaming expressions"
>
> Unless something has changed drasticly in the past few releases, the
> /export handler does *NOT* support exporting a full *collection* in solr
> cloud -- it only operates on an individual core (aka: shard/replica).
>
> Streaming expressions is a feature that does work in Cloud mode, and can
> make calls to the /export handler on a replica of each shard in order to
> process the data of an entire collection -- but when doing so it has to
> aggregate the *ALL* the results from every shard in memory on the
> coordinating node -- meaning that (in addition to the docvalues caveat)
> streaming expressions requires you to "spend" a lot of ram usage on one
> node as a trade off for spending more time & multiple requests to get teh
> same data from cursorMark...
>
> https://lucene.apache.org/solr/guide/exporting-result-sets.html
> https://lucene.apache.org/solr/guide/streaming-expressions.html
>
> An additional perk of cursorMakr that may be relevant to the OP is that
> you can "stop" tailing a cursor at anytime (ie: if you're post processing
> the results client side and decide you have "enough" results) but a simila
> feature isn't available (AFAICT) from streaming expressions...
>
> https://lucene.apache.org/solr/guide/pagination-of-
> results.html#tailing-a-cursor
>
>
> -Hoss
> http://www.lucidworks.com/
>


Why are cursor mark queries recommended over regular start, rows combination?

2018-03-12 Thread S G
Hi,

We have use-cases where some queries will return about 100k to 500k records.
As per https://lucene.apache.org/solr/guide/7_2/pagination-of-results.html,
it seems that using start=x, rows=y is a bad combination performance wise.

1) However, it is not clear to me why the alternative: "cursor-query" is
cheaper or recommended. It would have to run the same kind of workload as
the normal start=x, rows=y combination, no?

2) Also, it is not clear if the cursory-query runs on a single shard or
uses the same scatter gather as regular queries to read from all the shards?

3) Lastly, it is not clear the role of export handler. It seems that the
export handler would also have to do exactly the same kind of thing as
start=0 and rows=1000,000. And that again means bad performance.

What is the difference between all of the 3


Thanks
SG


Move the lang-configurations from managed-schema to its own xml file

2018-02-05 Thread S G
Hi,

I think it would be good to move the lang-configurations from
managed-schema to its own xml file as discussed in
https://issues.apache.org/jira/browse/SOLR-11948

What do other people think?

Thanks
SG


Re: 9000+ CLOSE_WAIT connections in solr v6.2.2 causing it to "die"

2018-02-03 Thread S G
Hi Arcadius,

Most of the clients use Solrj to interact with Solr.
Does it not automatically handle the connection pools?

SG


On Fri, Feb 2, 2018 at 4:47 PM, Arcadius Ahouansou 
wrote:

> I have seen a lot of CLOSE_WAIT in the past.
> In many cases, it was that the client application was not releasing/closing
> or pooling connections properly.
>
> I would suggest you double check the client code first.
>
> Arcadius.
>
> On 2 February 2018 at 23:52, mmb1234  wrote:
>
> > > You said that you're running Solr 6.2.2, but there is no 6.2.2 version.
> > > but the JVM argument list includes "-Xmx512m" which is a 512MB heap
> >
> > My typos. They're 6.6.2 and -Xmx30g respectively.
> >
> > > many open connections causes is a large number of open file handles,
> >
> > solr [ /opt/solr/server/logs ]$ sysctl -a | grep vm.max_map_count
> > vm.max_map_count = 262144
> >
> > The only thing I notice right before solr shutdown messages in solr.log
> the
> > /update QTime goes from ~500ms  to ~25.
> >
> > There is an automated health check that issues a kill on the 
> due
> > to http connection timeout.
> >
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Applied Knowledge Is Power
> Office : +441444702101
> Mobile: +447908761999
> Menelic Ltd: menelic.com
> Visitor Management System: menelicvisitor.com
> ---
>


Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread S G
Our 3.4.6 ZK nodes were unable to join the cluster unless their quorum got
broken.
So if there was 5 node zookeeper and it lost 2 nodes, they would not rejoin
because ZK still had its quorum.
To make them join, you had to break the quorum by restarting a node in
quorum.
Only when quorum broke, did ZK realize that something was wrong and it
recognized the other two nodes trying to rejoin.
Also this problem happened only when ZK had been running for a long time,
like several weeks (perhaps DNS caching or something, not sure really).


On Fri, Feb 2, 2018 at 11:32 AM, Tomas Fernandez Lobbe 
wrote:

> Hi Markus,
> If the same code that runs OK in 7.1 breaks 7.2.1, it is clear to me that
> there is some bug in Solr introduced between those releases (maybe an
> increase in memory utilization? or maybe some decrease in query throughput
> making threads to pile up?). I’d hate to have this issue lost in the users
> list, could you create a Jira? Maybe next time you have this issue you can
> post thread/heap dumps, that would be useful.
>
> Tomás
>
> > On Feb 2, 2018, at 9:38 AM, Walter Underwood 
> wrote:
> >
> > Zookeeper 3.4.6 is not good? That was the version recommended by Solr
> docs when I installed 6.2.0.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Feb 2, 2018, at 9:30 AM, Markus Jelsma 
> wrote:
> >>
> >> Hello S.G.
> >>
> >> We have relied in Trie* fields every since they became available, i
> don't think reverting to the old fieldType's will do us any good, we have a
> very recent problem.
> >>
> >> Regarding our heap, the cluster ran fine for years with just 1.5 GB, we
> only recently increased it because or data keeps on growing. Heap rarely
> goes higher than 50 %, except when this specific problem occurs. The nodes
> have no problem processing a few hundred QPS continuously and can go on for
> days, sometimes even a few weeks.
> >>
> >> I will keep my eye open for other clues when the problem strikes again!
> >>
> >> Thanks,
> >> Markus
> >>
> >> -Original message-
> >>> From:S G 
> >>> Sent: Friday 2nd February 2018 18:20
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: 7.2.1 cluster dies within minutes after restart
> >>>
> >>> Yeah, definitely check the zookeeper version.
> >>> 3.4.6 is not a good one I know and you can say the same for all the
> >>> versions below it too.
> >>> We have used 3.4.9 with no issues.
> >>> While Solr 7.x uses 3.4.10
> >>>
> >>> Another dimension could be the use or (dis-use) of p-fields like pint,
> >>> plong etc.
> >>> If you are using them, try to revert back to tint, tlong etc
> >>> And if you are not using them, try to use them (Although doing this
> means a
> >>> change from your older config and less likely to help).
> >>>
> >>> Lastly, did I read 2 GB for JVM heap?
> >>> That seems really too less to me for any version of Solr
> >>> We run with 10-16 gb of heap with G1GC collector and new-gen capped at
> 3-4gb
> >>>
> >>>
> >>> On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> >>> wrote:
> >>>
>  Hello Ere,
> 
>  It appears that my initial e-mail [1] got lost in the thread. We don't
>  have GC issues, the cluster that dies occasionally runs, in general,
> smooth
>  and quick with just 2 GB allocated.
> 
>  Thanks,
>  Markus
> 
>  [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
>  within-minutes-after-restart-td4372615.html
> 
>  -Original message-
> > From:Ere Maijala 
> > Sent: Friday 2nd February 2018 8:49
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Markus,
> >
> > I may be stating the obvious here, but I didn't notice garbage
> > collection mentioned in any of the previous messages, so here goes.
> In
> > our experience almost all of the Zookeeper timeouts etc. have been
> > caused by too long garbage collection pauses. I've summed up my
> > observations here:
> >  msg135857.html
> >
> >
> > So, in my experience it's relatively easy to cause heavy memory usage
> > with SolrCloud with seemingly innocent queries, and GC can become a
> > problem really quickly even if everything seems to be running
> smoothly
> > otherwise.
> >
> > Regards,
> > Ere
> >
> > Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
> >> Hello S.G.
> >>
> >> We do not complain about speed improvements at all, it is clear 7.x
> is
>  faster than its predecessor. The problem is stability and not
> recovering
>  from weird circumstances. In general, it is our high load cluster
>  containing user interaction logs that suffers the most. Our main text
>  search cluster - receiving much fewer queries - seems mostly
> unaffected,
>  except last Sunday. After very

Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread S G
Yeah, definitely check the zookeeper version.
3.4.6 is not a good one I know and you can say the same for all the
versions below it too.
We have used 3.4.9 with no issues.
While Solr 7.x uses 3.4.10

Another dimension could be the use or (dis-use) of p-fields like pint,
plong etc.
If you are using them, try to revert back to tint, tlong etc
And if you are not using them, try to use them (Although doing this means a
change from your older config and less likely to help).

Lastly, did I read 2 GB for JVM heap?
That seems really too less to me for any version of Solr
We run with 10-16 gb of heap with G1GC collector and new-gen capped at 3-4gb


On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma 
wrote:

> Hello Ere,
>
> It appears that my initial e-mail [1] got lost in the thread. We don't
> have GC issues, the cluster that dies occasionally runs, in general, smooth
> and quick with just 2 GB allocated.
>
> Thanks,
> Markus
>
> [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
> within-minutes-after-restart-td4372615.html
>
> -Original message-
> > From:Ere Maijala 
> > Sent: Friday 2nd February 2018 8:49
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Markus,
> >
> > I may be stating the obvious here, but I didn't notice garbage
> > collection mentioned in any of the previous messages, so here goes. In
> > our experience almost all of the Zookeeper timeouts etc. have been
> > caused by too long garbage collection pauses. I've summed up my
> > observations here:
> >  >
> >
> > So, in my experience it's relatively easy to cause heavy memory usage
> > with SolrCloud with seemingly innocent queries, and GC can become a
> > problem really quickly even if everything seems to be running smoothly
> > otherwise.
> >
> > Regards,
> > Ere
> >
> > Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
> > > Hello S.G.
> > >
> > > We do not complain about speed improvements at all, it is clear 7.x is
> faster than its predecessor. The problem is stability and not recovering
> from weird circumstances. In general, it is our high load cluster
> containing user interaction logs that suffers the most. Our main text
> search cluster - receiving much fewer queries - seems mostly unaffected,
> except last Sunday. After very short but high burst of queries it entered
> the same catatonic state the logs cluster usually dies from.
> > >
> > > The query burst immediately caused ZK timeouts and high heap
> consumption (not sure which came first of the latter two). The query burst
> lasted for 30 minutes, the excessive heap consumption continued for more
> than 8 hours, before Solr finally realized it could relax. Most remarkable
> was that Solr recovered on its own, ZK timeouts stopped, heap went back to
> normal.
> > >
> > > There seems to be a causality between high load and this state.
> > >
> > > We really want to get this fixed for ourselves and everyone else that
> may encounter this problem, but i don't know how, so i need much more
> feedback and hints from those who have deep understanding of inner working
> of Solrcloud and changes since 6.x.
> > >
> > > To be clear, we don't have the problem of 15 second ZK timeout, we use
> 30. Is 30 too low still? Is it even remotely related to this problem? What
> does load have to do with it?
> > >
> > > We are not able to reproduce it in lab environments. It can take
> minutes after cluster startup for it to occur, but also days.
> > >
> > > I've been slightly annoyed by problems that can occur in a board time
> span, it is always bad luck for reproduction.
> > >
> > > Any help getting further is much appreciated.
> > >
> > > Many thanks,
> > > Markus
> > >
> > > -Original message-
> > >> From:S G 
> > >> Sent: Wednesday 31st January 2018 21:48
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >>
> > >> We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
> > >> And that came out all right.
> > >> We saw a performance increase of about 30% in read latencies between
> 6.6.0
> > >> and 7.1.0
> > >> And then we saw a performance degradation of about 10% between 7.1.0
> and
> > >> 7.2.1 in many metrics.
> > >> But overall, it still seems better than 6.6.0.
> > >>
> > >> I will check for the errors too in the logs but the nodes were
> responsive
> > >> for all the 23+ hours we did the load test.
> > >>
> > >> Disclaimer: We do not test facets and pivots or block-joins. And will
> add
> > >> those features to our load-testing tool sometime this year.
> > >>
> > >> Thanks
> > >> SG
> > >>
> > >>
> > >> On Wed, Jan 31, 2018 at 3:12 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> > >> wrote:
> > >>
> > >>> Ah thanks, i just submitted a patch fixing it.
> > >>>
> > >>> Anyway, in the end it appears this is not the problem we are seeing
> as our
> > >>> timeouts were already a

Re: 7.2.1 cluster dies within minutes after restart

2018-02-01 Thread S G
ok, good to know that 7.x shows good performance for you too.

1) Regarding the zookeeper problem, do you know for sure that it does not
occur in 6.x ?
 I would suggest to write a small load-test that can send a similar
kind of load to 6.x and 7.x clusters and see which one breaks.
 I know that these kind of problems can take days to occur but without
a reproducible pattern, it may be hard to fix.

2) Another thing is the zookeeper version.
7.x uses 3.4.10 version of zookeeper (See
https://github.com/apache/lucene-solr/blob/branch_7_2/lucene/ivy-versions.properties#L192
)
If you are using 3.4.10, try using 3.4.9 or vice versa.
Do not use zookeeper versions lower than 3.4.9 - they have some nasty
bugs.

3) Do take a look at zookeeper cluster too.
ZK has 4-letter commands like ruok, srvr etc that reveal a lot of its
internal activity.

4) Hopefully, you are not doing anything cross-DC as that could cause
network delays and cause such problems.

5) As far as I can remember, we have seen some zookeeper issues but they
were generally related to 3.4.6 version or
VMs getting replaced in cloud environment and the IP's not getting
refreshed in the ZK's configs.

That's all I could think of from a user's perspective  --\_(0.0)_/--

Thanks
SG



On Wed, Jan 31, 2018 at 1:56 PM, Markus Jelsma 
wrote:

> Hello S.G.
>
> We do not complain about speed improvements at all, it is clear 7.x is
> faster than its predecessor. The problem is stability and not recovering
> from weird circumstances. In general, it is our high load cluster
> containing user interaction logs that suffers the most. Our main text
> search cluster - receiving much fewer queries - seems mostly unaffected,
> except last Sunday. After very short but high burst of queries it entered
> the same catatonic state the logs cluster usually dies from.
>
> The query burst immediately caused ZK timeouts and high heap consumption
> (not sure which came first of the latter two). The query burst lasted for
> 30 minutes, the excessive heap consumption continued for more than 8 hours,
> before Solr finally realized it could relax. Most remarkable was that Solr
> recovered on its own, ZK timeouts stopped, heap went back to normal.
>
> There seems to be a causality between high load and this state.
>
> We really want to get this fixed for ourselves and everyone else that may
> encounter this problem, but i don't know how, so i need much more feedback
> and hints from those who have deep understanding of inner working of
> Solrcloud and changes since 6.x.
>
> To be clear, we don't have the problem of 15 second ZK timeout, we use 30.
> Is 30 too low still? Is it even remotely related to this problem? What does
> load have to do with it?
>
> We are not able to reproduce it in lab environments. It can take minutes
> after cluster startup for it to occur, but also days.
>
> I've been slightly annoyed by problems that can occur in a board time
> span, it is always bad luck for reproduction.
>
> Any help getting further is much appreciated.
>
> Many thanks,
> Markus
>
> -Original message-
> > From:S G 
> > Sent: Wednesday 31st January 2018 21:48
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
> > And that came out all right.
> > We saw a performance increase of about 30% in read latencies between
> 6.6.0
> > and 7.1.0
> > And then we saw a performance degradation of about 10% between 7.1.0 and
> > 7.2.1 in many metrics.
> > But overall, it still seems better than 6.6.0.
> >
> > I will check for the errors too in the logs but the nodes were responsive
> > for all the 23+ hours we did the load test.
> >
> > Disclaimer: We do not test facets and pivots or block-joins. And will add
> > those features to our load-testing tool sometime this year.
> >
> > Thanks
> > SG
> >
> >
> > On Wed, Jan 31, 2018 at 3:12 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Ah thanks, i just submitted a patch fixing it.
> > >
> > > Anyway, in the end it appears this is not the problem we are seeing as
> our
> > > timeouts were already at 30 seconds.
> > >
> > > All i know is that at some point nodes start to lose ZK connections
> due to
> > > timeouts (logs say so, but all within 30 seconds), the logs are flooded
> > > with those messages:
> > > o.a.z.ClientCnxn Client session timed out, have not heard from server
> in
> > > 10359ms for sessionid 0x160f9e723c12122
> > > o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> > > 0x60f9e7234f05bb has expired
> > >
> > > Then there is a doubling in heap usage and nodes become unresponsive,
> die
> > > etc.
> > >
> > > We also see those messages in other collections, but not so frequently
> and
> > > they don't cause failure in those less loaded clusters.
> > >
> > > Ideas?
> > >
> > > Thanks,
> > > Markus
> > >
> > > -Original message-
> > > > From:Mic

Re: Long GC Pauses

2018-01-31 Thread S G
Hey Maulin,

I hope you are using some tools to look at your gc.log file (There are
couple available online) or grepping for pauses.
Do you mind sharing your G1GC settings and some screenshots from your
gc.log analyzer's output ?

-SG


On Wed, Jan 31, 2018 at 9:16 AM, Erick Erickson 
wrote:

> Just to double check, when you san you're seeing 60-200 sec  GC pauses
> are you looking at the GC logs (or using some kind of monitor) or is
> that the time it takes the query to respond to the client? Because a
> single GC pause that long on 40G is unusual no matter what. Another
> take on Jason's question is
> For all the JVMs you're running, how much _total_ heap is allocated?
> And how much physical memory is on the box? I generally start with _at
> least_ half the memory left to the OS
>
> These are fairly horrible, what generates such queries?
> AND * AND *
>
> Best,
> Erick
>
>
>
> On Wed, Jan 31, 2018 at 7:28 AM, Jason Gerlowski 
> wrote:
> > Hi Maulin,
> >
> > To clarify, when you said "...allocated 40 GB RAM to each shard." above,
> > I'm going to assume you meant "to each node" instead.  If you actually
> did
> > mean "to each shard" above, please correct me and anyone who chimes in
> > afterward.
> >
> > Firstly, it's really hard to even take guesses about potential causes or
> > remediations without more details about your load characteristics
> > (average/peak QPS, index size, average document size, etc.).  If no one
> > gives any satisfactory advice, please consider uploading additional
> details
> > to help us help you.
> >
> > Secondly, I don't know anything about the load characteristics you're
> > putting on your Solr cluster, but I'm curious whether you've experimented
> > with lower RAM settings.  Generally speaking, the more RAM you have, the
> > longer your GC pauses are likely to be (even with the tuning that various
> > GC settings provide).  If you can get away with giving the Solr process
> > less RAM, you should see your GC pauses shrink.  Was 40GB chosen after
> some
> > trial-and-error experimentation, or is it something you could
> investigate?
> >
> > For a bit more overview on this, see this slightly outdated (but still
> > useful) wiki page: https://wiki.apache.org/solr/
> SolrPerformanceProblems#RAM
> >
> > Hope that helps, even if just to disqualify some potential
> causes/solutions
> > to close in on a real fix.
> >
> > Best,
> >
> > Jason
> >
> > On Wed, Jan 31, 2018 at 8:17 AM, Maulin Rathod 
> wrote:
> >
> >> Hi,
> >>
> >> We are using solr cloud 6.1. We have around 20 collection on 4 nodes (We
> >> have 2 shards and each shard have 2 replicas). We have allocated 40 GB
> RAM
> >> to each shard.
> >>
> >> Intermittently we found long GC pauses (60 sec to 200 sec) due to which
> >> solr stops responding and hence collections goes in recovering mode. It
> >> takes minimum 5-10 minutes (sometime it takes more and we have to
> restart
> >> the solr node) for recovering all collections. We are using default GC
> >> setting (CMS) as per solr.cmd.
> >>
> >> We tried different G1 GC to see if it help, but still we see long GC
> >> pauses(60 sec to 200 sec) and also found that memory usage is more in in
> >> case G1 GC.
> >>
> >> What could be reason for long GC pauses and how can fix it? Insufficient
> >> memory or problem with GC setting or something else? Any suggestion
> would
> >> be greatly appreciated.
> >>
> >> In our analysis, we also found some inefficient queries (which uses *
> many
> >> times in query) in solr logs. Could it be reason for high memory usage?
> >>
> >> Slow Query
> >> --
> >>
> >> INFO  (qtp1239731077-498778) [c:documents s:shard1 r:core_node1
> >> x:documents] o.a.s.c.S.Request [documents]  webapp=/solr path=/select
> >> params={df=summary&distrib=false&fl=id&shards.purpose=4&
> >> start=0&fsv=true&sort=description+asc,id+desc&fq=&shard.url=
> >> s1.asite.com:8983/solr/documents|s1r1.asite.com:
> >> 8983/solr/documents&rows=250&version=2&q=((id:(
> >> REV78364_24705418+REV78364_24471492+REV78364_24471429+
> >> REV78364_24470771+REV78364_24470271+))+OR+summary:((HPC*+
> >> AND+*+AND+*+AND+OH1150*+AND+*+AND+*+AND+U0*+AND+*+AND+*+AND+
> >> HGS*+AND+*+AND+*+AND+MDL*+AND+*+AND+*+AND+100067*+AND+*+AND+
> >> -*+AND+Reinforcement*+AND+*+AND+Mode*)+))++AND++(title:((*
> >> HPC\+\-\+OH1150\+\-\+U0\+\-\+HGS\+\-\+MDL\+\-\+100067\+-\+
> >> Reinforcement\+Mode*)+))+AND+project_id:(-2+78243+78365+
> >> 78364)+AND+is_active:true+AND+((isLatest:(true)+AND+
> >> isFolderActive:true+AND+isXref:false+AND+-document_
> >> type_id:(3+7)+AND+((is_public:true+OR+distribution_list:
> >> 4858120+OR+folderadmin_list:4858120+OR+author_user_id:
> >> 4858120)+AND+((defaultAccess:(true)+OR+allowedUsers:(
> >> 4858120)+OR+allowedRoles:(6342201+172408+6336860)+OR+
> >> combinationUsers:(4858120))+AND+-blockedUsers:(4858120
> >> +OR+(isLatestRevPrivate:(true)+AND+allowedUsersForPvtRev:(
> >> 4858120)+AND+-folderadmin_list:(4858120)))&shards.tolerant=true&NOW=
> >> 1516786

Re: 7.2.1 cluster dies within minutes after restart

2018-01-31 Thread S G
We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
And that came out all right.
We saw a performance increase of about 30% in read latencies between 6.6.0
and 7.1.0
And then we saw a performance degradation of about 10% between 7.1.0 and
7.2.1 in many metrics.
But overall, it still seems better than 6.6.0.

I will check for the errors too in the logs but the nodes were responsive
for all the 23+ hours we did the load test.

Disclaimer: We do not test facets and pivots or block-joins. And will add
those features to our load-testing tool sometime this year.

Thanks
SG


On Wed, Jan 31, 2018 at 3:12 AM, Markus Jelsma 
wrote:

> Ah thanks, i just submitted a patch fixing it.
>
> Anyway, in the end it appears this is not the problem we are seeing as our
> timeouts were already at 30 seconds.
>
> All i know is that at some point nodes start to lose ZK connections due to
> timeouts (logs say so, but all within 30 seconds), the logs are flooded
> with those messages:
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 10359ms for sessionid 0x160f9e723c12122
> o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> 0x60f9e7234f05bb has expired
>
> Then there is a doubling in heap usage and nodes become unresponsive, die
> etc.
>
> We also see those messages in other collections, but not so frequently and
> they don't cause failure in those less loaded clusters.
>
> Ideas?
>
> Thanks,
> Markus
>
> -Original message-
> > From:Michael Braun 
> > Sent: Monday 29th January 2018 21:09
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Believe this is reported in https://issues.apache.org/
> jira/browse/SOLR-10471
> >
> >
> > On Mon, Jan 29, 2018 at 2:55 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello SG,
> > >
> > > The default in solr.in.sh is commented so it defaults to the value
> set in
> > > bin/solr, which is fifteen seconds. Just uncomment the setting in
> > > solr.in.sh and your timeout will be thirty seconds.
> > >
> > > For Solr itself to really default to thirty seconds, Solr's bin/solr
> needs
> > > to be patched to use the correct value.
> > >
> > > Regards,
> > > Markus
> > >
> > > -Original message-
> > > > From:S G 
> > > > Sent: Monday 29th January 2018 20:15
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > >
> > > > Hi Markus,
> > > >
> > > > We are in the process of upgrading our clusters to 7.2.1 and I am not
> > > sure
> > > > I quite follow the conversation here.
> > > > Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher
> > > value
> > > > in the config (and it's just a default value being wrong/overridden
> > > > somewhere)?
> > > > Or is it more severe in the sense that any config set for
> > > ZK_CLIENT_TIMEOUT
> > > > by the user is just ignored completely by Solr in 7.2.1 ?
> > > >
> > > > Thanks
> > > > SG
> > > >
> > > >
> > > > On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma <
> > > markus.jel...@openindex.io>
> > > > wrote:
> > > >
> > > > > Ok, i applied the patch and it is clear the timeout is 15000.
> Solr.xml
> > > > > says 3 if ZK_CLIENT_TIMEOUT is not set, which is by default
> unset
> > > in
> > > > > solr.in.sh,but set in bin/solr to 15000. So it seems Solr's
> default is
> > > > > still 15000, not 3.
> > > > >
> > > > > But, back to my topic. I see we explicitly set it in solr.in.sh to
> > > 3.
> > > > > To be sure, i applied your patch to a production machine, all our
> > > > > collections run with 3. So how would that explain this log
> line?
> > > > >
> > > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> server
> > > in
> > > > > 22130ms
> > > > >
> > > > > We also see these with smaller values, seven seconds. And, is this
> > > > > actually an indicator of the problems we have?
> > > > >
> > > > > Any ideas?
> > > > >
> > > > > Many thanks,
> > > > > Markus
> > > > >
> > > > >
> > > > > -Original message-
> > > > > > From:Markus Jelsma 
> > > > > > Sent: Saturday 27th January 2018 10:03
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: RE: 7.2.1 cluster dies within minutes after restart
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I grepped for it yesterday and found nothing but 3 in the
> > > settings,
> > > > > but judging from the weird time out value, you may be right. Let me
> > > apply
> > > > > your patch early next week and check for spurious warnings.
> > > > > >
> > > > > > Another note worthy observation for those working on cloud
> stability
> > > and
> > > > > recovery, whenever this happens, some nodes are also absolutely
> sure
> > > to run
> > > > > OOM. The leaders usually live longest, the replica's don't, their
> heap
> > > > > usage peaks every time, consistently.
> > > > > >
> > > > > > Thanks,
> > > > > > Markus
> > > > > >
> > > > > > -Original

Re: 7.2.1 cluster dies within minutes after restart

2018-01-29 Thread S G
Hi Markus,

We are in the process of upgrading our clusters to 7.2.1 and I am not sure
I quite follow the conversation here.
Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher value
in the config (and it's just a default value being wrong/overridden
somewhere)?
Or is it more severe in the sense that any config set for ZK_CLIENT_TIMEOUT
by the user is just ignored completely by Solr in 7.2.1 ?

Thanks
SG


On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma 
wrote:

> Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml
> says 3 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in
> solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is
> still 15000, not 3.
>
> But, back to my topic. I see we explicitly set it in solr.in.sh to 3.
> To be sure, i applied your patch to a production machine, all our
> collections run with 3. So how would that explain this log line?
>
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 22130ms
>
> We also see these with smaller values, seven seconds. And, is this
> actually an indicator of the problems we have?
>
> Any ideas?
>
> Many thanks,
> Markus
>
>
> -Original message-
> > From:Markus Jelsma 
> > Sent: Saturday 27th January 2018 10:03
> > To: solr-user@lucene.apache.org
> > Subject: RE: 7.2.1 cluster dies within minutes after restart
> >
> > Hello,
> >
> > I grepped for it yesterday and found nothing but 3 in the settings,
> but judging from the weird time out value, you may be right. Let me apply
> your patch early next week and check for spurious warnings.
> >
> > Another note worthy observation for those working on cloud stability and
> recovery, whenever this happens, some nodes are also absolutely sure to run
> OOM. The leaders usually live longest, the replica's don't, their heap
> usage peaks every time, consistently.
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> > > From:Shawn Heisey 
> > > Sent: Saturday 27th January 2018 0:49
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >
> > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> server in 22130ms (although zkClientTimeOut is 3).
> > >
> > > Are you absolutely certain that there is a setting for zkClientTimeout
> > > that is actually getting applied?  The default value in Solr's example
> > > configs is 30 seconds, but the internal default in the code (when no
> > > configuration is found) is still 15.  I have confirmed this in the
> code.
> > >
> > > Looks like SolrCloud doesn't log the values it's using for things like
> > > zkClientTimeout.  I think it should.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-11915
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: SOLR Data Backup

2018-01-19 Thread S G
Another option is to have CDCR enabled for Solr and replicate your data to
another Solr cluster continuously.

BTW, why do we not recommend having Solr as a source of truth?

On Thu, Jan 18, 2018 at 4:08 AM, Florian Gleixner  wrote:

> Am 18.01.2018 um 10:21 schrieb Wael Kader:
> > Hello,
> >
> > Whats the best way to do a backup of the SOLR data.
> > I have a single node solr server and I want to always keep a copy of the
> > data I have.
> >
> > Is replication an option for what I want ?
> >
> > I would like to get some tutorials and papers if possible on the method
> > that should be used in case its backup or replication or anything else.
> >
>
> The reference manual will help you:
>
>
> https://lucene.apache.org/solr/guide/6_6/making-and-
> restoring-backups.html#standalone-mode-backups
>
>


Re: Adding a child doc incrementally

2018-01-19 Thread S G
Restriction to a single shard seems like a big limitation for us.
Also, I was hoping that this was something Solr provided out of the box.
(Like
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
)

Something like:

{
 "id":"parents-id",
 "price":{"set":99},
 "popularity":{"inc":20},
 "children": {"add": {child document(s)}}
}

or something like:
{
 "id":"child-id",
 "parentId": "parents-id"
 ... normal fields of the child ...
 "operationType": "add | delete"
}

In both the cases, Solr can just look at the parents' ID, route the
document to the correct shard and add the child to the parent to create the
full nested document (as in block join), that would be ideal.

Thanks
SG





On Wed, Jan 17, 2018 at 9:58 PM, Gus Heck  wrote:

> If the document routing can be arranged such that the children and the
> parent are always co-located in the same shard, and share an identifier,
> the graph query can pull back the parent plus any arbitrary number of
> "children" that have been added at any time in any order. In this scheme
> "children" are just things that match your graph query... (
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-
> GraphQueryParser)
> However, if your query has to cross shards, that won't work (yet...
> https://issues.apache.org/jira/browse/SOLR-11384).
>
> More info here:
> https://www.slideshare.net/lucidworks/solr-graph-query-
> presented-by-kevin-watters-kmw-technology
>
> On Mon, Jan 15, 2018 at 2:09 PM, S G  wrote:
>
> > Hi,
> >
> > We have a use-case where a single document can contain thousands of child
> > documents.
> > However, I could not find any way to do it incrementally.
> > Only way is to read the full document from Solr, add the new child
> document
> > to it and then re-index the full document will all of its child documents
> > again.
> > This causes lot of reads from Solr just to form the document with one
> extra
> > document.
> > Ideally, I would have liked to only send the parent-ID and the
> > child-document only as part of an "incremental update" command to Solr.
> >
> > Is there a way to incrementally add a child document to a parent
> document?
> >
> > Thanks
> > SG
> >
>
>
>
> --
> http://www.the111shift.com
>


Adding a child doc incrementally

2018-01-15 Thread S G
Hi,

We have a use-case where a single document can contain thousands of child
documents.
However, I could not find any way to do it incrementally.
Only way is to read the full document from Solr, add the new child document
to it and then re-index the full document will all of its child documents
again.
This causes lot of reads from Solr just to form the document with one extra
document.
Ideally, I would have liked to only send the parent-ID and the
child-document only as part of an "incremental update" command to Solr.

Is there a way to incrementally add a child document to a parent document?

Thanks
SG


Re: regarding exposing merge metrics

2018-01-10 Thread S G
Last comment by Shawn on SOLR-10130 is:
Metrics was just a theory, sounds like that's not it.

It would be very interesting to know what really caused the slowdown and do
we really need the config or not.

Thanks
SG



On Tue, Jan 9, 2018 at 12:00 PM, suresh pendap 
wrote:

> Thanks Shalin for sharing the link. However if I follow the thread then it
> seems like there was no conclusive evidence found that the performance
> degradation was due to the merge or index related metrics.
> If that is the case then can we just get rid of the config and publish
> these metrics by default?
>
>
> Regards
> suresh
>
>
>
> On Mon, Jan 8, 2018 at 10:25 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > The merge metrics were enabled by default in 6.4 but they were turned
> > off in 6.4.2 because of large performance degradations. For more
> > details, see https://issues.apache.org/jira/browse/SOLR-10130
> >
> > On Tue, Jan 9, 2018 at 9:11 AM, S G  wrote:
> > > Yes, this is actually confusing and the documentation (
> > > https://lucene.apache.org/solr/guide/7_2/metrics-reporting.html) does
> > not
> > > help either:
> > >
> > > *Index Merge Metrics* : These metrics are collected in respective
> > > registries for each core (e.g., solr.core.collection1….), under the
> INDEX
> > > category.
> > > Basic metrics are always collected - collection of additional metrics
> can
> > > be turned on using boolean parameters in the
> /config/indexConfig/metrics.
> > >
> > > However, we do not see the merge-metrics being collected if the above
> > > config is absent. So what basic metrics are always collected for merge?
> > > And why do the merge metrics require an additional config while most of
> > the
> > > others are reported directly?
> > >
> > > Thanks
> > > SG
> > >
> > >
> > >
> > > On Mon, Jan 8, 2018 at 2:02 PM, suresh pendap  >
> > > wrote:
> > >
> > >> Hi,
> > >> I am following the instructions from
> > >> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
> > >>  in order to expose the Index merge related metrics.
> > >>
> > >> The document says that we have to add the below snippet in order to
> > expose
> > >> the merge metrics
> > >>
> > >> 
> > >>   ...
> > >>   
> > >> 
> > >>   524288
> > >>   true
> > >> 
> > >> ...
> > >>   
> > >> ...
> > >>
> > >>
> > >>
> > >> I would like to know why is this metrics not exposed by default just
> > like
> > >> all the other metrics?
> > >>
> > >> Is there any performance overhead that we should be concerned about
> it?
> > >>
> > >> If there was no particular reason, then can we expose it by default?
> > >>
> > >>
> > >>
> > >> Regards
> > >> Suresh
> > >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>


Re: regarding exposing merge metrics

2018-01-08 Thread S G
Yes, this is actually confusing and the documentation (
https://lucene.apache.org/solr/guide/7_2/metrics-reporting.html) does not
help either:

*Index Merge Metrics* : These metrics are collected in respective
registries for each core (e.g., solr.core.collection1…​.), under the INDEX
category.
Basic metrics are always collected - collection of additional metrics can
be turned on using boolean parameters in the /config/indexConfig/metrics.

However, we do not see the merge-metrics being collected if the above
config is absent. So what basic metrics are always collected for merge?
And why do the merge metrics require an additional config while most of the
others are reported directly?

Thanks
SG



On Mon, Jan 8, 2018 at 2:02 PM, suresh pendap 
wrote:

> Hi,
> I am following the instructions from
> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
>  in order to expose the Index merge related metrics.
>
> The document says that we have to add the below snippet in order to expose
> the merge metrics
>
> 
>   ...
>   
> 
>   524288
>   true
> 
> ...
>   
> ...
>
>
>
> I would like to know why is this metrics not exposed by default just like
> all the other metrics?
>
> Is there any performance overhead that we should be concerned about it?
>
> If there was no particular reason, then can we expose it by default?
>
>
>
> Regards
> Suresh
>


Re: 7.1.0 weird messages bad core before recovery

2018-01-05 Thread S G
>
> Never seen it before, bug? Already fixed?


I have seen it many times before in almost all Solr versions.
Do not remember the exact stack trace though.
Generally a restart fixes the problem (Like almost all software :)



On Fri, Jan 5, 2018 at 6:57 AM, Markus Jelsma 
wrote:

> Any on this?
>
> Thanks,
> Markus
>
>
>
> -Original message-
> > From:Markus Jelsma 
> > Sent: Wednesday 27th December 2017 11:11
> > To: Solr-user 
> > Subject: 7.1.0 weird messages bad core before recovery
> >
> > Hello,
> >
> > I just had a bad core that needed recovery after restart, first it told
> me this:
> >
> > org.apache.solr.common.SolrException: Unable to locate core
> logs_shard1_replica1
> >   at org.apache.solr.handler.admin.CoreAdminOperation.lambda$
> static$5(CoreAdminOperation.java:150)
> >   at org.apache.solr.handler.admin.CoreAdminOperation.execute(
> CoreAdminOperation.java:384)
> >   at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.
> call(CoreAdminHandler
> > 
> >
> > then:
> >
> > Error while trying to recover. core=logs_shard1_replica1:
> java.lang.NullPointerException
> >   at org.apache.solr.update.PeerSync.alreadyInSync(
> PeerSync.java:391)
> >   at org.apache.solr.update.PeerSync.sync(PeerSync.java:253)
> >   
> >
> > and:
> >
> > Recovery failed - trying again... (0)
> >
> > Before finally starting recovery which was successful. Never seen it
> before, bug? Already fixed?
> >
> > Thanks,
> > Markus
> >
>


Re: New replica types

2018-01-02 Thread S G
AFAIK, tlog file is truncated with a hard-commit.
So if the TLOG replica is only pulling the tlog-file, it would become out
of date if it does not pull the full index too.
That means that the TLOG replica would do a full copy every time there is a
commit on the leader.

PULL replica, by definition copies index files only and so it would do full
recoveries often too.


How intelligent are the two replica types in determining that they need to
do a full recovery vs partial recovery?
Does full recovery happen every hard-commit on the leader?
Or does it happen with segment merges on the leader? (because index files
will look much different after a segment-merge)



NRT replicas will typically have very different files in their on-disk
> indexes even though they contain the same documents.


This is something which has caused full recoveries many times in my
clusters and I wish there was a solution to this one.
Do you think it would make sense for all replicas of a shard to agree upon
the segment where a document should go to?
Coupling this with an agreed cadence on segment merges, Solr would never do
full recovery. (It's a very high level view of course and will need lot of
refinements if implemented).

Getting a cadence on segment merges could possibly be implemented by a
time-based merging strategy where documents arriving within a particular
time-range only will form a particular segment.
So documents arriving between 1pm-2pm go to segment 1, those between
2pm-3pm go to segment 2 and so on.
That ways replicas will only copy the last N segments (with N being 1
generally) instead of doing a full recovery.
Even if merging happens on the leader, the last N segments should not be
cleared to avoid full recoveries on the replicas.
(I know something like this happens today, but not very sure about the
internal details and it's nowhere documented clearly).

Currently, I see my replicas go into full-recovery even when I dynamically
add a field to a collection or a replica missed updates for a few seconds.
(I do have high values for catchup rather than the default 100)


Thanks
SG






On Tue, Jan 2, 2018 at 8:58 PM, Shawn Heisey  wrote:

> On 1/2/2018 8:02 PM, S G wrote:
>
>> If the above is incorrect, can someone please point that out?
>>
>
> Assuming I have a correct understanding of how the different replica types
> work, I have some small clarifications.  If my understanding is incorrect,
> I hope somebody will point out my errors.
>
> TLOG is leader eligible because it keeps transaction logs from ongoing
> indexing, although it does not perform that indexing on its own index
> unless it becomes the leader.  Transaction logs are necessary for operation
> as leader.
>
> PULL does not keep transaction logs, which is why it is not leader
> eligible.  It only copies the index data.
>
> Either TLOG or PULL would do a full index copy if the local index is
> suddenly very different from the leader.  This could happen in situations
> where you have NRT replicas and the leader changes -- NRT replicas will
> typically have very different files in their on-disk indexes even though
> they contain the same documents.  When the leader changes to a different
> NRT replica, TLOG/PULL replicas will suddenly find that they have a very
> different list of index files, so they will fetch the entire index.
>
> Thanks,
> Shawn
>


New replica types

2018-01-02 Thread S G
Hi,

I was excited to see some good work in having more replica types for Solr.

However, Solr documentation left me with a few questions.
https://lucene.apache.org/solr/guide/7_2/shards-and-indexing-data-in-solrcloud.html#types-of-replicas


This is what I could come up with:
(Note that each point compares with corresponding point in each
replica-type, so it's easy to compare)


NRT
  1) Indexes locally
  2) Remains in-sync with leader, hence leader eligible
  3) Queries return latest data
  4) Replicates tlog or full-index depending on how far behind it is from
the leader
  5) Recommended when no query should return stale data ever !
  6) Penalizes the leader for full-index copy only when the replica is
missing a lot of updates (configurable though).


TLOG
  1) Does not index locally
  2) Remains in-sync with leader, hence leader eligible
  3) Queries generally return stale data as it does not index locally
  4) Replicates tlog to remain in sync but also does periodic full-index
copy from leader to get indexed data
  5) Recommended for very high throughputs at the cost of getting stale
results.
  6) Can penalize the leader for large full-index copies


PULL
  0) Same as TLOG Replica but copies only the indexed data periodically
  1) Does not index locally
  2) Does not remain in-sync with leader, hence not eligible for leader
election
  3) Queries generally return stale data as it does not index locally
  4) Only does a periodic full-index copy from leader to get indexed data
  5) Recommended for use with NRT or TLOG replicas only to increase read
throughput at the cost of getting stale results.
  6) Can penalize the leader for large full-index copies


If the above is incorrect, can someone please point that out?

Thanks
SG


"add-field" completes in minutes and sends replicas into full recovery

2017-12-26 Thread S G
Hi,

I have a Solr 6.5.1 cluster with a collection spawning 30 VMs.

I see that commands like the below "add-field" command complete in minutes
and send replicas into full recovery:

curl -X POST -H 'Content-type:application/json' --data-binary
'{"add-field":{"name":"some_new_field","type":"string","indexed":true,"stored":true,"required":false
}}' http://my-solr-host:8983/solr/my_collection/schema

{
  "responseHeader":{
"status":0,
"QTime":239578}
}


Any idea why that might be so?

We are not stopping the read/write traffic to Solr when adding fields like
the above.

Thanks
SG


Re: Confusing DocValues documentation

2017-12-21 Thread S G
Thank you Eric.

I guess the biggest piece I was missing was the sort on a field other than
the search field.
Once you have filtered a list of documents and then you want to sort, the
inverted index cannot be used for lookup.
You just have doc-IDs which are values in inverted index, not the keys.
Hence they cannot be "looked" up - only option is to loop through all the
entries of that key's inverted index.

DocValues come to rescue by reducing that looping operation to a lookup
again.
Because in docValues, the key (i.e. array-index) is the document-index and
gives an O(1) lookup for any doc-ID.


But that does not seem to be helping with sorting or faceting of any kind.
This seems to be like a good way to speed up a stored field's retrieval.

DocValues in the current example are:
FieldA
doc1 = 1
doc2 = 2
doc3 =

FieldB
doc1 = 2
doc2 = 4
doc3 = 5

FieldC
doc1 = 5
doc2 =
doc3 = 5

So if I have to run a query:
fieldA=*&sort=fieldB asc
I will get all the documents due to filter and then I will lookup the
values of field-B from the docValues lookup.
That will give me 2,4,5
This is sorted in this case, but assume that this was not sorted.
(The docValues array is indexed by Lucene's doc-ID not the field-value
after all, right?)

Then does Lucene/Solr still sort them like regular array of values?
That does not seem very efficient.
And it does not seem to helping with faceting, pivoting too.
What did I miss?

Thanks
SG






On Thu, Dec 21, 2017 at 5:31 PM, Erick Erickson 
wrote:

> Here's where you're going off the rails: "I can just look at the
> map-for-field-A"
>
> As I said before, you're totally right, all the information you need
> is there. But
> you're thinking of this as though speed weren't a premium when you say.
> "I can just look". Consider that there are single replicas out there with
> 300M
> (or more) docs in them. "Just looking" in a list 300M items long 300M times
> (q=*:*&sort=whatever) is simply not going to be performant compared to
> 300M indexing operations which is what DV does.
>
> Faceting is much worse.
>
> Plus space is also at a premium. Java takes 40+ bytes to store the first
> character. So any Java structure you use is going to be enormous. 300M ints
> is bad enough. And if you spoof this by using ordinals as Lucene does,
> you're
> well on your way to reinventing docValues.
>
> Maybe this will help. Imagine you have a phone book in your hands. It
> consists of documents like this:
>
> id: something
> phone: phone number
> name: person's name
>
> For simplicity, they're both string types 'cause they sort.
>
> Let's search by phone number but sort by name, i.e.
>
> q=phone:1234*&sort=name asc
>
> I'm searching and find two docs that match. How do I know how they
> sort wrt each other?
>
> I'm searching in the phone field but I need the value for each doc
> associated with the name field. In your example I'm searching in
> map-for-fieldA but sorting in map-for-field-B
>
> To get the name value for these two docs I have to enumerate
> map-for-field-B until I find each doc and then I can get the proper
> value and know how they sort. Sure, I could do some ordering and do a
> binary search but that's still vastly slower than having a structure
> that's a simple index operation to get the value in its field.
>
> The DV structure is actually more like what's below. These structures
> are simply an array indexed by the _internal_ Lucene document id,
> which is a simple zero-based integer that contains the value
> associated with that doc for that field (I'm simplifying a bit, but
> that's conceptually the deal).
> FieldA
> doc1 = 1
> doc2 = 2
> doc3 =
>
> FieldB
> doc1 = 2
> doc2 = 4
> doc3 = 5
>
> FieldC
> doc1 = 5
> doc2 =
> doc3 = 5
>
> Best,
> Erick
>
> On Thu, Dec 21, 2017 at 4:05 PM, S G  wrote:
> > Thanks a lot Erick and Emir.
> >
> > I am still a bit confused and an example will help me a lot.
> > Here is a little bit modified version of the same to illustrate my point
> > more clearly.
> >
> > Let us consider 3 documents - doc1, doc2 and doc3
> > Each contains upto 3 fields - A, B and C.
> > And the values for these fields are random.
> > For example:
> > doc1 = {A:1, B:2, C:5}
> > doc2 = {A:2, B:4}
> > doc3 = {B:5, C:5}
> >
> >
> > Inverted Index for the same should be a map of:
> > Key: 
> > Value: 
> > i.e.
> > {
> >map-for-field-A: {1: doc1, 2: doc2}
> >map-for-field-B: {2: doc1, 4: doc2, 5:doc3}
> >map-for-field-C: {5: [doc1, doc3]}
> > }
> 

Re: Confusing DocValues documentation

2017-12-21 Thread S G
Thanks a lot Erick and Emir.

I am still a bit confused and an example will help me a lot.
Here is a little bit modified version of the same to illustrate my point
more clearly.

Let us consider 3 documents - doc1, doc2 and doc3
Each contains upto 3 fields - A, B and C.
And the values for these fields are random.
For example:
doc1 = {A:1, B:2, C:5}
doc2 = {A:2, B:4}
doc3 = {B:5, C:5}


Inverted Index for the same should be a map of:
Key: 
Value: 
i.e.
{
   map-for-field-A: {1: doc1, 2: doc2}
   map-for-field-B: {2: doc1, 4: doc2, 5:doc3}
   map-for-field-C: {5: [doc1, doc3]}
}

For sorting on field A, I can just look at the map-for-field-A and sort the
keys (and
perhaps keep it sorted too for saving the sort each time). For facets on
field A, I can
again, just look at the map-for-field-A and get counts for each value. So I
will
get facets(Field-A) = {1:1, 2:1} because count for each value is 1.
Similarly facets(Field-C) = {5:2}

Why is this not performant? All it did was to bring one data-structure into
memory and if
the current implementation was changed to use OS-cache for the same, the
pressure on
the JVM would be reduced as well.

So the point I am trying to make here is that how does the data-structure of
docValues differ from the inverted index I showed above? And how does that
structure helps it become more performant? I do not want to factor in the
OS-cache perspective here for the time being because that could have been
fixed in the regular inverted index also. I just want to focus on the
data-structure
for now that how it is different from the inverted index. Please do not say
"columnar format" as
those 2 words really convey nothing to me.

If you can draw me the exact "columnar format" for the above example, then
it would be much appreciated.

Thanks
SG




On Thu, Dec 21, 2017 at 12:43 PM, Erick Erickson 
wrote:

> bq: I do not see why sorting or faceting on any field A, B or C would
> be a problem. All the values for a field are there in one
> data-structure and it should be easy to sort or group-by on that.
>
> This is totally true just totally incomplete: ;)
>
> for a given field:
>
> Inverted structure (leaving out position information and the like):
>
> term1: doc1,   doc37, doc 95
> term2: doc10, doc37, doc 950
>
> docValues structure (assuming multiValued):
>
> doc1: term1
> doc10: term2
> doc37: term1 term2
> doc95: term1
> doc950: term2
>
> They are used to answer two different questions.
>
> The inverted structure efficiently answers "for term1, what docs does
> it appear in?"
>
> The docValues structure efficiently answers "for doc1, what terms are
> in the field?"
>
> So imagine you have a search on term1. It's a simple iteration of the
> inverted structure to get my result set, namely docs 1, 37, and 95.
>
> But now I want to facet. I have to get the _values_ for my field from
> the entire result set in order to fill my count buckets. Using the
> uninverted structure, I'd have to scan the entire table term-by-term
> and look to see if the term appeared in any of docs 1, 37, 95 and add
> to my total for the term. Think "table scan".
>
> Instead I use the docValues structure which is much faster, I already
> know all I'm interested in is these three docs, so I just read the
> terms in the field for each doc and add to my counts. Again, to answer
> this question from the wrong (in this case inverted structure) I'd
> have to do a table scan. Also, this would be _extremely_ expensive to
> do from stored fields.
>
> And it's the inverse for searching the docValues structure. In order
> to find which doc has term1, I'd have to examine all the terms for the
> field for each document in my index. Horribly painful.
>
> So yes, the information is all there in one structure or the other and
> you _could_ get all of it from either one. You'd also have a system
> that was able to serve 0.1 QPS on a largish index.
>
> And remember that this is very simplified. If you have a complex query
> you need to get a result set before even considering the
> facet/sort/whatever question so gathering the term information as I
> searched wouldn't particularly work.
>
> Best,
> Erick
>
> On Thu, Dec 21, 2017 at 9:56 AM, S G  wrote:
> > Hi,
> >
> > It seems that docValues are not really explained well anywhere.
> > Here are 2 links that try to explain it:
> > 1) https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/
> > 2)
> > https://www.elastic.co/guide/en/elasticsearch/guide/
> current/docvalues.html
> >
> > And official Solr documentation that does not explain the internal
> details
> > at all:
> > 3) https://lucene.apache.org/solr

Confusing DocValues documentation

2017-12-21 Thread S G
Hi,

It seems that docValues are not really explained well anywhere.
Here are 2 links that try to explain it:
1) https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/
2)
https://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues.html

And official Solr documentation that does not explain the internal details
at all:
3) https://lucene.apache.org/solr/guide/6_6/docvalues.html

The first links says that:
  The row-oriented (stored fields) are
  {
'doc1': {'A':1, 'B':2, 'C':3},
'doc2': {'A':2, 'B':3, 'C':4},
'doc3': {'A':4, 'B':3, 'C':2}
  }

  while column-oriented (docValues) are:
  {
'A': {'doc1':1, 'doc2':2, 'doc3':4},
'B': {'doc1':2, 'doc2':3, 'doc3':3},
'C': {'doc1':3, 'doc2':4, 'doc3':2}
  }

And the second link gives an example as:
Doc values maps documents to the terms contained by the document:

  Doc  Terms
  -
  Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the
  Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer
  Doc_3 | dog, dogs, fox, jumped, over, quick, the
  -


To me, this example is same as the row-oriented (stored fields) format in
the first link.
Which one is right?



Also, the column-oriented (docValues) mentioned above are:
{
  'A': {'doc1':1, 'doc2':2, 'doc3':4},
  'B': {'doc1':2, 'doc2':3, 'doc3':3},
  'C': {'doc1':3, 'doc2':4, 'doc3':2}
}
Isn't this what the inverted index also looks like?
Inverted index is an index of the term (A,B,C) to the document and the
position it is found in the document.


Or is it better to say that the inverted index is of the form:
{
   map-for-field-A: {1: doc1, 2: doc2, 4: doc3}
   map-for-field-B: {2: doc1, 3: [doc2,doc3]}
   map-for-field-C: {3: doc1, 4: doc2, 2: doc3}
}
But even if that is true, I do not see why sorting or faceting on any field
A, B or C would be a problem.
All the values for a field are there in one data-structure and it should be
easy to sort or group-by on that.

Can someone explain the above a bit more clearly please? A build-upon the
same example as above would be really good.


Thanks
SG


DocValues for multivalued strings and boolean fields

2017-12-20 Thread S G
Hi,

One of our Solr users is trying to set docValues="true" for multivalued
string fields and boolean-type fields.

I am not sure what the performance impact of that would be.
Can docValues negatively affect performance in any way?

We are using Solr 6.5.1 and also experimenting with 7.1.0

Thanks
SG


Re: JVM GC Issue

2017-12-04 Thread S G
I think the below article explains it well:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I was thinking that doc-Values need to be transitioned into JVM from the OS
cache.
Turns out that is not required as the docValues are loaded into the virtual
address space by the OS.
The JVM need not think about loading them into its own memory as it can
just access the virtual memory as easily.
The OS keeps track of whether the docValues should be loaded into memory
(if their address is actually being accessed by the JVM) or they just keep
lying on the disk.


Thx - SG

On Sun, Dec 3, 2017 at 12:02 PM, Shawn Heisey  wrote:

> On 12/2/2017 6:59 PM, S G wrote:
>
>> I am a bit curious on the docValues implementation.
>> I understand that docValues do not use JVM memory and
>> they make use of OS cache - that is why they are more performant.
>>
>> But to return any response from the docValues, the values in the
>> docValues' column-oriented-structures would need to be brought
>> into the JVM's memory. And that will then increase the pressure
>> on the JVM's memory anyways. So how do docValues actually
>> help from memory perspective?
>>
>
> What I'm writing below is my understanding of docValues.  If it turns out
> that I've gotten any of it wrong, that is MY error, not Solr's.
>
> When there are no docValues, Solr must do something called "uninverting
> the index" in order to satisfy certain operations -- primarily faceting,
> grouping, and sorting.
>
> A Lucene index is an inverted index.  This means that it is a big list of
> terms, and then each of those entries is a second list that describes which
> fields and documents have the term, as well as some other information like
> positions.  Uninverting the index is pretty efficient, but it does take
> time.  The uninverted index structure is a list of all terms for a specific
> field.  Then there's a second phase -- the info in the uninverted field is
> read and processed for the query operation, which will use heap.  I do not
> know if there are additional phases. There might be.
>
> In case you don't know, in the Lucene index, docValues data on disk
> consists of every entry in the index for one field, written sequentially in
> an uncompressed format.
>
> This means that for those query types, docValues is *exactly* what Solr
> needs for the first phase.  And instead of generating it into heap memory
> and then reading it, Solr can just read the data right off the disk (which
> the OS caches, so it might be REALLY fast and use OS memory) in order to
> handle second and later phases.  This is faster than building an uninverted
> field, and consumes no heap memory.
>
> As I mentioned, the uninverted data is built from indexed terms.  The
> contents of docValue data is the same as a stored field -- the original
> indexed data.  Because docValues cannot be added to fields using
> solr.TextField, the only type that undergoes text analysis, there's no
> possibility of a difference between an uninverted field and docValues.
>
> Thanks,
> Shawn
>


Re: JVM GC Issue

2017-12-02 Thread S G
I am a bit curious on the docValues implementation.
I understand that docValues do not use JVM memory and
they make use of OS cache - that is why they are more performant.

But to return any response from the docValues, the values in the
docValues' column-oriented-structures would need to be brought
into the JVM's memory. And that will then increase the pressure
on the JVM's memory anyways. So how do docValues actually
help from memory perspective?

Thanks
SG


On Sat, Dec 2, 2017 at 12:39 AM, Dominique Bejean  wrote:

> Hi, Thank you for the explanations about faceting. I was thinking the hit
> count had a biggest impact on facet memory lifecycle. Regardless the hit
> cout there is a query peak at the time the issue occurs. This is relative
> in regard of what Solr is supposed be able to handle, but this should be
> sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09
> 285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299
> 10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24
> At the time the issue occurs, we see the CPU activity growing up to very
> high. May be there is a lack of CPU. So, I will suggest all actions that
> will remove pressure on heap memory.
>
>
>- enable docValues
>- divide cache size per 2 in order go back to Solr default
>- refine the fl parameter as I know it can optimized
>
> Concerning phonetic filter, anyway it will be removed as a large number of
> results are really irrelevant. Regads. Dominique
>
>
> Le sam. 2 déc. 2017 à 04:25, Erick Erickson  a
> écrit :
>
> > Doninique:
> >
> > Actually, the memory requirements shouldn't really go up as the number
> > of hits increases. The general algorithm is (say rows=10)
> > Calcluate the score of each doc
> > If the score is zero, ignore
> > If the score is > the minimum in my current top 10, replace the lowest
> > scoring doc in my current top 10 with the new doc (a PriorityQueue
> > last I knew).
> > else discard the doc.
> >
> > When all docs have been scored, assemble the return from the top 10
> > (or whatever rows is set to).
> >
> > The key here is that most of the Solr index is kept in
> > MMapDirecotry/OS space, see Uwe's excellent blog here:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> > In terms of _searching_, very little of the Lucene index structures
> > are kept in memory.
> >
> > That said, faceting plays a bit loose with the rules. If you have
> > docValues set to true, most of the memory structures are in the OS
> > memory space, not the JVM. If you have docValues set to false, then
> > the "uninverted" structure is built in the JVM heap space.
> >
> > Additionally, the JVM requirements are sensitive to the number of
> > unique values in field being faceted on. For instance, let's say you
> > faceted by a date field with just facet.field=some_date_field. A
> > bucket would have to be allocated to hold the counts for each and
> > every unique date field, i.e. one for each millisecond in your search,
> > which might be something you're seeing. Conceptually this is just an
> > array[uniqueValues] of ints (longs? I'm not sure). This should be
> > relatively easily testable by omitting the facets while measuring.
> >
> > Where the number of rows _does_ make a difference is in the return
> > packet. Say I have rows=10. In that case I create a single return
> > packet with all 10 docs "fl" field. If rows = 10,000 then that return
> > packet is obviously 1,000 times as large and must be assembled in
> > memory.
> >
> > I rather doubt the phonetic filter is to blame. But you can test this
> > by just omitting the field containing the phonetic filter in the
> > search query. I've certainly been wrong before.
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
> >  wrote:
> > > Hi,
> > >
> > >
> > > Thank you both for your responses.
> > >
> > >
> > > I just have solr log for the very last period of the CG log.
> > >
> > >
> > > Grep command allows me to count queries per minute with hits > 1000 or
> >
> > > 1 and so with the biggest impact on memory and cpu during faceting
> > >
> > >
> > >> 1000
> > >
> > >  59 11:13
> > >
> > >  45 11:14
> > >
> > >  36 11:15
> > >
> > >  45 11:16
> > >
> > >  59 11:17
> > >
> > >  40 11:18
> > >
> > >  95 11:19
> > >
> > > 123 11:20
> > >
> > > 137 11:21
> > >
> > > 123 11:22
> > >
> > >  86 11:23
> > >
> > >  26 11:24
> > >
> > >  19 11:25
> > >
> > >  17 11:26
> > >
> > >
> > >> 1
> > >
> > >  55 11:19
> > >
> > >  78 11:20
> > >
> > >  48 11:21
> > >
> > > 134 11:22
> > >
> > >  93 11:23
> > >
> > >  10 11:24
> > >
> > >
> > > So we see that at the time GC start become nuts, large result set count
> > > increase.
> > >
> > >
> > > The query field include phonetic filter and results are really not
> > relevant
> > > due to this. I will suggest to :
> > >
> > 

How to debug slow update queries?

2017-11-29 Thread S G
Hi,

Our logs are spewing a lot of the following:

org.apache.solr.core.SolrCore; slow: [my_coll_shard8_replica1]
webapp=/solr path=/update params={wt=javabin&version=2} status=0 QTime=1736

And the QTime is as high as 3-4 seconds in some cases.

Shouldn't the slow logger print the document also which took so long?
The /select query and /get query handlers do a good job of helping in such
cases as the entire URL is present there (I know its GET vs POST thing).
But we should probably print the POST request's payload for /update
requests to debug more?

Thanks
SG


Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread S G
My bad. I found it at https://issues.apache.org/jira/browse/SOLR-9453
But I could not find it in changes.txt perhaps because its yet not resolved.

On Tue, Nov 21, 2017 at 9:15 AM, Erick Erickson 
wrote:

> Did you check the JIRA list? Or CHANGES.txt in more recent versions?
>
> On Tue, Nov 21, 2017 at 1:13 AM, S G  wrote:
> > Hi,
> >
> > We are running 6.2 version of Solr and hitting this error frequently.
> >
> > Error while trying to recover. core=my_core:java.lang.
> NullPointerException
> > at org.apache.solr.update.PeerSync.handleUpdates(
> PeerSync.java:605)
> > at org.apache.solr.update.PeerSync.handleResponse(
> PeerSync.java:344)
> > at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
> > at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:376)
> > at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:221)
> > at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> > Is this a known issue and fixed in some newer version?
> >
> >
> > Thanks
> > SG
>


NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread S G
Hi,

We are running 6.2 version of Solr and hitting this error frequently.

Error while trying to recover. core=my_core:java.lang.NullPointerException
at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605)
at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344)
at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:376)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



Is this a known issue and fixed in some newer version?


Thanks
SG


Re: DocValues

2017-11-17 Thread S G
Thank you Erick and Shawn.


1) So it seems like docValues should always be preferred over stored fields
for retreival
if sorting-of-multivalued-fields is not a concern. Is that a correct
understanding?


2) Also, the in-place atomic updates (with docValues=true and
stored/indexed=false) should
be much faster than regular atomic updates (with docValues=anything and
stored/indexed=true).
This is so because in-place updates are just looking up the document
corresponding to the
field in the columnar-oriented lookup and changing the value there. The
document itself is
not re-indexed because stored is false and indexed is false for an in-place
update. If there is
any bench-mark to verify this, it would be great.


3) If the performance is dreadful to search with docValue=true,
indexed=false fields, then
why is that even allowed? Shouldn't Solr just give an error for such cases?


Thanks
SG




On Fri, Nov 17, 2017 at 6:50 AM, Erick Erickson 
wrote:

> I'll add that using docValues in place of stored is much more
> efficient than using stored. To access stored=true data
> 1> a 16K block must be read from disk
> 2> the 16K block must be decompressed.
>
> With docValues, the value is a simple lookup, the value is probably in
> memory already (MMapped) and the decompression of a large block is
> unnecessary.
>
> There is one caveat: docValues uses (for multiValued fields) a
> SORTED_SET. Therefore multiple identical values are collapsed and the
> values are sorted. So if your input was
> 5, 6, 3, 4, 3, 3, 3
> the retrieved values would be
> 3, 4, 5, 6
>
> If this is NOT ok for your app, then you should use stored values to
> retrieve. Otherwise DocValues is preferred.
>
> Best,
> Erick
>
> On Fri, Nov 17, 2017 at 5:44 AM, Shawn Heisey  wrote:
> > On 11/17/2017 12:53 AM, S G wrote:
> >>
> >> Going through
> >>
> >> https://www.elastic.co/guide/en/elasticsearch/guide/
> current/_deep_dive_on_doc_values.html
> >> ,
> >> is it possible to enable only docValues and disable stored/indexed
> >> attributes for a field?
> >
> >
> > Yes, this is possible.  In fact, if you want to do in-place Atomic
> updates,
> > this is how the field must be set up.
> >
> > https://lucene.apache.org/solr/guide/6_6/updating-parts-
> of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
> >
> >> In that case, the field will become only sortable/facetable/pivotable
> but
> >> it cannot be searched nor can it be retrieved?
> >
> >
> > Recent Solr versions can use docValues instead of stored when retrieving
> > data for results.  This can be turned on/off on a per-field basis.  The
> > default setting is enabled if you're using a current schema version.
> >
> > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-
> RetrievingDocValuesDuringSearch
> >
> > As I understand it, you actually *can* search docValues-only fields
> (which
> > would require a match to the entire field -- no text analysis), but
> because
> > it works similarly to a full-table scan in a database, the performance is
> > dreadful on most fields, and it's NOT recommended.
> >
> > Thanks,
> > Shawn
>


Re: DocValues

2017-11-16 Thread S G
Going through
https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html
,
is it possible to enable only docValues and disable stored/indexed
attributes for a field?

In that case, the field will become only sortable/facetable/pivotable but
it cannot be searched nor can it be retrieved?
I am guessing that stored comes naturally when a field has docValues
enabled.
Is that a correct understanding?

Thanks
SG



On Thu, Nov 16, 2017 at 11:48 PM, S G  wrote:

> Hi,
>
> I am trying to understand docValues.
>
> Almost every link I have gone through says that enable docValues if you
> want to sort/facet/pivot.
> Does that mean I should enable docValues even if I just want to index and
> store simple integer-type fields?
> If that is true, then the default numeric fields will not work for me as
> they have docValues=true.
>
> Is it recommended to create my own fields when I do not want to
> sort/facet/pivot but only want to index and store?
>
> Thanks
> SG
>


DocValues

2017-11-16 Thread S G
Hi,

I am trying to understand docValues.

Almost every link I have gone through says that enable docValues if you
want to sort/facet/pivot.
Does that mean I should enable docValues even if I just want to index and
store simple integer-type fields?
If that is true, then the default numeric fields will not work for me as
they have docValues=true.

Is it recommended to create my own fields when I do not want to
sort/facet/pivot but only want to index and store?

Thanks
SG


Re: Multiple collections for a write-alias

2017-11-13 Thread S G
We are actually very close to doing what Shawn has suggested.

Emir has a good point about new collections failing on deletes/updates of
older documents which were not present in the new collection. But even if
this
feature can be implemented for an append-only log, it would make a good
feature IMO.


Use-case for re-indexing everything again is generally that of an attribute
change like
enabling "indexed" or "docValues" on a field or adding a new field to a
schema.
While the reading client-code sits behind a flag to start using the new
attribute/field, we
have to re-index all the data without stopping older-format reads.
Currently, we have to do
dual writes to the new collections or play catch-up-after-a-bootstrap.


Note that the catch-up-after-a-bootstrap is not very easy too (it is very
similar to the one
described by Shwan). If this special place is Kafka or some table in the
DB, then we have to
do dual writes to the regular source-of-truth and this special place. Dual
writes with DB and Kafka
suffer from being transaction-less (and thus lack consistency) while dual
write to DB increase
the load on DB.


Having created_date / modified_date fields and querying the DB to find
live-traffic documents has
its own problems and is taxing on the DB again.


Dual writes to Solr's multiple collections directly is the simplest to
implement for a client and
that is exactly what this new feature could be. With a
dual-write-collection-alias, it becomes
easier for the client to not implement any of the above if the
dual-write-collection-alias does the following:

- Deletes on missing documents in new collection are simply ignored.
- Incremental updates just throw an error for not being supported on
multi-write-collection-alias.
- Regular updates (i.e. Delete-Then-Insert) should work just fine because
they will just treat the document as a brand new one and versioning
strategies can take care of out-of-order updates.


SG


On Fri, Nov 10, 2017 at 6:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> This approach could work only if it is append only index. In case you have
> updates/deletes, you have to process in order, otherwise you will get
> incorrect results. I am thinking that is one of the reasons why it might
> not be supported since not too useful.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 9 Nov 2017, at 19:09, S G  wrote:
> >
> > Hi,
> >
> > We have a use-case to re-create a solr-collection by re-ingesting
> > everything but not tolerate a downtime while that is happening.
> >
> > We are using collection alias feature to point to the new collection when
> > it has been re-ingested fully.
> >
> > However, re-ingestion takes several hours to complete and during that
> time,
> > the customer has to write to both the collections - previous collection
> and
> > the one being bootstrapped.
> > This dual-write is harder to do from the client side (because client
> needs
> > to have a retry logic to ensure any update does not succeed in one
> > collection and fails in another - consistency problem) and it would be a
> > real welcome addition if collection aliasing can support this.
> >
> > Proposal:
> > If can enhance the write alias to point to multiple collections such that
> > any update to the alias is written to all the collections it points to,
> it
> > would help the client to avoid dual writes and also issue just a single
> > http call from the client instead of multiple. It would also reduce the
> > retry logic inside the client code used to keep the collections
> consistent.
> >
> >
> > Thanks
> > SG
>
>


Multiple collections for a write-alias

2017-11-09 Thread S G
Hi,

We have a use-case to re-create a solr-collection by re-ingesting
everything but not tolerate a downtime while that is happening.

We are using collection alias feature to point to the new collection when
it has been re-ingested fully.

However, re-ingestion takes several hours to complete and during that time,
the customer has to write to both the collections - previous collection and
the one being bootstrapped.
This dual-write is harder to do from the client side (because client needs
to have a retry logic to ensure any update does not succeed in one
collection and fails in another - consistency problem) and it would be a
real welcome addition if collection aliasing can support this.

Proposal:
If can enhance the write alias to point to multiple collections such that
any update to the alias is written to all the collections it points to, it
would help the client to avoid dual writes and also issue just a single
http call from the client instead of multiple. It would also reduce the
retry logic inside the client code used to keep the collections consistent.


Thanks
SG


Re: Solr staying constant on popularity indexes

2017-10-11 Thread S G
I find myself in the same boat as TI when a Solr node goes into recovery.
Solr UI and the logs are really of no help at that time.
It would be really nice to enhance the Solr UI with the features mentioned
in the original post.


On Tue, Oct 10, 2017 at 4:14 AM, Charlie Hull  wrote:

> On 10/10/2017 11:02, Bernd Fehling wrote:
>
>> Questions coming to my mind:
>>
>> Is there a "Resiliency Status" page for SolrCloud somewhere?
>>
>> How would SolrCloud behave in a Jepsen test?
>>
>
> This has been done in 2014 - see https://lucidworks.com/2014/12
> /10/call-maybe-solrcloud-jepsen-flaky-networks/
>
> Charlie
>
>>
>> Regards
>> Bernd
>>
>> Am 10.10.2017 um 09:22 schrieb Toke Eskildsen:
>>
>>> On Mon, 2017-10-09 at 20:50 -0700, Tech Id wrote:
>>>
 Being a long term Solr user, I tried to do a little comparison myself
 and actually found some interesting features in ES.

 1. No zookeeper  - I have burnt my hands with some zookeeper issues
 in the past and it is no fun to deal with. Kafka and Storm are also
 trying to burden zookeeper less and less because ZK cannot handle
 heavy traffic.

>>>
>>> ZooKeeper is not the easiest beast to tame, but it does have its
>>> plusses. The greatest being that it is pretty good at what it does:
>>> https://aphyr.com/posts/291-call-me-maybe-zookeeper
>>>
>>> Home-cooked distribution systems might be a lot easier to use,
>>> primarily because they tend to be a perfect fit for the technology they
>>> support, but they are hard to get right:
>>> https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0
>>>
>>> 2. REST APIs - this is a big wow over the complicated syntax Solr
 uses. I think V2 APIs are coming to address this, but they did come a
 bit late in the game.

>>>
>>> I guess you mean JSON APIs? Anyway, I fully agree that the old Solr
>>> syntax is extremely clunky as soon as we move beyond the simple "just
>>> supply a few search terms"-scenario.
>>>
>>> - Toke Eskildsen, Royal Danish Library
>>>
>>>
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Re: FilterCache size should reduce as index grows?

2017-10-05 Thread S G
So for large indexes, there is a chance that filterCache of 128 can cause
bad GC.
And for smaller indexes, it would really not matter that much because well,
the index size is small and probably whole of it is in OS-cache anyways.
So perhaps a default of 64 would be a much saner choice to get the best of
both the worlds?

On Thu, Oct 5, 2017 at 7:23 AM, Yonik Seeley  wrote:

> On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen  wrote:
> > On Wed, 2017-10-04 at 21:42 -0700, S G wrote:
> >
> > It seems that the memory limit option maxSizeMB was added in Solr 5.2:
> > https://issues.apache.org/jira/browse/SOLR-7372
> > I am not sure if it works with all caches in Solr, but in my world it
> > is way better to define the caches by memory instead of count.
>
> Yes, that will work with the filterCache, but one needs to change the
> cache type as well (maxSizeMB is only an option on LRUCache, and
> filterCache uses FastLRUCache in the default solrconfig.xml)
>
> -Yonik
>


FilterCache size should reduce as index grows?

2017-10-04 Thread S G
Hi,

Here is a discussion we had recently with a fellow Solr user.
It seems reasonable to me and wanted to see if this is an accepted theory.

The bit-vectors in filterCache are as long as the maximum number of
documents in a core. If there are a billion docs per core, every bit vector
will have a billion bits making its size as 10 9 / 8 = 128 mb
With such a big cache-value per entry,  the default value of 128 values in
will become 128x128mb = 16gb and would not be very good for a system
running below 32 gb of memory.

If such a use-case is anticipated, either the JVM's max memory be increased
to beyond 40 gb or the filterCache size be reduced to 32.

Thanks
SG


Re: Running Solr-Server inside other process

2017-08-27 Thread S G
Thanks Erick,

Yes, this seems to be what I want.
Is there a good example of how to use it?

Thanks
SG


On Thu, Aug 24, 2017 at 5:02 PM, Erick Erickson 
wrote:

> Solr has the EmbeddedSolrServer is that what you're looking for?
>
> Best,
> Erick
>
> On Thu, Aug 24, 2017 at 11:15 AM, S G  wrote:
> > Hi,
> >
> > We are looking to run Solr in-memory for testing and examples.
> >
> > For example:
> > 1) Cassandra has cassandra-unit:
> > https://github.com/jsevellec/cassandra-unit/wiki/How-to-
> use-it-in-your-code
> >
> > 2) Storm has local-mode:
> > http://storm.apache.org/releases/current/Local-mode.html
> >
> > Is there something similar for Solr too?
> >
> > Thanks
> > SG
>


Running Solr-Server inside other process

2017-08-24 Thread S G
Hi,

We are looking to run Solr in-memory for testing and examples.

For example:
1) Cassandra has cassandra-unit:
https://github.com/jsevellec/cassandra-unit/wiki/How-to-use-it-in-your-code

2) Storm has local-mode:
http://storm.apache.org/releases/current/Local-mode.html

Is there something similar for Solr too?

Thanks
SG


Re: Limiting the number of queries/updates to Solr

2017-08-07 Thread S G
I tried using the Jetty's QoS filter for rate limiting the queries.
It has a good option to apply different rates per URL pattern.

However, the same is not being picked up by Solr and the details of the
same are shared on
https://stackoverflow.com/questions/45536986/why-is-this-qos-jetty-filter-not-working

Has someone has worked on this before and can help?

Thanks
SG


On Fri, Aug 4, 2017 at 5:51 PM, S G  wrote:

> timeAllowed parameter is a not a good choice for rate limiting and could
> crash the whole Solr cluster.
> In fact, timeAllowed parameter should increase the chances of crashing the
> whole cluster:
>
> When the timeAllowed for a query is over, it's client will get a failure
> but the server handling the query itself will not kill the thread running
> that query. So Solr itself would still be working on that long-running
> query but the client has got a timeOut.
> These failure-receiving client-threads are now free to process other
> requests: retry failed ones or fire new queries to Solr.
> This should suffocate Solr even more, although client application's
> threads will not get blocked ever.
>
> With a rate limiter, we save both - clients' extra traffic gets
> rejected-responses and all Solr nodes breathe easy too.
> IMO, timeAllowed parameter will almost always kill the whole Solr cluster.
>
> -SG
>
>
>
>
> On Fri, Aug 4, 2017 at 3:30 PM, Varun Thacker  wrote:
>
>> Hi Hrishikesh,
>>
>> I think SOLR-7344 is probably an important addition to Solr. It could help
>> users isolate analytical queries ( streaming ) , search queries and
>> indexing requests and throttle requests
>>
>> Let's continue the discussion on the Jira
>>
>> On Thu, Aug 3, 2017 at 2:03 AM, Rick Leir  wrote:
>>
>> >
>> >
>> > On 2017-08-02 11:33 PM, Shawn Heisey wrote:
>> >
>> >> On 8/2/2017 8:41 PM, S G wrote:
>> >>
>> >>> Problem is that peak load estimates are just estimates.
>> >>> It would be nice to enforce them from Solr side such that if a rate
>> >>> higher than that is seen at any core, the core will automatically
>> begin to
>> >>> reject the requests.
>> >>> Such a feature would contribute to cluster stability while making sure
>> >>> the customer gets an exception to remind them of a slower rate.
>> >>>
>> >> Solr doesn't have anything like this.  This is primarily because there
>> >> is no network server code in Solr.  The networking is provided by the
>> >> servlet container.  The container in modern Solr versions is nearly
>> >> guaranteed to be Jetty.  As long as I have been using Solr, it has
>> >> shipped with a Jetty container.
>> >>
>> >> https://wiki.apache.org/solr/WhyNoWar
>> >>
>> >> I have no idea whether Jetty is capable of the kind of rate limiting
>> >> you're after.  If it is, it would be up to you to figure out the
>> >> configuration.
>> >>
>> >> You could always put a proxy server like haproxy in front of Solr.  I'm
>> >> pretty sure that haproxy is capable rejecting connections when the
>> >> request rate gets too high.  Other proxy servers (nginx, apache, F5
>> >> BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
>> >> of this.
>> >>
>> >> IMHO, intentionally causing connections to fail when a limit is
>> exceeded
>> >> would not be a very good idea.  When the rate gets too high, the first
>> >> thing that happens is all the requests slow down.  The slowdown could
>> be
>> >> dramatic.  As the rate continues to increase, some of the requests
>> >> probably would begin to fail.
>> >>
>> >> What you're proposing would be guaranteed to cause requests to fail.
>> >> Failing requests are even more likely than slow requests to result in
>> >> users finding a new source for whatever service they are getting from
>> >> your organization.
>> >>
>> > Shawn,
>> > Agreed, a connection limit is not a good idea.  But there is the
>> > timeAllowed parameter <https://cwiki.apache.org/conf
>> > luence/display/solr/Common+Query+Parameters#CommonQueryPa
>> > rameters-ThetimeAllowedParameter>
>> > timeAllowed - This parameter specifies the amount of time, in
>> > milliseconds, allowed for a search to complete. If this time expires
>> before
>> > the search is complete, any partial results will be returned.
>> >
>> > https://stackoverflow.com/questions/19557476/timing-out-a-query-in-solr
>> >
>> > With timeAllowed, you need not estimate what connection rate is
>> > unbearable. Rather, you would set a max response time. If some queries
>> take
>> > much longer than other queries, then this would cause the long ones to
>> > fail, which might be a good strategy. However, if queries normally all
>> take
>> > about the same time, then this would cause all queries to return partial
>> > results until the server recovers, which might be a bad strategy. In
>> this
>> > case, Walter's post is sensible.
>> >
>> > A previous thread suggested that timeAllowed could cause bad performance
>> > on some cloud servers.
>> > cheers -- Rick
>> >
>> >
>> >
>> >
>> >
>>
>
>


Re: Limiting the number of queries/updates to Solr

2017-08-04 Thread S G
timeAllowed parameter is a not a good choice for rate limiting and could
crash the whole Solr cluster.
In fact, timeAllowed parameter should increase the chances of crashing the
whole cluster:

When the timeAllowed for a query is over, it's client will get a failure
but the server handling the query itself will not kill the thread running
that query. So Solr itself would still be working on that long-running
query but the client has got a timeOut.
These failure-receiving client-threads are now free to process other
requests: retry failed ones or fire new queries to Solr.
This should suffocate Solr even more, although client application's threads
will not get blocked ever.

With a rate limiter, we save both - clients' extra traffic gets
rejected-responses and all Solr nodes breathe easy too.
IMO, timeAllowed parameter will almost always kill the whole Solr cluster.

-SG




On Fri, Aug 4, 2017 at 3:30 PM, Varun Thacker  wrote:

> Hi Hrishikesh,
>
> I think SOLR-7344 is probably an important addition to Solr. It could help
> users isolate analytical queries ( streaming ) , search queries and
> indexing requests and throttle requests
>
> Let's continue the discussion on the Jira
>
> On Thu, Aug 3, 2017 at 2:03 AM, Rick Leir  wrote:
>
> >
> >
> > On 2017-08-02 11:33 PM, Shawn Heisey wrote:
> >
> >> On 8/2/2017 8:41 PM, S G wrote:
> >>
> >>> Problem is that peak load estimates are just estimates.
> >>> It would be nice to enforce them from Solr side such that if a rate
> >>> higher than that is seen at any core, the core will automatically
> begin to
> >>> reject the requests.
> >>> Such a feature would contribute to cluster stability while making sure
> >>> the customer gets an exception to remind them of a slower rate.
> >>>
> >> Solr doesn't have anything like this.  This is primarily because there
> >> is no network server code in Solr.  The networking is provided by the
> >> servlet container.  The container in modern Solr versions is nearly
> >> guaranteed to be Jetty.  As long as I have been using Solr, it has
> >> shipped with a Jetty container.
> >>
> >> https://wiki.apache.org/solr/WhyNoWar
> >>
> >> I have no idea whether Jetty is capable of the kind of rate limiting
> >> you're after.  If it is, it would be up to you to figure out the
> >> configuration.
> >>
> >> You could always put a proxy server like haproxy in front of Solr.  I'm
> >> pretty sure that haproxy is capable rejecting connections when the
> >> request rate gets too high.  Other proxy servers (nginx, apache, F5
> >> BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
> >> of this.
> >>
> >> IMHO, intentionally causing connections to fail when a limit is exceeded
> >> would not be a very good idea.  When the rate gets too high, the first
> >> thing that happens is all the requests slow down.  The slowdown could be
> >> dramatic.  As the rate continues to increase, some of the requests
> >> probably would begin to fail.
> >>
> >> What you're proposing would be guaranteed to cause requests to fail.
> >> Failing requests are even more likely than slow requests to result in
> >> users finding a new source for whatever service they are getting from
> >> your organization.
> >>
> > Shawn,
> > Agreed, a connection limit is not a good idea.  But there is the
> > timeAllowed parameter <https://cwiki.apache.org/conf
> > luence/display/solr/Common+Query+Parameters#CommonQueryPa
> > rameters-ThetimeAllowedParameter>
> > timeAllowed - This parameter specifies the amount of time, in
> > milliseconds, allowed for a search to complete. If this time expires
> before
> > the search is complete, any partial results will be returned.
> >
> > https://stackoverflow.com/questions/19557476/timing-out-a-query-in-solr
> >
> > With timeAllowed, you need not estimate what connection rate is
> > unbearable. Rather, you would set a max response time. If some queries
> take
> > much longer than other queries, then this would cause the long ones to
> > fail, which might be a good strategy. However, if queries normally all
> take
> > about the same time, then this would cause all queries to return partial
> > results until the server recovers, which might be a bad strategy. In this
> > case, Walter's post is sensible.
> >
> > A previous thread suggested that timeAllowed could cause bad performance
> > on some cloud servers.
> > cheers -- Rick
> >
> >
> >
> >
> >
>


Limiting the number of queries/updates to Solr

2017-08-02 Thread S G
Hi,

My team provides Solr clusters to several other teams in my company.
We get peak-requirements for query-rate and update-rate from our customers
and load-test the cluster based on the same.
This helps us arrive at a cluster suitable for a given peak load.

Problem is that peak load estimates are just estimates.
It would be nice to enforce them from Solr side such that if a rate higher
than that is seen at any core, the core will automatically begin to reject
the requests.
Such a feature would contribute to cluster stability while making sure the
customer gets an exception to remind them of a slower rate.

A configuration like the following in managed-schema or solrconfig.xml
would be great:

  
  
  
  


If the rate exceeds the above limits, an exception like the following
should be thrown: "Cannot process more than 500 updates/second. Please slow
down or raise the coreRateLimiter.update limits in solrconfig.xml'

Is
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/store/RateLimiter.SimpleRateLimiter.html
a step in that direction?

Thanks
SG


CloudSolrClient preferred over LBHttpSolrClient

2017-07-17 Thread S G
Hi,

Does anyone know if CloudSolrClient is preferred over LBHttpSolrClient ?
If yes, why so and has there been any good performance benefits documented
anywhere?

Thanks
SG


admin/metrics API or read JMX by jolokia?

2017-06-25 Thread S G
Hi,

The API admin/metrics

in
6.x version of Solr seems to be very good.
Is it performance friendly as well?

We want to use this API to query the metrics every minute or so from all
Solr nodes and push to grafana.
How does this compare with the performance overhead of reading JMX metrics
via Jolokia?

Rest API is surely easier to understand and parse.
However it involves making a REST call that will pass through jetty,
probably take up a thread for each request? etc.
Is Jolokia lighter-weight in this respect?

Some recommendation on this would be great.

Thanks
SG


Re: SolrException: Error trying to proxy request for url: solr/sync-status/admin/system

2017-06-20 Thread S G
Got no response on the solr-user mailing list and so trying the dev-mailing
list.

Please guide me if this should not be done. But I thought that the issue
looks strange enough to post it here.

Thanks
SG


On Mon, Jun 19, 2017 at 8:13 PM, S G  wrote:

> Hi,
>
> We are stuck in a strange problem.
> Whole cluster is red. All nodes are being shown as down.
> Restart of the nodes is not helping either.
>
> All our nodes seem to have gone into a distributed lock.
> Here is the grep command I ran on all the solr.log files:
>
> grep "Error trying to proxy request" $f | cut -d" " -f14 | sort | uniq
> -c
> And the output from 10 different solr-nodes' solr.log file is shown below:
> (Basically each node is calling admin/system on other nodes and throwing
> exceptions. You can see number of exceptions thrown by each server for
> every other server).
>
>
>
> SVR_1.log
>   13 http://SVR_2:8983/solr/my-collection/admin/system
>   18 http://SVR_3:8983/solr/my-collection/admin/system
>   19 http://SVR_4:8983/solr/my-collection/admin/system
>   15 http://SVR_6:8983/solr/my-collection/admin/system
>   13 http://SVR_7:8983/solr/my-collection/admin/system
>   13 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_2.log
>  335 http://SVR_3:8983/solr/my-collection/admin/system
>   23 http://SVR_4:8983/solr/my-collection/admin/system
>   21 http://SVR_6:8983/solr/my-collection/admin/system
>   23 http://SVR_7:8983/solr/my-collection/admin/system
>   23 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_3.log
>   24 http://SVR_2:8983/solr/my-collection/admin/system
>   14 http://SVR_4:8983/solr/my-collection/admin/system
>   13 http://SVR_6:8983/solr/my-collection/admin/system
>   14 http://SVR_7:8983/solr/my-collection/admin/system
>   16 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_4.log
>   11 http://SVR_2:8983/solr/my-collection/admin/system
>   29 http://SVR_3:8983/solr/my-collection/admin/system
>7 http://SVR_6:8983/solr/my-collection/admin/system
>   16 http://SVR_7:8983/solr/my-collection/admin/system
>   11 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_5.log
>   18 http://SVR_2:8983/solr/my-collection/admin/system
>   16 http://SVR_3:8983/solr/my-collection/admin/system
>   13 http://SVR_4:8983/solr/my-collection/admin/system
>   12 http://SVR_6:8983/solr/my-collection/admin/system
>   16 http://SVR_7:8983/solr/my-collection/admin/system
>   11 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_6.log
>   44 http://SVR_2:8983/solr/my-collection/admin/system
>  296 http://SVR_3:8983/solr/my-collection/admin/system
>   40 http://SVR_4:8983/solr/my-collection/admin/system
>   15 http://SVR_7:8983/solr/my-collection/admin/system
>   15 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_7.log
>   59 http://SVR_2:8983/solr/my-collection/admin/system
>  215 http://SVR_3:8983/solr/my-collection/admin/system
>   62 http://SVR_4:8983/solr/my-collection/admin/system
>   47 http://SVR_6:8983/solr/my-collection/admin/system
>   61 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_8.log
>   13 http://SVR_2:8983/solr/my-collection/admin/system
>   18 http://SVR_3:8983/solr/my-collection/admin/system
>   10 http://SVR_4:8983/solr/my-collection/admin/system
>7 http://SVR_6:8983/solr/my-collection/admin/system
>   12 http://SVR_7:8983/solr/my-collection/admin/system
>   13 http://SVR_9:8983/solr/my-collection/admin/system
>
> SVR_9.log
>   38 http://SVR_2:8983/solr/my-collection/admin/system
>  229 http://SVR_3:8983/solr/my-collection/admin/system
>   15 http://SVR_4:8983/solr/my-collection/admin/system
>   22 http://SVR_6:8983/solr/my-collection/admin/system
>   26 http://SVR_7:8983/solr/my-collection/admin/system
>
> SVR_10.log
>9 http://SVR_2:8983/solr/my-collection/admin/system
>   22 http://SVR_3:8983/solr/my-collection/admin/system
>   18 http://SVR_4:8983/solr/my-collection/admin/system
>   14 http://SVR_6:8983/solr/my-collection/admin/system
>   18 http://SVR_7:8983/solr/my-collection/admin/system
>   10 http://SVR_9:8983/solr/my-collection/admin/system
>
>
> Thanks
> SG
>


SolrException: Error trying to proxy request for url: solr/sync-status/admin/system

2017-06-19 Thread S G
Hi,

We are stuck in a strange problem.
Whole cluster is red. All nodes are being shown as down.
Restart of the nodes is not helping either.

All our nodes seem to have gone into a distributed lock.
Here is the grep command I ran on all the solr.log files:

grep "Error trying to proxy request" $f | cut -d" " -f14 | sort | uniq
-c
And the output from 10 different solr-nodes' solr.log file is shown below:
(Basically each node is calling admin/system on other nodes and throwing
exceptions. You can see number of exceptions thrown by each server for
every other server).



SVR_1.log
  13 http://SVR_2:8983/solr/my-collection/admin/system
  18 http://SVR_3:8983/solr/my-collection/admin/system
  19 http://SVR_4:8983/solr/my-collection/admin/system
  15 http://SVR_6:8983/solr/my-collection/admin/system
  13 http://SVR_7:8983/solr/my-collection/admin/system
  13 http://SVR_9:8983/solr/my-collection/admin/system

SVR_2.log
 335 http://SVR_3:8983/solr/my-collection/admin/system
  23 http://SVR_4:8983/solr/my-collection/admin/system
  21 http://SVR_6:8983/solr/my-collection/admin/system
  23 http://SVR_7:8983/solr/my-collection/admin/system
  23 http://SVR_9:8983/solr/my-collection/admin/system

SVR_3.log
  24 http://SVR_2:8983/solr/my-collection/admin/system
  14 http://SVR_4:8983/solr/my-collection/admin/system
  13 http://SVR_6:8983/solr/my-collection/admin/system
  14 http://SVR_7:8983/solr/my-collection/admin/system
  16 http://SVR_9:8983/solr/my-collection/admin/system

SVR_4.log
  11 http://SVR_2:8983/solr/my-collection/admin/system
  29 http://SVR_3:8983/solr/my-collection/admin/system
   7 http://SVR_6:8983/solr/my-collection/admin/system
  16 http://SVR_7:8983/solr/my-collection/admin/system
  11 http://SVR_9:8983/solr/my-collection/admin/system

SVR_5.log
  18 http://SVR_2:8983/solr/my-collection/admin/system
  16 http://SVR_3:8983/solr/my-collection/admin/system
  13 http://SVR_4:8983/solr/my-collection/admin/system
  12 http://SVR_6:8983/solr/my-collection/admin/system
  16 http://SVR_7:8983/solr/my-collection/admin/system
  11 http://SVR_9:8983/solr/my-collection/admin/system

SVR_6.log
  44 http://SVR_2:8983/solr/my-collection/admin/system
 296 http://SVR_3:8983/solr/my-collection/admin/system
  40 http://SVR_4:8983/solr/my-collection/admin/system
  15 http://SVR_7:8983/solr/my-collection/admin/system
  15 http://SVR_9:8983/solr/my-collection/admin/system

SVR_7.log
  59 http://SVR_2:8983/solr/my-collection/admin/system
 215 http://SVR_3:8983/solr/my-collection/admin/system
  62 http://SVR_4:8983/solr/my-collection/admin/system
  47 http://SVR_6:8983/solr/my-collection/admin/system
  61 http://SVR_9:8983/solr/my-collection/admin/system

SVR_8.log
  13 http://SVR_2:8983/solr/my-collection/admin/system
  18 http://SVR_3:8983/solr/my-collection/admin/system
  10 http://SVR_4:8983/solr/my-collection/admin/system
   7 http://SVR_6:8983/solr/my-collection/admin/system
  12 http://SVR_7:8983/solr/my-collection/admin/system
  13 http://SVR_9:8983/solr/my-collection/admin/system

SVR_9.log
  38 http://SVR_2:8983/solr/my-collection/admin/system
 229 http://SVR_3:8983/solr/my-collection/admin/system
  15 http://SVR_4:8983/solr/my-collection/admin/system
  22 http://SVR_6:8983/solr/my-collection/admin/system
  26 http://SVR_7:8983/solr/my-collection/admin/system

SVR_10.log
   9 http://SVR_2:8983/solr/my-collection/admin/system
  22 http://SVR_3:8983/solr/my-collection/admin/system
  18 http://SVR_4:8983/solr/my-collection/admin/system
  14 http://SVR_6:8983/solr/my-collection/admin/system
  18 http://SVR_7:8983/solr/my-collection/admin/system
  10 http://SVR_9:8983/solr/my-collection/admin/system


Thanks
SG


Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

2017-06-06 Thread S G
Hi,

We are seeing some very bad performance on our performance test that tries
to load a 2 shard, 3 replica system with about 2000 writes/sec and 2000
reads/sec

The exception stack trace seems to point to a specific line of code and a
similar stack trace is reported by users on Elastic-Search forums too.

Could this be a a common bug in Lucene which is affecting both the systems?
https://issues.apache.org/jira/browse/SOLR-10806

One bad part about Solr is that once it happens, the whole system comes to
a grinding halt.
Solr UI is not accessible, even for the nodes not hosting any collections !
It would be really nice to get rid of such an instability in the system.

Thanks
SG


Backing up indexes to an HDFS filesystem

2017-05-15 Thread S G
Hi,

I have a few doubts regarding the documentation at https://cwiki.apache.org/
confluence/display/solr/Making+and+Restoring+Backups for backing up the
indexes to a HDFS filesystem

1) How frequently are the indexes backed up?
2) Is there a possibility of data-loss if Solr crashes between two backups?
3) Is it production ready?
4) What is the performance impact of backup?
5) How quick are the restores? (i.e some benchmarking of time vs index size)


Thanks
SG


Re: Recommended index-size per core

2017-05-11 Thread S G
Thanks Toke. Your answer did help me a lot.

But one part about your answer is something that has always been confusing
to be me.

> The JVM heap is not used for caching the index data directly (although it
holds derived data). What you need is free memory on your machine for OS
disk-caching.
> The ideal JVM size is extremely dependent on how you index, query and
adjust the filter-cache (secondarily the other caches, but the filter-cache
tends to be the large one).  A heap of 10GB might very well be fine for
handling your whole 50GB index. If that is on a 64GB machine, the remaining
54GB of RAM (minus the other stuff that is running) ought to ensure a fully
cached index.

How can 50GB index be handled by a 10GB heap?
I am a developer myself and would love to know as many details as possible.
So a long answer would be much appreciated.

Also, on a related note - Are stored fields or doc values part of index
too? Are they stored in JVM or OS-cache? (I would guess latter, but does
that mean JVM is just not required for those or a small percentage?)

Thanks
SG




On Thu, May 11, 2017 at 7:33 AM, Shawn Heisey  wrote:

> On 5/10/2017 11:52 AM, S G wrote:
> > Is there a recommendation on the size of index that one should host
> > per core?
>
> No, there really isn't.
>
> I can list off a bunch of recommendations, but a whole bunch of things
> that I don't know about your install could make those recommendations
> completely wrong.  An index size that works really well for one person
> might have terrible performance for another.
>
> If you haven't already built it, then there are possibly even things
> that YOU don't know about your install yet that can affect what what you
> need.
>
> https://lucidworks.com/sizing-hardware-in-the-abstract-why-
> we-dont-have-a-definitive-answer/
>
> The only general advice I can give you is this:  You're probably going
> to need more RAM.
>
> Thanks,
> Shawn
>
>


Recommended index-size per core

2017-05-10 Thread S G
Hi,

Is there a recommendation on the size of index that one should host per
core?
Idea is to come up with an *initial* shard/replica setting for a load test.
And then arrive at a good cluster size based on that testing.


*Example: *

Num documents: 100 million
Average document size: 1kb
So total space required:  100 gb

Indexable fields per document: 5 strings, average field-size: 100 chars
So total index space required for all docs: 50gb (assuming all unique words)


*Rough estimates for an initial size:*

50gb index is best served if all of it is in memory.
And JVMs perform the best if their max-heap is between 15-20gb
So a starting point for num-shards: 50gb/20gb ~ 3

Now if all index is in memory per core, then replicas can serve queries
with a much higher throughput.
So we can begin with 2 replicas per shard.

*Questions:*

Are there any other factors that we can consider *initially* to make our
calculations more precise.
Note that the goal of the exercise is not to get rid of load-testing, only
to start with a close-enough cluster setting so that load testing can
finish faster.

Thanks
SG


Re: Confusing debug=timing parameter

2016-12-18 Thread S G
Thank you Furkan.

I am still a little confused.
So I will shorten the response and post only the relevant pieces for easier
understanding.

 "responseHeader": {
"status": 0,
"QTime": 2978
}
 "response": {
"numFound": 1565135270,
  },
  "debug": {
"timing": {
  "time": 19320,
  "prepare": {
"time": 4,
"query": {
  "time": 3
},
  "process": {
"time": 19315,
"query": {
  "time": 19309
}
   }
}

As I understand, QTime is the total time spent by the core.
"process", "prepare" etc. are all the parts that together make the part of
query processing.
And so their times should approximately add up to the QTime.
Numbers wise, I would have expected prepare-time + process-time <= QTime
Or 4 + 19315 <= 2978
This is obviously not correct.

Where am I making a mistake?
Any pointers would be greatly appreciated.

Thanks
SG




On Sun, Dec 18, 2016 at 4:40 AM, Furkan KAMACI 
wrote:

> Hi,
>
> Let me explain you *time* *parameters in Solr*:
>
> *Timing* parameter of debug returns information about how long the query
> took to process.
>
> *Query time* shows information of how long did it take in Solr to get the
> search
> results. It doesn't include reading bits from disk, etc.
>
> Also, there is another parameter named as *elapsed time*. It shows time
> frame of the query sent to Solr and response is returned. Includes query
> time, reading bits from disk, constructing the response and transmissioning
> it, etc.
>
> Kind Regards,
> Furkan KAMACI
>
> On Sat, Dec 17, 2016 at 6:43 PM, S G  wrote:
>
> > Hi,
> >
> > I am using Solr 4.10 and its response time for the clients is not very
> > good.
> > Even though the Solr's plugin/stats shows less than 200 milliseconds,
> > clients report several seconds in response time.
> >
> > So I tried using debug-timing parameter from the Solr UI and this is
> what I
> > got.
> > Note how the QTime is 2978 while the time in debug-timing is 19320.
> >
> > What does this mean?
> > How can Solr return a result in 3 seconds when time taken between two
> > points in the same path is 20 seconds ?
> >
> > {
> >   "responseHeader": {
> > "status": 0,
> > "QTime": 2978,
> > "params": {
> >   "q": "*:*",
> >   "debug": "timing",
> >   "indent": "true",
> >   "wt": "json",
> >   "_": "1481992653008"
> > }
> >   },
> >   "response": {
> > "numFound": 1565135270,
> > "start": 0,
> > "maxScore": 1,
> > "docs": [
> >   
> > ]
> >   },
> >   "debug": {
> > "timing": {
> >   "time": 19320,
> >   "prepare": {
> > "time": 4,
> > "query": {
> >   "time": 3
> > },
> > "facet": {
> >   "time": 0
> > },
> > "mlt": {
> >   "time": 0
> > },
> > "highlight": {
> >   "time": 0
> > },
> > "stats": {
> >   "time": 0
> > },
> > "expand": {
> >   "time": 0
> > },
> > "debug": {
> >   "time": 0
> > }
> >   },
> >   "process": {
> > "time": 19315,
> > "query": {
> >   "time": 19309
> > },
> > "facet": {
> >   "time": 0
> > },
> > "mlt": {
> >   "time": 1
> > },
> > "highlight": {
> >   "time": 0
> > },
> > "stats": {
> >   "time": 0
> > },
> > "expand": {
> >   "time": 0
> > },
> > "debug": {
> >   "time": 5
> > }
> >   }
> > }
> >   }
> > }
> >
>


How to identify documents failed in a batch request?

2016-12-17 Thread S G
Hi,

I am using the following code to send documents to Solr:

final UpdateRequest request = new UpdateRequest();
request.setAction(UpdateRequest.ACTION.COMMIT, false, false);
request.add(docsList);
UpdateResponse response = request.process(solrClient);

The response returned from the last line does not seem to be very helpful
in determining how I can identify documents failed in a batch request.

Does anyone know how this can be done?

Thanks
SG


Confusing debug=timing parameter

2016-12-17 Thread S G
Hi,

I am using Solr 4.10 and its response time for the clients is not very good.
Even though the Solr's plugin/stats shows less than 200 milliseconds,
clients report several seconds in response time.

So I tried using debug-timing parameter from the Solr UI and this is what I
got.
Note how the QTime is 2978 while the time in debug-timing is 19320.

What does this mean?
How can Solr return a result in 3 seconds when time taken between two
points in the same path is 20 seconds ?

{
  "responseHeader": {
"status": 0,
"QTime": 2978,
"params": {
  "q": "*:*",
  "debug": "timing",
  "indent": "true",
  "wt": "json",
  "_": "1481992653008"
}
  },
  "response": {
"numFound": 1565135270,
"start": 0,
"maxScore": 1,
"docs": [
  
]
  },
  "debug": {
"timing": {
  "time": 19320,
  "prepare": {
"time": 4,
"query": {
  "time": 3
},
"facet": {
  "time": 0
},
"mlt": {
  "time": 0
},
"highlight": {
  "time": 0
},
"stats": {
  "time": 0
},
"expand": {
  "time": 0
},
"debug": {
  "time": 0
}
  },
  "process": {
"time": 19315,
"query": {
  "time": 19309
},
"facet": {
  "time": 0
},
"mlt": {
  "time": 1
},
"highlight": {
  "time": 0
},
"stats": {
  "time": 0
},
"expand": {
  "time": 0
},
"debug": {
  "time": 5
}
  }
}
  }
}


Re: Memory leak in Solr

2016-12-04 Thread S G
Thank you Eric.
Our Solr version is 4.10 and we are not doing any sorting or faceting.

I am trying to find some ways of investigating this problem.
Hence asking a few more questions to see what are the normal steps taken in
such situations.
(I did search a few of them on the Internet but could not find anything
good).
Any pointers provided here will help us resolve a little more quickly.


1) Is there a conclusive way to know about the memory leaks?
  How does Solr ensure with each release that there are no memory leaks?
  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
second now.
  Looks like we will need to scale it down.
  Total VM memory is 92gb and Solr is the only process running on it.


2) How can I know that the zookeeper connectivity to Solr is not good?
  What commands/steps are normally used to resolve this?
  Does Solr has some metrics that share the zookeeper interaction
statistics?


3) In a span of 9 hours, I see:
  4 times: java.net.SocketException: Connection reset
  32 times: java.net.SocketTimeoutException: Read timed out

And several other exceptions that ultimately bring a whole shard down
(leader is recovery-failed and replica is down).

I understand that the above information might not be sufficient to get the
full picture.
But just in case, someone has resolved or debugged these issues before,
please share your experience.
It would be of great help to me.

Thanks,
SG





On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson 
wrote:

> All of this is consistent with not having a properly
> tuned Solr instance wrt # documents, usage
> pattern, memory allocated to the JVM, GC
> settings and the like.
>
> Your leader issues can be explained by long
> GC pauses too. Zookeeper periodically pings
> each replica it knows about and if the response
> times out (due to GC in this case) then Zookeeper
> thinks the node has gone away and marks
> it as "down". Similarly when a leader forwards
> an update to a follower and the request times
> out, the leader will mark the follower as down.
> Do this enough and the state of the cluster gets
> "interesting".
>
> You still haven't told us what version of Solr
> you're using, the "Version" you took from
> the core stats is the version of the _index_,
> not Solr.
>
> You have almost 200M documents on
> a single core. That's definitely on the high side,
> although I've seen that work. Assuming
> you aren't doing things like faceting and
> sorting and the like on non docValues fields.
>
> As others have pointed out, the link you
> provided doesn't provide much in the way of
> any "smoking guns" as far as a memory
> leak is concerned.
>
> I've certainly seen situations where memory
> required by Solr is close to the total memory
> allocated to the JVM for instance. Then the GC
> cycle kicks in and recovers just enough to
> go on for a very brief time before going into another
> GC cycle resulting in very poor performance.
>
> So overall this looks like you need to do some
> serious tuning of your Solr instances, take a
> hard look at how you're using your physical
> machines. You specify that these are VMs,
> but how many VMs are you running per box?
> How much JVM have you allocated for each?
> How much total physical memory do you have
> to work with per box?
>
> Even if you provide the answers to the above
> questions, there's not much we can do to
> help you resolve your issues assuming it's
> simply inappropriate sizing. I'd really recommend
> you create a stress environment so you can
> test different scenarios to become confident about
> your expected performance, here's a blog on the
> subject:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
> the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Sat, Dec 3, 2016 at 8:46 PM, S G  wrote:
> > The symptom we see is that the java clients querying Solr see response
> > times in 10s of seconds (not milliseconds).
> > And on the tomcat's gc.log file (where Solr is running), we see very bad
> GC
> > pauses - threads being paused for 0.5 seconds per second approximately.
> >
> > Some numbers for the Solr Cloud:
> >
> > *Overall infrastructure:*
> > - Only one collection
> > - 16 VMs used
> > - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
> >
> > *Overview from one core:*
> > - Num Docs:193,623,388
> > - Max Doc:230,577,696
> > - Heap Memory Usage:231,217,880
> > - Deleted Docs:36,954,308
> > - Version:2,357,757
> > - Segment Count:37
> >
> > *Stats from QueryHandler/select*
> > - requests:78,557
&g

Re: Memory leak in Solr

2016-12-03 Thread S G
The symptom we see is that the java clients querying Solr see response
times in 10s of seconds (not milliseconds).
And on the tomcat's gc.log file (where Solr is running), we see very bad GC
pauses - threads being paused for 0.5 seconds per second approximately.

Some numbers for the Solr Cloud:

*Overall infrastructure:*
- Only one collection
- 16 VMs used
- 8 shards (1 leader and 1 replica per shard - each core on separate VM)

*Overview from one core:*
- Num Docs:193,623,388
- Max Doc:230,577,696
- Heap Memory Usage:231,217,880
- Deleted Docs:36,954,308
- Version:2,357,757
- Segment Count:37

*Stats from QueryHandler/select*
- requests:78,557
- errors:358
- timeouts:0
- totalTime:1,639,975.27
- avgRequestsPerSecond:2.62
- 5minRateReqsPerSecond:1.39
- 15minRateReqsPerSecond:1.64
- avgTimePerRequest:20.87
- medianRequestTime:0.70
- 75thPcRequestTime:1.11
- 95thPcRequestTime:191.76

*Stats from QueryHandler/update*
- requests:33,555
- errors:0
- timeouts:0
- totalTime:227,870.58
- avgRequestsPerSecond:1.12
- 5minRateReqsPerSecond:1.16
- 15minRateReqsPerSecond:1.23
- avgTimePerRequest:6.79
- medianRequestTime:3.16
- 75thPcRequestTime:5.27
- 95thPcRequestTime:9.33

And yet the Solr clients are reporting timeouts and very long read times.

Plus, on every server, we are seeing lots of exceptions.
For example:

Between 8:06:55 PM and 8:21:36 PM, exceptions are:

1) Request says it is coming from leader, but we are the leader:
update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2

2) org.apache.solr.common.SolrException: Request says it is coming from
leader, but we are the leader

3) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

4) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

5) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

6) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

7) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request. Zombie server list:
[HOSTA_ca_1_1456429897]

8) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request. Zombie server list:
[HOSTA_ca_1_1456429897]

9) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

10) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

11) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

12) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

Why are we seeing so many timeouts then and why so huge response times on
the client?

Thanks
SG



On Sat, Dec 3, 2016 at 4:19 PM,  wrote:

> What tool is that ? The stats I would like to run on my Solr instance
>
> Bill Bell
> Sent from mobile
>
>
> > On Dec 2, 2016, at 4:49 PM, Shawn Heisey  wrote:
> >
> >> On 12/2/2016 12:01 PM, S G wrote:
> >> This post shows some stats on Solr which indicate that there might be a
> >> memory leak in there.
> >>
> >> http://stackoverflow.com/questions/40939166/is-this-a-
> memory-leak-in-solr
> >>
> >> Can someone please help to debug this?
> >> It might be a very good step in making Solr stable if we can fix this.
> >
> > +1 to what Walter said.
> >
> > I replied earlier on the stackoverflow question.
> >
> > FYI -- your 95th percentile request time of about 16 milliseconds is NOT
> > something that I would characterize as "very high."  I would *love* to
> > have statistics that good.
> >
> > Even your 99th percentile request time is not much more than a full
> > second.  If a search takes a couple of seconds, most users will not
> > really care, and some might not even notice.  It's when a large
> > percentage of queries start taking several seconds that complaints start
> > coming in.  On your system, 99 percent of your queries are completing in
> > 1.3 seconds or less, and 95 percent of them are less than 17
> > milliseconds.  That sounds quite good 

Memory leak in Solr

2016-12-02 Thread S G
Hi,

This post shows some stats on Solr which indicate that there might be a
memory leak in there.

http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr

Can someone please help to debug this?
It might be a very good step in making Solr stable if we can fix this.

Thanks
SG


Re: Whether SolrCloud can support 2 TB data?

2016-09-23 Thread S G
Hey Yago,

12 T is very impressive.

Can you also share some numbers about the shards, replicas, machine
count/specs and docs/second for your case?
I think you would not be having a single index of 12 TB too. So some
details on that would be really helpful too.

https://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/
is a good post how LucidWorks achieved 150k docs/second.
If you have any such similar blog, that would be quite useful and popular
too.

--SG

On Fri, Sep 23, 2016 at 5:00 PM, Yago Riveiro 
wrote:

> In my company we have a SolrCloud cluster with 12T.
>
> My advices:
>
> Be nice with CPU you will needed in some point (very important if you have
> not control over the kind of queries to the cluster, clients are greedy,
> the want all results at the same time)
>
> SSD and memory (as many as you can afford if you will do facets)
>
> Full recoveries are a pain, network it's important and should be as fast
> as possible, never less than 1Gbit.
>
> Divide and conquer, but too much can drive you to an expensive overhead,
> data travels over the network. Find the sweet point (only testing you use
> case you will know)
>
> --
>
> /Yago Riveiro
>
> On 23 Sep 2016, 23:44 +0100, Pushkar Raste ,
> wrote:
> > Solr is RAM hungry. Make sure that you have enough RAM to have most if
> the
> > index of a core in the RAM itself.
> >
> > You should also consider using really good SSDs.
> >
> > That would be a good start. Like others said, test and verify your setup.
> >
> > --Pushkar Raste
> >
> > On Sep 23, 2016 4:58 PM, "Jeffery Yuan"  wrote:
> >
> > Thanks so much for your prompt reply.
> >
> > We are definitely going to use SolrCloud.
> >
> > I am just wondering whether SolrCloud can scale even at TB data level and
> > what kind of hardware configuration it should be.
> >
> > Thanks.
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Whether-solr-can-support-2-TB-data-tp4297790p4297800.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>