Re: Solr queries slow down over time

2020-09-25 Thread Goutham Tholpadi
Hi Mark, Thanks for confirming Dwane's advice from your own experience. I
will shift to a streaming expressions implementation.

Best
Goutham

On Fri, Sep 25, 2020 at 7:03 PM Mark H. Wood  wrote:

> On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote:
> > I have around 30M documents in Solr, and I am doing repeated *:* queries
> > with rows=1, and changing start to 0, 1, 2, and so on, in a
> > loop in my script (using pysolr).
> >
> > At the start of the iteration, the calls to Solr were taking less than 1
> > sec each. After running for a few hours (with start at around 27M) I
> found
> > that each call was taking around 30-60 secs.
> >
> > Any pointers on why the same fetch of 1 records takes much longer
> now?
> > Does Solr need to load all the 27M before getting the last 1 records?
>
> I and many others have run into the same issue.  Yes, each windowed
> query starts fresh, having to find at least enough records to satisfy
> the query, walking the list to discard the first 'start' worth of
> them, and then returning the next 'rows' worth.  So as 'start' increases,
> the work required of Solr increases and the response time lengthens.
>
> > Is there a better way to do this operation using Solr?
>
> Another answer in this thread gives links to resources for addressing
> the problem, and I can't improve on those.
>
> I can say that when I switched from start= windowing to cursormark, I
> got a very nice improvement in overall speed and did not see the
> progressive slowing anymore.  A query loop that ran for *days* now
> completes in under five minutes.  In some way that I haven't quite
> figured out, a cursormark tells Solr where in the overall document
> sequence to start working.
>
> So yes, there *is* a better way.
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


Re: Solr queries slow down over time

2020-09-25 Thread Goutham Tholpadi
Thanks a ton, Dwane. I went through the article and the documentation link.
This corresponds exactly to my use case.

Best
Goutham

On Fri, Sep 25, 2020 at 2:59 PM Dwane Hall  wrote:

> Goutham I suggest you read Hossman's excellent article on deep paging and
> why returning rows=(some large number) is a bad idea. It provides an
> thorough overview of the concept and will explain it better than I ever
> could (
> https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18).
> In short if you want to extract that many documents out of your corpus use
> cursor mark, streaming expressions, or Solr's parallel SQL interface (that
> uses streaming expressions under the hood)
> https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html.
>
> Thanks,
>
> Dwane
> --
> *From:* Goutham Tholpadi 
> *Sent:* Friday, 25 September 2020 4:19 PM
> *To:* solr-user@lucene.apache.org 
> *Subject:* Solr queries slow down over time
>
> Hi,
>
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=1, and changing start to 0, 1, 2, and so on, in a
> loop in my script (using pysolr).
>
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
>
> Any pointers on why the same fetch of 1 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 1 records?
> Is there a better way to do this operation using Solr?
>
> Thanks!
> Goutham
>


Re: Solr queries slow down over time

2020-09-25 Thread Mark H. Wood
On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote:
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=1, and changing start to 0, 1, 2, and so on, in a
> loop in my script (using pysolr).
> 
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
> 
> Any pointers on why the same fetch of 1 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 1 records?

I and many others have run into the same issue.  Yes, each windowed
query starts fresh, having to find at least enough records to satisfy
the query, walking the list to discard the first 'start' worth of
them, and then returning the next 'rows' worth.  So as 'start' increases,
the work required of Solr increases and the response time lengthens.

> Is there a better way to do this operation using Solr?

Another answer in this thread gives links to resources for addressing
the problem, and I can't improve on those.

I can say that when I switched from start= windowing to cursormark, I
got a very nice improvement in overall speed and did not see the
progressive slowing anymore.  A query loop that ran for *days* now
completes in under five minutes.  In some way that I haven't quite
figured out, a cursormark tells Solr where in the overall document
sequence to start working.

So yes, there *is* a better way.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Solr queries slow down over time

2020-09-25 Thread Dwane Hall
Goutham I suggest you read Hossman's excellent article on deep paging and why 
returning rows=(some large number) is a bad idea. It provides an thorough 
overview of the concept and will explain it better than I ever could 
(https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18).
 In short if you want to extract that many documents out of your corpus use 
cursor mark, streaming expressions, or Solr's parallel SQL interface (that uses 
streaming expressions under the hood)
https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html.

Thanks,

Dwane

From: Goutham Tholpadi 
Sent: Friday, 25 September 2020 4:19 PM
To: solr-user@lucene.apache.org 
Subject: Solr queries slow down over time

Hi,

I have around 30M documents in Solr, and I am doing repeated *:* queries
with rows=1, and changing start to 0, 1, 2, and so on, in a
loop in my script (using pysolr).

At the start of the iteration, the calls to Solr were taking less than 1
sec each. After running for a few hours (with start at around 27M) I found
that each call was taking around 30-60 secs.

Any pointers on why the same fetch of 1 records takes much longer now?
Does Solr need to load all the 27M before getting the last 1 records?
Is there a better way to do this operation using Solr?

Thanks!
Goutham


Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Anil
Thanks Guys.

i will try two level document routing in case of file_collection.

i really don't understand why index size is high for file_collection as
same file is available in main_collection.

(each file indexed as one document with all commands in main  collection
and same file is indexed as number of documents, each command as a solr
document in file_collection).

will index size grows with more distinct words or few distinct words with
more number of documents ? let me know if i have not put the question
correctly.

Thanks,
Anil

On 15 March 2016 at 01:00, Susheel Kumar  wrote:

> If you can find/know which fields (or combination) in your document divides
> / groups the data together would be the fields for custom routing.  Solr
> supports up to two level.
>
> E.g. if you have field with say documentType or country or etc. would
> help.  See the document routing at
>
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
>
>
> On Mon, Mar 14, 2016 at 3:14 PM, Erick Erickson 
> wrote:
>
> > Usually I just let the compositeId do its thing and only go for custom
> > routing when the default proves inadequate.
> >
> > Note: your 480M documents may very well be too many for three shards!
> > You really have to test
> >
> > Erick
> >
> >
> > On Mon, Mar 14, 2016 at 10:04 AM, Anil  wrote:
> > > Hi Erick,
> > > In b/w, Do you recommend any effective shard distribution method ?
> > >
> > > Regards,
> > > Anil
> > >
> > > On 14 March 2016 at 22:30, Erick Erickson 
> > wrote:
> > >
> > >> Try shards.info=true, but pinging the shard directly is the most
> > certain.
> > >>
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Mon, Mar 14, 2016 at 9:48 AM, Anil  wrote:
> > >> > HI Erik,
> > >> >
> > >> > we have used document routing to balance the shards load and for
> > >> > expand/collapse. it is mainly used for main_collection which holds
> > one to
> > >> > many relationship records. In file_collection, it is only for load
> > >> > distribution.
> > >> >
> > >> > 25GB for entire solr service. each machine will act as shard for
> some
> > >> > collections.
> > >> >
> > >> > we have not stress tested our servers at least for solr service. i
> > have
> > >> > read the the link you have shared, i will do something on it. thanks
> > for
> > >> > sharing.
> > >> >
> > >> > i have checked other collections, where index size is max 90GB and 5
> > M as
> > >> > max number of documents. but for the particular file_collection_2014
> > , i
> > >> > see total index size across replicas is 147 GB.
> > >> >
> > >> > Can we get any hints if we run the query with debugQuery=true ?
> what
> > is
> > >> > the effective way of load distribution ? Please advice.
> > >> >
> > >> > Regards,
> > >> > Anil
> > >> >
> > >> > On 14 March 2016 at 20:32, Erick Erickson 
> > >> wrote:
> > >> >
> > >> >> bq: The slowness is happening for file_collection. though it has 3
> > >> shards,
> > >> >> documents are available in 2 shards. shard1 - 150M docs and shard2
> > has
> > >> 330M
> > >> >> docs , shard3 is empty.
> > >> >>
> > >> >> Well, this collection terribly balanced. Putting 330M docs on a
> > single
> > >> >> shard is
> > >> >> pushing the limits, the only time I've seen that many docs on a
> > shard,
> > >> >> particularly
> > >> >> with 25G of ram, they were very small records. My guess is that you
> > will
> > >> >> find
> > >> >> the queries you send to that shard substantially slower than the
> 150M
> > >> >> shard,
> > >> >> although 150M could also be pushing your limits. You can measure
> this
> > >> >> by sending the query to the specific core (something like
> > >> >>
> > >> >> solr/files_shard1_replica1/query?(your queryhere)=false
> > >> >>
> > >> >> My bet is that your QTime will be significantly different with the
> > two
> > >> >> shards.
> > >> >>
> > >> >> It also sounds like you're using implicit routing where you control
> > >> where
> > >> >> the
> > >> >> files go, it's easy to have unbalanced shards in that case, why did
> > you
> > >> >> decide
> > >> >> to do it this way? There are valid reasons, but...
> > >> >>
> > >> >> In short, my guess is that you've simply overloaded your shard with
> > >> >> 330M docs. It's
> > >> >> not at all clear that even 150 will give you satisfactory
> > performance,
> > >> >> have you stress
> > >> >> tested your servers? Here's the long form of sizing:
> > >> >>
> > >> >>
> > >> >>
> > >>
> >
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > >> >>
> > >> >> Best,
> > >> >> Erick
> > >> >>
> > >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar <
> > susheel2...@gmail.com>
> > >> >> wrote:
> > >> >> > For each of the solr machines/shards you have.  Thanks.
> > >> >> >
> > >> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <
> > >> 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Susheel Kumar
If you can find/know which fields (or combination) in your document divides
/ groups the data together would be the fields for custom routing.  Solr
supports up to two level.

E.g. if you have field with say documentType or country or etc. would
help.  See the document routing at
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud



On Mon, Mar 14, 2016 at 3:14 PM, Erick Erickson 
wrote:

> Usually I just let the compositeId do its thing and only go for custom
> routing when the default proves inadequate.
>
> Note: your 480M documents may very well be too many for three shards!
> You really have to test
>
> Erick
>
>
> On Mon, Mar 14, 2016 at 10:04 AM, Anil  wrote:
> > Hi Erick,
> > In b/w, Do you recommend any effective shard distribution method ?
> >
> > Regards,
> > Anil
> >
> > On 14 March 2016 at 22:30, Erick Erickson 
> wrote:
> >
> >> Try shards.info=true, but pinging the shard directly is the most
> certain.
> >>
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 14, 2016 at 9:48 AM, Anil  wrote:
> >> > HI Erik,
> >> >
> >> > we have used document routing to balance the shards load and for
> >> > expand/collapse. it is mainly used for main_collection which holds
> one to
> >> > many relationship records. In file_collection, it is only for load
> >> > distribution.
> >> >
> >> > 25GB for entire solr service. each machine will act as shard for some
> >> > collections.
> >> >
> >> > we have not stress tested our servers at least for solr service. i
> have
> >> > read the the link you have shared, i will do something on it. thanks
> for
> >> > sharing.
> >> >
> >> > i have checked other collections, where index size is max 90GB and 5
> M as
> >> > max number of documents. but for the particular file_collection_2014
> , i
> >> > see total index size across replicas is 147 GB.
> >> >
> >> > Can we get any hints if we run the query with debugQuery=true ?  what
> is
> >> > the effective way of load distribution ? Please advice.
> >> >
> >> > Regards,
> >> > Anil
> >> >
> >> > On 14 March 2016 at 20:32, Erick Erickson 
> >> wrote:
> >> >
> >> >> bq: The slowness is happening for file_collection. though it has 3
> >> shards,
> >> >> documents are available in 2 shards. shard1 - 150M docs and shard2
> has
> >> 330M
> >> >> docs , shard3 is empty.
> >> >>
> >> >> Well, this collection terribly balanced. Putting 330M docs on a
> single
> >> >> shard is
> >> >> pushing the limits, the only time I've seen that many docs on a
> shard,
> >> >> particularly
> >> >> with 25G of ram, they were very small records. My guess is that you
> will
> >> >> find
> >> >> the queries you send to that shard substantially slower than the 150M
> >> >> shard,
> >> >> although 150M could also be pushing your limits. You can measure this
> >> >> by sending the query to the specific core (something like
> >> >>
> >> >> solr/files_shard1_replica1/query?(your queryhere)=false
> >> >>
> >> >> My bet is that your QTime will be significantly different with the
> two
> >> >> shards.
> >> >>
> >> >> It also sounds like you're using implicit routing where you control
> >> where
> >> >> the
> >> >> files go, it's easy to have unbalanced shards in that case, why did
> you
> >> >> decide
> >> >> to do it this way? There are valid reasons, but...
> >> >>
> >> >> In short, my guess is that you've simply overloaded your shard with
> >> >> 330M docs. It's
> >> >> not at all clear that even 150 will give you satisfactory
> performance,
> >> >> have you stress
> >> >> tested your servers? Here's the long form of sizing:
> >> >>
> >> >>
> >> >>
> >>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar <
> susheel2...@gmail.com>
> >> >> wrote:
> >> >> > For each of the solr machines/shards you have.  Thanks.
> >> >> >
> >> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <
> >> susheel2...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hello Anil,
> >> >> >>
> >> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
> >> >> >> parameters under System / share the snapshot. ?
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Susheel
> >> >> >>
> >> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
> >> >> >>
> >> >> >>> HI Toke and Jack,
> >> >> >>>
> >> >> >>> Please find the details below.
> >> >> >>>
> >> >> >>> * How large are your 3 shards in bytes? (total index across
> >> replicas)
> >> >> >>>   --  *146G. i am using CDH (cloudera), not sure how to
> >> check
> >> >> the
> >> >> >>> index size of each collection on each shard*
> >> >> >>> * What storage system do you use (local SSD, local spinning
> drives,
> >> >> remote
> >> >> >>> storage...)? *Local (hdfs) spinning drives*
> >> >> >>> * How much 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Erick Erickson
Usually I just let the compositeId do its thing and only go for custom
routing when the default proves inadequate.

Note: your 480M documents may very well be too many for three shards!
You really have to test

Erick


On Mon, Mar 14, 2016 at 10:04 AM, Anil  wrote:
> Hi Erick,
> In b/w, Do you recommend any effective shard distribution method ?
>
> Regards,
> Anil
>
> On 14 March 2016 at 22:30, Erick Erickson  wrote:
>
>> Try shards.info=true, but pinging the shard directly is the most certain.
>>
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 14, 2016 at 9:48 AM, Anil  wrote:
>> > HI Erik,
>> >
>> > we have used document routing to balance the shards load and for
>> > expand/collapse. it is mainly used for main_collection which holds one to
>> > many relationship records. In file_collection, it is only for load
>> > distribution.
>> >
>> > 25GB for entire solr service. each machine will act as shard for some
>> > collections.
>> >
>> > we have not stress tested our servers at least for solr service. i have
>> > read the the link you have shared, i will do something on it. thanks for
>> > sharing.
>> >
>> > i have checked other collections, where index size is max 90GB and 5 M as
>> > max number of documents. but for the particular file_collection_2014 , i
>> > see total index size across replicas is 147 GB.
>> >
>> > Can we get any hints if we run the query with debugQuery=true ?  what is
>> > the effective way of load distribution ? Please advice.
>> >
>> > Regards,
>> > Anil
>> >
>> > On 14 March 2016 at 20:32, Erick Erickson 
>> wrote:
>> >
>> >> bq: The slowness is happening for file_collection. though it has 3
>> shards,
>> >> documents are available in 2 shards. shard1 - 150M docs and shard2 has
>> 330M
>> >> docs , shard3 is empty.
>> >>
>> >> Well, this collection terribly balanced. Putting 330M docs on a single
>> >> shard is
>> >> pushing the limits, the only time I've seen that many docs on a shard,
>> >> particularly
>> >> with 25G of ram, they were very small records. My guess is that you will
>> >> find
>> >> the queries you send to that shard substantially slower than the 150M
>> >> shard,
>> >> although 150M could also be pushing your limits. You can measure this
>> >> by sending the query to the specific core (something like
>> >>
>> >> solr/files_shard1_replica1/query?(your queryhere)=false
>> >>
>> >> My bet is that your QTime will be significantly different with the two
>> >> shards.
>> >>
>> >> It also sounds like you're using implicit routing where you control
>> where
>> >> the
>> >> files go, it's easy to have unbalanced shards in that case, why did you
>> >> decide
>> >> to do it this way? There are valid reasons, but...
>> >>
>> >> In short, my guess is that you've simply overloaded your shard with
>> >> 330M docs. It's
>> >> not at all clear that even 150 will give you satisfactory performance,
>> >> have you stress
>> >> tested your servers? Here's the long form of sizing:
>> >>
>> >>
>> >>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar 
>> >> wrote:
>> >> > For each of the solr machines/shards you have.  Thanks.
>> >> >
>> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <
>> susheel2...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hello Anil,
>> >> >>
>> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
>> >> >> parameters under System / share the snapshot. ?
>> >> >>
>> >> >> Thanks,
>> >> >> Susheel
>> >> >>
>> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
>> >> >>
>> >> >>> HI Toke and Jack,
>> >> >>>
>> >> >>> Please find the details below.
>> >> >>>
>> >> >>> * How large are your 3 shards in bytes? (total index across
>> replicas)
>> >> >>>   --  *146G. i am using CDH (cloudera), not sure how to
>> check
>> >> the
>> >> >>> index size of each collection on each shard*
>> >> >>> * What storage system do you use (local SSD, local spinning drives,
>> >> remote
>> >> >>> storage...)? *Local (hdfs) spinning drives*
>> >> >>> * How much physical memory does your system have? *we have 15 data
>> >> nodes.
>> >> >>> multiple services installed on each data node (252 GB RAM for each
>> data
>> >> >>> node). 25 gb RAM allocated for solr service.*
>> >> >>> * How much memory is free for disk cache? *i could not find.*
>> >> >>> * How many concurrent queries do you issue? *very less. i dont see
>> any
>> >> >>> concurrent queries to this file_collection for now.*
>> >> >>> * Do you update while you search? *Yes.. its very less.*
>> >> >>> * What does a full query (rows, faceting, grouping, highlighting,
>> >> >>> everything) look like? *for the file_collection, rows - 100,
>> >> highlights =
>> >> >>> false, no facets, expand = false.*
>> >> >>> * How many documents 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Anil
Hi Erick,
In b/w, Do you recommend any effective shard distribution method ?

Regards,
Anil

On 14 March 2016 at 22:30, Erick Erickson  wrote:

> Try shards.info=true, but pinging the shard directly is the most certain.
>
>
> Best,
> Erick
>
> On Mon, Mar 14, 2016 at 9:48 AM, Anil  wrote:
> > HI Erik,
> >
> > we have used document routing to balance the shards load and for
> > expand/collapse. it is mainly used for main_collection which holds one to
> > many relationship records. In file_collection, it is only for load
> > distribution.
> >
> > 25GB for entire solr service. each machine will act as shard for some
> > collections.
> >
> > we have not stress tested our servers at least for solr service. i have
> > read the the link you have shared, i will do something on it. thanks for
> > sharing.
> >
> > i have checked other collections, where index size is max 90GB and 5 M as
> > max number of documents. but for the particular file_collection_2014 , i
> > see total index size across replicas is 147 GB.
> >
> > Can we get any hints if we run the query with debugQuery=true ?  what is
> > the effective way of load distribution ? Please advice.
> >
> > Regards,
> > Anil
> >
> > On 14 March 2016 at 20:32, Erick Erickson 
> wrote:
> >
> >> bq: The slowness is happening for file_collection. though it has 3
> shards,
> >> documents are available in 2 shards. shard1 - 150M docs and shard2 has
> 330M
> >> docs , shard3 is empty.
> >>
> >> Well, this collection terribly balanced. Putting 330M docs on a single
> >> shard is
> >> pushing the limits, the only time I've seen that many docs on a shard,
> >> particularly
> >> with 25G of ram, they were very small records. My guess is that you will
> >> find
> >> the queries you send to that shard substantially slower than the 150M
> >> shard,
> >> although 150M could also be pushing your limits. You can measure this
> >> by sending the query to the specific core (something like
> >>
> >> solr/files_shard1_replica1/query?(your queryhere)=false
> >>
> >> My bet is that your QTime will be significantly different with the two
> >> shards.
> >>
> >> It also sounds like you're using implicit routing where you control
> where
> >> the
> >> files go, it's easy to have unbalanced shards in that case, why did you
> >> decide
> >> to do it this way? There are valid reasons, but...
> >>
> >> In short, my guess is that you've simply overloaded your shard with
> >> 330M docs. It's
> >> not at all clear that even 150 will give you satisfactory performance,
> >> have you stress
> >> tested your servers? Here's the long form of sizing:
> >>
> >>
> >>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar 
> >> wrote:
> >> > For each of the solr machines/shards you have.  Thanks.
> >> >
> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <
> susheel2...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hello Anil,
> >> >>
> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
> >> >> parameters under System / share the snapshot. ?
> >> >>
> >> >> Thanks,
> >> >> Susheel
> >> >>
> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
> >> >>
> >> >>> HI Toke and Jack,
> >> >>>
> >> >>> Please find the details below.
> >> >>>
> >> >>> * How large are your 3 shards in bytes? (total index across
> replicas)
> >> >>>   --  *146G. i am using CDH (cloudera), not sure how to
> check
> >> the
> >> >>> index size of each collection on each shard*
> >> >>> * What storage system do you use (local SSD, local spinning drives,
> >> remote
> >> >>> storage...)? *Local (hdfs) spinning drives*
> >> >>> * How much physical memory does your system have? *we have 15 data
> >> nodes.
> >> >>> multiple services installed on each data node (252 GB RAM for each
> data
> >> >>> node). 25 gb RAM allocated for solr service.*
> >> >>> * How much memory is free for disk cache? *i could not find.*
> >> >>> * How many concurrent queries do you issue? *very less. i dont see
> any
> >> >>> concurrent queries to this file_collection for now.*
> >> >>> * Do you update while you search? *Yes.. its very less.*
> >> >>> * What does a full query (rows, faceting, grouping, highlighting,
> >> >>> everything) look like? *for the file_collection, rows - 100,
> >> highlights =
> >> >>> false, no facets, expand = false.*
> >> >>> * How many documents does a typical query match (hitcount)? *it
> varies
> >> >>> with
> >> >>> each file. i have sort on int field to order commands in the query.*
> >> >>>
> >> >>> we have two sets of collections on solr cluster ( 17 data nodes)
> >> >>>
> >> >>> 1. main_collection - collection created per year. each collection
> uses
> >> 8
> >> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
> >> >>>
> >> >>> 2. 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Anil
thanks Eric. i will try that. Some how i am not able to run a query on the
shard directly because of kerberos. i even tried curl --negotiate.

Regards,
Anil

On 14 March 2016 at 22:30, Erick Erickson  wrote:

> Try shards.info=true, but pinging the shard directly is the most certain.
>
>
> Best,
> Erick
>
> On Mon, Mar 14, 2016 at 9:48 AM, Anil  wrote:
> > HI Erik,
> >
> > we have used document routing to balance the shards load and for
> > expand/collapse. it is mainly used for main_collection which holds one to
> > many relationship records. In file_collection, it is only for load
> > distribution.
> >
> > 25GB for entire solr service. each machine will act as shard for some
> > collections.
> >
> > we have not stress tested our servers at least for solr service. i have
> > read the the link you have shared, i will do something on it. thanks for
> > sharing.
> >
> > i have checked other collections, where index size is max 90GB and 5 M as
> > max number of documents. but for the particular file_collection_2014 , i
> > see total index size across replicas is 147 GB.
> >
> > Can we get any hints if we run the query with debugQuery=true ?  what is
> > the effective way of load distribution ? Please advice.
> >
> > Regards,
> > Anil
> >
> > On 14 March 2016 at 20:32, Erick Erickson 
> wrote:
> >
> >> bq: The slowness is happening for file_collection. though it has 3
> shards,
> >> documents are available in 2 shards. shard1 - 150M docs and shard2 has
> 330M
> >> docs , shard3 is empty.
> >>
> >> Well, this collection terribly balanced. Putting 330M docs on a single
> >> shard is
> >> pushing the limits, the only time I've seen that many docs on a shard,
> >> particularly
> >> with 25G of ram, they were very small records. My guess is that you will
> >> find
> >> the queries you send to that shard substantially slower than the 150M
> >> shard,
> >> although 150M could also be pushing your limits. You can measure this
> >> by sending the query to the specific core (something like
> >>
> >> solr/files_shard1_replica1/query?(your queryhere)=false
> >>
> >> My bet is that your QTime will be significantly different with the two
> >> shards.
> >>
> >> It also sounds like you're using implicit routing where you control
> where
> >> the
> >> files go, it's easy to have unbalanced shards in that case, why did you
> >> decide
> >> to do it this way? There are valid reasons, but...
> >>
> >> In short, my guess is that you've simply overloaded your shard with
> >> 330M docs. It's
> >> not at all clear that even 150 will give you satisfactory performance,
> >> have you stress
> >> tested your servers? Here's the long form of sizing:
> >>
> >>
> >>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar 
> >> wrote:
> >> > For each of the solr machines/shards you have.  Thanks.
> >> >
> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <
> susheel2...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hello Anil,
> >> >>
> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
> >> >> parameters under System / share the snapshot. ?
> >> >>
> >> >> Thanks,
> >> >> Susheel
> >> >>
> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
> >> >>
> >> >>> HI Toke and Jack,
> >> >>>
> >> >>> Please find the details below.
> >> >>>
> >> >>> * How large are your 3 shards in bytes? (total index across
> replicas)
> >> >>>   --  *146G. i am using CDH (cloudera), not sure how to
> check
> >> the
> >> >>> index size of each collection on each shard*
> >> >>> * What storage system do you use (local SSD, local spinning drives,
> >> remote
> >> >>> storage...)? *Local (hdfs) spinning drives*
> >> >>> * How much physical memory does your system have? *we have 15 data
> >> nodes.
> >> >>> multiple services installed on each data node (252 GB RAM for each
> data
> >> >>> node). 25 gb RAM allocated for solr service.*
> >> >>> * How much memory is free for disk cache? *i could not find.*
> >> >>> * How many concurrent queries do you issue? *very less. i dont see
> any
> >> >>> concurrent queries to this file_collection for now.*
> >> >>> * Do you update while you search? *Yes.. its very less.*
> >> >>> * What does a full query (rows, faceting, grouping, highlighting,
> >> >>> everything) look like? *for the file_collection, rows - 100,
> >> highlights =
> >> >>> false, no facets, expand = false.*
> >> >>> * How many documents does a typical query match (hitcount)? *it
> varies
> >> >>> with
> >> >>> each file. i have sort on int field to order commands in the query.*
> >> >>>
> >> >>> we have two sets of collections on solr cluster ( 17 data nodes)
> >> >>>
> >> >>> 1. main_collection - collection created per year. each collection
> uses
> >> 8
> >> >>> shards 2 replicas ex: 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Erick Erickson
Try shards.info=true, but pinging the shard directly is the most certain.


Best,
Erick

On Mon, Mar 14, 2016 at 9:48 AM, Anil  wrote:
> HI Erik,
>
> we have used document routing to balance the shards load and for
> expand/collapse. it is mainly used for main_collection which holds one to
> many relationship records. In file_collection, it is only for load
> distribution.
>
> 25GB for entire solr service. each machine will act as shard for some
> collections.
>
> we have not stress tested our servers at least for solr service. i have
> read the the link you have shared, i will do something on it. thanks for
> sharing.
>
> i have checked other collections, where index size is max 90GB and 5 M as
> max number of documents. but for the particular file_collection_2014 , i
> see total index size across replicas is 147 GB.
>
> Can we get any hints if we run the query with debugQuery=true ?  what is
> the effective way of load distribution ? Please advice.
>
> Regards,
> Anil
>
> On 14 March 2016 at 20:32, Erick Erickson  wrote:
>
>> bq: The slowness is happening for file_collection. though it has 3 shards,
>> documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M
>> docs , shard3 is empty.
>>
>> Well, this collection terribly balanced. Putting 330M docs on a single
>> shard is
>> pushing the limits, the only time I've seen that many docs on a shard,
>> particularly
>> with 25G of ram, they were very small records. My guess is that you will
>> find
>> the queries you send to that shard substantially slower than the 150M
>> shard,
>> although 150M could also be pushing your limits. You can measure this
>> by sending the query to the specific core (something like
>>
>> solr/files_shard1_replica1/query?(your queryhere)=false
>>
>> My bet is that your QTime will be significantly different with the two
>> shards.
>>
>> It also sounds like you're using implicit routing where you control where
>> the
>> files go, it's easy to have unbalanced shards in that case, why did you
>> decide
>> to do it this way? There are valid reasons, but...
>>
>> In short, my guess is that you've simply overloaded your shard with
>> 330M docs. It's
>> not at all clear that even 150 will give you satisfactory performance,
>> have you stress
>> tested your servers? Here's the long form of sizing:
>>
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar 
>> wrote:
>> > For each of the solr machines/shards you have.  Thanks.
>> >
>> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar 
>> > wrote:
>> >
>> >> Hello Anil,
>> >>
>> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
>> >> parameters under System / share the snapshot. ?
>> >>
>> >> Thanks,
>> >> Susheel
>> >>
>> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
>> >>
>> >>> HI Toke and Jack,
>> >>>
>> >>> Please find the details below.
>> >>>
>> >>> * How large are your 3 shards in bytes? (total index across replicas)
>> >>>   --  *146G. i am using CDH (cloudera), not sure how to check
>> the
>> >>> index size of each collection on each shard*
>> >>> * What storage system do you use (local SSD, local spinning drives,
>> remote
>> >>> storage...)? *Local (hdfs) spinning drives*
>> >>> * How much physical memory does your system have? *we have 15 data
>> nodes.
>> >>> multiple services installed on each data node (252 GB RAM for each data
>> >>> node). 25 gb RAM allocated for solr service.*
>> >>> * How much memory is free for disk cache? *i could not find.*
>> >>> * How many concurrent queries do you issue? *very less. i dont see any
>> >>> concurrent queries to this file_collection for now.*
>> >>> * Do you update while you search? *Yes.. its very less.*
>> >>> * What does a full query (rows, faceting, grouping, highlighting,
>> >>> everything) look like? *for the file_collection, rows - 100,
>> highlights =
>> >>> false, no facets, expand = false.*
>> >>> * How many documents does a typical query match (hitcount)? *it varies
>> >>> with
>> >>> each file. i have sort on int field to order commands in the query.*
>> >>>
>> >>> we have two sets of collections on solr cluster ( 17 data nodes)
>> >>>
>> >>> 1. main_collection - collection created per year. each collection uses
>> 8
>> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
>> >>>
>> >>> 2. file_collection (where files having commands are indexed) -
>> collection
>> >>> created per 2 years. it uses 3 shards and 2 replicas. ex :
>> >>> file_collection_2014, file_collection_2016
>> >>>
>> >>> The slowness is happening for file_collection. though it has 3 shards,
>> >>> documents are available in 2 shards. shard1 - 150M docs and shard2 has
>> >>> 330M
>> >>> docs , shard3 is empty.
>> >>>
>> >>> main_collection is looks 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Anil
HI Erik,

we have used document routing to balance the shards load and for
expand/collapse. it is mainly used for main_collection which holds one to
many relationship records. In file_collection, it is only for load
distribution.

25GB for entire solr service. each machine will act as shard for some
collections.

we have not stress tested our servers at least for solr service. i have
read the the link you have shared, i will do something on it. thanks for
sharing.

i have checked other collections, where index size is max 90GB and 5 M as
max number of documents. but for the particular file_collection_2014 , i
see total index size across replicas is 147 GB.

Can we get any hints if we run the query with debugQuery=true ?  what is
the effective way of load distribution ? Please advice.

Regards,
Anil

On 14 March 2016 at 20:32, Erick Erickson  wrote:

> bq: The slowness is happening for file_collection. though it has 3 shards,
> documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M
> docs , shard3 is empty.
>
> Well, this collection terribly balanced. Putting 330M docs on a single
> shard is
> pushing the limits, the only time I've seen that many docs on a shard,
> particularly
> with 25G of ram, they were very small records. My guess is that you will
> find
> the queries you send to that shard substantially slower than the 150M
> shard,
> although 150M could also be pushing your limits. You can measure this
> by sending the query to the specific core (something like
>
> solr/files_shard1_replica1/query?(your queryhere)=false
>
> My bet is that your QTime will be significantly different with the two
> shards.
>
> It also sounds like you're using implicit routing where you control where
> the
> files go, it's easy to have unbalanced shards in that case, why did you
> decide
> to do it this way? There are valid reasons, but...
>
> In short, my guess is that you've simply overloaded your shard with
> 330M docs. It's
> not at all clear that even 150 will give you satisfactory performance,
> have you stress
> tested your servers? Here's the long form of sizing:
>
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar 
> wrote:
> > For each of the solr machines/shards you have.  Thanks.
> >
> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar 
> > wrote:
> >
> >> Hello Anil,
> >>
> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
> >> parameters under System / share the snapshot. ?
> >>
> >> Thanks,
> >> Susheel
> >>
> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
> >>
> >>> HI Toke and Jack,
> >>>
> >>> Please find the details below.
> >>>
> >>> * How large are your 3 shards in bytes? (total index across replicas)
> >>>   --  *146G. i am using CDH (cloudera), not sure how to check
> the
> >>> index size of each collection on each shard*
> >>> * What storage system do you use (local SSD, local spinning drives,
> remote
> >>> storage...)? *Local (hdfs) spinning drives*
> >>> * How much physical memory does your system have? *we have 15 data
> nodes.
> >>> multiple services installed on each data node (252 GB RAM for each data
> >>> node). 25 gb RAM allocated for solr service.*
> >>> * How much memory is free for disk cache? *i could not find.*
> >>> * How many concurrent queries do you issue? *very less. i dont see any
> >>> concurrent queries to this file_collection for now.*
> >>> * Do you update while you search? *Yes.. its very less.*
> >>> * What does a full query (rows, faceting, grouping, highlighting,
> >>> everything) look like? *for the file_collection, rows - 100,
> highlights =
> >>> false, no facets, expand = false.*
> >>> * How many documents does a typical query match (hitcount)? *it varies
> >>> with
> >>> each file. i have sort on int field to order commands in the query.*
> >>>
> >>> we have two sets of collections on solr cluster ( 17 data nodes)
> >>>
> >>> 1. main_collection - collection created per year. each collection uses
> 8
> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
> >>>
> >>> 2. file_collection (where files having commands are indexed) -
> collection
> >>> created per 2 years. it uses 3 shards and 2 replicas. ex :
> >>> file_collection_2014, file_collection_2016
> >>>
> >>> The slowness is happening for file_collection. though it has 3 shards,
> >>> documents are available in 2 shards. shard1 - 150M docs and shard2 has
> >>> 330M
> >>> docs , shard3 is empty.
> >>>
> >>> main_collection is looks good.
> >>>
> >>> please let me know if you need any additional details.
> >>>
> >>> Regards,
> >>> Anil
> >>>
> >>>
> >>> On 13 March 2016 at 21:48, Anil  wrote:
> >>>
> >>> > Thanks Toke and Jack.
> >>> >
> >>> > Jack,
> >>> >
> >>> > Yes. it is 480 million :)
> >>> >
> >>> > 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Anil
Hi Shusheel,

we have enabled kerberos. so solr is accessed using Hue only. i will check
if I can get the similar information using Hue. Thanks.

Regards,
Anil

On 14 March 2016 at 19:34, Susheel Kumar  wrote:

> Hello Anil,
>
> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
> parameters under System / share the snapshot. ?
>
> Thanks,
> Susheel
>
> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
>
> > HI Toke and Jack,
> >
> > Please find the details below.
> >
> > * How large are your 3 shards in bytes? (total index across replicas)
> >   --  *146G. i am using CDH (cloudera), not sure how to check the
> > index size of each collection on each shard*
> > * What storage system do you use (local SSD, local spinning drives,
> remote
> > storage...)? *Local (hdfs) spinning drives*
> > * How much physical memory does your system have? *we have 15 data nodes.
> > multiple services installed on each data node (252 GB RAM for each data
> > node). 25 gb RAM allocated for solr service.*
> > * How much memory is free for disk cache? *i could not find.*
> > * How many concurrent queries do you issue? *very less. i dont see any
> > concurrent queries to this file_collection for now.*
> > * Do you update while you search? *Yes.. its very less.*
> > * What does a full query (rows, faceting, grouping, highlighting,
> > everything) look like? *for the file_collection, rows - 100, highlights =
> > false, no facets, expand = false.*
> > * How many documents does a typical query match (hitcount)? *it varies
> with
> > each file. i have sort on int field to order commands in the query.*
> >
> > we have two sets of collections on solr cluster ( 17 data nodes)
> >
> > 1. main_collection - collection created per year. each collection uses 8
> > shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
> >
> > 2. file_collection (where files having commands are indexed) - collection
> > created per 2 years. it uses 3 shards and 2 replicas. ex :
> > file_collection_2014, file_collection_2016
> >
> > The slowness is happening for file_collection. though it has 3 shards,
> > documents are available in 2 shards. shard1 - 150M docs and shard2 has
> 330M
> > docs , shard3 is empty.
> >
> > main_collection is looks good.
> >
> > please let me know if you need any additional details.
> >
> > Regards,
> > Anil
> >
> >
> > On 13 March 2016 at 21:48, Anil  wrote:
> >
> > > Thanks Toke and Jack.
> > >
> > > Jack,
> > >
> > > Yes. it is 480 million :)
> > >
> > > I will share the additional details soon. thanks.
> > >
> > >
> > > Regards,
> > > Anil
> > >
> > >
> > >
> > >
> > >
> > > On 13 March 2016 at 21:06, Jack Krupansky 
> > > wrote:
> > >
> > >> (We should have a wiki/doc page for the "usual list of suspects" when
> > >> queries are/appear slow, rather than need to repeat the same mantra(s)
> > for
> > >> every inquiry on this topic.)
> > >>
> > >>
> > >> -- Jack Krupansky
> > >>
> > >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
> > t...@statsbiblioteket.dk>
> > >> wrote:
> > >>
> > >> > Anil  wrote:
> > >> > > i have indexed a data (commands from files) with 10 fields and 3
> of
> > >> them
> > >> > is
> > >> > > text fields. collection is created with 3 shards and 2 replicas. I
> > >> have
> > >> > > used document routing as well.
> > >> >
> > >> > > Currently collection holds 47,80,01,405 records.
> > >> >
> > >> > ...480 million, right? Funny digit grouping in India.
> > >> >
> > >> > > text search against text field taking around 5 sec. solr is query
> > just
> > >> > and
> > >> > > of two terms with fl as 7 fields
> > >> >
> > >> > > fileId:"file unique id" AND command_text:(system login)
> > >> >
> > >> > While not an impressive response time, it might just be that your
> > >> hardware
> > >> > is not enough to handle that amount of documents. The usual culprit
> is
> > >> IO
> > >> > speed, so chances are you have a system with spinning drives and not
> > >> enough
> > >> > RAM: Switch to SSD and/or add more RAM.
> > >> >
> > >> > To give better advice, we need more information.
> > >> >
> > >> > * How large are your 3 shards in bytes?
> > >> > * What storage system do you use (local SSD, local spinning drives,
> > >> remote
> > >> > storage...)?
> > >> > * How much physical memory does your system have?
> > >> > * How much memory is free for disk cache?
> > >> > * How many concurrent queries do you issue?
> > >> > * Do you update while you search?
> > >> > * What does a full query (rows, faceting, grouping, highlighting,
> > >> > everything) look like?
> > >> > * How many documents does a typical query match (hitcount)?
> > >> >
> > >> > - Toke Eskildsen
> > >> >
> > >>
> > >
> > >
> >
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Erick Erickson
bq: The slowness is happening for file_collection. though it has 3 shards,
documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M
docs , shard3 is empty.

Well, this collection terribly balanced. Putting 330M docs on a single shard is
pushing the limits, the only time I've seen that many docs on a shard,
particularly
with 25G of ram, they were very small records. My guess is that you will find
the queries you send to that shard substantially slower than the 150M shard,
although 150M could also be pushing your limits. You can measure this
by sending the query to the specific core (something like

solr/files_shard1_replica1/query?(your queryhere)=false

My bet is that your QTime will be significantly different with the two shards.

It also sounds like you're using implicit routing where you control where the
files go, it's easy to have unbalanced shards in that case, why did you decide
to do it this way? There are valid reasons, but...

In short, my guess is that you've simply overloaded your shard with
330M docs. It's
not at all clear that even 150 will give you satisfactory performance,
have you stress
tested your servers? Here's the long form of sizing:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar  wrote:
> For each of the solr machines/shards you have.  Thanks.
>
> On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar 
> wrote:
>
>> Hello Anil,
>>
>> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
>> parameters under System / share the snapshot. ?
>>
>> Thanks,
>> Susheel
>>
>> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
>>
>>> HI Toke and Jack,
>>>
>>> Please find the details below.
>>>
>>> * How large are your 3 shards in bytes? (total index across replicas)
>>>   --  *146G. i am using CDH (cloudera), not sure how to check the
>>> index size of each collection on each shard*
>>> * What storage system do you use (local SSD, local spinning drives, remote
>>> storage...)? *Local (hdfs) spinning drives*
>>> * How much physical memory does your system have? *we have 15 data nodes.
>>> multiple services installed on each data node (252 GB RAM for each data
>>> node). 25 gb RAM allocated for solr service.*
>>> * How much memory is free for disk cache? *i could not find.*
>>> * How many concurrent queries do you issue? *very less. i dont see any
>>> concurrent queries to this file_collection for now.*
>>> * Do you update while you search? *Yes.. its very less.*
>>> * What does a full query (rows, faceting, grouping, highlighting,
>>> everything) look like? *for the file_collection, rows - 100, highlights =
>>> false, no facets, expand = false.*
>>> * How many documents does a typical query match (hitcount)? *it varies
>>> with
>>> each file. i have sort on int field to order commands in the query.*
>>>
>>> we have two sets of collections on solr cluster ( 17 data nodes)
>>>
>>> 1. main_collection - collection created per year. each collection uses 8
>>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
>>>
>>> 2. file_collection (where files having commands are indexed) - collection
>>> created per 2 years. it uses 3 shards and 2 replicas. ex :
>>> file_collection_2014, file_collection_2016
>>>
>>> The slowness is happening for file_collection. though it has 3 shards,
>>> documents are available in 2 shards. shard1 - 150M docs and shard2 has
>>> 330M
>>> docs , shard3 is empty.
>>>
>>> main_collection is looks good.
>>>
>>> please let me know if you need any additional details.
>>>
>>> Regards,
>>> Anil
>>>
>>>
>>> On 13 March 2016 at 21:48, Anil  wrote:
>>>
>>> > Thanks Toke and Jack.
>>> >
>>> > Jack,
>>> >
>>> > Yes. it is 480 million :)
>>> >
>>> > I will share the additional details soon. thanks.
>>> >
>>> >
>>> > Regards,
>>> > Anil
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 13 March 2016 at 21:06, Jack Krupansky 
>>> > wrote:
>>> >
>>> >> (We should have a wiki/doc page for the "usual list of suspects" when
>>> >> queries are/appear slow, rather than need to repeat the same mantra(s)
>>> for
>>> >> every inquiry on this topic.)
>>> >>
>>> >>
>>> >> -- Jack Krupansky
>>> >>
>>> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
>>> t...@statsbiblioteket.dk>
>>> >> wrote:
>>> >>
>>> >> > Anil  wrote:
>>> >> > > i have indexed a data (commands from files) with 10 fields and 3 of
>>> >> them
>>> >> > is
>>> >> > > text fields. collection is created with 3 shards and 2 replicas. I
>>> >> have
>>> >> > > used document routing as well.
>>> >> >
>>> >> > > Currently collection holds 47,80,01,405 records.
>>> >> >
>>> >> > ...480 million, right? Funny digit grouping in India.
>>> >> >
>>> >> > > text search against text field taking around 5 sec. solr is query
>>> just
>>> >> > and
>>> >> > > of 

Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Susheel Kumar
For each of the solr machines/shards you have.  Thanks.

On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar 
wrote:

> Hello Anil,
>
> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
> parameters under System / share the snapshot. ?
>
> Thanks,
> Susheel
>
> On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:
>
>> HI Toke and Jack,
>>
>> Please find the details below.
>>
>> * How large are your 3 shards in bytes? (total index across replicas)
>>   --  *146G. i am using CDH (cloudera), not sure how to check the
>> index size of each collection on each shard*
>> * What storage system do you use (local SSD, local spinning drives, remote
>> storage...)? *Local (hdfs) spinning drives*
>> * How much physical memory does your system have? *we have 15 data nodes.
>> multiple services installed on each data node (252 GB RAM for each data
>> node). 25 gb RAM allocated for solr service.*
>> * How much memory is free for disk cache? *i could not find.*
>> * How many concurrent queries do you issue? *very less. i dont see any
>> concurrent queries to this file_collection for now.*
>> * Do you update while you search? *Yes.. its very less.*
>> * What does a full query (rows, faceting, grouping, highlighting,
>> everything) look like? *for the file_collection, rows - 100, highlights =
>> false, no facets, expand = false.*
>> * How many documents does a typical query match (hitcount)? *it varies
>> with
>> each file. i have sort on int field to order commands in the query.*
>>
>> we have two sets of collections on solr cluster ( 17 data nodes)
>>
>> 1. main_collection - collection created per year. each collection uses 8
>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
>>
>> 2. file_collection (where files having commands are indexed) - collection
>> created per 2 years. it uses 3 shards and 2 replicas. ex :
>> file_collection_2014, file_collection_2016
>>
>> The slowness is happening for file_collection. though it has 3 shards,
>> documents are available in 2 shards. shard1 - 150M docs and shard2 has
>> 330M
>> docs , shard3 is empty.
>>
>> main_collection is looks good.
>>
>> please let me know if you need any additional details.
>>
>> Regards,
>> Anil
>>
>>
>> On 13 March 2016 at 21:48, Anil  wrote:
>>
>> > Thanks Toke and Jack.
>> >
>> > Jack,
>> >
>> > Yes. it is 480 million :)
>> >
>> > I will share the additional details soon. thanks.
>> >
>> >
>> > Regards,
>> > Anil
>> >
>> >
>> >
>> >
>> >
>> > On 13 March 2016 at 21:06, Jack Krupansky 
>> > wrote:
>> >
>> >> (We should have a wiki/doc page for the "usual list of suspects" when
>> >> queries are/appear slow, rather than need to repeat the same mantra(s)
>> for
>> >> every inquiry on this topic.)
>> >>
>> >>
>> >> -- Jack Krupansky
>> >>
>> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
>> t...@statsbiblioteket.dk>
>> >> wrote:
>> >>
>> >> > Anil  wrote:
>> >> > > i have indexed a data (commands from files) with 10 fields and 3 of
>> >> them
>> >> > is
>> >> > > text fields. collection is created with 3 shards and 2 replicas. I
>> >> have
>> >> > > used document routing as well.
>> >> >
>> >> > > Currently collection holds 47,80,01,405 records.
>> >> >
>> >> > ...480 million, right? Funny digit grouping in India.
>> >> >
>> >> > > text search against text field taking around 5 sec. solr is query
>> just
>> >> > and
>> >> > > of two terms with fl as 7 fields
>> >> >
>> >> > > fileId:"file unique id" AND command_text:(system login)
>> >> >
>> >> > While not an impressive response time, it might just be that your
>> >> hardware
>> >> > is not enough to handle that amount of documents. The usual culprit
>> is
>> >> IO
>> >> > speed, so chances are you have a system with spinning drives and not
>> >> enough
>> >> > RAM: Switch to SSD and/or add more RAM.
>> >> >
>> >> > To give better advice, we need more information.
>> >> >
>> >> > * How large are your 3 shards in bytes?
>> >> > * What storage system do you use (local SSD, local spinning drives,
>> >> remote
>> >> > storage...)?
>> >> > * How much physical memory does your system have?
>> >> > * How much memory is free for disk cache?
>> >> > * How many concurrent queries do you issue?
>> >> > * Do you update while you search?
>> >> > * What does a full query (rows, faceting, grouping, highlighting,
>> >> > everything) look like?
>> >> > * How many documents does a typical query match (hitcount)?
>> >> >
>> >> > - Toke Eskildsen
>> >> >
>> >>
>> >
>> >
>>
>
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Susheel Kumar
Hello Anil,

Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
parameters under System / share the snapshot. ?

Thanks,
Susheel

On Mon, Mar 14, 2016 at 5:36 AM, Anil  wrote:

> HI Toke and Jack,
>
> Please find the details below.
>
> * How large are your 3 shards in bytes? (total index across replicas)
>   --  *146G. i am using CDH (cloudera), not sure how to check the
> index size of each collection on each shard*
> * What storage system do you use (local SSD, local spinning drives, remote
> storage...)? *Local (hdfs) spinning drives*
> * How much physical memory does your system have? *we have 15 data nodes.
> multiple services installed on each data node (252 GB RAM for each data
> node). 25 gb RAM allocated for solr service.*
> * How much memory is free for disk cache? *i could not find.*
> * How many concurrent queries do you issue? *very less. i dont see any
> concurrent queries to this file_collection for now.*
> * Do you update while you search? *Yes.. its very less.*
> * What does a full query (rows, faceting, grouping, highlighting,
> everything) look like? *for the file_collection, rows - 100, highlights =
> false, no facets, expand = false.*
> * How many documents does a typical query match (hitcount)? *it varies with
> each file. i have sort on int field to order commands in the query.*
>
> we have two sets of collections on solr cluster ( 17 data nodes)
>
> 1. main_collection - collection created per year. each collection uses 8
> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
>
> 2. file_collection (where files having commands are indexed) - collection
> created per 2 years. it uses 3 shards and 2 replicas. ex :
> file_collection_2014, file_collection_2016
>
> The slowness is happening for file_collection. though it has 3 shards,
> documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M
> docs , shard3 is empty.
>
> main_collection is looks good.
>
> please let me know if you need any additional details.
>
> Regards,
> Anil
>
>
> On 13 March 2016 at 21:48, Anil  wrote:
>
> > Thanks Toke and Jack.
> >
> > Jack,
> >
> > Yes. it is 480 million :)
> >
> > I will share the additional details soon. thanks.
> >
> >
> > Regards,
> > Anil
> >
> >
> >
> >
> >
> > On 13 March 2016 at 21:06, Jack Krupansky 
> > wrote:
> >
> >> (We should have a wiki/doc page for the "usual list of suspects" when
> >> queries are/appear slow, rather than need to repeat the same mantra(s)
> for
> >> every inquiry on this topic.)
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
> t...@statsbiblioteket.dk>
> >> wrote:
> >>
> >> > Anil  wrote:
> >> > > i have indexed a data (commands from files) with 10 fields and 3 of
> >> them
> >> > is
> >> > > text fields. collection is created with 3 shards and 2 replicas. I
> >> have
> >> > > used document routing as well.
> >> >
> >> > > Currently collection holds 47,80,01,405 records.
> >> >
> >> > ...480 million, right? Funny digit grouping in India.
> >> >
> >> > > text search against text field taking around 5 sec. solr is query
> just
> >> > and
> >> > > of two terms with fl as 7 fields
> >> >
> >> > > fileId:"file unique id" AND command_text:(system login)
> >> >
> >> > While not an impressive response time, it might just be that your
> >> hardware
> >> > is not enough to handle that amount of documents. The usual culprit is
> >> IO
> >> > speed, so chances are you have a system with spinning drives and not
> >> enough
> >> > RAM: Switch to SSD and/or add more RAM.
> >> >
> >> > To give better advice, we need more information.
> >> >
> >> > * How large are your 3 shards in bytes?
> >> > * What storage system do you use (local SSD, local spinning drives,
> >> remote
> >> > storage...)?
> >> > * How much physical memory does your system have?
> >> > * How much memory is free for disk cache?
> >> > * How many concurrent queries do you issue?
> >> > * Do you update while you search?
> >> > * What does a full query (rows, faceting, grouping, highlighting,
> >> > everything) look like?
> >> > * How many documents does a typical query match (hitcount)?
> >> >
> >> > - Toke Eskildsen
> >> >
> >>
> >
> >
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-14 Thread Anil
HI Toke and Jack,

Please find the details below.

* How large are your 3 shards in bytes? (total index across replicas)
  --  *146G. i am using CDH (cloudera), not sure how to check the
index size of each collection on each shard*
* What storage system do you use (local SSD, local spinning drives, remote
storage...)? *Local (hdfs) spinning drives*
* How much physical memory does your system have? *we have 15 data nodes.
multiple services installed on each data node (252 GB RAM for each data
node). 25 gb RAM allocated for solr service.*
* How much memory is free for disk cache? *i could not find.*
* How many concurrent queries do you issue? *very less. i dont see any
concurrent queries to this file_collection for now.*
* Do you update while you search? *Yes.. its very less.*
* What does a full query (rows, faceting, grouping, highlighting,
everything) look like? *for the file_collection, rows - 100, highlights =
false, no facets, expand = false.*
* How many documents does a typical query match (hitcount)? *it varies with
each file. i have sort on int field to order commands in the query.*

we have two sets of collections on solr cluster ( 17 data nodes)

1. main_collection - collection created per year. each collection uses 8
shards 2 replicas ex: main_collection_2016, main_collection_2015 etc

2. file_collection (where files having commands are indexed) - collection
created per 2 years. it uses 3 shards and 2 replicas. ex :
file_collection_2014, file_collection_2016

The slowness is happening for file_collection. though it has 3 shards,
documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M
docs , shard3 is empty.

main_collection is looks good.

please let me know if you need any additional details.

Regards,
Anil


On 13 March 2016 at 21:48, Anil  wrote:

> Thanks Toke and Jack.
>
> Jack,
>
> Yes. it is 480 million :)
>
> I will share the additional details soon. thanks.
>
>
> Regards,
> Anil
>
>
>
>
>
> On 13 March 2016 at 21:06, Jack Krupansky 
> wrote:
>
>> (We should have a wiki/doc page for the "usual list of suspects" when
>> queries are/appear slow, rather than need to repeat the same mantra(s) for
>> every inquiry on this topic.)
>>
>>
>> -- Jack Krupansky
>>
>> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen 
>> wrote:
>>
>> > Anil  wrote:
>> > > i have indexed a data (commands from files) with 10 fields and 3 of
>> them
>> > is
>> > > text fields. collection is created with 3 shards and 2 replicas. I
>> have
>> > > used document routing as well.
>> >
>> > > Currently collection holds 47,80,01,405 records.
>> >
>> > ...480 million, right? Funny digit grouping in India.
>> >
>> > > text search against text field taking around 5 sec. solr is query just
>> > and
>> > > of two terms with fl as 7 fields
>> >
>> > > fileId:"file unique id" AND command_text:(system login)
>> >
>> > While not an impressive response time, it might just be that your
>> hardware
>> > is not enough to handle that amount of documents. The usual culprit is
>> IO
>> > speed, so chances are you have a system with spinning drives and not
>> enough
>> > RAM: Switch to SSD and/or add more RAM.
>> >
>> > To give better advice, we need more information.
>> >
>> > * How large are your 3 shards in bytes?
>> > * What storage system do you use (local SSD, local spinning drives,
>> remote
>> > storage...)?
>> > * How much physical memory does your system have?
>> > * How much memory is free for disk cache?
>> > * How many concurrent queries do you issue?
>> > * Do you update while you search?
>> > * What does a full query (rows, faceting, grouping, highlighting,
>> > everything) look like?
>> > * How many documents does a typical query match (hitcount)?
>> >
>> > - Toke Eskildsen
>> >
>>
>
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Anil
HI Shawn, Jack and Eric,

Thank you very much.

Regards,
Anil


On 14 March 2016 at 02:55, Shawn Heisey  wrote:

> On 3/13/2016 9:36 AM, Jack Krupansky wrote:
> > (We should have a wiki/doc page for the "usual list of suspects" when
> > queries are/appear slow, rather than need to repeat the same mantra(s)
> for
> > every inquiry on this topic.)
>
> There's this page, with the disclaimer that I wrote almost all of it:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> It emphasizes RAM quite a bit, but when there are hundreds of millions
> of documents, that's usually the problem.  I've just added some info
> about high query rates.
>
> Thanks,
> Shawn
>
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Shawn Heisey
On 3/13/2016 9:36 AM, Jack Krupansky wrote:
> (We should have a wiki/doc page for the "usual list of suspects" when
> queries are/appear slow, rather than need to repeat the same mantra(s) for
> every inquiry on this topic.)

There's this page, with the disclaimer that I wrote almost all of it:

https://wiki.apache.org/solr/SolrPerformanceProblems

It emphasizes RAM quite a bit, but when there are hundreds of millions
of documents, that's usually the problem.  I've just added some info
about high query rates.

Thanks,
Shawn



Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Jack Krupansky
Yeah, there's some good material there, but probably still too inaccessible
for the average "help, my queries are slow" inquiry we get so frequently on
this list.

Another useful page is:
https://wiki.apache.org/solr/SolrPerformanceProblems


-- Jack Krupansky

On Sun, Mar 13, 2016 at 2:58 PM, Erick Erickson 
wrote:

> Jack:
> https://wiki.apache.org/solr/SolrPerformanceFactors
> and
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> are already there, we can add to them
>
> Best,
> Erick
>
> On Sun, Mar 13, 2016 at 9:18 AM, Anil  wrote:
> > Thanks Toke and Jack.
> >
> > Jack,
> >
> > Yes. it is 480 million :)
> >
> > I will share the additional details soon. thanks.
> >
> >
> > Regards,
> > Anil
> >
> >
> >
> >
> >
> > On 13 March 2016 at 21:06, Jack Krupansky 
> wrote:
> >
> >> (We should have a wiki/doc page for the "usual list of suspects" when
> >> queries are/appear slow, rather than need to repeat the same mantra(s)
> for
> >> every inquiry on this topic.)
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
> t...@statsbiblioteket.dk>
> >> wrote:
> >>
> >> > Anil  wrote:
> >> > > i have indexed a data (commands from files) with 10 fields and 3 of
> >> them
> >> > is
> >> > > text fields. collection is created with 3 shards and 2 replicas. I
> have
> >> > > used document routing as well.
> >> >
> >> > > Currently collection holds 47,80,01,405 records.
> >> >
> >> > ...480 million, right? Funny digit grouping in India.
> >> >
> >> > > text search against text field taking around 5 sec. solr is query
> just
> >> > and
> >> > > of two terms with fl as 7 fields
> >> >
> >> > > fileId:"file unique id" AND command_text:(system login)
> >> >
> >> > While not an impressive response time, it might just be that your
> >> hardware
> >> > is not enough to handle that amount of documents. The usual culprit
> is IO
> >> > speed, so chances are you have a system with spinning drives and not
> >> enough
> >> > RAM: Switch to SSD and/or add more RAM.
> >> >
> >> > To give better advice, we need more information.
> >> >
> >> > * How large are your 3 shards in bytes?
> >> > * What storage system do you use (local SSD, local spinning drives,
> >> remote
> >> > storage...)?
> >> > * How much physical memory does your system have?
> >> > * How much memory is free for disk cache?
> >> > * How many concurrent queries do you issue?
> >> > * Do you update while you search?
> >> > * What does a full query (rows, faceting, grouping, highlighting,
> >> > everything) look like?
> >> > * How many documents does a typical query match (hitcount)?
> >> >
> >> > - Toke Eskildsen
> >> >
> >>
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Erick Erickson
Jack:
https://wiki.apache.org/solr/SolrPerformanceFactors
and
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

are already there, we can add to them

Best,
Erick

On Sun, Mar 13, 2016 at 9:18 AM, Anil  wrote:
> Thanks Toke and Jack.
>
> Jack,
>
> Yes. it is 480 million :)
>
> I will share the additional details soon. thanks.
>
>
> Regards,
> Anil
>
>
>
>
>
> On 13 March 2016 at 21:06, Jack Krupansky  wrote:
>
>> (We should have a wiki/doc page for the "usual list of suspects" when
>> queries are/appear slow, rather than need to repeat the same mantra(s) for
>> every inquiry on this topic.)
>>
>>
>> -- Jack Krupansky
>>
>> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen 
>> wrote:
>>
>> > Anil  wrote:
>> > > i have indexed a data (commands from files) with 10 fields and 3 of
>> them
>> > is
>> > > text fields. collection is created with 3 shards and 2 replicas. I have
>> > > used document routing as well.
>> >
>> > > Currently collection holds 47,80,01,405 records.
>> >
>> > ...480 million, right? Funny digit grouping in India.
>> >
>> > > text search against text field taking around 5 sec. solr is query just
>> > and
>> > > of two terms with fl as 7 fields
>> >
>> > > fileId:"file unique id" AND command_text:(system login)
>> >
>> > While not an impressive response time, it might just be that your
>> hardware
>> > is not enough to handle that amount of documents. The usual culprit is IO
>> > speed, so chances are you have a system with spinning drives and not
>> enough
>> > RAM: Switch to SSD and/or add more RAM.
>> >
>> > To give better advice, we need more information.
>> >
>> > * How large are your 3 shards in bytes?
>> > * What storage system do you use (local SSD, local spinning drives,
>> remote
>> > storage...)?
>> > * How much physical memory does your system have?
>> > * How much memory is free for disk cache?
>> > * How many concurrent queries do you issue?
>> > * Do you update while you search?
>> > * What does a full query (rows, faceting, grouping, highlighting,
>> > everything) look like?
>> > * How many documents does a typical query match (hitcount)?
>> >
>> > - Toke Eskildsen
>> >
>>


Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Anil
Thanks Toke and Jack.

Jack,

Yes. it is 480 million :)

I will share the additional details soon. thanks.


Regards,
Anil





On 13 March 2016 at 21:06, Jack Krupansky  wrote:

> (We should have a wiki/doc page for the "usual list of suspects" when
> queries are/appear slow, rather than need to repeat the same mantra(s) for
> every inquiry on this topic.)
>
>
> -- Jack Krupansky
>
> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen 
> wrote:
>
> > Anil  wrote:
> > > i have indexed a data (commands from files) with 10 fields and 3 of
> them
> > is
> > > text fields. collection is created with 3 shards and 2 replicas. I have
> > > used document routing as well.
> >
> > > Currently collection holds 47,80,01,405 records.
> >
> > ...480 million, right? Funny digit grouping in India.
> >
> > > text search against text field taking around 5 sec. solr is query just
> > and
> > > of two terms with fl as 7 fields
> >
> > > fileId:"file unique id" AND command_text:(system login)
> >
> > While not an impressive response time, it might just be that your
> hardware
> > is not enough to handle that amount of documents. The usual culprit is IO
> > speed, so chances are you have a system with spinning drives and not
> enough
> > RAM: Switch to SSD and/or add more RAM.
> >
> > To give better advice, we need more information.
> >
> > * How large are your 3 shards in bytes?
> > * What storage system do you use (local SSD, local spinning drives,
> remote
> > storage...)?
> > * How much physical memory does your system have?
> > * How much memory is free for disk cache?
> > * How many concurrent queries do you issue?
> > * Do you update while you search?
> > * What does a full query (rows, faceting, grouping, highlighting,
> > everything) look like?
> > * How many documents does a typical query match (hitcount)?
> >
> > - Toke Eskildsen
> >
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Jack Krupansky
(We should have a wiki/doc page for the "usual list of suspects" when
queries are/appear slow, rather than need to repeat the same mantra(s) for
every inquiry on this topic.)


-- Jack Krupansky

On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen 
wrote:

> Anil  wrote:
> > i have indexed a data (commands from files) with 10 fields and 3 of them
> is
> > text fields. collection is created with 3 shards and 2 replicas. I have
> > used document routing as well.
>
> > Currently collection holds 47,80,01,405 records.
>
> ...480 million, right? Funny digit grouping in India.
>
> > text search against text field taking around 5 sec. solr is query just
> and
> > of two terms with fl as 7 fields
>
> > fileId:"file unique id" AND command_text:(system login)
>
> While not an impressive response time, it might just be that your hardware
> is not enough to handle that amount of documents. The usual culprit is IO
> speed, so chances are you have a system with spinning drives and not enough
> RAM: Switch to SSD and/or add more RAM.
>
> To give better advice, we need more information.
>
> * How large are your 3 shards in bytes?
> * What storage system do you use (local SSD, local spinning drives, remote
> storage...)?
> * How much physical memory does your system have?
> * How much memory is free for disk cache?
> * How many concurrent queries do you issue?
> * Do you update while you search?
> * What does a full query (rows, faceting, grouping, highlighting,
> everything) look like?
> * How many documents does a typical query match (hitcount)?
>
> - Toke Eskildsen
>


Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Toke Eskildsen
Anil  wrote:
> i have indexed a data (commands from files) with 10 fields and 3 of them is
> text fields. collection is created with 3 shards and 2 replicas. I have
> used document routing as well.

> Currently collection holds 47,80,01,405 records.

...480 million, right? Funny digit grouping in India.

> text search against text field taking around 5 sec. solr is query just and
> of two terms with fl as 7 fields

> fileId:"file unique id" AND command_text:(system login)

While not an impressive response time, it might just be that your hardware is 
not enough to handle that amount of documents. The usual culprit is IO speed, 
so chances are you have a system with spinning drives and not enough RAM: 
Switch to SSD and/or add more RAM.

To give better advice, we need more information.

* How large are your 3 shards in bytes?
* What storage system do you use (local SSD, local spinning drives, remote 
storage...)?
* How much physical memory does your system have?
* How much memory is free for disk cache?
* How many concurrent queries do you issue?
* Do you update while you search?
* What does a full query (rows, faceting, grouping, highlighting, everything) 
look like?
* How many documents does a typical query match (hitcount)?

- Toke Eskildsen


Re: Solr Queries

2011-08-22 Thread Shalin Shekhar Mangar
Hi Abhijeet,

On Mon, Aug 22, 2011 at 3:09 PM, abhijit bashetti abhijitbashe...@gmail.com
 wrote:


 1. Can I update a specific field while re-indexing?


Solr doesn't support updating specific fields. You must always create a
complete document with values for all fields while indexing. If you keep the
same value for the unique key field, the new doc will replace the one in the
index.



 2. what are the ways to improve the performance of Indexing?


See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

The above page is for Lucene users but is useful for Solr users as well.



 3. What should be ideal system configuration for solr indexing server?


This is difficult to answer. It depends on your particular use-case.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Queries

2009-12-16 Thread AHMET ARSLAN
 Hi,
      Suppose i have a content field of
 type text.
 an example on content field is as shown below:
 After frustrated waiting period to get my credit card from
 the ICICI Bank,
 today I decided to write them a online petition stating my
 problem... Below
 is the unedited version of letter I sent to ICICI... 
 1. Can i use promixity search for 2 phrases 
 frustrated waiting
 and credit card??
 (i wanted to perform a search checking if frustrated
 waiting and credit
 card are within 10 words using proximity search.)
 where frustrated waiting and credit card are exact phrases
 (i.e. can i
 search on it as a whole word.. n not 2 different words in
 different parts in
 a document)
 does solr support this kinda operation. if so how do we
 structure our query.
 or could you gimme an example?
 Thanks
 Raakhi Khatwani.
 

Similar discussion was:
http://old.nabble.com/Nested-phrases-with-proximity-in-Solr-td26012747.html

with surround your query would be : 
(frustrated w waiting) 10w (credit w card) 

Hope this helps.





Re: Solr Queries

2009-11-12 Thread Grant Ingersoll

On Nov 12, 2009, at 8:55 AM, Rakhi Khatwani wrote:

 Hi,
 I am using solr 1.3 and i hv inserted some data in my comment
 field.
 for example:
 
 for document1:
 str name=comment
 The iPhone 3GS finally adds common cell phone features like multimedia
 messaging, video recording, and voice dialing. It runs faster; its promised
 battery life is longer; and the multimedia quality continues to shine.
 
 
 The iPhone 3GS' call quality shows no improvements and the 3G signal
 reception remains uneven. We still don't get Flash Lite, USB transfer and
 storage, or multitasking.
 /str
 
 
 for document2:
 str name=comment
 Sony Ericsson c510 has 3.2MP cybershot camera with smile detectino. Amazing
 phone, faster than Sony Ericsson w580iSony Ericcsion w580i camera is only
 2MP with no autofocus and smile detection. it doesnot even have a flash
 leading to poor quality pictures
 /str
 
 A]
 
 now when i apply the following queries, i get 0 hits:
 1.comment:iph*e
 2.comment:iph?ne

What field type are you using?  This is in your schema.xml

 
 B] Can i apply range queries on part of the content?

 
 C] Can i apply more the one wildcard in a query?? for example comment:ip*h*
 (this command works but its equivalent to just using 1ipho*)

Yes.

 
 D] for fuzzy queries:
 content:iphone~0.7 returns both the documents.
 content:iphone~0.8 returns no documents (similarly for 0.9).
 

The fuzz factor there incorporates the edit distance.  I gather the first Sony 
doc has a match on phone and the score is between 0.7 and 0.8.  You can add 
debugQuery=true to see the explains. 

 However if i change it to iPhone,
content:iPhone~0.7 returns 0 documents
content:iPhone~0.5 returns both the documents.
 
 Is fuzzy search case sensitive? even if it is, why am i not able to retrieve
 unexpected results.

Again, this all comes back to how you analyze the documents based on what Field 
Type you are using?

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: Solr Queries

2009-11-12 Thread Rakhi Khatwani
Hi,
Sorry i forgot to mention that comment field is a text field.

Regards,
Raakhi

On Thu, Nov 12, 2009 at 8:05 PM, Grant Ingersoll gsing...@apache.orgwrote:


 On Nov 12, 2009, at 8:55 AM, Rakhi Khatwani wrote:

  Hi,
  I am using solr 1.3 and i hv inserted some data in my comment
  field.
  for example:
 
  for document1:
  str name=comment
  The iPhone 3GS finally adds common cell phone features like multimedia
  messaging, video recording, and voice dialing. It runs faster; its
 promised
  battery life is longer; and the multimedia quality continues to shine.
 
 
  The iPhone 3GS' call quality shows no improvements and the 3G signal
  reception remains uneven. We still don't get Flash Lite, USB transfer and
  storage, or multitasking.
  /str
 
 
  for document2:
  str name=comment
  Sony Ericsson c510 has 3.2MP cybershot camera with smile detectino.
 Amazing
  phone, faster than Sony Ericsson w580iSony Ericcsion w580i camera is only
  2MP with no autofocus and smile detection. it doesnot even have a flash
  leading to poor quality pictures
  /str
 
  A]
 
  now when i apply the following queries, i get 0 hits:
  1.comment:iph*e
  2.comment:iph?ne

 What field type are you using?  This is in your schema.xml

 
  B] Can i apply range queries on part of the content?

 
  C] Can i apply more the one wildcard in a query?? for example
 comment:ip*h*
  (this command works but its equivalent to just using 1ipho*)

 Yes.

 
  D] for fuzzy queries:
  content:iphone~0.7 returns both the documents.
  content:iphone~0.8 returns no documents (similarly for 0.9).
 

 The fuzz factor there incorporates the edit distance.  I gather the first
 Sony doc has a match on phone and the score is between 0.7 and 0.8.  You can
 add debugQuery=true to see the explains.

  However if i change it to iPhone,
 content:iPhone~0.7 returns 0 documents
 content:iPhone~0.5 returns both the documents.
 
  Is fuzzy search case sensitive? even if it is, why am i not able to
 retrieve
  unexpected results.

 Again, this all comes back to how you analyze the documents based on what
 Field Type you are using?

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search