Re: Solr queries slow down over time
Hi Mark, Thanks for confirming Dwane's advice from your own experience. I will shift to a streaming expressions implementation. Best Goutham On Fri, Sep 25, 2020 at 7:03 PM Mark H. Wood wrote: > On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote: > > I have around 30M documents in Solr, and I am doing repeated *:* queries > > with rows=1, and changing start to 0, 1, 2, and so on, in a > > loop in my script (using pysolr). > > > > At the start of the iteration, the calls to Solr were taking less than 1 > > sec each. After running for a few hours (with start at around 27M) I > found > > that each call was taking around 30-60 secs. > > > > Any pointers on why the same fetch of 1 records takes much longer > now? > > Does Solr need to load all the 27M before getting the last 1 records? > > I and many others have run into the same issue. Yes, each windowed > query starts fresh, having to find at least enough records to satisfy > the query, walking the list to discard the first 'start' worth of > them, and then returning the next 'rows' worth. So as 'start' increases, > the work required of Solr increases and the response time lengthens. > > > Is there a better way to do this operation using Solr? > > Another answer in this thread gives links to resources for addressing > the problem, and I can't improve on those. > > I can say that when I switched from start= windowing to cursormark, I > got a very nice improvement in overall speed and did not see the > progressive slowing anymore. A query loop that ran for *days* now > completes in under five minutes. In some way that I haven't quite > figured out, a cursormark tells Solr where in the overall document > sequence to start working. > > So yes, there *is* a better way. > > -- > Mark H. Wood > Lead Technology Analyst > > University Library > Indiana University - Purdue University Indianapolis > 755 W. Michigan Street > Indianapolis, IN 46202 > 317-274-0749 > www.ulib.iupui.edu >
Re: Solr queries slow down over time
Thanks a ton, Dwane. I went through the article and the documentation link. This corresponds exactly to my use case. Best Goutham On Fri, Sep 25, 2020 at 2:59 PM Dwane Hall wrote: > Goutham I suggest you read Hossman's excellent article on deep paging and > why returning rows=(some large number) is a bad idea. It provides an > thorough overview of the concept and will explain it better than I ever > could ( > https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18). > In short if you want to extract that many documents out of your corpus use > cursor mark, streaming expressions, or Solr's parallel SQL interface (that > uses streaming expressions under the hood) > https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html. > > Thanks, > > Dwane > -- > *From:* Goutham Tholpadi > *Sent:* Friday, 25 September 2020 4:19 PM > *To:* solr-user@lucene.apache.org > *Subject:* Solr queries slow down over time > > Hi, > > I have around 30M documents in Solr, and I am doing repeated *:* queries > with rows=1, and changing start to 0, 1, 2, and so on, in a > loop in my script (using pysolr). > > At the start of the iteration, the calls to Solr were taking less than 1 > sec each. After running for a few hours (with start at around 27M) I found > that each call was taking around 30-60 secs. > > Any pointers on why the same fetch of 1 records takes much longer now? > Does Solr need to load all the 27M before getting the last 1 records? > Is there a better way to do this operation using Solr? > > Thanks! > Goutham >
Re: Solr queries slow down over time
On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote: > I have around 30M documents in Solr, and I am doing repeated *:* queries > with rows=1, and changing start to 0, 1, 2, and so on, in a > loop in my script (using pysolr). > > At the start of the iteration, the calls to Solr were taking less than 1 > sec each. After running for a few hours (with start at around 27M) I found > that each call was taking around 30-60 secs. > > Any pointers on why the same fetch of 1 records takes much longer now? > Does Solr need to load all the 27M before getting the last 1 records? I and many others have run into the same issue. Yes, each windowed query starts fresh, having to find at least enough records to satisfy the query, walking the list to discard the first 'start' worth of them, and then returning the next 'rows' worth. So as 'start' increases, the work required of Solr increases and the response time lengthens. > Is there a better way to do this operation using Solr? Another answer in this thread gives links to resources for addressing the problem, and I can't improve on those. I can say that when I switched from start= windowing to cursormark, I got a very nice improvement in overall speed and did not see the progressive slowing anymore. A query loop that ran for *days* now completes in under five minutes. In some way that I haven't quite figured out, a cursormark tells Solr where in the overall document sequence to start working. So yes, there *is* a better way. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu signature.asc Description: PGP signature
Re: Solr queries slow down over time
Goutham I suggest you read Hossman's excellent article on deep paging and why returning rows=(some large number) is a bad idea. It provides an thorough overview of the concept and will explain it better than I ever could (https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18). In short if you want to extract that many documents out of your corpus use cursor mark, streaming expressions, or Solr's parallel SQL interface (that uses streaming expressions under the hood) https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html. Thanks, Dwane From: Goutham Tholpadi Sent: Friday, 25 September 2020 4:19 PM To: solr-user@lucene.apache.org Subject: Solr queries slow down over time Hi, I have around 30M documents in Solr, and I am doing repeated *:* queries with rows=1, and changing start to 0, 1, 2, and so on, in a loop in my script (using pysolr). At the start of the iteration, the calls to Solr were taking less than 1 sec each. After running for a few hours (with start at around 27M) I found that each call was taking around 30-60 secs. Any pointers on why the same fetch of 1 records takes much longer now? Does Solr need to load all the 27M before getting the last 1 records? Is there a better way to do this operation using Solr? Thanks! Goutham
Re: Solr Queries are very slow - Suggestions needed
Thanks Guys. i will try two level document routing in case of file_collection. i really don't understand why index size is high for file_collection as same file is available in main_collection. (each file indexed as one document with all commands in main collection and same file is indexed as number of documents, each command as a solr document in file_collection). will index size grows with more distinct words or few distinct words with more number of documents ? let me know if i have not put the question correctly. Thanks, Anil On 15 March 2016 at 01:00, Susheel Kumarwrote: > If you can find/know which fields (or combination) in your document divides > / groups the data together would be the fields for custom routing. Solr > supports up to two level. > > E.g. if you have field with say documentType or country or etc. would > help. See the document routing at > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > > > On Mon, Mar 14, 2016 at 3:14 PM, Erick Erickson > wrote: > > > Usually I just let the compositeId do its thing and only go for custom > > routing when the default proves inadequate. > > > > Note: your 480M documents may very well be too many for three shards! > > You really have to test > > > > Erick > > > > > > On Mon, Mar 14, 2016 at 10:04 AM, Anil wrote: > > > Hi Erick, > > > In b/w, Do you recommend any effective shard distribution method ? > > > > > > Regards, > > > Anil > > > > > > On 14 March 2016 at 22:30, Erick Erickson > > wrote: > > > > > >> Try shards.info=true, but pinging the shard directly is the most > > certain. > > >> > > >> > > >> Best, > > >> Erick > > >> > > >> On Mon, Mar 14, 2016 at 9:48 AM, Anil wrote: > > >> > HI Erik, > > >> > > > >> > we have used document routing to balance the shards load and for > > >> > expand/collapse. it is mainly used for main_collection which holds > > one to > > >> > many relationship records. In file_collection, it is only for load > > >> > distribution. > > >> > > > >> > 25GB for entire solr service. each machine will act as shard for > some > > >> > collections. > > >> > > > >> > we have not stress tested our servers at least for solr service. i > > have > > >> > read the the link you have shared, i will do something on it. thanks > > for > > >> > sharing. > > >> > > > >> > i have checked other collections, where index size is max 90GB and 5 > > M as > > >> > max number of documents. but for the particular file_collection_2014 > > , i > > >> > see total index size across replicas is 147 GB. > > >> > > > >> > Can we get any hints if we run the query with debugQuery=true ? > what > > is > > >> > the effective way of load distribution ? Please advice. > > >> > > > >> > Regards, > > >> > Anil > > >> > > > >> > On 14 March 2016 at 20:32, Erick Erickson > > >> wrote: > > >> > > > >> >> bq: The slowness is happening for file_collection. though it has 3 > > >> shards, > > >> >> documents are available in 2 shards. shard1 - 150M docs and shard2 > > has > > >> 330M > > >> >> docs , shard3 is empty. > > >> >> > > >> >> Well, this collection terribly balanced. Putting 330M docs on a > > single > > >> >> shard is > > >> >> pushing the limits, the only time I've seen that many docs on a > > shard, > > >> >> particularly > > >> >> with 25G of ram, they were very small records. My guess is that you > > will > > >> >> find > > >> >> the queries you send to that shard substantially slower than the > 150M > > >> >> shard, > > >> >> although 150M could also be pushing your limits. You can measure > this > > >> >> by sending the query to the specific core (something like > > >> >> > > >> >> solr/files_shard1_replica1/query?(your queryhere)=false > > >> >> > > >> >> My bet is that your QTime will be significantly different with the > > two > > >> >> shards. > > >> >> > > >> >> It also sounds like you're using implicit routing where you control > > >> where > > >> >> the > > >> >> files go, it's easy to have unbalanced shards in that case, why did > > you > > >> >> decide > > >> >> to do it this way? There are valid reasons, but... > > >> >> > > >> >> In short, my guess is that you've simply overloaded your shard with > > >> >> 330M docs. It's > > >> >> not at all clear that even 150 will give you satisfactory > > performance, > > >> >> have you stress > > >> >> tested your servers? Here's the long form of sizing: > > >> >> > > >> >> > > >> >> > > >> > > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > >> >> > > >> >> Best, > > >> >> Erick > > >> >> > > >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar < > > susheel2...@gmail.com> > > >> >> wrote: > > >> >> > For each of the solr machines/shards you have. Thanks. > > >> >> > > > >> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar < > > >>
Re: Solr Queries are very slow - Suggestions needed
If you can find/know which fields (or combination) in your document divides / groups the data together would be the fields for custom routing. Solr supports up to two level. E.g. if you have field with say documentType or country or etc. would help. See the document routing at https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud On Mon, Mar 14, 2016 at 3:14 PM, Erick Ericksonwrote: > Usually I just let the compositeId do its thing and only go for custom > routing when the default proves inadequate. > > Note: your 480M documents may very well be too many for three shards! > You really have to test > > Erick > > > On Mon, Mar 14, 2016 at 10:04 AM, Anil wrote: > > Hi Erick, > > In b/w, Do you recommend any effective shard distribution method ? > > > > Regards, > > Anil > > > > On 14 March 2016 at 22:30, Erick Erickson > wrote: > > > >> Try shards.info=true, but pinging the shard directly is the most > certain. > >> > >> > >> Best, > >> Erick > >> > >> On Mon, Mar 14, 2016 at 9:48 AM, Anil wrote: > >> > HI Erik, > >> > > >> > we have used document routing to balance the shards load and for > >> > expand/collapse. it is mainly used for main_collection which holds > one to > >> > many relationship records. In file_collection, it is only for load > >> > distribution. > >> > > >> > 25GB for entire solr service. each machine will act as shard for some > >> > collections. > >> > > >> > we have not stress tested our servers at least for solr service. i > have > >> > read the the link you have shared, i will do something on it. thanks > for > >> > sharing. > >> > > >> > i have checked other collections, where index size is max 90GB and 5 > M as > >> > max number of documents. but for the particular file_collection_2014 > , i > >> > see total index size across replicas is 147 GB. > >> > > >> > Can we get any hints if we run the query with debugQuery=true ? what > is > >> > the effective way of load distribution ? Please advice. > >> > > >> > Regards, > >> > Anil > >> > > >> > On 14 March 2016 at 20:32, Erick Erickson > >> wrote: > >> > > >> >> bq: The slowness is happening for file_collection. though it has 3 > >> shards, > >> >> documents are available in 2 shards. shard1 - 150M docs and shard2 > has > >> 330M > >> >> docs , shard3 is empty. > >> >> > >> >> Well, this collection terribly balanced. Putting 330M docs on a > single > >> >> shard is > >> >> pushing the limits, the only time I've seen that many docs on a > shard, > >> >> particularly > >> >> with 25G of ram, they were very small records. My guess is that you > will > >> >> find > >> >> the queries you send to that shard substantially slower than the 150M > >> >> shard, > >> >> although 150M could also be pushing your limits. You can measure this > >> >> by sending the query to the specific core (something like > >> >> > >> >> solr/files_shard1_replica1/query?(your queryhere)=false > >> >> > >> >> My bet is that your QTime will be significantly different with the > two > >> >> shards. > >> >> > >> >> It also sounds like you're using implicit routing where you control > >> where > >> >> the > >> >> files go, it's easy to have unbalanced shards in that case, why did > you > >> >> decide > >> >> to do it this way? There are valid reasons, but... > >> >> > >> >> In short, my guess is that you've simply overloaded your shard with > >> >> 330M docs. It's > >> >> not at all clear that even 150 will give you satisfactory > performance, > >> >> have you stress > >> >> tested your servers? Here's the long form of sizing: > >> >> > >> >> > >> >> > >> > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > >> >> > >> >> Best, > >> >> Erick > >> >> > >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar < > susheel2...@gmail.com> > >> >> wrote: > >> >> > For each of the solr machines/shards you have. Thanks. > >> >> > > >> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar < > >> susheel2...@gmail.com> > >> >> > wrote: > >> >> > > >> >> >> Hello Anil, > >> >> >> > >> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > >> >> >> parameters under System / share the snapshot. ? > >> >> >> > >> >> >> Thanks, > >> >> >> Susheel > >> >> >> > >> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > >> >> >> > >> >> >>> HI Toke and Jack, > >> >> >>> > >> >> >>> Please find the details below. > >> >> >>> > >> >> >>> * How large are your 3 shards in bytes? (total index across > >> replicas) > >> >> >>> -- *146G. i am using CDH (cloudera), not sure how to > >> check > >> >> the > >> >> >>> index size of each collection on each shard* > >> >> >>> * What storage system do you use (local SSD, local spinning > drives, > >> >> remote > >> >> >>> storage...)? *Local (hdfs) spinning drives* > >> >> >>> * How much
Re: Solr Queries are very slow - Suggestions needed
Usually I just let the compositeId do its thing and only go for custom routing when the default proves inadequate. Note: your 480M documents may very well be too many for three shards! You really have to test Erick On Mon, Mar 14, 2016 at 10:04 AM, Anilwrote: > Hi Erick, > In b/w, Do you recommend any effective shard distribution method ? > > Regards, > Anil > > On 14 March 2016 at 22:30, Erick Erickson wrote: > >> Try shards.info=true, but pinging the shard directly is the most certain. >> >> >> Best, >> Erick >> >> On Mon, Mar 14, 2016 at 9:48 AM, Anil wrote: >> > HI Erik, >> > >> > we have used document routing to balance the shards load and for >> > expand/collapse. it is mainly used for main_collection which holds one to >> > many relationship records. In file_collection, it is only for load >> > distribution. >> > >> > 25GB for entire solr service. each machine will act as shard for some >> > collections. >> > >> > we have not stress tested our servers at least for solr service. i have >> > read the the link you have shared, i will do something on it. thanks for >> > sharing. >> > >> > i have checked other collections, where index size is max 90GB and 5 M as >> > max number of documents. but for the particular file_collection_2014 , i >> > see total index size across replicas is 147 GB. >> > >> > Can we get any hints if we run the query with debugQuery=true ? what is >> > the effective way of load distribution ? Please advice. >> > >> > Regards, >> > Anil >> > >> > On 14 March 2016 at 20:32, Erick Erickson >> wrote: >> > >> >> bq: The slowness is happening for file_collection. though it has 3 >> shards, >> >> documents are available in 2 shards. shard1 - 150M docs and shard2 has >> 330M >> >> docs , shard3 is empty. >> >> >> >> Well, this collection terribly balanced. Putting 330M docs on a single >> >> shard is >> >> pushing the limits, the only time I've seen that many docs on a shard, >> >> particularly >> >> with 25G of ram, they were very small records. My guess is that you will >> >> find >> >> the queries you send to that shard substantially slower than the 150M >> >> shard, >> >> although 150M could also be pushing your limits. You can measure this >> >> by sending the query to the specific core (something like >> >> >> >> solr/files_shard1_replica1/query?(your queryhere)=false >> >> >> >> My bet is that your QTime will be significantly different with the two >> >> shards. >> >> >> >> It also sounds like you're using implicit routing where you control >> where >> >> the >> >> files go, it's easy to have unbalanced shards in that case, why did you >> >> decide >> >> to do it this way? There are valid reasons, but... >> >> >> >> In short, my guess is that you've simply overloaded your shard with >> >> 330M docs. It's >> >> not at all clear that even 150 will give you satisfactory performance, >> >> have you stress >> >> tested your servers? Here's the long form of sizing: >> >> >> >> >> >> >> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >> >> >> >> Best, >> >> Erick >> >> >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar >> >> wrote: >> >> > For each of the solr machines/shards you have. Thanks. >> >> > >> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar < >> susheel2...@gmail.com> >> >> > wrote: >> >> > >> >> >> Hello Anil, >> >> >> >> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory >> >> >> parameters under System / share the snapshot. ? >> >> >> >> >> >> Thanks, >> >> >> Susheel >> >> >> >> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: >> >> >> >> >> >>> HI Toke and Jack, >> >> >>> >> >> >>> Please find the details below. >> >> >>> >> >> >>> * How large are your 3 shards in bytes? (total index across >> replicas) >> >> >>> -- *146G. i am using CDH (cloudera), not sure how to >> check >> >> the >> >> >>> index size of each collection on each shard* >> >> >>> * What storage system do you use (local SSD, local spinning drives, >> >> remote >> >> >>> storage...)? *Local (hdfs) spinning drives* >> >> >>> * How much physical memory does your system have? *we have 15 data >> >> nodes. >> >> >>> multiple services installed on each data node (252 GB RAM for each >> data >> >> >>> node). 25 gb RAM allocated for solr service.* >> >> >>> * How much memory is free for disk cache? *i could not find.* >> >> >>> * How many concurrent queries do you issue? *very less. i dont see >> any >> >> >>> concurrent queries to this file_collection for now.* >> >> >>> * Do you update while you search? *Yes.. its very less.* >> >> >>> * What does a full query (rows, faceting, grouping, highlighting, >> >> >>> everything) look like? *for the file_collection, rows - 100, >> >> highlights = >> >> >>> false, no facets, expand = false.* >> >> >>> * How many documents
Re: Solr Queries are very slow - Suggestions needed
Hi Erick, In b/w, Do you recommend any effective shard distribution method ? Regards, Anil On 14 March 2016 at 22:30, Erick Ericksonwrote: > Try shards.info=true, but pinging the shard directly is the most certain. > > > Best, > Erick > > On Mon, Mar 14, 2016 at 9:48 AM, Anil wrote: > > HI Erik, > > > > we have used document routing to balance the shards load and for > > expand/collapse. it is mainly used for main_collection which holds one to > > many relationship records. In file_collection, it is only for load > > distribution. > > > > 25GB for entire solr service. each machine will act as shard for some > > collections. > > > > we have not stress tested our servers at least for solr service. i have > > read the the link you have shared, i will do something on it. thanks for > > sharing. > > > > i have checked other collections, where index size is max 90GB and 5 M as > > max number of documents. but for the particular file_collection_2014 , i > > see total index size across replicas is 147 GB. > > > > Can we get any hints if we run the query with debugQuery=true ? what is > > the effective way of load distribution ? Please advice. > > > > Regards, > > Anil > > > > On 14 March 2016 at 20:32, Erick Erickson > wrote: > > > >> bq: The slowness is happening for file_collection. though it has 3 > shards, > >> documents are available in 2 shards. shard1 - 150M docs and shard2 has > 330M > >> docs , shard3 is empty. > >> > >> Well, this collection terribly balanced. Putting 330M docs on a single > >> shard is > >> pushing the limits, the only time I've seen that many docs on a shard, > >> particularly > >> with 25G of ram, they were very small records. My guess is that you will > >> find > >> the queries you send to that shard substantially slower than the 150M > >> shard, > >> although 150M could also be pushing your limits. You can measure this > >> by sending the query to the specific core (something like > >> > >> solr/files_shard1_replica1/query?(your queryhere)=false > >> > >> My bet is that your QTime will be significantly different with the two > >> shards. > >> > >> It also sounds like you're using implicit routing where you control > where > >> the > >> files go, it's easy to have unbalanced shards in that case, why did you > >> decide > >> to do it this way? There are valid reasons, but... > >> > >> In short, my guess is that you've simply overloaded your shard with > >> 330M docs. It's > >> not at all clear that even 150 will give you satisfactory performance, > >> have you stress > >> tested your servers? Here's the long form of sizing: > >> > >> > >> > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > >> > >> Best, > >> Erick > >> > >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar > >> wrote: > >> > For each of the solr machines/shards you have. Thanks. > >> > > >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar < > susheel2...@gmail.com> > >> > wrote: > >> > > >> >> Hello Anil, > >> >> > >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > >> >> parameters under System / share the snapshot. ? > >> >> > >> >> Thanks, > >> >> Susheel > >> >> > >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > >> >> > >> >>> HI Toke and Jack, > >> >>> > >> >>> Please find the details below. > >> >>> > >> >>> * How large are your 3 shards in bytes? (total index across > replicas) > >> >>> -- *146G. i am using CDH (cloudera), not sure how to > check > >> the > >> >>> index size of each collection on each shard* > >> >>> * What storage system do you use (local SSD, local spinning drives, > >> remote > >> >>> storage...)? *Local (hdfs) spinning drives* > >> >>> * How much physical memory does your system have? *we have 15 data > >> nodes. > >> >>> multiple services installed on each data node (252 GB RAM for each > data > >> >>> node). 25 gb RAM allocated for solr service.* > >> >>> * How much memory is free for disk cache? *i could not find.* > >> >>> * How many concurrent queries do you issue? *very less. i dont see > any > >> >>> concurrent queries to this file_collection for now.* > >> >>> * Do you update while you search? *Yes.. its very less.* > >> >>> * What does a full query (rows, faceting, grouping, highlighting, > >> >>> everything) look like? *for the file_collection, rows - 100, > >> highlights = > >> >>> false, no facets, expand = false.* > >> >>> * How many documents does a typical query match (hitcount)? *it > varies > >> >>> with > >> >>> each file. i have sort on int field to order commands in the query.* > >> >>> > >> >>> we have two sets of collections on solr cluster ( 17 data nodes) > >> >>> > >> >>> 1. main_collection - collection created per year. each collection > uses > >> 8 > >> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc > >> >>> > >> >>> 2.
Re: Solr Queries are very slow - Suggestions needed
thanks Eric. i will try that. Some how i am not able to run a query on the shard directly because of kerberos. i even tried curl --negotiate. Regards, Anil On 14 March 2016 at 22:30, Erick Ericksonwrote: > Try shards.info=true, but pinging the shard directly is the most certain. > > > Best, > Erick > > On Mon, Mar 14, 2016 at 9:48 AM, Anil wrote: > > HI Erik, > > > > we have used document routing to balance the shards load and for > > expand/collapse. it is mainly used for main_collection which holds one to > > many relationship records. In file_collection, it is only for load > > distribution. > > > > 25GB for entire solr service. each machine will act as shard for some > > collections. > > > > we have not stress tested our servers at least for solr service. i have > > read the the link you have shared, i will do something on it. thanks for > > sharing. > > > > i have checked other collections, where index size is max 90GB and 5 M as > > max number of documents. but for the particular file_collection_2014 , i > > see total index size across replicas is 147 GB. > > > > Can we get any hints if we run the query with debugQuery=true ? what is > > the effective way of load distribution ? Please advice. > > > > Regards, > > Anil > > > > On 14 March 2016 at 20:32, Erick Erickson > wrote: > > > >> bq: The slowness is happening for file_collection. though it has 3 > shards, > >> documents are available in 2 shards. shard1 - 150M docs and shard2 has > 330M > >> docs , shard3 is empty. > >> > >> Well, this collection terribly balanced. Putting 330M docs on a single > >> shard is > >> pushing the limits, the only time I've seen that many docs on a shard, > >> particularly > >> with 25G of ram, they were very small records. My guess is that you will > >> find > >> the queries you send to that shard substantially slower than the 150M > >> shard, > >> although 150M could also be pushing your limits. You can measure this > >> by sending the query to the specific core (something like > >> > >> solr/files_shard1_replica1/query?(your queryhere)=false > >> > >> My bet is that your QTime will be significantly different with the two > >> shards. > >> > >> It also sounds like you're using implicit routing where you control > where > >> the > >> files go, it's easy to have unbalanced shards in that case, why did you > >> decide > >> to do it this way? There are valid reasons, but... > >> > >> In short, my guess is that you've simply overloaded your shard with > >> 330M docs. It's > >> not at all clear that even 150 will give you satisfactory performance, > >> have you stress > >> tested your servers? Here's the long form of sizing: > >> > >> > >> > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > >> > >> Best, > >> Erick > >> > >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar > >> wrote: > >> > For each of the solr machines/shards you have. Thanks. > >> > > >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar < > susheel2...@gmail.com> > >> > wrote: > >> > > >> >> Hello Anil, > >> >> > >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > >> >> parameters under System / share the snapshot. ? > >> >> > >> >> Thanks, > >> >> Susheel > >> >> > >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > >> >> > >> >>> HI Toke and Jack, > >> >>> > >> >>> Please find the details below. > >> >>> > >> >>> * How large are your 3 shards in bytes? (total index across > replicas) > >> >>> -- *146G. i am using CDH (cloudera), not sure how to > check > >> the > >> >>> index size of each collection on each shard* > >> >>> * What storage system do you use (local SSD, local spinning drives, > >> remote > >> >>> storage...)? *Local (hdfs) spinning drives* > >> >>> * How much physical memory does your system have? *we have 15 data > >> nodes. > >> >>> multiple services installed on each data node (252 GB RAM for each > data > >> >>> node). 25 gb RAM allocated for solr service.* > >> >>> * How much memory is free for disk cache? *i could not find.* > >> >>> * How many concurrent queries do you issue? *very less. i dont see > any > >> >>> concurrent queries to this file_collection for now.* > >> >>> * Do you update while you search? *Yes.. its very less.* > >> >>> * What does a full query (rows, faceting, grouping, highlighting, > >> >>> everything) look like? *for the file_collection, rows - 100, > >> highlights = > >> >>> false, no facets, expand = false.* > >> >>> * How many documents does a typical query match (hitcount)? *it > varies > >> >>> with > >> >>> each file. i have sort on int field to order commands in the query.* > >> >>> > >> >>> we have two sets of collections on solr cluster ( 17 data nodes) > >> >>> > >> >>> 1. main_collection - collection created per year. each collection > uses > >> 8 > >> >>> shards 2 replicas ex:
Re: Solr Queries are very slow - Suggestions needed
Try shards.info=true, but pinging the shard directly is the most certain. Best, Erick On Mon, Mar 14, 2016 at 9:48 AM, Anilwrote: > HI Erik, > > we have used document routing to balance the shards load and for > expand/collapse. it is mainly used for main_collection which holds one to > many relationship records. In file_collection, it is only for load > distribution. > > 25GB for entire solr service. each machine will act as shard for some > collections. > > we have not stress tested our servers at least for solr service. i have > read the the link you have shared, i will do something on it. thanks for > sharing. > > i have checked other collections, where index size is max 90GB and 5 M as > max number of documents. but for the particular file_collection_2014 , i > see total index size across replicas is 147 GB. > > Can we get any hints if we run the query with debugQuery=true ? what is > the effective way of load distribution ? Please advice. > > Regards, > Anil > > On 14 March 2016 at 20:32, Erick Erickson wrote: > >> bq: The slowness is happening for file_collection. though it has 3 shards, >> documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M >> docs , shard3 is empty. >> >> Well, this collection terribly balanced. Putting 330M docs on a single >> shard is >> pushing the limits, the only time I've seen that many docs on a shard, >> particularly >> with 25G of ram, they were very small records. My guess is that you will >> find >> the queries you send to that shard substantially slower than the 150M >> shard, >> although 150M could also be pushing your limits. You can measure this >> by sending the query to the specific core (something like >> >> solr/files_shard1_replica1/query?(your queryhere)=false >> >> My bet is that your QTime will be significantly different with the two >> shards. >> >> It also sounds like you're using implicit routing where you control where >> the >> files go, it's easy to have unbalanced shards in that case, why did you >> decide >> to do it this way? There are valid reasons, but... >> >> In short, my guess is that you've simply overloaded your shard with >> 330M docs. It's >> not at all clear that even 150 will give you satisfactory performance, >> have you stress >> tested your servers? Here's the long form of sizing: >> >> >> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >> >> Best, >> Erick >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar >> wrote: >> > For each of the solr machines/shards you have. Thanks. >> > >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar >> > wrote: >> > >> >> Hello Anil, >> >> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory >> >> parameters under System / share the snapshot. ? >> >> >> >> Thanks, >> >> Susheel >> >> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: >> >> >> >>> HI Toke and Jack, >> >>> >> >>> Please find the details below. >> >>> >> >>> * How large are your 3 shards in bytes? (total index across replicas) >> >>> -- *146G. i am using CDH (cloudera), not sure how to check >> the >> >>> index size of each collection on each shard* >> >>> * What storage system do you use (local SSD, local spinning drives, >> remote >> >>> storage...)? *Local (hdfs) spinning drives* >> >>> * How much physical memory does your system have? *we have 15 data >> nodes. >> >>> multiple services installed on each data node (252 GB RAM for each data >> >>> node). 25 gb RAM allocated for solr service.* >> >>> * How much memory is free for disk cache? *i could not find.* >> >>> * How many concurrent queries do you issue? *very less. i dont see any >> >>> concurrent queries to this file_collection for now.* >> >>> * Do you update while you search? *Yes.. its very less.* >> >>> * What does a full query (rows, faceting, grouping, highlighting, >> >>> everything) look like? *for the file_collection, rows - 100, >> highlights = >> >>> false, no facets, expand = false.* >> >>> * How many documents does a typical query match (hitcount)? *it varies >> >>> with >> >>> each file. i have sort on int field to order commands in the query.* >> >>> >> >>> we have two sets of collections on solr cluster ( 17 data nodes) >> >>> >> >>> 1. main_collection - collection created per year. each collection uses >> 8 >> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc >> >>> >> >>> 2. file_collection (where files having commands are indexed) - >> collection >> >>> created per 2 years. it uses 3 shards and 2 replicas. ex : >> >>> file_collection_2014, file_collection_2016 >> >>> >> >>> The slowness is happening for file_collection. though it has 3 shards, >> >>> documents are available in 2 shards. shard1 - 150M docs and shard2 has >> >>> 330M >> >>> docs , shard3 is empty. >> >>> >> >>> main_collection is looks
Re: Solr Queries are very slow - Suggestions needed
HI Erik, we have used document routing to balance the shards load and for expand/collapse. it is mainly used for main_collection which holds one to many relationship records. In file_collection, it is only for load distribution. 25GB for entire solr service. each machine will act as shard for some collections. we have not stress tested our servers at least for solr service. i have read the the link you have shared, i will do something on it. thanks for sharing. i have checked other collections, where index size is max 90GB and 5 M as max number of documents. but for the particular file_collection_2014 , i see total index size across replicas is 147 GB. Can we get any hints if we run the query with debugQuery=true ? what is the effective way of load distribution ? Please advice. Regards, Anil On 14 March 2016 at 20:32, Erick Ericksonwrote: > bq: The slowness is happening for file_collection. though it has 3 shards, > documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M > docs , shard3 is empty. > > Well, this collection terribly balanced. Putting 330M docs on a single > shard is > pushing the limits, the only time I've seen that many docs on a shard, > particularly > with 25G of ram, they were very small records. My guess is that you will > find > the queries you send to that shard substantially slower than the 150M > shard, > although 150M could also be pushing your limits. You can measure this > by sending the query to the specific core (something like > > solr/files_shard1_replica1/query?(your queryhere)=false > > My bet is that your QTime will be significantly different with the two > shards. > > It also sounds like you're using implicit routing where you control where > the > files go, it's easy to have unbalanced shards in that case, why did you > decide > to do it this way? There are valid reasons, but... > > In short, my guess is that you've simply overloaded your shard with > 330M docs. It's > not at all clear that even 150 will give you satisfactory performance, > have you stress > tested your servers? Here's the long form of sizing: > > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > Best, > Erick > > On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar > wrote: > > For each of the solr machines/shards you have. Thanks. > > > > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar > > wrote: > > > >> Hello Anil, > >> > >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > >> parameters under System / share the snapshot. ? > >> > >> Thanks, > >> Susheel > >> > >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > >> > >>> HI Toke and Jack, > >>> > >>> Please find the details below. > >>> > >>> * How large are your 3 shards in bytes? (total index across replicas) > >>> -- *146G. i am using CDH (cloudera), not sure how to check > the > >>> index size of each collection on each shard* > >>> * What storage system do you use (local SSD, local spinning drives, > remote > >>> storage...)? *Local (hdfs) spinning drives* > >>> * How much physical memory does your system have? *we have 15 data > nodes. > >>> multiple services installed on each data node (252 GB RAM for each data > >>> node). 25 gb RAM allocated for solr service.* > >>> * How much memory is free for disk cache? *i could not find.* > >>> * How many concurrent queries do you issue? *very less. i dont see any > >>> concurrent queries to this file_collection for now.* > >>> * Do you update while you search? *Yes.. its very less.* > >>> * What does a full query (rows, faceting, grouping, highlighting, > >>> everything) look like? *for the file_collection, rows - 100, > highlights = > >>> false, no facets, expand = false.* > >>> * How many documents does a typical query match (hitcount)? *it varies > >>> with > >>> each file. i have sort on int field to order commands in the query.* > >>> > >>> we have two sets of collections on solr cluster ( 17 data nodes) > >>> > >>> 1. main_collection - collection created per year. each collection uses > 8 > >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc > >>> > >>> 2. file_collection (where files having commands are indexed) - > collection > >>> created per 2 years. it uses 3 shards and 2 replicas. ex : > >>> file_collection_2014, file_collection_2016 > >>> > >>> The slowness is happening for file_collection. though it has 3 shards, > >>> documents are available in 2 shards. shard1 - 150M docs and shard2 has > >>> 330M > >>> docs , shard3 is empty. > >>> > >>> main_collection is looks good. > >>> > >>> please let me know if you need any additional details. > >>> > >>> Regards, > >>> Anil > >>> > >>> > >>> On 13 March 2016 at 21:48, Anil wrote: > >>> > >>> > Thanks Toke and Jack. > >>> > > >>> > Jack, > >>> > > >>> > Yes. it is 480 million :) > >>> > > >>> >
Re: Solr Queries are very slow - Suggestions needed
Hi Shusheel, we have enabled kerberos. so solr is accessed using Hue only. i will check if I can get the similar information using Hue. Thanks. Regards, Anil On 14 March 2016 at 19:34, Susheel Kumarwrote: > Hello Anil, > > Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > parameters under System / share the snapshot. ? > > Thanks, > Susheel > > On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > > > HI Toke and Jack, > > > > Please find the details below. > > > > * How large are your 3 shards in bytes? (total index across replicas) > > -- *146G. i am using CDH (cloudera), not sure how to check the > > index size of each collection on each shard* > > * What storage system do you use (local SSD, local spinning drives, > remote > > storage...)? *Local (hdfs) spinning drives* > > * How much physical memory does your system have? *we have 15 data nodes. > > multiple services installed on each data node (252 GB RAM for each data > > node). 25 gb RAM allocated for solr service.* > > * How much memory is free for disk cache? *i could not find.* > > * How many concurrent queries do you issue? *very less. i dont see any > > concurrent queries to this file_collection for now.* > > * Do you update while you search? *Yes.. its very less.* > > * What does a full query (rows, faceting, grouping, highlighting, > > everything) look like? *for the file_collection, rows - 100, highlights = > > false, no facets, expand = false.* > > * How many documents does a typical query match (hitcount)? *it varies > with > > each file. i have sort on int field to order commands in the query.* > > > > we have two sets of collections on solr cluster ( 17 data nodes) > > > > 1. main_collection - collection created per year. each collection uses 8 > > shards 2 replicas ex: main_collection_2016, main_collection_2015 etc > > > > 2. file_collection (where files having commands are indexed) - collection > > created per 2 years. it uses 3 shards and 2 replicas. ex : > > file_collection_2014, file_collection_2016 > > > > The slowness is happening for file_collection. though it has 3 shards, > > documents are available in 2 shards. shard1 - 150M docs and shard2 has > 330M > > docs , shard3 is empty. > > > > main_collection is looks good. > > > > please let me know if you need any additional details. > > > > Regards, > > Anil > > > > > > On 13 March 2016 at 21:48, Anil wrote: > > > > > Thanks Toke and Jack. > > > > > > Jack, > > > > > > Yes. it is 480 million :) > > > > > > I will share the additional details soon. thanks. > > > > > > > > > Regards, > > > Anil > > > > > > > > > > > > > > > > > > On 13 March 2016 at 21:06, Jack Krupansky > > > wrote: > > > > > >> (We should have a wiki/doc page for the "usual list of suspects" when > > >> queries are/appear slow, rather than need to repeat the same mantra(s) > > for > > >> every inquiry on this topic.) > > >> > > >> > > >> -- Jack Krupansky > > >> > > >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen < > > t...@statsbiblioteket.dk> > > >> wrote: > > >> > > >> > Anil wrote: > > >> > > i have indexed a data (commands from files) with 10 fields and 3 > of > > >> them > > >> > is > > >> > > text fields. collection is created with 3 shards and 2 replicas. I > > >> have > > >> > > used document routing as well. > > >> > > > >> > > Currently collection holds 47,80,01,405 records. > > >> > > > >> > ...480 million, right? Funny digit grouping in India. > > >> > > > >> > > text search against text field taking around 5 sec. solr is query > > just > > >> > and > > >> > > of two terms with fl as 7 fields > > >> > > > >> > > fileId:"file unique id" AND command_text:(system login) > > >> > > > >> > While not an impressive response time, it might just be that your > > >> hardware > > >> > is not enough to handle that amount of documents. The usual culprit > is > > >> IO > > >> > speed, so chances are you have a system with spinning drives and not > > >> enough > > >> > RAM: Switch to SSD and/or add more RAM. > > >> > > > >> > To give better advice, we need more information. > > >> > > > >> > * How large are your 3 shards in bytes? > > >> > * What storage system do you use (local SSD, local spinning drives, > > >> remote > > >> > storage...)? > > >> > * How much physical memory does your system have? > > >> > * How much memory is free for disk cache? > > >> > * How many concurrent queries do you issue? > > >> > * Do you update while you search? > > >> > * What does a full query (rows, faceting, grouping, highlighting, > > >> > everything) look like? > > >> > * How many documents does a typical query match (hitcount)? > > >> > > > >> > - Toke Eskildsen > > >> > > > >> > > > > > > > > >
Re: Solr Queries are very slow - Suggestions needed
bq: The slowness is happening for file_collection. though it has 3 shards, documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M docs , shard3 is empty. Well, this collection terribly balanced. Putting 330M docs on a single shard is pushing the limits, the only time I've seen that many docs on a shard, particularly with 25G of ram, they were very small records. My guess is that you will find the queries you send to that shard substantially slower than the 150M shard, although 150M could also be pushing your limits. You can measure this by sending the query to the specific core (something like solr/files_shard1_replica1/query?(your queryhere)=false My bet is that your QTime will be significantly different with the two shards. It also sounds like you're using implicit routing where you control where the files go, it's easy to have unbalanced shards in that case, why did you decide to do it this way? There are valid reasons, but... In short, my guess is that you've simply overloaded your shard with 330M docs. It's not at all clear that even 150 will give you satisfactory performance, have you stress tested your servers? Here's the long form of sizing: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumarwrote: > For each of the solr machines/shards you have. Thanks. > > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar > wrote: > >> Hello Anil, >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory >> parameters under System / share the snapshot. ? >> >> Thanks, >> Susheel >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: >> >>> HI Toke and Jack, >>> >>> Please find the details below. >>> >>> * How large are your 3 shards in bytes? (total index across replicas) >>> -- *146G. i am using CDH (cloudera), not sure how to check the >>> index size of each collection on each shard* >>> * What storage system do you use (local SSD, local spinning drives, remote >>> storage...)? *Local (hdfs) spinning drives* >>> * How much physical memory does your system have? *we have 15 data nodes. >>> multiple services installed on each data node (252 GB RAM for each data >>> node). 25 gb RAM allocated for solr service.* >>> * How much memory is free for disk cache? *i could not find.* >>> * How many concurrent queries do you issue? *very less. i dont see any >>> concurrent queries to this file_collection for now.* >>> * Do you update while you search? *Yes.. its very less.* >>> * What does a full query (rows, faceting, grouping, highlighting, >>> everything) look like? *for the file_collection, rows - 100, highlights = >>> false, no facets, expand = false.* >>> * How many documents does a typical query match (hitcount)? *it varies >>> with >>> each file. i have sort on int field to order commands in the query.* >>> >>> we have two sets of collections on solr cluster ( 17 data nodes) >>> >>> 1. main_collection - collection created per year. each collection uses 8 >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc >>> >>> 2. file_collection (where files having commands are indexed) - collection >>> created per 2 years. it uses 3 shards and 2 replicas. ex : >>> file_collection_2014, file_collection_2016 >>> >>> The slowness is happening for file_collection. though it has 3 shards, >>> documents are available in 2 shards. shard1 - 150M docs and shard2 has >>> 330M >>> docs , shard3 is empty. >>> >>> main_collection is looks good. >>> >>> please let me know if you need any additional details. >>> >>> Regards, >>> Anil >>> >>> >>> On 13 March 2016 at 21:48, Anil wrote: >>> >>> > Thanks Toke and Jack. >>> > >>> > Jack, >>> > >>> > Yes. it is 480 million :) >>> > >>> > I will share the additional details soon. thanks. >>> > >>> > >>> > Regards, >>> > Anil >>> > >>> > >>> > >>> > >>> > >>> > On 13 March 2016 at 21:06, Jack Krupansky >>> > wrote: >>> > >>> >> (We should have a wiki/doc page for the "usual list of suspects" when >>> >> queries are/appear slow, rather than need to repeat the same mantra(s) >>> for >>> >> every inquiry on this topic.) >>> >> >>> >> >>> >> -- Jack Krupansky >>> >> >>> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen < >>> t...@statsbiblioteket.dk> >>> >> wrote: >>> >> >>> >> > Anil wrote: >>> >> > > i have indexed a data (commands from files) with 10 fields and 3 of >>> >> them >>> >> > is >>> >> > > text fields. collection is created with 3 shards and 2 replicas. I >>> >> have >>> >> > > used document routing as well. >>> >> > >>> >> > > Currently collection holds 47,80,01,405 records. >>> >> > >>> >> > ...480 million, right? Funny digit grouping in India. >>> >> > >>> >> > > text search against text field taking around 5 sec. solr is query >>> just >>> >> > and >>> >> > > of
Re: Solr Queries are very slow - Suggestions needed
For each of the solr machines/shards you have. Thanks. On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumarwrote: > Hello Anil, > > Can you go to Solr Admin Panel -> Dashboard and share all 4 memory > parameters under System / share the snapshot. ? > > Thanks, > Susheel > > On Mon, Mar 14, 2016 at 5:36 AM, Anil wrote: > >> HI Toke and Jack, >> >> Please find the details below. >> >> * How large are your 3 shards in bytes? (total index across replicas) >> -- *146G. i am using CDH (cloudera), not sure how to check the >> index size of each collection on each shard* >> * What storage system do you use (local SSD, local spinning drives, remote >> storage...)? *Local (hdfs) spinning drives* >> * How much physical memory does your system have? *we have 15 data nodes. >> multiple services installed on each data node (252 GB RAM for each data >> node). 25 gb RAM allocated for solr service.* >> * How much memory is free for disk cache? *i could not find.* >> * How many concurrent queries do you issue? *very less. i dont see any >> concurrent queries to this file_collection for now.* >> * Do you update while you search? *Yes.. its very less.* >> * What does a full query (rows, faceting, grouping, highlighting, >> everything) look like? *for the file_collection, rows - 100, highlights = >> false, no facets, expand = false.* >> * How many documents does a typical query match (hitcount)? *it varies >> with >> each file. i have sort on int field to order commands in the query.* >> >> we have two sets of collections on solr cluster ( 17 data nodes) >> >> 1. main_collection - collection created per year. each collection uses 8 >> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc >> >> 2. file_collection (where files having commands are indexed) - collection >> created per 2 years. it uses 3 shards and 2 replicas. ex : >> file_collection_2014, file_collection_2016 >> >> The slowness is happening for file_collection. though it has 3 shards, >> documents are available in 2 shards. shard1 - 150M docs and shard2 has >> 330M >> docs , shard3 is empty. >> >> main_collection is looks good. >> >> please let me know if you need any additional details. >> >> Regards, >> Anil >> >> >> On 13 March 2016 at 21:48, Anil wrote: >> >> > Thanks Toke and Jack. >> > >> > Jack, >> > >> > Yes. it is 480 million :) >> > >> > I will share the additional details soon. thanks. >> > >> > >> > Regards, >> > Anil >> > >> > >> > >> > >> > >> > On 13 March 2016 at 21:06, Jack Krupansky >> > wrote: >> > >> >> (We should have a wiki/doc page for the "usual list of suspects" when >> >> queries are/appear slow, rather than need to repeat the same mantra(s) >> for >> >> every inquiry on this topic.) >> >> >> >> >> >> -- Jack Krupansky >> >> >> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen < >> t...@statsbiblioteket.dk> >> >> wrote: >> >> >> >> > Anil wrote: >> >> > > i have indexed a data (commands from files) with 10 fields and 3 of >> >> them >> >> > is >> >> > > text fields. collection is created with 3 shards and 2 replicas. I >> >> have >> >> > > used document routing as well. >> >> > >> >> > > Currently collection holds 47,80,01,405 records. >> >> > >> >> > ...480 million, right? Funny digit grouping in India. >> >> > >> >> > > text search against text field taking around 5 sec. solr is query >> just >> >> > and >> >> > > of two terms with fl as 7 fields >> >> > >> >> > > fileId:"file unique id" AND command_text:(system login) >> >> > >> >> > While not an impressive response time, it might just be that your >> >> hardware >> >> > is not enough to handle that amount of documents. The usual culprit >> is >> >> IO >> >> > speed, so chances are you have a system with spinning drives and not >> >> enough >> >> > RAM: Switch to SSD and/or add more RAM. >> >> > >> >> > To give better advice, we need more information. >> >> > >> >> > * How large are your 3 shards in bytes? >> >> > * What storage system do you use (local SSD, local spinning drives, >> >> remote >> >> > storage...)? >> >> > * How much physical memory does your system have? >> >> > * How much memory is free for disk cache? >> >> > * How many concurrent queries do you issue? >> >> > * Do you update while you search? >> >> > * What does a full query (rows, faceting, grouping, highlighting, >> >> > everything) look like? >> >> > * How many documents does a typical query match (hitcount)? >> >> > >> >> > - Toke Eskildsen >> >> > >> >> >> > >> > >> > >
Re: Solr Queries are very slow - Suggestions needed
Hello Anil, Can you go to Solr Admin Panel -> Dashboard and share all 4 memory parameters under System / share the snapshot. ? Thanks, Susheel On Mon, Mar 14, 2016 at 5:36 AM, Anilwrote: > HI Toke and Jack, > > Please find the details below. > > * How large are your 3 shards in bytes? (total index across replicas) > -- *146G. i am using CDH (cloudera), not sure how to check the > index size of each collection on each shard* > * What storage system do you use (local SSD, local spinning drives, remote > storage...)? *Local (hdfs) spinning drives* > * How much physical memory does your system have? *we have 15 data nodes. > multiple services installed on each data node (252 GB RAM for each data > node). 25 gb RAM allocated for solr service.* > * How much memory is free for disk cache? *i could not find.* > * How many concurrent queries do you issue? *very less. i dont see any > concurrent queries to this file_collection for now.* > * Do you update while you search? *Yes.. its very less.* > * What does a full query (rows, faceting, grouping, highlighting, > everything) look like? *for the file_collection, rows - 100, highlights = > false, no facets, expand = false.* > * How many documents does a typical query match (hitcount)? *it varies with > each file. i have sort on int field to order commands in the query.* > > we have two sets of collections on solr cluster ( 17 data nodes) > > 1. main_collection - collection created per year. each collection uses 8 > shards 2 replicas ex: main_collection_2016, main_collection_2015 etc > > 2. file_collection (where files having commands are indexed) - collection > created per 2 years. it uses 3 shards and 2 replicas. ex : > file_collection_2014, file_collection_2016 > > The slowness is happening for file_collection. though it has 3 shards, > documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M > docs , shard3 is empty. > > main_collection is looks good. > > please let me know if you need any additional details. > > Regards, > Anil > > > On 13 March 2016 at 21:48, Anil wrote: > > > Thanks Toke and Jack. > > > > Jack, > > > > Yes. it is 480 million :) > > > > I will share the additional details soon. thanks. > > > > > > Regards, > > Anil > > > > > > > > > > > > On 13 March 2016 at 21:06, Jack Krupansky > > wrote: > > > >> (We should have a wiki/doc page for the "usual list of suspects" when > >> queries are/appear slow, rather than need to repeat the same mantra(s) > for > >> every inquiry on this topic.) > >> > >> > >> -- Jack Krupansky > >> > >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen < > t...@statsbiblioteket.dk> > >> wrote: > >> > >> > Anil wrote: > >> > > i have indexed a data (commands from files) with 10 fields and 3 of > >> them > >> > is > >> > > text fields. collection is created with 3 shards and 2 replicas. I > >> have > >> > > used document routing as well. > >> > > >> > > Currently collection holds 47,80,01,405 records. > >> > > >> > ...480 million, right? Funny digit grouping in India. > >> > > >> > > text search against text field taking around 5 sec. solr is query > just > >> > and > >> > > of two terms with fl as 7 fields > >> > > >> > > fileId:"file unique id" AND command_text:(system login) > >> > > >> > While not an impressive response time, it might just be that your > >> hardware > >> > is not enough to handle that amount of documents. The usual culprit is > >> IO > >> > speed, so chances are you have a system with spinning drives and not > >> enough > >> > RAM: Switch to SSD and/or add more RAM. > >> > > >> > To give better advice, we need more information. > >> > > >> > * How large are your 3 shards in bytes? > >> > * What storage system do you use (local SSD, local spinning drives, > >> remote > >> > storage...)? > >> > * How much physical memory does your system have? > >> > * How much memory is free for disk cache? > >> > * How many concurrent queries do you issue? > >> > * Do you update while you search? > >> > * What does a full query (rows, faceting, grouping, highlighting, > >> > everything) look like? > >> > * How many documents does a typical query match (hitcount)? > >> > > >> > - Toke Eskildsen > >> > > >> > > > > >
Re: Solr Queries are very slow - Suggestions needed
HI Toke and Jack, Please find the details below. * How large are your 3 shards in bytes? (total index across replicas) -- *146G. i am using CDH (cloudera), not sure how to check the index size of each collection on each shard* * What storage system do you use (local SSD, local spinning drives, remote storage...)? *Local (hdfs) spinning drives* * How much physical memory does your system have? *we have 15 data nodes. multiple services installed on each data node (252 GB RAM for each data node). 25 gb RAM allocated for solr service.* * How much memory is free for disk cache? *i could not find.* * How many concurrent queries do you issue? *very less. i dont see any concurrent queries to this file_collection for now.* * Do you update while you search? *Yes.. its very less.* * What does a full query (rows, faceting, grouping, highlighting, everything) look like? *for the file_collection, rows - 100, highlights = false, no facets, expand = false.* * How many documents does a typical query match (hitcount)? *it varies with each file. i have sort on int field to order commands in the query.* we have two sets of collections on solr cluster ( 17 data nodes) 1. main_collection - collection created per year. each collection uses 8 shards 2 replicas ex: main_collection_2016, main_collection_2015 etc 2. file_collection (where files having commands are indexed) - collection created per 2 years. it uses 3 shards and 2 replicas. ex : file_collection_2014, file_collection_2016 The slowness is happening for file_collection. though it has 3 shards, documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M docs , shard3 is empty. main_collection is looks good. please let me know if you need any additional details. Regards, Anil On 13 March 2016 at 21:48, Anilwrote: > Thanks Toke and Jack. > > Jack, > > Yes. it is 480 million :) > > I will share the additional details soon. thanks. > > > Regards, > Anil > > > > > > On 13 March 2016 at 21:06, Jack Krupansky > wrote: > >> (We should have a wiki/doc page for the "usual list of suspects" when >> queries are/appear slow, rather than need to repeat the same mantra(s) for >> every inquiry on this topic.) >> >> >> -- Jack Krupansky >> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen >> wrote: >> >> > Anil wrote: >> > > i have indexed a data (commands from files) with 10 fields and 3 of >> them >> > is >> > > text fields. collection is created with 3 shards and 2 replicas. I >> have >> > > used document routing as well. >> > >> > > Currently collection holds 47,80,01,405 records. >> > >> > ...480 million, right? Funny digit grouping in India. >> > >> > > text search against text field taking around 5 sec. solr is query just >> > and >> > > of two terms with fl as 7 fields >> > >> > > fileId:"file unique id" AND command_text:(system login) >> > >> > While not an impressive response time, it might just be that your >> hardware >> > is not enough to handle that amount of documents. The usual culprit is >> IO >> > speed, so chances are you have a system with spinning drives and not >> enough >> > RAM: Switch to SSD and/or add more RAM. >> > >> > To give better advice, we need more information. >> > >> > * How large are your 3 shards in bytes? >> > * What storage system do you use (local SSD, local spinning drives, >> remote >> > storage...)? >> > * How much physical memory does your system have? >> > * How much memory is free for disk cache? >> > * How many concurrent queries do you issue? >> > * Do you update while you search? >> > * What does a full query (rows, faceting, grouping, highlighting, >> > everything) look like? >> > * How many documents does a typical query match (hitcount)? >> > >> > - Toke Eskildsen >> > >> > >
Re: Solr Queries are very slow - Suggestions needed
HI Shawn, Jack and Eric, Thank you very much. Regards, Anil On 14 March 2016 at 02:55, Shawn Heiseywrote: > On 3/13/2016 9:36 AM, Jack Krupansky wrote: > > (We should have a wiki/doc page for the "usual list of suspects" when > > queries are/appear slow, rather than need to repeat the same mantra(s) > for > > every inquiry on this topic.) > > There's this page, with the disclaimer that I wrote almost all of it: > > https://wiki.apache.org/solr/SolrPerformanceProblems > > It emphasizes RAM quite a bit, but when there are hundreds of millions > of documents, that's usually the problem. I've just added some info > about high query rates. > > Thanks, > Shawn > >
Re: Solr Queries are very slow - Suggestions needed
On 3/13/2016 9:36 AM, Jack Krupansky wrote: > (We should have a wiki/doc page for the "usual list of suspects" when > queries are/appear slow, rather than need to repeat the same mantra(s) for > every inquiry on this topic.) There's this page, with the disclaimer that I wrote almost all of it: https://wiki.apache.org/solr/SolrPerformanceProblems It emphasizes RAM quite a bit, but when there are hundreds of millions of documents, that's usually the problem. I've just added some info about high query rates. Thanks, Shawn
Re: Solr Queries are very slow - Suggestions needed
Yeah, there's some good material there, but probably still too inaccessible for the average "help, my queries are slow" inquiry we get so frequently on this list. Another useful page is: https://wiki.apache.org/solr/SolrPerformanceProblems -- Jack Krupansky On Sun, Mar 13, 2016 at 2:58 PM, Erick Ericksonwrote: > Jack: > https://wiki.apache.org/solr/SolrPerformanceFactors > and > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > are already there, we can add to them > > Best, > Erick > > On Sun, Mar 13, 2016 at 9:18 AM, Anil wrote: > > Thanks Toke and Jack. > > > > Jack, > > > > Yes. it is 480 million :) > > > > I will share the additional details soon. thanks. > > > > > > Regards, > > Anil > > > > > > > > > > > > On 13 March 2016 at 21:06, Jack Krupansky > wrote: > > > >> (We should have a wiki/doc page for the "usual list of suspects" when > >> queries are/appear slow, rather than need to repeat the same mantra(s) > for > >> every inquiry on this topic.) > >> > >> > >> -- Jack Krupansky > >> > >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen < > t...@statsbiblioteket.dk> > >> wrote: > >> > >> > Anil wrote: > >> > > i have indexed a data (commands from files) with 10 fields and 3 of > >> them > >> > is > >> > > text fields. collection is created with 3 shards and 2 replicas. I > have > >> > > used document routing as well. > >> > > >> > > Currently collection holds 47,80,01,405 records. > >> > > >> > ...480 million, right? Funny digit grouping in India. > >> > > >> > > text search against text field taking around 5 sec. solr is query > just > >> > and > >> > > of two terms with fl as 7 fields > >> > > >> > > fileId:"file unique id" AND command_text:(system login) > >> > > >> > While not an impressive response time, it might just be that your > >> hardware > >> > is not enough to handle that amount of documents. The usual culprit > is IO > >> > speed, so chances are you have a system with spinning drives and not > >> enough > >> > RAM: Switch to SSD and/or add more RAM. > >> > > >> > To give better advice, we need more information. > >> > > >> > * How large are your 3 shards in bytes? > >> > * What storage system do you use (local SSD, local spinning drives, > >> remote > >> > storage...)? > >> > * How much physical memory does your system have? > >> > * How much memory is free for disk cache? > >> > * How many concurrent queries do you issue? > >> > * Do you update while you search? > >> > * What does a full query (rows, faceting, grouping, highlighting, > >> > everything) look like? > >> > * How many documents does a typical query match (hitcount)? > >> > > >> > - Toke Eskildsen > >> > > >> >
Re: Solr Queries are very slow - Suggestions needed
Jack: https://wiki.apache.org/solr/SolrPerformanceFactors and http://wiki.apache.org/lucene-java/ImproveSearchingSpeed are already there, we can add to them Best, Erick On Sun, Mar 13, 2016 at 9:18 AM, Anilwrote: > Thanks Toke and Jack. > > Jack, > > Yes. it is 480 million :) > > I will share the additional details soon. thanks. > > > Regards, > Anil > > > > > > On 13 March 2016 at 21:06, Jack Krupansky wrote: > >> (We should have a wiki/doc page for the "usual list of suspects" when >> queries are/appear slow, rather than need to repeat the same mantra(s) for >> every inquiry on this topic.) >> >> >> -- Jack Krupansky >> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen >> wrote: >> >> > Anil wrote: >> > > i have indexed a data (commands from files) with 10 fields and 3 of >> them >> > is >> > > text fields. collection is created with 3 shards and 2 replicas. I have >> > > used document routing as well. >> > >> > > Currently collection holds 47,80,01,405 records. >> > >> > ...480 million, right? Funny digit grouping in India. >> > >> > > text search against text field taking around 5 sec. solr is query just >> > and >> > > of two terms with fl as 7 fields >> > >> > > fileId:"file unique id" AND command_text:(system login) >> > >> > While not an impressive response time, it might just be that your >> hardware >> > is not enough to handle that amount of documents. The usual culprit is IO >> > speed, so chances are you have a system with spinning drives and not >> enough >> > RAM: Switch to SSD and/or add more RAM. >> > >> > To give better advice, we need more information. >> > >> > * How large are your 3 shards in bytes? >> > * What storage system do you use (local SSD, local spinning drives, >> remote >> > storage...)? >> > * How much physical memory does your system have? >> > * How much memory is free for disk cache? >> > * How many concurrent queries do you issue? >> > * Do you update while you search? >> > * What does a full query (rows, faceting, grouping, highlighting, >> > everything) look like? >> > * How many documents does a typical query match (hitcount)? >> > >> > - Toke Eskildsen >> > >>
Re: Solr Queries are very slow - Suggestions needed
Thanks Toke and Jack. Jack, Yes. it is 480 million :) I will share the additional details soon. thanks. Regards, Anil On 13 March 2016 at 21:06, Jack Krupanskywrote: > (We should have a wiki/doc page for the "usual list of suspects" when > queries are/appear slow, rather than need to repeat the same mantra(s) for > every inquiry on this topic.) > > > -- Jack Krupansky > > On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen > wrote: > > > Anil wrote: > > > i have indexed a data (commands from files) with 10 fields and 3 of > them > > is > > > text fields. collection is created with 3 shards and 2 replicas. I have > > > used document routing as well. > > > > > Currently collection holds 47,80,01,405 records. > > > > ...480 million, right? Funny digit grouping in India. > > > > > text search against text field taking around 5 sec. solr is query just > > and > > > of two terms with fl as 7 fields > > > > > fileId:"file unique id" AND command_text:(system login) > > > > While not an impressive response time, it might just be that your > hardware > > is not enough to handle that amount of documents. The usual culprit is IO > > speed, so chances are you have a system with spinning drives and not > enough > > RAM: Switch to SSD and/or add more RAM. > > > > To give better advice, we need more information. > > > > * How large are your 3 shards in bytes? > > * What storage system do you use (local SSD, local spinning drives, > remote > > storage...)? > > * How much physical memory does your system have? > > * How much memory is free for disk cache? > > * How many concurrent queries do you issue? > > * Do you update while you search? > > * What does a full query (rows, faceting, grouping, highlighting, > > everything) look like? > > * How many documents does a typical query match (hitcount)? > > > > - Toke Eskildsen > > >
Re: Solr Queries are very slow - Suggestions needed
(We should have a wiki/doc page for the "usual list of suspects" when queries are/appear slow, rather than need to repeat the same mantra(s) for every inquiry on this topic.) -- Jack Krupansky On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsenwrote: > Anil wrote: > > i have indexed a data (commands from files) with 10 fields and 3 of them > is > > text fields. collection is created with 3 shards and 2 replicas. I have > > used document routing as well. > > > Currently collection holds 47,80,01,405 records. > > ...480 million, right? Funny digit grouping in India. > > > text search against text field taking around 5 sec. solr is query just > and > > of two terms with fl as 7 fields > > > fileId:"file unique id" AND command_text:(system login) > > While not an impressive response time, it might just be that your hardware > is not enough to handle that amount of documents. The usual culprit is IO > speed, so chances are you have a system with spinning drives and not enough > RAM: Switch to SSD and/or add more RAM. > > To give better advice, we need more information. > > * How large are your 3 shards in bytes? > * What storage system do you use (local SSD, local spinning drives, remote > storage...)? > * How much physical memory does your system have? > * How much memory is free for disk cache? > * How many concurrent queries do you issue? > * Do you update while you search? > * What does a full query (rows, faceting, grouping, highlighting, > everything) look like? > * How many documents does a typical query match (hitcount)? > > - Toke Eskildsen >
Re: Solr Queries are very slow - Suggestions needed
Anilwrote: > i have indexed a data (commands from files) with 10 fields and 3 of them is > text fields. collection is created with 3 shards and 2 replicas. I have > used document routing as well. > Currently collection holds 47,80,01,405 records. ...480 million, right? Funny digit grouping in India. > text search against text field taking around 5 sec. solr is query just and > of two terms with fl as 7 fields > fileId:"file unique id" AND command_text:(system login) While not an impressive response time, it might just be that your hardware is not enough to handle that amount of documents. The usual culprit is IO speed, so chances are you have a system with spinning drives and not enough RAM: Switch to SSD and/or add more RAM. To give better advice, we need more information. * How large are your 3 shards in bytes? * What storage system do you use (local SSD, local spinning drives, remote storage...)? * How much physical memory does your system have? * How much memory is free for disk cache? * How many concurrent queries do you issue? * Do you update while you search? * What does a full query (rows, faceting, grouping, highlighting, everything) look like? * How many documents does a typical query match (hitcount)? - Toke Eskildsen
Re: Solr Queries
Hi Abhijeet, On Mon, Aug 22, 2011 at 3:09 PM, abhijit bashetti abhijitbashe...@gmail.com wrote: 1. Can I update a specific field while re-indexing? Solr doesn't support updating specific fields. You must always create a complete document with values for all fields while indexing. If you keep the same value for the unique key field, the new doc will replace the one in the index. 2. what are the ways to improve the performance of Indexing? See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed The above page is for Lucene users but is useful for Solr users as well. 3. What should be ideal system configuration for solr indexing server? This is difficult to answer. It depends on your particular use-case. -- Regards, Shalin Shekhar Mangar.
Re: Solr Queries
Hi, Suppose i have a content field of type text. an example on content field is as shown below: After frustrated waiting period to get my credit card from the ICICI Bank, today I decided to write them a online petition stating my problem... Below is the unedited version of letter I sent to ICICI... 1. Can i use promixity search for 2 phrases frustrated waiting and credit card?? (i wanted to perform a search checking if frustrated waiting and credit card are within 10 words using proximity search.) where frustrated waiting and credit card are exact phrases (i.e. can i search on it as a whole word.. n not 2 different words in different parts in a document) does solr support this kinda operation. if so how do we structure our query. or could you gimme an example? Thanks Raakhi Khatwani. Similar discussion was: http://old.nabble.com/Nested-phrases-with-proximity-in-Solr-td26012747.html with surround your query would be : (frustrated w waiting) 10w (credit w card) Hope this helps.
Re: Solr Queries
On Nov 12, 2009, at 8:55 AM, Rakhi Khatwani wrote: Hi, I am using solr 1.3 and i hv inserted some data in my comment field. for example: for document1: str name=comment The iPhone 3GS finally adds common cell phone features like multimedia messaging, video recording, and voice dialing. It runs faster; its promised battery life is longer; and the multimedia quality continues to shine. The iPhone 3GS' call quality shows no improvements and the 3G signal reception remains uneven. We still don't get Flash Lite, USB transfer and storage, or multitasking. /str for document2: str name=comment Sony Ericsson c510 has 3.2MP cybershot camera with smile detectino. Amazing phone, faster than Sony Ericsson w580iSony Ericcsion w580i camera is only 2MP with no autofocus and smile detection. it doesnot even have a flash leading to poor quality pictures /str A] now when i apply the following queries, i get 0 hits: 1.comment:iph*e 2.comment:iph?ne What field type are you using? This is in your schema.xml B] Can i apply range queries on part of the content? C] Can i apply more the one wildcard in a query?? for example comment:ip*h* (this command works but its equivalent to just using 1ipho*) Yes. D] for fuzzy queries: content:iphone~0.7 returns both the documents. content:iphone~0.8 returns no documents (similarly for 0.9). The fuzz factor there incorporates the edit distance. I gather the first Sony doc has a match on phone and the score is between 0.7 and 0.8. You can add debugQuery=true to see the explains. However if i change it to iPhone, content:iPhone~0.7 returns 0 documents content:iPhone~0.5 returns both the documents. Is fuzzy search case sensitive? even if it is, why am i not able to retrieve unexpected results. Again, this all comes back to how you analyze the documents based on what Field Type you are using? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr Queries
Hi, Sorry i forgot to mention that comment field is a text field. Regards, Raakhi On Thu, Nov 12, 2009 at 8:05 PM, Grant Ingersoll gsing...@apache.orgwrote: On Nov 12, 2009, at 8:55 AM, Rakhi Khatwani wrote: Hi, I am using solr 1.3 and i hv inserted some data in my comment field. for example: for document1: str name=comment The iPhone 3GS finally adds common cell phone features like multimedia messaging, video recording, and voice dialing. It runs faster; its promised battery life is longer; and the multimedia quality continues to shine. The iPhone 3GS' call quality shows no improvements and the 3G signal reception remains uneven. We still don't get Flash Lite, USB transfer and storage, or multitasking. /str for document2: str name=comment Sony Ericsson c510 has 3.2MP cybershot camera with smile detectino. Amazing phone, faster than Sony Ericsson w580iSony Ericcsion w580i camera is only 2MP with no autofocus and smile detection. it doesnot even have a flash leading to poor quality pictures /str A] now when i apply the following queries, i get 0 hits: 1.comment:iph*e 2.comment:iph?ne What field type are you using? This is in your schema.xml B] Can i apply range queries on part of the content? C] Can i apply more the one wildcard in a query?? for example comment:ip*h* (this command works but its equivalent to just using 1ipho*) Yes. D] for fuzzy queries: content:iphone~0.7 returns both the documents. content:iphone~0.8 returns no documents (similarly for 0.9). The fuzz factor there incorporates the edit distance. I gather the first Sony doc has a match on phone and the score is between 0.7 and 0.8. You can add debugQuery=true to see the explains. However if i change it to iPhone, content:iPhone~0.7 returns 0 documents content:iPhone~0.5 returns both the documents. Is fuzzy search case sensitive? even if it is, why am i not able to retrieve unexpected results. Again, this all comes back to how you analyze the documents based on what Field Type you are using? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search