Re: Performance troubles with solr
Thank you all for your fast replies, Changing photo_id:* to boolean has_photo field via transformer, when importing data, *fixed my problems*; reducing query times to *30~ ms*. I'll try to optimize furthermore by your advices on filter query usage and int=>tint (will search it first) transform. On Thu, Sep 15, 2011 at 1:31 AM, Chris Hostetter wrote: > > : &q=photo_id:* AND gender:true AND country:MALAWI AND online:false > > photo_id:* does not mean what you probably think it means. you most > likely want photo_id:[* TO *] given your current schema, but i would > recommend adding a new "has_photo" boolean field and using that instead. > > thta alone should explain a big part of what those queries would be slow. > > you didn't describe how your "q" param varies in your test queries (just > your fq). I'm assuming "gender" and "online" can vary, and that you > sometimes don't use the "photo_id" clauses, and that the "country" clause > can vary, but that these clauses are always all mandatory. > > in which case i would suggest using "fq" for all of them individually, and > leaving your "q" param as "*:*" (unless you sometimes sort on the actual > solr score, in which case leave it as whatever part of hte queyr you > actually want to contribute to hte score) > > Lastly: I don't remember off the top of my head how "int" and "tinit" are > defined in the example solrconfig files, but you should consider your > usage of them carefully -- particularly with the precisionStep and which > fields you do range queries on. > > > > -Hoss >
Re: Performance troubles with solr
: &q=photo_id:* AND gender:true AND country:MALAWI AND online:false photo_id:* does not mean what you probably think it means. you most likely want photo_id:[* TO *] given your current schema, but i would recommend adding a new "has_photo" boolean field and using that instead. thta alone should explain a big part of what those queries would be slow. you didn't describe how your "q" param varies in your test queries (just your fq). I'm assuming "gender" and "online" can vary, and that you sometimes don't use the "photo_id" clauses, and that the "country" clause can vary, but that these clauses are always all mandatory. in which case i would suggest using "fq" for all of them individually, and leaving your "q" param as "*:*" (unless you sometimes sort on the actual solr score, in which case leave it as whatever part of hte queyr you actually want to contribute to hte score) Lastly: I don't remember off the top of my head how "int" and "tinit" are defined in the example solrconfig files, but you should consider your usage of them carefully -- particularly with the precisionStep and which fields you do range queries on. -Hoss
RE: Performance troubles with solr
How about this: Start with just what you had in your query (q) without the filter queries. Then add the fq's back in one at a time to see what is giving you problems -- leaving the birth filter query to the very last. Others on the list more experienced with filter queries might have a more direct answer... JRJ -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 11:31 AM To: solr-user@lucene.apache.org Subject: Re: Performance troubles with solr I tried moving age query from filter query to normal query but nothing really changed. But when i try to move everything into query itself ( removed all filter queries) QTimes slowed much more. I don't have problem with memory or cpu usage, my problem is query response times. When i send only one query respond times vary from 500 ms to 1000 ms (non cached) and its too much. When i send a set of random queries (10-20 queries per second) response times goes crayz ( 8 seconds to 60+ seconds). On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote: > I don't have enough experience with filter queries to advise well on when > to use fq vs. putting it in the query itself, but I do know that we are not > using filter queries, and with index sizes ranging from 7 Million to 27+ > Million we have not seen this kind of issue. > > Maybe keeping 16,384 filter queries around, particularly caching the ones > with "random age ranges" is eating your memory up -- so perhaps try moving > just that particular fq into q instead (since it is "random") and just cache > the ones where the number of "options" is limited? > > What happens if you try your test without the filter queries? What happens > if you put the additional criteria that are in your filter query into the > query itself? > > JRJ > > -Original Message- > From: Yusuf Karakaya [mailto:karakaya...@gmail.com] > Sent: Wednesday, September 14, 2011 9:54 AM > To: solr-user@lucene.apache.org > Subject: Re: Performance troubles with solr > > Thank you for your reply. > I tried to give most of the information i can but obviously i missed some. > 1. Just what does your "test script" do? Is it doing updates, or just > queries of the sort you mentioned below? > Test script only sends random queries. > 2. If the test script is doing updates, how are those updates being fed to > Solr? > There are no updates right now, as i failed on performance. > 3. What version of Solr are you running? > I'm using Solr 3.3.0 > 4. Why did you increase the default for jetty (around 384m) to 6000m, > particularly given your relatively modest number of documents (2,000,000). > I was trying everything before asking here. > 5. Machine characteristics, particularly operating system and physical > memory on the machine. > OS => Debian 6.0, Physcal Memory => 32 gb, CPU => 2x Intel Quad Core > > On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT >wrote: > > > I think folks are going to need a *lot* more information. Particularly > > > > 1. Just what does your "test script" do? Is it doing updates, or just > > queries of the sort you mentioned below? > > 2. If the test script is doing updates, how are those updates being fed > to > > Solr? > > 3. What version of Solr are you running? > > 4. Why did you increase the default for jetty (around 384m) to 6000m, > > particularly given your relatively modest number of documents > (2,000,000). > > 5. Machine characteristics, particularly operating system and physical > > memory on the machine. > > > > Please refer to http://wiki.apache.org/solr/UsingMailingLists for > > additional guidance in using the mailing list to get help. > > > > -Original Message- > > From: Yusuf Karakaya [mailto:karakaya...@gmail.com] > > Sent: Wednesday, September 14, 2011 9:19 AM > > To: solr-user@lucene.apache.org > > Subject: Performance troubles with solr > > > > Hi, i'm having performance troubles with solr. I don't know if i'm > > expection > > too much from solr or i missconfigured solr. > > When i run a single query its QTime is 500-1000~ ms (without any use of > > caches). > > When i run my test script (with use of caches) QTime increases > > exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases > > to > > %550~ > > > > My solr-start script: > > java -Duser.timezone=EET -Xmx6000m -jar ./start.jar > > > > 2,000,000~ documents , currently there aren't any commits but in future > > there will be 5,000~ updates/additions to documents every 3-5~ min via > >
Re: Performance troubles with solr
I tried moving age query from filter query to normal query but nothing really changed. But when i try to move everything into query itself ( removed all filter queries) QTimes slowed much more. I don't have problem with memory or cpu usage, my problem is query response times. When i send only one query respond times vary from 500 ms to 1000 ms (non cached) and its too much. When i send a set of random queries (10-20 queries per second) response times goes crayz ( 8 seconds to 60+ seconds). On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote: > I don't have enough experience with filter queries to advise well on when > to use fq vs. putting it in the query itself, but I do know that we are not > using filter queries, and with index sizes ranging from 7 Million to 27+ > Million we have not seen this kind of issue. > > Maybe keeping 16,384 filter queries around, particularly caching the ones > with "random age ranges" is eating your memory up -- so perhaps try moving > just that particular fq into q instead (since it is "random") and just cache > the ones where the number of "options" is limited? > > What happens if you try your test without the filter queries? What happens > if you put the additional criteria that are in your filter query into the > query itself? > > JRJ > > -Original Message- > From: Yusuf Karakaya [mailto:karakaya...@gmail.com] > Sent: Wednesday, September 14, 2011 9:54 AM > To: solr-user@lucene.apache.org > Subject: Re: Performance troubles with solr > > Thank you for your reply. > I tried to give most of the information i can but obviously i missed some. > 1. Just what does your "test script" do? Is it doing updates, or just > queries of the sort you mentioned below? > Test script only sends random queries. > 2. If the test script is doing updates, how are those updates being fed to > Solr? > There are no updates right now, as i failed on performance. > 3. What version of Solr are you running? > I'm using Solr 3.3.0 > 4. Why did you increase the default for jetty (around 384m) to 6000m, > particularly given your relatively modest number of documents (2,000,000). > I was trying everything before asking here. > 5. Machine characteristics, particularly operating system and physical > memory on the machine. > OS => Debian 6.0, Physcal Memory => 32 gb, CPU => 2x Intel Quad Core > > On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT >wrote: > > > I think folks are going to need a *lot* more information. Particularly > > > > 1. Just what does your "test script" do? Is it doing updates, or just > > queries of the sort you mentioned below? > > 2. If the test script is doing updates, how are those updates being fed > to > > Solr? > > 3. What version of Solr are you running? > > 4. Why did you increase the default for jetty (around 384m) to 6000m, > > particularly given your relatively modest number of documents > (2,000,000). > > 5. Machine characteristics, particularly operating system and physical > > memory on the machine. > > > > Please refer to http://wiki.apache.org/solr/UsingMailingLists for > > additional guidance in using the mailing list to get help. > > > > -Original Message- > > From: Yusuf Karakaya [mailto:karakaya...@gmail.com] > > Sent: Wednesday, September 14, 2011 9:19 AM > > To: solr-user@lucene.apache.org > > Subject: Performance troubles with solr > > > > Hi, i'm having performance troubles with solr. I don't know if i'm > > expection > > too much from solr or i missconfigured solr. > > When i run a single query its QTime is 500-1000~ ms (without any use of > > caches). > > When i run my test script (with use of caches) QTime increases > > exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases > > to > > %550~ > > > > My solr-start script: > > java -Duser.timezone=EET -Xmx6000m -jar ./start.jar > > > > 2,000,000~ documents , currently there aren't any commits but in future > > there will be 5,000~ updates/additions to documents every 3-5~ min via > > delta import. > > > > Search Query > > sort=userscore+desc > > &start=0 > > &q=photo_id:* AND gender:true AND country:MALAWI AND online:false > > &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) > > &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO > > NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) > > &fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) > > &rows=150 > > > > Schema > > > > required="true"/> > > > required="true"/> > > > > > > > > > > > > > > > > > > > > > > Cache Sizes & Lazy Load > > > > > autowarmCount="4096"/> > > > autowarmCount="4096"/> > > > autowarmCount="4096"/> > > true > > >
RE: Performance troubles with solr
I don't have enough experience with filter queries to advise well on when to use fq vs. putting it in the query itself, but I do know that we are not using filter queries, and with index sizes ranging from 7 Million to 27+ Million we have not seen this kind of issue. Maybe keeping 16,384 filter queries around, particularly caching the ones with "random age ranges" is eating your memory up -- so perhaps try moving just that particular fq into q instead (since it is "random") and just cache the ones where the number of "options" is limited? What happens if you try your test without the filter queries? What happens if you put the additional criteria that are in your filter query into the query itself? JRJ -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:54 AM To: solr-user@lucene.apache.org Subject: Re: Performance troubles with solr Thank you for your reply. I tried to give most of the information i can but obviously i missed some. 1. Just what does your "test script" do? Is it doing updates, or just queries of the sort you mentioned below? Test script only sends random queries. 2. If the test script is doing updates, how are those updates being fed to Solr? There are no updates right now, as i failed on performance. 3. What version of Solr are you running? I'm using Solr 3.3.0 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS => Debian 6.0, Physcal Memory => 32 gb, CPU => 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT wrote: > I think folks are going to need a *lot* more information. Particularly > > 1. Just what does your "test script" do? Is it doing updates, or just > queries of the sort you mentioned below? > 2. If the test script is doing updates, how are those updates being fed to > Solr? > 3. What version of Solr are you running? > 4. Why did you increase the default for jetty (around 384m) to 6000m, > particularly given your relatively modest number of documents (2,000,000). > 5. Machine characteristics, particularly operating system and physical > memory on the machine. > > Please refer to http://wiki.apache.org/solr/UsingMailingLists for > additional guidance in using the mailing list to get help. > > -Original Message- > From: Yusuf Karakaya [mailto:karakaya...@gmail.com] > Sent: Wednesday, September 14, 2011 9:19 AM > To: solr-user@lucene.apache.org > Subject: Performance troubles with solr > > Hi, i'm having performance troubles with solr. I don't know if i'm > expection > too much from solr or i missconfigured solr. > When i run a single query its QTime is 500-1000~ ms (without any use of > caches). > When i run my test script (with use of caches) QTime increases > exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases > to > %550~ > > My solr-start script: > java -Duser.timezone=EET -Xmx6000m -jar ./start.jar > > 2,000,000~ documents , currently there aren't any commits but in future > there will be 5,000~ updates/additions to documents every 3-5~ min via > delta import. > > Search Query > sort=userscore+desc > &start=0 > &q=photo_id:* AND gender:true AND country:MALAWI AND online:false > &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) > &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO > NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) > &fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) > &rows=150 > > Schema > > > required="true"/> > > > > > > > > > > > Cache Sizes & Lazy Load > > autowarmCount="4096"/> > autowarmCount="4096"/> > autowarmCount="4096"/> > true >
Re: Performance troubles with solr
Thank you for your reply. I tried to give most of the information i can but obviously i missed some. 1. Just what does your "test script" do? Is it doing updates, or just queries of the sort you mentioned below? Test script only sends random queries. 2. If the test script is doing updates, how are those updates being fed to Solr? There are no updates right now, as i failed on performance. 3. What version of Solr are you running? I'm using Solr 3.3.0 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS => Debian 6.0, Physcal Memory => 32 gb, CPU => 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT wrote: > I think folks are going to need a *lot* more information. Particularly > > 1. Just what does your "test script" do? Is it doing updates, or just > queries of the sort you mentioned below? > 2. If the test script is doing updates, how are those updates being fed to > Solr? > 3. What version of Solr are you running? > 4. Why did you increase the default for jetty (around 384m) to 6000m, > particularly given your relatively modest number of documents (2,000,000). > 5. Machine characteristics, particularly operating system and physical > memory on the machine. > > Please refer to http://wiki.apache.org/solr/UsingMailingLists for > additional guidance in using the mailing list to get help. > > -Original Message- > From: Yusuf Karakaya [mailto:karakaya...@gmail.com] > Sent: Wednesday, September 14, 2011 9:19 AM > To: solr-user@lucene.apache.org > Subject: Performance troubles with solr > > Hi, i'm having performance troubles with solr. I don't know if i'm > expection > too much from solr or i missconfigured solr. > When i run a single query its QTime is 500-1000~ ms (without any use of > caches). > When i run my test script (with use of caches) QTime increases > exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases > to > %550~ > > My solr-start script: > java -Duser.timezone=EET -Xmx6000m -jar ./start.jar > > 2,000,000~ documents , currently there aren't any commits but in future > there will be 5,000~ updates/additions to documents every 3-5~ min via > delta import. > > Search Query > sort=userscore+desc > &start=0 > &q=photo_id:* AND gender:true AND country:MALAWI AND online:false > &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) > &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO > NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) > &fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) > &rows=150 > > Schema > > > required="true"/> > > > > > > > > > > > Cache Sizes & Lazy Load > > autowarmCount="4096"/> > autowarmCount="4096"/> > autowarmCount="4096"/> > true >
RE: Performance troubles with solr
I think folks are going to need a *lot* more information. Particularly 1. Just what does your "test script" do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr are you running? 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). 5. Machine characteristics, particularly operating system and physical memory on the machine. Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional guidance in using the mailing list to get help. -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:19 AM To: solr-user@lucene.apache.org Subject: Performance troubles with solr Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc &start=0 &q=photo_id:* AND gender:true AND country:MALAWI AND online:false &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) &fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) &rows=150 Schema Cache Sizes & Lazy Load true
Performance troubles with solr
Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc &start=0 &q=photo_id:* AND gender:true AND country:MALAWI AND online:false &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) &fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) &rows=150 Schema Cache Sizes & Lazy Load true