Re: Performance troubles with solr

2011-09-15 Thread Yusuf Karakaya
Thank you all for your fast replies,
Changing photo_id:* to boolean has_photo field via transformer, when
importing data, *fixed my problems*; reducing query times to *30~ ms*.
I'll try to optimize furthermore by your advices on filter query usage and
int=>tint (will search it first) transform.


On Thu, Sep 15, 2011 at 1:31 AM, Chris Hostetter
wrote:

>
> : &q=photo_id:* AND gender:true AND country:MALAWI AND online:false
>
> photo_id:* does not mean what you probably think it means.  you most
> likely want photo_id:[* TO *] given your current schema, but i would
> recommend adding a new "has_photo" boolean field and using that instead.
>
> thta alone should explain a big part of what those queries would be slow.
>
> you didn't describe how your "q" param varies in your test queries (just
> your fq).  I'm assuming "gender" and "online" can vary, and that you
> sometimes don't use the "photo_id" clauses, and that the "country" clause
> can vary, but that these clauses are always all mandatory.
>
> in which case i would suggest using "fq" for all of them individually, and
> leaving your "q" param as "*:*" (unless you sometimes sort on the actual
> solr score, in which case leave it as whatever part of hte queyr you
> actually want to contribute to hte score)
>
> Lastly: I don't remember off the top of my head how "int" and "tinit" are
> defined in the example solrconfig files, but you should consider your
> usage of them carefully -- particularly with the precisionStep and which
> fields you do range queries on.
>
>
>
> -Hoss
>


Re: Performance troubles with solr

2011-09-14 Thread Chris Hostetter

: &q=photo_id:* AND gender:true AND country:MALAWI AND online:false

photo_id:* does not mean what you probably think it means.  you most 
likely want photo_id:[* TO *] given your current schema, but i would 
recommend adding a new "has_photo" boolean field and using that instead.

thta alone should explain a big part of what those queries would be slow.

you didn't describe how your "q" param varies in your test queries (just 
your fq).  I'm assuming "gender" and "online" can vary, and that you 
sometimes don't use the "photo_id" clauses, and that the "country" clause 
can vary, but that these clauses are always all mandatory.

in which case i would suggest using "fq" for all of them individually, and 
leaving your "q" param as "*:*" (unless you sometimes sort on the actual 
solr score, in which case leave it as whatever part of hte queyr you 
actually want to contribute to hte score)

Lastly: I don't remember off the top of my head how "int" and "tinit" are 
defined in the example solrconfig files, but you should consider your 
usage of them carefully -- particularly with the precisionStep and which 
fields you do range queries on.



-Hoss


RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
How about this:  Start with just what you had in your query (q) without the 
filter queries.  Then add the fq's back in one at a time to see what is giving 
you problems -- leaving the birth filter query to the very last.

Others on the list more experienced with filter queries might have a more 
direct answer...

JRJ


-Original Message-
From: Yusuf Karakaya [mailto:karakaya...@gmail.com] 
Sent: Wednesday, September 14, 2011 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Performance troubles with solr

I  tried moving age query from filter query to normal query but nothing
really changed.
But when i try to move everything into query itself ( removed all filter
queries) QTimes slowed much more.
I don't have problem with memory or cpu usage, my problem is query response
times.
When i send only one query respond times vary from 500 ms to 1000 ms (non
cached) and its too much.
When i send a set of random queries (10-20 queries per second) response
times goes crayz ( 8 seconds to 60+ seconds).

On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote:

> I don't have enough experience with filter queries to advise well on when
> to use fq vs. putting it in the query itself, but I do know that we are not
> using filter queries, and with index sizes ranging from 7 Million to 27+
> Million we have not seen this kind of issue.
>
> Maybe keeping 16,384 filter queries around, particularly caching the ones
> with "random age ranges" is eating your memory up -- so perhaps try moving
> just that particular fq into q instead (since it is "random") and just cache
> the ones where the number of "options" is limited?
>
> What happens if you try your test without the filter queries?  What happens
> if you put the additional criteria that are in your filter query into the
> query itself?
>
> JRJ
>
> -Original Message-
> From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
> Sent: Wednesday, September 14, 2011 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Performance troubles with solr
>
> Thank you for your reply.
> I tried to give most of the information i can but obviously i missed some.
> 1.  Just what does your "test script" do?   Is it doing updates, or just
> queries of the sort you mentioned below?
> Test script only sends random queries.
> 2.  If the test script is doing updates, how are those updates being fed to
> Solr?
> There are no updates right now, as i failed on performance.
> 3.  What version of Solr are you running?
> I'm using Solr 3.3.0
> 4.  Why did you increase the default for jetty (around 384m) to 6000m,
> particularly given your relatively modest number of documents (2,000,000).
> I was trying everything before asking here.
> 5.  Machine characteristics, particularly operating system and physical
> memory on the machine.
> OS => Debian 6.0,  Physcal Memory => 32 gb, CPU => 2x Intel Quad Core
>
> On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT  >wrote:
>
> > I think folks are going to need a *lot* more information.  Particularly
> >
> > 1.  Just what does your "test script" do?   Is it doing updates, or just
> > queries of the sort you mentioned below?
> > 2.  If the test script is doing updates, how are those updates being fed
> to
> > Solr?
> > 3.  What version of Solr are you running?
> > 4.  Why did you increase the default for jetty (around 384m) to 6000m,
> > particularly given your relatively modest number of documents
> (2,000,000).
> > 5.  Machine characteristics, particularly operating system and physical
> > memory on the machine.
> >
> > Please refer to http://wiki.apache.org/solr/UsingMailingLists for
> > additional guidance in using the mailing list to get help.
> >
> > -Original Message-
> > From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
> > Sent: Wednesday, September 14, 2011 9:19 AM
> > To: solr-user@lucene.apache.org
> > Subject: Performance troubles with solr
> >
> > Hi, i'm having performance troubles with solr. I don't know if i'm
> > expection
> > too much from solr or i missconfigured solr.
> > When i run a single query its QTime is 500-1000~ ms (without any use of
> > caches).
> > When i run my test script (with use of caches) QTime increases
> > exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
> > to
> > %550~
> >
> > My solr-start script:
> > java -Duser.timezone=EET -Xmx6000m -jar ./start.jar
> >
> > 2,000,000~ documents ,  currently there aren't any commits but in future
> > there will be 5,000~ updates/additions to documents every 3-5~   min via
> > 

Re: Performance troubles with solr

2011-09-14 Thread Yusuf Karakaya
I  tried moving age query from filter query to normal query but nothing
really changed.
But when i try to move everything into query itself ( removed all filter
queries) QTimes slowed much more.
I don't have problem with memory or cpu usage, my problem is query response
times.
When i send only one query respond times vary from 500 ms to 1000 ms (non
cached) and its too much.
When i send a set of random queries (10-20 queries per second) response
times goes crayz ( 8 seconds to 60+ seconds).

On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote:

> I don't have enough experience with filter queries to advise well on when
> to use fq vs. putting it in the query itself, but I do know that we are not
> using filter queries, and with index sizes ranging from 7 Million to 27+
> Million we have not seen this kind of issue.
>
> Maybe keeping 16,384 filter queries around, particularly caching the ones
> with "random age ranges" is eating your memory up -- so perhaps try moving
> just that particular fq into q instead (since it is "random") and just cache
> the ones where the number of "options" is limited?
>
> What happens if you try your test without the filter queries?  What happens
> if you put the additional criteria that are in your filter query into the
> query itself?
>
> JRJ
>
> -Original Message-
> From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
> Sent: Wednesday, September 14, 2011 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Performance troubles with solr
>
> Thank you for your reply.
> I tried to give most of the information i can but obviously i missed some.
> 1.  Just what does your "test script" do?   Is it doing updates, or just
> queries of the sort you mentioned below?
> Test script only sends random queries.
> 2.  If the test script is doing updates, how are those updates being fed to
> Solr?
> There are no updates right now, as i failed on performance.
> 3.  What version of Solr are you running?
> I'm using Solr 3.3.0
> 4.  Why did you increase the default for jetty (around 384m) to 6000m,
> particularly given your relatively modest number of documents (2,000,000).
> I was trying everything before asking here.
> 5.  Machine characteristics, particularly operating system and physical
> memory on the machine.
> OS => Debian 6.0,  Physcal Memory => 32 gb, CPU => 2x Intel Quad Core
>
> On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT  >wrote:
>
> > I think folks are going to need a *lot* more information.  Particularly
> >
> > 1.  Just what does your "test script" do?   Is it doing updates, or just
> > queries of the sort you mentioned below?
> > 2.  If the test script is doing updates, how are those updates being fed
> to
> > Solr?
> > 3.  What version of Solr are you running?
> > 4.  Why did you increase the default for jetty (around 384m) to 6000m,
> > particularly given your relatively modest number of documents
> (2,000,000).
> > 5.  Machine characteristics, particularly operating system and physical
> > memory on the machine.
> >
> > Please refer to http://wiki.apache.org/solr/UsingMailingLists for
> > additional guidance in using the mailing list to get help.
> >
> > -Original Message-
> > From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
> > Sent: Wednesday, September 14, 2011 9:19 AM
> > To: solr-user@lucene.apache.org
> > Subject: Performance troubles with solr
> >
> > Hi, i'm having performance troubles with solr. I don't know if i'm
> > expection
> > too much from solr or i missconfigured solr.
> > When i run a single query its QTime is 500-1000~ ms (without any use of
> > caches).
> > When i run my test script (with use of caches) QTime increases
> > exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
> > to
> > %550~
> >
> > My solr-start script:
> > java -Duser.timezone=EET -Xmx6000m -jar ./start.jar
> >
> > 2,000,000~ documents ,  currently there aren't any commits but in future
> > there will be 5,000~ updates/additions to documents every 3-5~   min via
> > delta import.
> >
> > Search Query
> > sort=userscore+desc
> > &start=0
> > &q=photo_id:* AND gender:true AND country:MALAWI AND online:false
> > &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
> > &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
> > NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
> > &fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
> > &rows=150
> >
> > Schema
> >
> >  required="true"/>
> >  > required="true"/>
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > Cache Sizes & Lazy Load
> >
> >  > autowarmCount="4096"/>
> >  > autowarmCount="4096"/>
> >  > autowarmCount="4096"/>
> > true
> >
>


RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
I don't have enough experience with filter queries to advise well on when to 
use fq vs. putting it in the query itself, but I do know that we are not using 
filter queries, and with index sizes ranging from 7 Million to 27+ Million we 
have not seen this kind of issue.

Maybe keeping 16,384 filter queries around, particularly caching the ones with 
"random age ranges" is eating your memory up -- so perhaps try moving just that 
particular fq into q instead (since it is "random") and just cache the ones 
where the number of "options" is limited?

What happens if you try your test without the filter queries?  What happens if 
you put the additional criteria that are in your filter query into the query 
itself?

JRJ

-Original Message-
From: Yusuf Karakaya [mailto:karakaya...@gmail.com] 
Sent: Wednesday, September 14, 2011 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Performance troubles with solr

Thank you for your reply.
I tried to give most of the information i can but obviously i missed some.
1.  Just what does your "test script" do?   Is it doing updates, or just
queries of the sort you mentioned below?
Test script only sends random queries.
2.  If the test script is doing updates, how are those updates being fed to
Solr?
There are no updates right now, as i failed on performance.
3.  What version of Solr are you running?
I'm using Solr 3.3.0
4.  Why did you increase the default for jetty (around 384m) to 6000m,
particularly given your relatively modest number of documents (2,000,000).
I was trying everything before asking here.
5.  Machine characteristics, particularly operating system and physical
memory on the machine.
OS => Debian 6.0,  Physcal Memory => 32 gb, CPU => 2x Intel Quad Core

On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT wrote:

> I think folks are going to need a *lot* more information.  Particularly
>
> 1.  Just what does your "test script" do?   Is it doing updates, or just
> queries of the sort you mentioned below?
> 2.  If the test script is doing updates, how are those updates being fed to
> Solr?
> 3.  What version of Solr are you running?
> 4.  Why did you increase the default for jetty (around 384m) to 6000m,
> particularly given your relatively modest number of documents (2,000,000).
> 5.  Machine characteristics, particularly operating system and physical
> memory on the machine.
>
> Please refer to http://wiki.apache.org/solr/UsingMailingLists for
> additional guidance in using the mailing list to get help.
>
> -Original Message-
> From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
> Sent: Wednesday, September 14, 2011 9:19 AM
> To: solr-user@lucene.apache.org
> Subject: Performance troubles with solr
>
> Hi, i'm having performance troubles with solr. I don't know if i'm
> expection
> too much from solr or i missconfigured solr.
> When i run a single query its QTime is 500-1000~ ms (without any use of
> caches).
> When i run my test script (with use of caches) QTime increases
> exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
> to
> %550~
>
> My solr-start script:
> java -Duser.timezone=EET -Xmx6000m -jar ./start.jar
>
> 2,000,000~ documents ,  currently there aren't any commits but in future
> there will be 5,000~ updates/additions to documents every 3-5~   min via
> delta import.
>
> Search Query
> sort=userscore+desc
> &start=0
> &q=photo_id:* AND gender:true AND country:MALAWI AND online:false
> &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
> &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
> NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
> &fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
> &rows=150
>
> Schema
>
> 
>  required="true"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> Cache Sizes & Lazy Load
>
>  autowarmCount="4096"/>
>  autowarmCount="4096"/>
>  autowarmCount="4096"/>
> true
>


Re: Performance troubles with solr

2011-09-14 Thread Yusuf Karakaya
Thank you for your reply.
I tried to give most of the information i can but obviously i missed some.
1.  Just what does your "test script" do?   Is it doing updates, or just
queries of the sort you mentioned below?
Test script only sends random queries.
2.  If the test script is doing updates, how are those updates being fed to
Solr?
There are no updates right now, as i failed on performance.
3.  What version of Solr are you running?
I'm using Solr 3.3.0
4.  Why did you increase the default for jetty (around 384m) to 6000m,
particularly given your relatively modest number of documents (2,000,000).
I was trying everything before asking here.
5.  Machine characteristics, particularly operating system and physical
memory on the machine.
OS => Debian 6.0,  Physcal Memory => 32 gb, CPU => 2x Intel Quad Core

On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT wrote:

> I think folks are going to need a *lot* more information.  Particularly
>
> 1.  Just what does your "test script" do?   Is it doing updates, or just
> queries of the sort you mentioned below?
> 2.  If the test script is doing updates, how are those updates being fed to
> Solr?
> 3.  What version of Solr are you running?
> 4.  Why did you increase the default for jetty (around 384m) to 6000m,
> particularly given your relatively modest number of documents (2,000,000).
> 5.  Machine characteristics, particularly operating system and physical
> memory on the machine.
>
> Please refer to http://wiki.apache.org/solr/UsingMailingLists for
> additional guidance in using the mailing list to get help.
>
> -Original Message-
> From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
> Sent: Wednesday, September 14, 2011 9:19 AM
> To: solr-user@lucene.apache.org
> Subject: Performance troubles with solr
>
> Hi, i'm having performance troubles with solr. I don't know if i'm
> expection
> too much from solr or i missconfigured solr.
> When i run a single query its QTime is 500-1000~ ms (without any use of
> caches).
> When i run my test script (with use of caches) QTime increases
> exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
> to
> %550~
>
> My solr-start script:
> java -Duser.timezone=EET -Xmx6000m -jar ./start.jar
>
> 2,000,000~ documents ,  currently there aren't any commits but in future
> there will be 5,000~ updates/additions to documents every 3-5~   min via
> delta import.
>
> Search Query
> sort=userscore+desc
> &start=0
> &q=photo_id:* AND gender:true AND country:MALAWI AND online:false
> &fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
> &fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
> NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
> &fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
> &rows=150
>
> Schema
>
> 
>  required="true"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> Cache Sizes & Lazy Load
>
>  autowarmCount="4096"/>
>  autowarmCount="4096"/>
>  autowarmCount="4096"/>
> true
>


RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
I think folks are going to need a *lot* more information.  Particularly

1.  Just what does your "test script" do?   Is it doing updates, or just 
queries of the sort you mentioned below?  
2.  If the test script is doing updates, how are those updates being fed to 
Solr?  
3.  What version of Solr are you running?
4.  Why did you increase the default for jetty (around 384m) to 6000m, 
particularly given your relatively modest number of documents (2,000,000).
5.  Machine characteristics, particularly operating system and physical memory 
on the machine.

Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional 
guidance in using the mailing list to get help.

-Original Message-
From: Yusuf Karakaya [mailto:karakaya...@gmail.com] 
Sent: Wednesday, September 14, 2011 9:19 AM
To: solr-user@lucene.apache.org
Subject: Performance troubles with solr

Hi, i'm having performance troubles with solr. I don't know if i'm expection
too much from solr or i missconfigured solr.
When i run a single query its QTime is 500-1000~ ms (without any use of
caches).
When i run my test script (with use of caches) QTime increases
exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases to
%550~

My solr-start script:
java -Duser.timezone=EET -Xmx6000m -jar ./start.jar

2,000,000~ documents ,  currently there aren't any commits but in future
there will be 5,000~ updates/additions to documents every 3-5~   min via
delta import.

Search Query
sort=userscore+desc
&start=0
&q=photo_id:* AND gender:true AND country:MALAWI AND online:false
&fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
&fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
&fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
&rows=150

Schema













Cache Sizes & Lazy Load




true


Performance troubles with solr

2011-09-14 Thread Yusuf Karakaya
Hi, i'm having performance troubles with solr. I don't know if i'm expection
too much from solr or i missconfigured solr.
When i run a single query its QTime is 500-1000~ ms (without any use of
caches).
When i run my test script (with use of caches) QTime increases
exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases to
%550~

My solr-start script:
java -Duser.timezone=EET -Xmx6000m -jar ./start.jar

2,000,000~ documents ,  currently there aren't any commits but in future
there will be 5,000~ updates/additions to documents every 3-5~   min via
delta import.

Search Query
sort=userscore+desc
&start=0
&q=photo_id:* AND gender:true AND country:MALAWI AND online:false
&fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
&fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
&fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
&rows=150

Schema













Cache Sizes & Lazy Load




true