Re: Performance optimization of Proximity/Wildcard searches

2011-02-07 Thread Otis Gospodnetic
Hi,


Yes, assuming you didn't change the index files, say by optimizing the index, 
the hot portions of the index should remain in the OS cache unless something 
else kicked them out.

Re other thread - I don't think I have those messages any more.

Otis
---
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Mon, February 7, 2011 2:49:44 AM
 Subject: Re: Performance optimization of Proximity/Wildcard searches
 
 Only couple of thousand documents are added daily so the old OS cache  should
 still be useful since old documents remain same, right?
 
 Also  can you please comment on my other thread related to Term  Vectors?
 Thanks!
 
 On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic  otis_gospodne...@yahoo.com
   wrote:
 
  Yes, OS cache mostly remains (obviously index files that are  no longer
  around
  are going to remain the OS cache for a while,  but will be useless and
  gradually
  replaced by new index  files).
  How long warmup takes is not relevant here, but what queries you  use to
  warm up
  the index and how much you auto-warm the  caches.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original  Message 
   From: Salman Akram salman.ak...@northbaysolutions.net
To: solr-user@lucene.apache.org
Sent: Sat, February 5, 2011 4:06:54 AM
   Subject: Re:  Performance optimization of Proximity/Wildcard searches
  
Correct me if I am wrong.
  
   Commit in index flushes  SOLR cache but of  course OS cache would still be
   useful? If a  an index is updated every hour  then a warm up that takes
   less
   than 5 mins should be more than enough,  right?
   
   On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic 
  otis_gospodne...@yahoo.com
  wrote:
  
Salman,

Warming up may be useful if your  caches are getting  decent hit ratios.
Plus, you
are warming  up  the OS cache when you warm up.
   
 Otis

 Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
Lucene ecosystem  search :: http://search-lucene.com/
   
   

- Original  Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
   To: solr-user@lucene.apache.org
   Sent: Fri, February 4, 2011 3:33:41 PM
  Subject: Re:  Performance optimization of Proximity/Wildcard  
searches

  I know so we are  not really using it for regular warm-ups (in any
   case
  index
 is updated on hourly basis). Just  tried  few times to compare
  results.
  The
 issue is I am not  even sure if warming up is  useful for such
   regular
   updates.


 
 On Fri, Feb 4, 2011  at 5:16 PM, Otis   Gospodnetic 
otis_gospodne...@yahoo.com
 wrote:

   Salman,
  
   I only skimmed your email, but wanted  to say that  this part
   sounds a
little
   suspicious:
  
 Our warm up script currently  executes  all distinct  queries in
  our
 logs
having  count  5. It was run  yesterday (with all the   indexing
 update
   every
 
  It sounds   like this will make  warmup take a long time,
  assuming
  you
  have
   more than a  handful distinct  queries in your logs.
  
  Otis

   Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
   Lucene ecosystem  search  :: http://search-lucene.com/
 
   
 
  -  Original  Message  
   From: Salman  Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
  Sent: Tue, January 25, 2011 6:32:48 AM
 Subject: Re: Performance  optimization of  Proximity/Wildcard
  searches
  
By warmed  index you  only mean warming the  SOLR cache or OS
  cache? As
I
  said
   our index is  updated every hour so I am  not sure how much SOLR
cache
   would
 be helpful but OS cache should still be  helpful, right?

   I  haven't  compared the results   with a proper script but from
  manual
testing
here  are  some of the observations.

   'Recent' queries which  are  in  cache of  course return
  immediately
(only
   if
 they are  exactly same - even  if they took 3-4 mins first
  time).   I
 will
  need
to test how  many recent  queries stay in   cache but still this
  would
 work
   only
   for very commonqueries.  User can run different queries and I
  want
 at
least
them to be at 'acceptable'  level  (5-10 secs) even if   not very
  fast.
  
 Our warm up script currently   executes all distinct  queries in
   our
logs
having count  5. It  was  run  yesterday

Re: Performance optimization of Proximity/Wildcard searches

2011-02-06 Thread Salman Akram
Only couple of thousand documents are added daily so the old OS cache should
still be useful since old documents remain same, right?

Also can you please comment on my other thread related to Term Vectors?
Thanks!

On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Yes, OS cache mostly remains (obviously index files that are no longer
 around
 are going to remain the OS cache for a while, but will be useless and
 gradually
 replaced by new index files).
 How long warmup takes is not relevant here, but what queries you use to
 warm up
 the index and how much you auto-warm the caches.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org
  Sent: Sat, February 5, 2011 4:06:54 AM
  Subject: Re: Performance optimization of Proximity/Wildcard searches
 
  Correct me if I am wrong.
 
  Commit in index flushes SOLR cache but of  course OS cache would still be
  useful? If a an index is updated every hour  then a warm up that takes
 less
  than 5 mins should be more than enough,  right?
 
  On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com
wrote:
 
   Salman,
  
   Warming up may be useful if your  caches are getting decent hit ratios.
   Plus, you
   are warming up  the OS cache when you warm up.
  
   Otis
   
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
   Lucene ecosystem  search :: http://search-lucene.com/
  
  
  
   - Original  Message 
From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Fri, February 4, 2011 3:33:41 PM
Subject: Re:  Performance optimization of Proximity/Wildcard searches
   
 I know so we are not really using it for regular warm-ups (in any
  case
index
is updated on hourly basis). Just tried  few times to compare
 results.
The
issue is I am not  even sure if warming up is useful for such
  regular
 updates.
   
   
   
On Fri, Feb 4, 2011  at 5:16 PM, Otis  Gospodnetic 
   otis_gospodne...@yahoo.com
   wrote:
   
 Salman,
 
 I only skimmed your email, but wanted  to say that  this part
 sounds a
   little
 suspicious:
 
   Our warm up script currently  executes  all distinct queries in
 our
logs
  having  count  5. It was run  yesterday (with all the  indexing
update
 every

 It sounds  like this will make  warmup take a long time,
 assuming
you
 have
 more than a  handful distinct  queries in your logs.

 Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem  search :: http://search-lucene.com/

 

 - Original  Message  
  From: Salman Akram salman.ak...@northbaysolutions.net
To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
Sent: Tue, January 25, 2011 6:32:48 AM
   Subject: Re: Performance  optimization of Proximity/Wildcard
 searches
 
  By warmed  index you  only mean warming the SOLR cache or OS
 cache? As
   I
said
  our index is updated every hour so I am  not sure how much SOLR
  cache
  would
   be helpful but OS cache should still be  helpful, right?
  
  I  haven't compared the results   with a proper script but from
 manual
  testing
   here are  some of the observations.
  
  'Recent' queries which  are  in cache of  course return
 immediately
   (only
 if
they are exactly same - even  if they took 3-4 mins first
 time).  I
will
 need
  to test how  many recent  queries stay in  cache but still this
 would
work
 only
  for very common   queries.  User can run different queries and I
 want
   at
   least
  them to be at 'acceptable'  level  (5-10 secs) even if  not very
 fast.
 
   Our warm up script currently   executes all distinct queries in
  our
   logs
  having count  5. It  was  run  yesterday (with all the indexing
   update
  every
   hour after that) and today when  I  executed some of the same
queries
 again
   their time seemed a little less  (around  15-20%), I am  not
 sure if
   this
 means
   anything. However,  still their  time is not acceptable.
  
  What do you  think is the best way to  compare  results? First
 run all
   the
   warm
  up queries and then execute same randomly andcompare?
 
  We are using Windows  server, would it make a  big difference if
  we
   move
  to
  Linux? Our load is not  high but some  queries are really
  complex.
 
   Also I  was hoping to move to SSD in last after trying out  all
software
   options. Is that an  agreed fact that on large indexes (which
 don't
   fit
  in
  RAM

Re: Performance optimization of Proximity/Wildcard searches

2011-02-05 Thread Salman Akram
Correct me if I am wrong.

Commit in index flushes SOLR cache but of course OS cache would still be
useful? If a an index is updated every hour then a warm up that takes less
than 5 mins should be more than enough, right?

On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Salman,

 Warming up may be useful if your caches are getting decent hit ratios.
 Plus, you
 are warming up the OS cache when you warm up.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org
  Sent: Fri, February 4, 2011 3:33:41 PM
  Subject: Re: Performance optimization of Proximity/Wildcard searches
 
  I know so we are not really using it for regular warm-ups (in any case
  index
  is updated on hourly basis). Just tried few times to compare results.
  The
  issue is I am not even sure if warming up is useful for such  regular
  updates.
 
 
 
  On Fri, Feb 4, 2011 at 5:16 PM, Otis  Gospodnetic 
 otis_gospodne...@yahoo.com
wrote:
 
   Salman,
  
   I only skimmed your email, but wanted  to say that this part sounds a
 little
   suspicious:
  
 Our warm up script currently  executes all distinct queries in our
  logs
having count  5. It was run  yesterday (with all the  indexing
 update
   every
  
   It sounds like this will make  warmup take a long time, assuming
 you
   have
   more than a  handful distinct queries in your logs.
  
   Otis
   
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
   Lucene ecosystem  search :: http://search-lucene.com/
  
  
  
   - Original  Message 
From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
 Sent: Tue, January 25, 2011 6:32:48 AM
Subject: Re: Performance  optimization of Proximity/Wildcard searches
   
By warmed  index you only mean warming the SOLR cache or OS cache? As
 I
 said
our index is updated every hour so I am not sure how much SOLR  cache
would
be helpful but OS cache should still be  helpful, right?
   
I  haven't compared the results  with a proper script but from manual
testing
here are  some of the observations.
   
'Recent' queries which  are  in cache of course return immediately
 (only
   if
 they are exactly same - even  if they took 3-4 mins first time). I
  will
   need
to test how many recent  queries stay in  cache but still this would
 work
   only
for very common  queries.  User can run different queries and I want
 at
least
them to be at 'acceptable'  level (5-10 secs) even if  not very fast.
   
Our warm up script currently   executes all distinct queries in our
 logs
having count  5. It  was run  yesterday (with all the indexing
 update
   every
 hour after that) and today when  I executed some of the same
  queries
   again
their time seemed a little less  (around  15-20%), I am not sure if
 this
   means
anything. However,  still their  time is not acceptable.
   
What do you  think is the best way to compare  results? First run all
 the
warm
up queries and then execute same randomly and   compare?
   
We are using Windows server, would it make a  big difference if  we
 move
   to
Linux? Our load is not  high but some queries are really  complex.
   
Also I  was hoping to move to SSD in last after trying out all
  software
 options. Is that an agreed fact that on large indexes (which don't
 fit
   in
RAM) proximity/wildcard/phrase queries (on common  words) would be
 slow
and
it can be only improved by  cache warm up and better hardware?
 Otherwise
with
an  index of around 150GB such queries will take more than a  min?

If that's the case I know this question is very subjective but  if a
single
query takes 2 min on SAS 10K RPM what  would its approx time be on a
  good
   SSD
(everything  else same)?
   
Thanks!
   
   
 On Tue, Jan 25,  2011 at 3:44 PM, Toke Eskildsen
   t...@statsbiblioteket.dkwrote:

  On Tue, 2011-01-25 at 10:20 +0100, Salman Akram  wrote:
  Cache  warming is a good option too but the  index get updated
 every
   hour
  so
   not sure how much would that help.

  What is the  time difference between queries with a warmed index
 and  a
 cold one? If  the warmed index performs satisfactory,  then one
 answer
   is
 to upgrade  your underlying  storage. As always for IO-caused
   performance
 problem  in  Lucene/Solr-land, SSD is the answer.

 
   
   
--
Regards,

Salman Akram
   
  
 
 
 
  --
  Regards,
 
  Salman Akram
 




-- 
Regards,

Salman Akram


Re: Performance optimization of Proximity/Wildcard searches

2011-02-05 Thread Salman Akram
Since all queries return total count as well so on average a query matches
10% of the total documents. The index I am talking about is around 13
million so that means around 1.3 million documents match on average. Of
course all of them won't be overlapping so I am guessing that around 30-50%
documents do match the daily queries.

I tried to find out a lot if you can tell SOLR to stop searching after a
certain count - I don't mean no. of rows but just like MySQL limit so that
it doesn't have to spend time calculating the total count whereas its only
returning few rows to UI and we are OK in showing count as 1000+ (if its
more than 1000) but couldn't find any way.

On Sat, Feb 5, 2011 at 7:45 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Heh, I'm not sure if this is valid thinking. :)

 By *matching* doc distribution I meant: what proportion of your millions of
 documents actually ever get matched and then how many of those make it to
 the
 UI.
 If you have 1000 queries in a day and they all end up matching only 3 of
 your
 docs, the system will need less RAM than a system where 1000 queries match
 5
 different docs.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org
  Sent: Fri, February 4, 2011 3:38:55 PM
  Subject: Re: Performance optimization of Proximity/Wildcard searches
 
  Well I assume many people out there would have indexes larger than 100GB
  and
  I don't think so normally you will have more RAM than 32GB or  64!
 
  As I mentioned the queries are mostly phrase, proximity, wildcard  and
  combination of these.
 
  What exactly do you mean by distribution of  documents? On this index our
  documents are not more than few hundred KB's on  average (file system
 size)
  and there are around 14 million documents. 80% of  the index size is
 taken up
  by position file. I am not sure if this is what  you asked?
 
  On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com
wrote:
 
   Hi,
  
  
Sharding is an  option  too but that too comes with limitations so
 want to
keep that as a  last  resort but I think there must be other things
 coz
150GB
is not too big for  one drive/server with 32GB  Ram.
  
   Hmm what makes you think 32 GB is enough for your 150  GB index?
   It depends on queries and distribution of matching documents,  for
 example.
   What's yours like?
  
   Otis

   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
   Lucene ecosystem  search :: http://search-lucene.com/
  
  
  
   - Original  Message 
From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Tue, January 25, 2011 4:20:34 AM
Subject: Performance  optimization of Proximity/Wildcard searches
   
 Hi,
   
I am facing performance issues in three types of  queries (and  their
combination). Some of the queries take  more than 2-3 mins. Index
 size  is
around 150GB.

   
   - Wildcard
-  Proximity
   - Phrases (with common  words)
   
I know CommonGrams and  Stop words are a  good way to resolve such
 issues
   but
they don't fulfill  our  functional requirements (Common Grams seem
 to
   have
 issues with phrase  proximity, stop words have issues with exact
  match
   etc).
   
Sharding is an  option too  but that too comes with limitations so
 want to
keep that as a  last  resort but I think there must be other things
 coz
150GB
is not too big for  one drive/server with 32GB  Ram.
   
Cache warming is a good option too but  the  index get updated every
 hour
   so
not sure how much would  that  help.
   
What are the other main tips that can  help in performance
  optimization
   of
the above  queries?
   
Thanks
   
--
 Regards,
   
Salman Akram

  
 
 
 
  --
  Regards,
 
  Salman Akram
 




-- 
Regards,

Salman Akram


Re: Performance optimization of Proximity/Wildcard searches

2011-02-05 Thread Otis Gospodnetic
Yes, OS cache mostly remains (obviously index files that are no longer around 
are going to remain the OS cache for a while, but will be useless and gradually 
replaced by new index files).
How long warmup takes is not relevant here, but what queries you use to warm up 
the index and how much you auto-warm the caches.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Sat, February 5, 2011 4:06:54 AM
 Subject: Re: Performance optimization of Proximity/Wildcard searches
 
 Correct me if I am wrong.
 
 Commit in index flushes SOLR cache but of  course OS cache would still be
 useful? If a an index is updated every hour  then a warm up that takes less
 than 5 mins should be more than enough,  right?
 
 On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
   wrote:
 
  Salman,
 
  Warming up may be useful if your  caches are getting decent hit ratios.
  Plus, you
  are warming up  the OS cache when you warm up.
 
  Otis
  
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem  search :: http://search-lucene.com/
 
 
 
  - Original  Message 
   From: Salman Akram salman.ak...@northbaysolutions.net
To: solr-user@lucene.apache.org
Sent: Fri, February 4, 2011 3:33:41 PM
   Subject: Re:  Performance optimization of Proximity/Wildcard searches
  
I know so we are not really using it for regular warm-ups (in any  case
   index
   is updated on hourly basis). Just tried  few times to compare results.
   The
   issue is I am not  even sure if warming up is useful for such  regular
updates.
  
  
  
   On Fri, Feb 4, 2011  at 5:16 PM, Otis  Gospodnetic 
  otis_gospodne...@yahoo.com
  wrote:
  
Salman,

I only skimmed your email, but wanted  to say that  this part sounds a
  little
suspicious:

  Our warm up script currently  executes  all distinct queries in our
   logs
 having  count  5. It was run  yesterday (with all the  indexing
   update
every
   
It sounds  like this will make  warmup take a long time, assuming
   you
have
more than a  handful distinct  queries in your logs.
   
Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem  search :: http://search-lucene.com/
   

   
- Original  Message  
 From: Salman Akram salman.ak...@northbaysolutions.net
   To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
   Sent: Tue, January 25, 2011 6:32:48 AM
  Subject: Re: Performance  optimization of Proximity/Wildcard  
searches

 By warmed  index you  only mean warming the SOLR cache or OS cache? As
  I
   said
 our index is updated every hour so I am  not sure how much SOLR  cache
 would
  be helpful but OS cache should still be  helpful, right?
 
 I  haven't compared the results   with a proper script but from manual
 testing
  here are  some of the observations.
 
 'Recent' queries which  are  in cache of  course return immediately
  (only
if
   they are exactly same - even  if they took 3-4 mins first time).  I
   will
need
 to test how  many recent  queries stay in  cache but still this would
   work
only
 for very common   queries.  User can run different queries and I want
  at
  least
 them to be at 'acceptable'  level  (5-10 secs) even if  not very fast.

  Our warm up script currently   executes all distinct queries in  our
  logs
 having count  5. It  was  run  yesterday (with all the indexing
  update
 every
  hour after that) and today when  I  executed some of the same
   queries
again
  their time seemed a little less  (around  15-20%), I am  not sure if
  this
means
  anything. However,  still their  time is not acceptable.
 
 What do you  think is the best way to  compare  results? First run all
  the
  warm
 up queries and then execute same randomly andcompare?

 We are using Windows  server, would it make a  big difference if  we
  move
 to
 Linux? Our load is not  high but some  queries are really  complex.

  Also I  was hoping to move to SSD in last after trying out  all
   software
  options. Is that an  agreed fact that on large indexes (which don't
  fit
 in
 RAM) proximity/wildcard/phrase queries (on  common  words) would be
  slow
 and
  it can be only improved by  cache warm up and better  hardware?
  Otherwise
 with
  an  index of around 150GB such queries will take more than a   min?
 
 If that's the case I  know this question is very subjective but  if a
  single
 query takes 2 min on SAS 10K RPM what  would  its approx time be on a
   good
SSD
  (everything  else same

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Salman,

I only skimmed your email, but wanted to say that this part sounds a little 
suspicious:

 Our warm up script currently  executes all distinct queries in our logs
 having count  5. It was run  yesterday (with all the indexing update every

It sounds like this will make warmup take a long time, assuming you have 
more than a handful distinct queries in your logs.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
 Sent: Tue, January 25, 2011 6:32:48 AM
 Subject: Re: Performance optimization of Proximity/Wildcard searches
 
 By warmed index you only mean warming the SOLR cache or OS cache? As I  said
 our index is updated every hour so I am not sure how much SOLR cache  would
 be helpful but OS cache should still be helpful, right?
 
 I  haven't compared the results with a proper script but from manual  testing
 here are some of the observations.
 
 'Recent' queries which are  in cache of course return immediately (only if
 they are exactly same - even  if they took 3-4 mins first time). I will need
 to test how many recent  queries stay in cache but still this would work only
 for very common queries.  User can run different queries and I want at least
 them to be at 'acceptable'  level (5-10 secs) even if not very fast.
 
 Our warm up script currently  executes all distinct queries in our logs
 having count  5. It was run  yesterday (with all the indexing update every
 hour after that) and today when  I executed some of the same queries again
 their time seemed a little less  (around 15-20%), I am not sure if this means
 anything. However, still their  time is not acceptable.
 
 What do you think is the best way to compare  results? First run all the warm
 up queries and then execute same randomly and  compare?
 
 We are using Windows server, would it make a big difference if  we move to
 Linux? Our load is not high but some queries are really  complex.
 
 Also I was hoping to move to SSD in last after trying out all  software
 options. Is that an agreed fact that on large indexes (which don't  fit in
 RAM) proximity/wildcard/phrase queries (on common words) would be slow  and
 it can be only improved by cache warm up and better hardware? Otherwise  with
 an index of around 150GB such queries will take more than a  min?
 
 If that's the case I know this question is very subjective but if a  single
 query takes 2 min on SAS 10K RPM what would its approx time be on a  good SSD
 (everything else same)?
 
 Thanks!
 
 
 On Tue, Jan 25,  2011 at 3:44 PM, Toke Eskildsen 
t...@statsbiblioteket.dkwrote:
 
   On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
   Cache  warming is a good option too but the index get updated every hour
   so
   not sure how much would that help.
 
  What is the  time difference between queries with a warmed index and a
  cold one? If  the warmed index performs satisfactory, then one answer is
  to upgrade  your underlying storage. As always for IO-caused performance
  problem in  Lucene/Solr-land, SSD is the answer.
 
 
 
 
 -- 
 Regards,
 
 Salman Akram
 


Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Hi,


 Sharding is an  option too but that too comes with limitations so want to
 keep that as a last  resort but I think there must be other things coz 150GB
 is not too big for  one drive/server with 32GB Ram.

Hmm what makes you think 32 GB is enough for your 150 GB index?
It depends on queries and distribution of matching documents, for example.  
What's yours like?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Tue, January 25, 2011 4:20:34 AM
 Subject: Performance optimization of Proximity/Wildcard searches
 
 Hi,
 
 I am facing performance issues in three types of queries (and  their
 combination). Some of the queries take more than 2-3 mins. Index size  is
 around 150GB.
 
 
- Wildcard
-  Proximity
- Phrases (with common words)
 
 I know CommonGrams and  Stop words are a good way to resolve such issues but
 they don't fulfill our  functional requirements (Common Grams seem to have
 issues with phrase  proximity, stop words have issues with exact match etc).
 
 Sharding is an  option too but that too comes with limitations so want to
 keep that as a last  resort but I think there must be other things coz 150GB
 is not too big for  one drive/server with 32GB Ram.
 
 Cache warming is a good option too but  the index get updated every hour so
 not sure how much would that  help.
 
 What are the other main tips that can help in performance  optimization of
 the above queries?
 
 Thanks
 
 -- 
 Regards,
 
 Salman Akram
 


Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Salman Akram
I know so we are not really using it for regular warm-ups (in any case index
is updated on hourly basis). Just tried few times to compare results. The
issue is I am not even sure if warming up is useful for such regular
updates.



On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Salman,

 I only skimmed your email, but wanted to say that this part sounds a little
 suspicious:

  Our warm up script currently  executes all distinct queries in our logs
  having count  5. It was run  yesterday (with all the indexing update
 every

 It sounds like this will make warmup take a long time, assuming you
 have
 more than a handful distinct queries in your logs.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
  Sent: Tue, January 25, 2011 6:32:48 AM
  Subject: Re: Performance optimization of Proximity/Wildcard searches
 
  By warmed index you only mean warming the SOLR cache or OS cache? As I
  said
  our index is updated every hour so I am not sure how much SOLR cache
  would
  be helpful but OS cache should still be helpful, right?
 
  I  haven't compared the results with a proper script but from manual
  testing
  here are some of the observations.
 
  'Recent' queries which are  in cache of course return immediately (only
 if
  they are exactly same - even  if they took 3-4 mins first time). I will
 need
  to test how many recent  queries stay in cache but still this would work
 only
  for very common queries.  User can run different queries and I want at
 least
  them to be at 'acceptable'  level (5-10 secs) even if not very fast.
 
  Our warm up script currently  executes all distinct queries in our logs
  having count  5. It was run  yesterday (with all the indexing update
 every
  hour after that) and today when  I executed some of the same queries
 again
  their time seemed a little less  (around 15-20%), I am not sure if this
 means
  anything. However, still their  time is not acceptable.
 
  What do you think is the best way to compare  results? First run all the
 warm
  up queries and then execute same randomly and  compare?
 
  We are using Windows server, would it make a big difference if  we move
 to
  Linux? Our load is not high but some queries are really  complex.
 
  Also I was hoping to move to SSD in last after trying out all  software
  options. Is that an agreed fact that on large indexes (which don't  fit
 in
  RAM) proximity/wildcard/phrase queries (on common words) would be slow
  and
  it can be only improved by cache warm up and better hardware? Otherwise
  with
  an index of around 150GB such queries will take more than a  min?
 
  If that's the case I know this question is very subjective but if a
  single
  query takes 2 min on SAS 10K RPM what would its approx time be on a  good
 SSD
  (everything else same)?
 
  Thanks!
 
 
  On Tue, Jan 25,  2011 at 3:44 PM, Toke Eskildsen
 t...@statsbiblioteket.dkwrote:
 
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
Cache  warming is a good option too but the index get updated every
 hour
so
not sure how much would that help.
  
   What is the  time difference between queries with a warmed index and a
   cold one? If  the warmed index performs satisfactory, then one answer
 is
   to upgrade  your underlying storage. As always for IO-caused
 performance
   problem in  Lucene/Solr-land, SSD is the answer.
  
  
 
 
  --
  Regards,
 
  Salman Akram
 




-- 
Regards,

Salman Akram


Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Salman Akram
Well I assume many people out there would have indexes larger than 100GB and
I don't think so normally you will have more RAM than 32GB or 64!

As I mentioned the queries are mostly phrase, proximity, wildcard and
combination of these.

What exactly do you mean by distribution of documents? On this index our
documents are not more than few hundred KB's on average (file system size)
and there are around 14 million documents. 80% of the index size is taken up
by position file. I am not sure if this is what you asked?

On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Hi,


  Sharding is an  option too but that too comes with limitations so want to
  keep that as a last  resort but I think there must be other things coz
 150GB
  is not too big for  one drive/server with 32GB Ram.

 Hmm what makes you think 32 GB is enough for your 150 GB index?
 It depends on queries and distribution of matching documents, for example.
 What's yours like?

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Salman Akram salman.ak...@northbaysolutions.net
  To: solr-user@lucene.apache.org
  Sent: Tue, January 25, 2011 4:20:34 AM
  Subject: Performance optimization of Proximity/Wildcard searches
 
  Hi,
 
  I am facing performance issues in three types of queries (and  their
  combination). Some of the queries take more than 2-3 mins. Index size  is
  around 150GB.
 
 
 - Wildcard
 -  Proximity
 - Phrases (with common words)
 
  I know CommonGrams and  Stop words are a good way to resolve such issues
 but
  they don't fulfill our  functional requirements (Common Grams seem to
 have
  issues with phrase  proximity, stop words have issues with exact match
 etc).
 
  Sharding is an  option too but that too comes with limitations so want to
  keep that as a last  resort but I think there must be other things coz
 150GB
  is not too big for  one drive/server with 32GB Ram.
 
  Cache warming is a good option too but  the index get updated every hour
 so
  not sure how much would that  help.
 
  What are the other main tips that can help in performance  optimization
 of
  the above queries?
 
  Thanks
 
  --
  Regards,
 
  Salman Akram
 




-- 
Regards,

Salman Akram


Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Salman,

Warming up may be useful if your caches are getting decent hit ratios. Plus, 
you 
are warming up the OS cache when you warm up.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Fri, February 4, 2011 3:33:41 PM
 Subject: Re: Performance optimization of Proximity/Wildcard searches
 
 I know so we are not really using it for regular warm-ups (in any case  index
 is updated on hourly basis). Just tried few times to compare results.  The
 issue is I am not even sure if warming up is useful for such  regular
 updates.
 
 
 
 On Fri, Feb 4, 2011 at 5:16 PM, Otis  Gospodnetic otis_gospodne...@yahoo.com
   wrote:
 
  Salman,
 
  I only skimmed your email, but wanted  to say that this part sounds a little
  suspicious:
 
Our warm up script currently  executes all distinct queries in our  logs
   having count  5. It was run  yesterday (with all the  indexing update
  every
 
  It sounds like this will make  warmup take a long time, assuming you
  have
  more than a  handful distinct queries in your logs.
 
  Otis
  
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem  search :: http://search-lucene.com/
 
 
 
  - Original  Message 
   From: Salman Akram salman.ak...@northbaysolutions.net
To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
Sent: Tue, January 25, 2011 6:32:48 AM
   Subject: Re: Performance  optimization of Proximity/Wildcard searches
  
   By warmed  index you only mean warming the SOLR cache or OS cache? As I
said
   our index is updated every hour so I am not sure how much SOLR  cache
   would
   be helpful but OS cache should still be  helpful, right?
  
   I  haven't compared the results  with a proper script but from manual
   testing
   here are  some of the observations.
  
   'Recent' queries which  are  in cache of course return immediately (only
  if
they are exactly same - even  if they took 3-4 mins first time). I  will
  need
   to test how many recent  queries stay in  cache but still this would work
  only
   for very common  queries.  User can run different queries and I want at
   least
   them to be at 'acceptable'  level (5-10 secs) even if  not very fast.
  
   Our warm up script currently   executes all distinct queries in our logs
   having count  5. It  was run  yesterday (with all the indexing update
  every
hour after that) and today when  I executed some of the same  queries
  again
   their time seemed a little less  (around  15-20%), I am not sure if this
  means
   anything. However,  still their  time is not acceptable.
  
   What do you  think is the best way to compare  results? First run all the
   warm
   up queries and then execute same randomly and   compare?
  
   We are using Windows server, would it make a  big difference if  we move
  to
   Linux? Our load is not  high but some queries are really  complex.
  
   Also I  was hoping to move to SSD in last after trying out all  software
options. Is that an agreed fact that on large indexes (which don't   fit
  in
   RAM) proximity/wildcard/phrase queries (on common  words) would be slow
   and
   it can be only improved by  cache warm up and better hardware? Otherwise
   with
   an  index of around 150GB such queries will take more than a  min?
   
   If that's the case I know this question is very subjective but  if a
   single
   query takes 2 min on SAS 10K RPM what  would its approx time be on a  good
  SSD
   (everything  else same)?
  
   Thanks!
  
  
On Tue, Jan 25,  2011 at 3:44 PM, Toke Eskildsen
  t...@statsbiblioteket.dkwrote:
   
 On Tue, 2011-01-25 at 10:20 +0100, Salman Akram  wrote:
 Cache  warming is a good option too but the  index get updated every
  hour
 so
  not sure how much would that help.
   
 What is the  time difference between queries with a warmed index and  a
cold one? If  the warmed index performs satisfactory,  then one answer
  is
to upgrade  your underlying  storage. As always for IO-caused
  performance
problem  in  Lucene/Solr-land, SSD is the answer.
   

  
  
   --
   Regards,
   
   Salman Akram
  
 
 
 
 
 -- 
 Regards,
 
 Salman Akram
 


Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic
Heh, I'm not sure if this is valid thinking. :)

By *matching* doc distribution I meant: what proportion of your millions of 
documents actually ever get matched and then how many of those make it to the 
UI.
If you have 1000 queries in a day and they all end up matching only 3 of your 
docs, the system will need less RAM than a system where 1000 queries match 
5 
different docs.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Salman Akram salman.ak...@northbaysolutions.net
 To: solr-user@lucene.apache.org
 Sent: Fri, February 4, 2011 3:38:55 PM
 Subject: Re: Performance optimization of Proximity/Wildcard searches
 
 Well I assume many people out there would have indexes larger than 100GB  and
 I don't think so normally you will have more RAM than 32GB or  64!
 
 As I mentioned the queries are mostly phrase, proximity, wildcard  and
 combination of these.
 
 What exactly do you mean by distribution of  documents? On this index our
 documents are not more than few hundred KB's on  average (file system size)
 and there are around 14 million documents. 80% of  the index size is taken up
 by position file. I am not sure if this is what  you asked?
 
 On Fri, Feb 4, 2011 at 5:19 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
   wrote:
 
  Hi,
 
 
   Sharding is an  option  too but that too comes with limitations so want to
   keep that as a  last  resort but I think there must be other things coz
   150GB
   is not too big for  one drive/server with 32GB  Ram.
 
  Hmm what makes you think 32 GB is enough for your 150  GB index?
  It depends on queries and distribution of matching documents,  for example.
  What's yours like?
 
  Otis
   
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem  search :: http://search-lucene.com/
 
 
 
  - Original  Message 
   From: Salman Akram salman.ak...@northbaysolutions.net
To: solr-user@lucene.apache.org
Sent: Tue, January 25, 2011 4:20:34 AM
   Subject: Performance  optimization of Proximity/Wildcard searches
  
Hi,
  
   I am facing performance issues in three types of  queries (and  their
   combination). Some of the queries take  more than 2-3 mins. Index size  is
   around 150GB.
   
  
  - Wildcard
   -  Proximity
  - Phrases (with common  words)
  
   I know CommonGrams and  Stop words are a  good way to resolve such issues
  but
   they don't fulfill  our  functional requirements (Common Grams seem to
  have
issues with phrase  proximity, stop words have issues with exact  match
  etc).
  
   Sharding is an  option too  but that too comes with limitations so want to
   keep that as a  last  resort but I think there must be other things coz
   150GB
   is not too big for  one drive/server with 32GB  Ram.
  
   Cache warming is a good option too but  the  index get updated every hour
  so
   not sure how much would  that  help.
  
   What are the other main tips that can  help in performance  optimization
  of
   the above  queries?
  
   Thanks
  
   --
Regards,
  
   Salman Akram
   
 
 
 
 
 -- 
 Regards,
 
 Salman Akram
 


Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Salman Akram
Hi,

I am facing performance issues in three types of queries (and their
combination). Some of the queries take more than 2-3 mins. Index size is
around 150GB.


   - Wildcard
   - Proximity
   - Phrases (with common words)

I know CommonGrams and Stop words are a good way to resolve such issues but
they don't fulfill our functional requirements (Common Grams seem to have
issues with phrase proximity, stop words have issues with exact match etc).

Sharding is an option too but that too comes with limitations so want to
keep that as a last resort but I think there must be other things coz 150GB
is not too big for one drive/server with 32GB Ram.

Cache warming is a good option too but the index get updated every hour so
not sure how much would that help.

What are the other main tips that can help in performance optimization of
the above queries?

Thanks

-- 
Regards,

Salman Akram


Re: Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Toke Eskildsen
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
 Cache warming is a good option too but the index get updated every hour so
 not sure how much would that help.

What is the time difference between queries with a warmed index and a
cold one? If the warmed index performs satisfactory, then one answer is
to upgrade your underlying storage. As always for IO-caused performance
problem in Lucene/Solr-land, SSD is the answer.



Re: Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Salman Akram
By warmed index you only mean warming the SOLR cache or OS cache? As I said
our index is updated every hour so I am not sure how much SOLR cache would
be helpful but OS cache should still be helpful, right?

I haven't compared the results with a proper script but from manual testing
here are some of the observations.

'Recent' queries which are in cache of course return immediately (only if
they are exactly same - even if they took 3-4 mins first time). I will need
to test how many recent queries stay in cache but still this would work only
for very common queries. User can run different queries and I want at least
them to be at 'acceptable' level (5-10 secs) even if not very fast.

Our warm up script currently executes all distinct queries in our logs
having count  5. It was run yesterday (with all the indexing update every
hour after that) and today when I executed some of the same queries again
their time seemed a little less (around 15-20%), I am not sure if this means
anything. However, still their time is not acceptable.

What do you think is the best way to compare results? First run all the warm
up queries and then execute same randomly and compare?

We are using Windows server, would it make a big difference if we move to
Linux? Our load is not high but some queries are really complex.

Also I was hoping to move to SSD in last after trying out all software
options. Is that an agreed fact that on large indexes (which don't fit in
RAM) proximity/wildcard/phrase queries (on common words) would be slow and
it can be only improved by cache warm up and better hardware? Otherwise with
an index of around 150GB such queries will take more than a min?

If that's the case I know this question is very subjective but if a single
query takes 2 min on SAS 10K RPM what would its approx time be on a good SSD
(everything else same)?

Thanks!


On Tue, Jan 25, 2011 at 3:44 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
  Cache warming is a good option too but the index get updated every hour
 so
  not sure how much would that help.

 What is the time difference between queries with a warmed index and a
 cold one? If the warmed index performs satisfactory, then one answer is
 to upgrade your underlying storage. As always for IO-caused performance
 problem in Lucene/Solr-land, SSD is the answer.




-- 
Regards,

Salman Akram