TrieDateField, precisionStep impact on sorting performance

2014-07-16 Thread Kuehn, Dennis
Hello,

I'd like to sort on a TrieDateField which currently has a precisionStep value 
of 6.
From what I got so far, the precisionStep value only affects range query 
performance and index size.

However, the documentation for TrieDateField says:
'precisionStep=0 enables efficient date sorting and minimizes index size; 
precisionStep=8 (the default) enables efficient range queries.'

Does this mean sorting performance will suffer for precisionStep values other 
than 0?

Cheers,
Dennis


Re: TrieDateField, precisionStep impact on sorting performance

2014-07-16 Thread Yonik Seeley
On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis
dennis.ku...@brands4friends.de wrote:
 I'd like to sort on a TrieDateField which currently has a precisionStep value 
 of 6.
 From what I got so far, the precisionStep value only affects range query 
 performance and index size.

 However, the documentation for TrieDateField says:
 'precisionStep=0 enables efficient date sorting and minimizes index size; 
 precisionStep=8 (the default) enables efficient range queries.'

 Does this mean sorting performance will suffer for precisionStep values other 
 than 0?

No, sorting speed is unaffected by precisionStep.  That comment looks
slightly misleading.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: TrieDateField, precisionStep impact on sorting performance

2014-07-16 Thread Kuehn, Dennis
Thanks for clarifying!

Dennis



On 7/16/14 3:19 PM, Yonik Seeley yo...@heliosearch.com wrote:

On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis
dennis.ku...@brands4friends.de wrote:
 I'd like to sort on a TrieDateField which currently has a precisionStep
value of 6.
 From what I got so far, the precisionStep value only affects range
query performance and index size.

 However, the documentation for TrieDateField says:
 'precisionStep=0 enables efficient date sorting and minimizes index
size; precisionStep=8 (the default) enables efficient range queries.'

 Does this mean sorting performance will suffer for precisionStep values
other than 0?

No, sorting speed is unaffected by precisionStep.  That comment looks
slightly misleading.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data



Re: Sorting performance

2012-06-08 Thread Dmitry Kan
Hi,

probably this may help you start:

https://issues.apache.org/jira/browse/SOLR-1297

Dmitry

On Mon, Jun 4, 2012 at 9:51 PM, Gau gauravshe...@gmail.com wrote:

 Here is the usecase:
 I am using synonym expansion at query time to get results. this is
 essentially a name search, so a search for Jim may be expanded at query
 time
 for James, Jung, Jimmy, etc.

 So ranking fields like TF, IDF, Norms do not mean anything to me. I just
 reset them to zero. so all the results which I get have the same rank. I
 have used a copy field to boost the weights of exact match, so Jim would be
 boosted to the top.

 However I want the other results like Jimmy, Jung, James to be sorted by
 Levenstein Distance with respect to word Jim (the original query). The
 number of results returned are quite large. So a genereal strdist sort
 takes
 6-7 seconds. Is there any other option than applying a sort= in the query
 to
 achieve the same functionality? Any particular way to index the data to
 achieve the same result? any idea to boost the performance and get the
 intended functionality?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987633.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan


Sorting performance

2012-06-04 Thread Gau
Here is the usecase:
I am using synonym expansion at query time to get results. this is
essentially a name search, so a search for Jim may be expanded at query time
for James, Jung, Jimmy, etc.

So ranking fields like TF, IDF, Norms do not mean anything to me. I just
reset them to zero. so all the results which I get have the same rank. I
have used a copy field to boost the weights of exact match, so Jim would be
boosted to the top. 

However I want the other results like Jimmy, Jung, James to be sorted by
Levenstein Distance with respect to word Jim (the original query). The
number of results returned are quite large. So a genereal strdist sort takes
6-7 seconds. Is there any other option than applying a sort= in the query to
achieve the same functionality? Any particular way to index the data to
achieve the same result? any idea to boost the performance and get the
intended functionality?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987632.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting performance

2012-06-04 Thread Gau
Here is the usecase:
I am using synonym expansion at query time to get results. this is
essentially a name search, so a search for Jim may be expanded at query time
for James, Jung, Jimmy, etc.

So ranking fields like TF, IDF, Norms do not mean anything to me. I just
reset them to zero. so all the results which I get have the same rank. I
have used a copy field to boost the weights of exact match, so Jim would be
boosted to the top. 

However I want the other results like Jimmy, Jung, James to be sorted by
Levenstein Distance with respect to word Jim (the original query). The
number of results returned are quite large. So a genereal strdist sort takes
6-7 seconds. Is there any other option than applying a sort= in the query to
achieve the same functionality? Any particular way to index the data to
achieve the same result? any idea to boost the performance and get the
intended functionality?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987633.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting performance + replication of index between cores

2009-09-03 Thread Sreeram Vaidyanathan

Did u guys find a solution?
I am having a similar issue.

Setup:
One indexer box  2 searcher box. Each having 6 different solr-cores
We have a lot of updates (in the range of a couple thousand items every few
mins).
The Snappuller/Snapinstaller pulls and commits every 5 mins.

Query response time peaks to 60+ seconds when a new searcher is being
prepared.
I have disabled the caches (filter, query  document). 

We have a strict requirement of response time  10 secs all the time.

Thanks
Sreeram


sunnyfr wrote:
 
 Hi Christophe, 
 
 Did you find a way to fix up your problem, cuz even with replication will
 have this problem, lot of update means clear cache and manage that.
 I've the same issue, I just wondering if I won't turn off servers during
 update ??? 
 How did you fix that ? 
 
 Thanks,
 sunny
 
 
 christophe-2 wrote:
 
 Hi,
 
 After fully reloading my index, using another field than a Data does not 
 help that much.
 Using a warmup query avoids having the first request slow, but:
  - Frequents commits means that the Searcher is reloaded frequently 
 and, as the warmup takes time, the clients must wait.
  - Having warmup slows down the index process (I guess this is 
 because after a commit, the Searchers are recreated)
 
 So I'm considering, as suggested,  to have two instances: one for 
 indexing and one for searching.
 I was wondering if there are simple ways to replicate the index in a 
 single Solr server running two cores ? Any such config already tested ? 
 I guess that the standard replication based on rsync can be simplified a 
 lot in this case as the two indexes are on the same server.
 
 Thanks
 Christophe
 
 Beniamin Janicki wrote:
 :so you can send your updates anytime you want, and as long as you only 
 :commit every 5 minutes (or commit on a master as often as you want, but 
 :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 :results will be at most 5minutes + warming time stale.

 This is what I do as well ( commits are done once per 5 minutes ). I've
 got
 master - slave configuration. Master has turned off all caches
 (commented in
 solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
 ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
 with warming it took from 30 mins up to 2 hours). 

 Slave caches are configured to have autowarmCount=0 and
 maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
 done. I haven't noticed any huge delays while serving search request.
 Try to use those values - may be they'll help in your case too.

 Ben Janicki


 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
 Sent: 22 October 2008 04:56
 To: solr-user@lucene.apache.org
 Subject: Re: Sorting performance


 : The problem is that I will have hundreds of users doing queries, and a
 : continuous flow of document coming in.
 : So a delay in warming up a cache could be acceptable if I do it a
 few
 times
 : per day. But not on a too regular basis (right now, the first query
 that
 loads
 : the cache takes 150s).
 : 
 : However: I'm not sure why it looks not to be a good idea to update the
 caches

 you can refresh the caches automaticly after updating, the newSearcher 
 event is fired whenever a searcher is opened (but before it's used by 
 clients) so you can configure warming queries for it -- it doesn't have
 to 
 be done manually (or by the first user to use that reader)

 so you can send your updates anytime you want, and as long as you only 
 commit every 5 minutes (or commit on a master as often as you want, but 
 only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 results will be at most 5minutes + warming time stale.


 -Hoss

   
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Sorting-performance-tp20037712p25286018.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sorting performance + replication of index between cores

2009-04-17 Thread sunnyfr

Hi Christophe, 

Did you find a way to fix up your problem, cuz even with replication will
have this problem, lot of update means clear cache and manage that.
I've the same issue, I just wondering if I won't turn off servers during
update ??? 
How did you fix that ? 

Thanks,
sunny


christophe-2 wrote:
 
 Hi,
 
 After fully reloading my index, using another field than a Data does not 
 help that much.
 Using a warmup query avoids having the first request slow, but:
  - Frequents commits means that the Searcher is reloaded frequently 
 and, as the warmup takes time, the clients must wait.
  - Having warmup slows down the index process (I guess this is 
 because after a commit, the Searchers are recreated)
 
 So I'm considering, as suggested,  to have two instances: one for 
 indexing and one for searching.
 I was wondering if there are simple ways to replicate the index in a 
 single Solr server running two cores ? Any such config already tested ? 
 I guess that the standard replication based on rsync can be simplified a 
 lot in this case as the two indexes are on the same server.
 
 Thanks
 Christophe
 
 Beniamin Janicki wrote:
 :so you can send your updates anytime you want, and as long as you only 
 :commit every 5 minutes (or commit on a master as often as you want, but 
 :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 :results will be at most 5minutes + warming time stale.

 This is what I do as well ( commits are done once per 5 minutes ). I've
 got
 master - slave configuration. Master has turned off all caches (commented
 in
 solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
 ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
 with warming it took from 30 mins up to 2 hours). 

 Slave caches are configured to have autowarmCount=0 and
 maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
 done. I haven't noticed any huge delays while serving search request.
 Try to use those values - may be they'll help in your case too.

 Ben Janicki


 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
 Sent: 22 October 2008 04:56
 To: solr-user@lucene.apache.org
 Subject: Re: Sorting performance


 : The problem is that I will have hundreds of users doing queries, and a
 : continuous flow of document coming in.
 : So a delay in warming up a cache could be acceptable if I do it a few
 times
 : per day. But not on a too regular basis (right now, the first query
 that
 loads
 : the cache takes 150s).
 : 
 : However: I'm not sure why it looks not to be a good idea to update the
 caches

 you can refresh the caches automaticly after updating, the newSearcher 
 event is fired whenever a searcher is opened (but before it's used by 
 clients) so you can configure warming queries for it -- it doesn't have
 to 
 be done manually (or by the first user to use that reader)

 so you can send your updates anytime you want, and as long as you only 
 commit every 5 minutes (or commit on a master as often as you want, but 
 only run snappuller/snapinstaller on your slaves every 5 minutes) your 
 results will be at most 5minutes + warming time stale.


 -Hoss

   
 
 

-- 
View this message in context: 
http://www.nabble.com/Sorting-performance-tp20037712p23094174.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sorting performance + replication of index between cores

2008-10-27 Thread christophe

Hi,

After fully reloading my index, using another field than a Data does not 
help that much.

Using a warmup query avoids having the first request slow, but:
- Frequents commits means that the Searcher is reloaded frequently 
and, as the warmup takes time, the clients must wait.
- Having warmup slows down the index process (I guess this is 
because after a commit, the Searchers are recreated)


So I'm considering, as suggested,  to have two instances: one for 
indexing and one for searching.
I was wondering if there are simple ways to replicate the index in a 
single Solr server running two cores ? Any such config already tested ? 
I guess that the standard replication based on rsync can be simplified a 
lot in this case as the two indexes are on the same server.


Thanks
Christophe

Beniamin Janicki wrote:
:so you can send your updates anytime you want, and as long as you only 
:commit every 5 minutes (or commit on a master as often as you want, but 
:only run snappuller/snapinstaller on your slaves every 5 minutes) your 
:results will be at most 5minutes + warming time stale.


This is what I do as well ( commits are done once per 5 minutes ). I've got
master - slave configuration. Master has turned off all caches (commented in
solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
,Xmx= 1GB and committing takes around 10 secs ( on default configuration
with warming it took from 30 mins up to 2 hours). 


Slave caches are configured to have autowarmCount=0 and
maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
done. I haven't noticed any huge delays while serving search request.
Try to use those values - may be they'll help in your case too.

Ben Janicki


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: 22 October 2008 04:56

To: solr-user@lucene.apache.org
Subject: Re: Sorting performance


: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache could be acceptable if I do it a few
times
: per day. But not on a too regular basis (right now, the first query that
loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the

caches

you can refresh the caches automaticly after updating, the newSearcher 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)


so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.



-Hoss

  


Re: Sorting performance

2008-10-23 Thread christophe

Hi,

I'm now reloading my index.
The issue might be related with the way dates are handled (I was sorting 
on a date field).
Now, I have added an integer field that represent the date (but in 
minutes instead of milli seconds).
With 4M documents (and indexing running in background), I have a correct 
response time, even for the first query. I still want to check with 10M 
and more documents.


Once my index is fully loaded, I will try the config parameters you suggest.

Thanks
Christophe

Beniamin Janicki wrote:
:so you can send your updates anytime you want, and as long as you only 
:commit every 5 minutes (or commit on a master as often as you want, but 
:only run snappuller/snapinstaller on your slaves every 5 minutes) your 
:results will be at most 5minutes + warming time stale.


This is what I do as well ( commits are done once per 5 minutes ). I've got
master - slave configuration. Master has turned off all caches (commented in
solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
,Xmx= 1GB and committing takes around 10 secs ( on default configuration
with warming it took from 30 mins up to 2 hours). 


Slave caches are configured to have autowarmCount=0 and
maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
done. I haven't noticed any huge delays while serving search request.
Try to use those values - may be they'll help in your case too.

Ben Janicki


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: 22 October 2008 04:56

To: solr-user@lucene.apache.org
Subject: Re: Sorting performance


: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache could be acceptable if I do it a few
times
: per day. But not on a too regular basis (right now, the first query that
loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the

caches

you can refresh the caches automaticly after updating, the newSearcher 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)


so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.



-Hoss

  


RE: Sorting performance

2008-10-22 Thread Beniamin Janicki
:so you can send your updates anytime you want, and as long as you only 
:commit every 5 minutes (or commit on a master as often as you want, but 
:only run snappuller/snapinstaller on your slaves every 5 minutes) your 
:results will be at most 5minutes + warming time stale.

This is what I do as well ( commits are done once per 5 minutes ). I've got
master - slave configuration. Master has turned off all caches (commented in
solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
,Xmx= 1GB and committing takes around 10 secs ( on default configuration
with warming it took from 30 mins up to 2 hours). 

Slave caches are configured to have autowarmCount=0 and
maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
done. I haven't noticed any huge delays while serving search request.
Try to use those values - may be they'll help in your case too.

Ben Janicki


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: 22 October 2008 04:56
To: solr-user@lucene.apache.org
Subject: Re: Sorting performance


: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache could be acceptable if I do it a few
times
: per day. But not on a too regular basis (right now, the first query that
loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the
caches

you can refresh the caches automaticly after updating, the newSearcher 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.


-Hoss



Re: Sorting performance

2008-10-21 Thread christophe
I'm now considering if Solr (Lucene) is a good choice when we have a 
huge number of indexed document and a large number of new documents 
needs to be indexed everyday.


Maybe I'm wrong, but my feeling is that the way the sort caches are 
handled (recreated after new commit, not shared between Searcher), the 
solution does not scale. And it is not just a memory issue (memory is 
cheap), but more the lack of update of an existing cache.


I'm testing if I can sort on a field that might be faster to cache: any 
hints on this ? Would that make a difference if  I use a field with less 
different values than a timestamp ? I'm looking for some details on how 
the cache is populated on the first query. Also, for the code insiders 
;-), would that be difficult to change this caching mechanism to allow 
update and reuse of an existing cache ?


Thanks for your help
Christophe

christophe wrote:
The problem is that I will have hundreds of users doing queries, and a 
continuous flow of document coming in.
So a delay in warming up a cache could be acceptable if I do it a 
few times per day. But not on a too regular basis (right now, the 
first query that loads the cache takes 150s).


However: I'm not sure why it looks not to be a good idea to update the 
caches when updates are committed ? Any centralized cache (memcached 
is a good one) that is maintained up to date by the update/commit 
process would be great. Config options could then let to the user to 
decide if the cache is shared between servers or not. Creating a new 
cache and then swap it will double the necessary memory.


I also have a related questions regarding readers: a new reader is 
opened when documents are committed. And the cache is associated with 
the reader (if I got it right). Are all user requests served by this 
reader ? How does that scale if I have many concurrent users ?


C.

Norberto Meijome wrote:

On Mon, 20 Oct 2008 16:28:23 +0300
christophe [EMAIL PROTECTED] wrote:

 
Hum. this mean I have to wait before I index new documents and 
avoid indexing when they are created (I have about 50 000 new 
documents created each day and I was planning to make those 
searchable ASAP).



you can always index + optimize out of band in a 'master' / RW server 
, and

then send the updated index to your slave (the one actually serving the
requests).
This *will NOT* remove the need to refresh your cache, but it will 
remove any

delay introduced by commit/indexing + optimise.

 
Too bad there is no way to have a centralized cache that can be 
shared AND updated when new documents are created.



hmm not sure it makes sense like that... but maybe along the lines of 
having an
active cache that is used to serve queries, and new ones being 
prepared, and

then swapped when ready.
Speaking of which (or not :P) , has anyone thought about / done any 
work on
using memcached for these internal solr caches? I guess it would make 
sense for

setups with several slaves ( or even a master updating memcached
too...)...though for a setup with shards it would be slightly more 
involved
(although it *could* be used to support several slaves per 'data 
shard' ).


All the best,
B
_
{Beto|Norberto|Numard} Meijome

RTFM and STFW before anything bad happens.

I speak for myself, not my employer. Contents may be hot. Slippery 
when wet.
Reading disclaimers makes you go blind. Writing them is worse. You 
have been

Warned.
  






Re: Sorting performance

2008-10-21 Thread Chris Hostetter

: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache could be acceptable if I do it a few times
: per day. But not on a too regular basis (right now, the first query that loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the caches

you can refresh the caches automaticly after updating, the newSearcher 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.


-Hoss



Re: Sorting performance

2008-10-20 Thread christophe

Will do so. Thanks.
Are there any metrics on how to compute memory requirements (based on 
doc average size, number of sorted fields, number of indexed documents + 
number of new document / day) ?


Thanks
Christophe


Mark Miller wrote:
You need to setup a warming query that sorts so that the initial long 
query is done behind the scenes. Users first query will then be fast. 
Solrconfig.


- Mark


On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED] 
wrote:


Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
-Xmx2024m
With those values, the second query is way faster. Only the first one 
is very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same 
issues when I will have many users searching at the same time: I will 
do a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin 
console or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and 
indexed fields.


This query: text:sometext returns the results, sorted by score in 
a few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more 
to return the data (when it doesn't fails with an out of memory 
error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

 /   multiValued=true /
 default=NOW multiValued=false/

Thanks
Christophe












Re: Sorting performance

2008-10-20 Thread christophe
When I start indexing new documents, searches are taking long time 
again: is the sort cache flushed when new documents are indexed ?


Thanks
Christophe

Mark Miller wrote:
You need to setup a warming query that sorts so that the initial long 
query is done behind the scenes. Users first query will then be fast. 
Solrconfig.


- Mark


On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED] 
wrote:


Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
-Xmx2024m
With those values, the second query is way faster. Only the first one 
is very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same 
issues when I will have many users searching at the same time: I will 
do a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin 
console or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and 
indexed fields.


This query: text:sometext returns the results, sorted by score in 
a few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more 
to return the data (when it doesn't fails with an out of memory 
error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

 /   multiValued=true /
 default=NOW multiValued=false/

Thanks
Christophe












Re: Sorting performance

2008-10-20 Thread Erick Erickson
Caches are specific to opening a searcher. So whenever you open a reader,
the caches are rebuilt for that server. If you are picking up your changes,
you
MUST be opening a new reader so yes, indeed, your caches are being flushed.

You can get around this by firing a few warmup queries at the server before
using it for real.

If you are opening a new reader for each request, well, you shouldn't do
that G.

Best
Erick

On Mon, Oct 20, 2008 at 9:02 AM, christophe [EMAIL PROTECTED]wrote:

 When I start indexing new documents, searches are taking long time again:
 is the sort cache flushed when new documents are indexed ?

 Thanks
 Christophe

 Mark Miller wrote:

 You need to setup a warming query that sorts so that the initial long
 query is done behind the scenes. Users first query will then be fast.
 Solrconfig.

 - Mark


 On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED]
 wrote:

  Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
 With those values, the second query is way faster. Only the first one is
 very slow.
 Thanks for the tip.
 However, I'm wondering if will be enough and I will not hit the same
 issues when I will have many users searching at the same time: I will do a
 stress test to check this.

 Thanks
 Christophe

 christophe wrote:

 It is slow each time I run it. (I test it from the Solr admin console or
 from a JAVA program using the Http client).
 I do not get the OOM each time.

 Thx
 Christophe

 Otis Gospodnetic wrote:

 Is the sorted query slow only the first time or every time you run it?

 You got an OOM?  What -Xmx value are you using?  Try increasing it.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

  From: christophe [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, October 17, 2008 1:28:52 PM
 Subject: Sorting performance
 Hi,

 I'm doing some tests with Solr1.3
 I have loaded around 7M documents, each with a few stored and indexed
 fields.

 This query: text:sometext returns the results, sorted by score in a
 few milliseconds. (I display 10 out of 8747 matched documents)
 This one: text:sometext;id desc   takes something like 60s or more to
 return the data (when it doesn't fails with an out of memory error). (id 
 is
 a string type).
 I have tried to display only id, same results.

 Any ideas ? I'm sure I'm doing something wrong.

 My schema is based on the sample, with the following fields:

  /   multiValued=true /
  default=NOW multiValued=false/

 Thanks
 Christophe









Re: Sorting performance

2008-10-20 Thread Mark Miller

christophe wrote:
When I start indexing new documents, searches are taking long time 
again: is the sort cache flushed when new documents are indexed ?


When you commit, a new Reader will be opened (or reopened) so that the 
freshly added docs can be seen. This would make the first search slow 
again, but if you have the warming queries, it should be warmed before 
being put into use. Be sure the warming query sorts on the right field.




Are there any metrics on how to compute memory requirements (based on 
doc average size, number of sorted fields, number of indexed documents 
+ number of new document / day) ?


Depends on the field type, but I think its 32bits x numDocs for most 
datatypes, with the String datatype also requiring an array of all the 
unique terms to index into. Thats not everything, but it dominates.




Thanks
Christophe
Mark Miller wrote:
You need to setup a warming query that sorts so that the initial long 
query is done behind the scenes. Users first query will then be fast. 
Solrconfig.


- Mark


On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED] 
wrote:


Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
-Xmx2024m
With those values, the second query is way faster. Only the first 
one is very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same 
issues when I will have many users searching at the same time: I 
will do a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin 
console or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:
Is the sorted query slow only the first time or every time you run 
it?


You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and 
indexed fields.


This query: text:sometext returns the results, sorted by score in 
a few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or 
more to return the data (when it doesn't fails with an out of 
memory error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

 /   multiValued=true /
 default=NOW multiValued=false/

Thanks
Christophe














Re: Sorting performance

2008-10-20 Thread christophe
Hum. this mean I have to wait before I index new documents and avoid 
indexing when they are created (I have about 50 000 new documents 
created each day and I was planning to make those searchable ASAP).
Too bad there is no way to have a centralized cache that can be shared 
AND updated when new documents are created.


C.

Mark Miller wrote:

christophe wrote:
When I start indexing new documents, searches are taking long time 
again: is the sort cache flushed when new documents are indexed ?


When you commit, a new Reader will be opened (or reopened) so that the 
freshly added docs can be seen. This would make the first search slow 
again, but if you have the warming queries, it should be warmed before 
being put into use. Be sure the warming query sorts on the right field.




Are there any metrics on how to compute memory requirements (based on 
doc average size, number of sorted fields, number of indexed 
documents + number of new document / day) ?


Depends on the field type, but I think its 32bits x numDocs for most 
datatypes, with the String datatype also requiring an array of all the 
unique terms to index into. Thats not everything, but it dominates.




Thanks
Christophe
Mark Miller wrote:
You need to setup a warming query that sorts so that the initial 
long query is done behind the scenes. Users first query will then be 
fast. Solrconfig.


- Mark


On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED] 
wrote:


Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
-Xmx2024m
With those values, the second query is way faster. Only the first 
one is very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the 
same issues when I will have many users searching at the same time: 
I will do a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin 
console or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:
Is the sorted query slow only the first time or every time you 
run it?


You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and 
indexed fields.


This query: text:sometext returns the results, sorted by score 
in a few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or 
more to return the data (when it doesn't fails with an out of 
memory error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

 /   multiValued=true /
 default=NOW multiValued=false/

Thanks
Christophe
















Re: Sorting performance

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 16:28:23 +0300
christophe [EMAIL PROTECTED] wrote:

 Hum. this mean I have to wait before I index new documents and avoid 
 indexing when they are created (I have about 50 000 new documents 
 created each day and I was planning to make those searchable ASAP).

you can always index + optimize out of band in a 'master' / RW server , and
then send the updated index to your slave (the one actually serving the
requests). 

This *will NOT* remove the need to refresh your cache, but it will remove any
delay introduced by commit/indexing + optimise.

 Too bad there is no way to have a centralized cache that can be shared 
 AND updated when new documents are created.

hmm not sure it makes sense like that... but maybe along the lines of having an
active cache that is used to serve queries, and new ones being prepared, and
then swapped when ready. 

Speaking of which (or not :P) , has anyone thought about / done any work on
using memcached for these internal solr caches? I guess it would make sense for
setups with several slaves ( or even a master updating memcached
too...)...though for a setup with shards it would be slightly more involved
(although it *could* be used to support several slaves per 'data shard' ).

All the best,
B
_
{Beto|Norberto|Numard} Meijome

RTFM and STFW before anything bad happens.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Sorting performance

2008-10-20 Thread christophe
The problem is that I will have hundreds of users doing queries, and a 
continuous flow of document coming in.
So a delay in warming up a cache could be acceptable if I do it a few 
times per day. But not on a too regular basis (right now, the first 
query that loads the cache takes 150s).


However: I'm not sure why it looks not to be a good idea to update the 
caches when updates are committed ? Any centralized cache (memcached is 
a good one) that is maintained up to date by the update/commit process 
would be great. Config options could then let to the user to decide if 
the cache is shared between servers or not. Creating a new cache and 
then swap it will double the necessary memory.


I also have a related questions regarding readers: a new reader is 
opened when documents are committed. And the cache is associated with 
the reader (if I got it right). Are all user requests served by this 
reader ? How does that scale if I have many concurrent users ?


C.

Norberto Meijome wrote:

On Mon, 20 Oct 2008 16:28:23 +0300
christophe [EMAIL PROTECTED] wrote:

  
Hum. this mean I have to wait before I index new documents and avoid 
indexing when they are created (I have about 50 000 new documents 
created each day and I was planning to make those searchable ASAP).



you can always index + optimize out of band in a 'master' / RW server , and
then send the updated index to your slave (the one actually serving the
requests). 


This *will NOT* remove the need to refresh your cache, but it will remove any
delay introduced by commit/indexing + optimise.

  
Too bad there is no way to have a centralized cache that can be shared 
AND updated when new documents are created.



hmm not sure it makes sense like that... but maybe along the lines of having an
active cache that is used to serve queries, and new ones being prepared, and
then swapped when ready. 


Speaking of which (or not :P) , has anyone thought about / done any work on
using memcached for these internal solr caches? I guess it would make sense for
setups with several slaves ( or even a master updating memcached
too...)...though for a setup with shards it would be slightly more involved
(although it *could* be used to support several slaves per 'data shard' ).

All the best,
B
_
{Beto|Norberto|Numard} Meijome

RTFM and STFW before anything bad happens.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.
  




RE: Sorting performance

2008-10-20 Thread Lance Norskog
Accd to previous posters on this topic, sorting requires an array with an
entry per document in the entire index. Each entry has 32 bits for the 'int'
type, and 32 bits plus the field representation length for other types. Not
knowing Lucene internals I have a hard time believing that it really has to
be this wasteful, but oh well.

Since 'sint' is needed to do range queries on a field, and 'int' is needed
for efficient sorting, we wound up have one field of each type and a
copyField to make sure they both get the same numbers.  Yes, it's
annoying. 

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 6:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Sorting performance

christophe wrote:
 When I start indexing new documents, searches are taking long time
 again: is the sort cache flushed when new documents are indexed ?

When you commit, a new Reader will be opened (or reopened) so that the
freshly added docs can be seen. This would make the first search slow again,
but if you have the warming queries, it should be warmed before being put
into use. Be sure the warming query sorts on the right field.


 Are there any metrics on how to compute memory requirements (based on 
 doc average size, number of sorted fields, number of indexed documents
 + number of new document / day) ?

Depends on the field type, but I think its 32bits x numDocs for most 
datatypes, with the String datatype also requiring an array of all the 
unique terms to index into. Thats not everything, but it dominates.


 Thanks
 Christophe
 Mark Miller wrote:
 You need to setup a warming query that sorts so that the initial long 
 query is done behind the scenes. Users first query will then be fast. 
 Solrconfig.

 - Mark


 On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED] 
 wrote:

 Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
 -Xmx2024m
 With those values, the second query is way faster. Only the first 
 one is very slow.
 Thanks for the tip.
 However, I'm wondering if will be enough and I will not hit the same 
 issues when I will have many users searching at the same time: I 
 will do a stress test to check this.

 Thanks
 Christophe

 christophe wrote:
 It is slow each time I run it. (I test it from the Solr admin 
 console or from a JAVA program using the Http client).
 I do not get the OOM each time.

 Thx
 Christophe

 Otis Gospodnetic wrote:
 Is the sorted query slow only the first time or every time you run 
 it?

 You got an OOM?  What -Xmx value are you using?  Try increasing it.

 Otis
 -- 
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

 From: christophe [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, October 17, 2008 1:28:52 PM
 Subject: Sorting performance
 Hi,

 I'm doing some tests with Solr1.3
 I have loaded around 7M documents, each with a few stored and 
 indexed fields.

 This query: text:sometext returns the results, sorted by score in 
 a few milliseconds. (I display 10 out of 8747 matched documents)
 This one: text:sometext;id desc   takes something like 60s or 
 more to return the data (when it doesn't fails with an out of 
 memory error). (id is a string type).
 I have tried to display only id, same results.

 Any ideas ? I'm sure I'm doing something wrong.

 My schema is based on the sample, with the following fields:

  /   multiValued=true /
  default=NOW multiValued=false/

 Thanks
 Christophe










Re: Sorting performance

2008-10-18 Thread christophe
It is slow each time I run it. (I test it from the Solr admin console or 
from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance 


Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and indexed 
fields.


This query: text:sometext returns the results, sorted by score in a few 
milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more to 
return the data (when it doesn't fails with an out of memory error). (id 
is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

  
/ 
  
  
  
  
  
multiValued=true /
  
default=NOW multiValued=false/
  



Thanks
Christophe



  




Re: Sorting performance

2008-10-18 Thread christophe

Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
With those values, the second query is way faster. Only the first one is 
very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same 
issues when I will have many users searching at the same time: I will do 
a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin console 
or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 

From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and 
indexed fields.


This query: text:sometext returns the results, sorted by score in a 
few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more 
to return the data (when it doesn't fails with an out of memory 
error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

  /   multiValued=true /
  default=NOW multiValued=false/
 


Thanks
Christophe



  






Re: Sorting performance

2008-10-18 Thread Mark Miller
You need to setup a warming query that sorts so that the initial long  
query is done behind the scenes. Users first query will then be fast.  
Solrconfig.


- Mark


On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED]  
wrote:


Here are the memory parameters I'm using now(Tomcat): -Xms2024m - 
Xmx2024m
With those values, the second query is way faster. Only the first  
one is very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same  
issues when I will have many users searching at the same time: I  
will do a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin  
console or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:
Is the sorted query slow only the first time or every time you run  
it?


You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and  
indexed fields.


This query: text:sometext returns the results, sorted by score in  
a few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or  
more to return the data (when it doesn't fails with an out of  
memory error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

 /   multiValued=true /
 default=NOW multiValued=false/

Thanks
Christophe










Sorting performance

2008-10-17 Thread christophe

Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and indexed 
fields.


This query: text:sometext returns the results, sorted by score in a few 
milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more to 
return the data (when it doesn't fails with an out of memory error). (id 
is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

  field name=id type=string indexed=true stored=true required=true / 
  field name=url type=string indexed=true stored=true/

  field name=type type=string indexed=true stored=true/
  field name=title type=string indexed=true stored=true/
  field name=text type=text indexed=true stored=true /
  field name=tag type=string indexed=true stored=true multiValued=true 
/
  field name=timestamp type=date indexed=true stored=true default=NOW 
multiValued=false/
  dynamicField name=* type=ignored /


Thanks
Christophe





Re: Sorting performance

2008-10-17 Thread Otis Gospodnetic
Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: christophe [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, October 17, 2008 1:28:52 PM
 Subject: Sorting performance 
 
 Hi,
 
 I'm doing some tests with Solr1.3
 I have loaded around 7M documents, each with a few stored and indexed 
 fields.
 
 This query: text:sometext returns the results, sorted by score in a few 
 milliseconds. (I display 10 out of 8747 matched documents)
 This one: text:sometext;id desc   takes something like 60s or more to 
 return the data (when it doesn't fails with an out of memory error). (id 
 is a string type).
 I have tried to display only id, same results.
 
 Any ideas ? I'm sure I'm doing something wrong.
 
 My schema is based on the sample, with the following fields:
 
   
 / 
   
   
   
   
   
 multiValued=true /
   
 default=NOW multiValued=false/
   
 
 
 Thanks
 Christophe