Re: FunctionQueries and FieldCache and OOM

2011-03-16 Thread Chris Hostetter

: Alright, i can now confirm the issue has been resolved by reducing precision. 
: The garbage collector on nodes without reduced precision has a real hard time 
: keeping up and clearly shows a very different graph of heap consumption.
: 
: Consider using MINUTE, HOUR or DAY as precision in case you suffer from 
: excessive memory consumption:
: 
: recip(ms(NOW/PRECISION,DATE_FIELD),TIME_FRACTION,1,1)

FWIW: it sounds like your problem wasn't actually related to your 
fieldCache, but probably instead if was because of how big your 
queryResultCache is

:   Am i correct when i assume that Lucene FieldCache entries are added for
:   each unique function query?  In that case, every query is a unique cache

...no, the FieldCache has one entry per field name, and the value of that 
cache is an array keyed off of the internal docId for every doc in the 
index, and the corrisponding value (it's an uninverted version of lucene's 
inverted index for doing fast value lookups by document)

changes in the *values* used in your function queries won't affect 
FieldCache usage -- only changing the *fields* used in your functions 
would impact that.

:   each unique function query?  In that case, every query is a unique cache
:   entry because it operates on milliseconds. If all doesn't work i might be

what you describe is correct, but not in the FieldCache -- the 
queryResultCache is where queries that deal with the main result set (ie: 
paginated and/or sorted) wind up .. having lots of distinct queries in 
the bq (or q) param will make the number of unique items in that cache 
grow significantly (just like having lots of distinct queries in the fq 
will cause your filterCache to grow significantly)

you should definitley checkout what max size you have configured for your 
queryResultCache ... it sounds like it's proably too big, if you were 
getting OOM errors from having high precision dates in your boost queries.  
while i think using less precision is a wise choice, you should still 
consider dialing that max size down, so that if some other usage pattern 
still causes lots of unique queries in a short time period (a bot crawling 
your site map perhaps) it doesn't fill up and cause another OOM



-Hoss


Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Well, it's quite hard to debug because the values listed on the stats page in 
the fieldCache section don't make much sense. Reducing precision with 
NOW/HOUR, however, does seem to make a difference.

It is hard (or impossible) to reproduce this is a test setup with the same 
index but without continues updates and without stress tests. Firing manual 
queries with different values for the bf parameter don't show any difference 
in the values listed on the stats page.

Someone cares to provide an explanation?

Thanks

On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
 Hi,
 
 In one of the environments i'm working on (4 Solr 1.4.1. nodes with
 replication, 3+ million docs, ~5.5GB index size, high commit rate
 (~1-2min), high query rate (~50q/s), high number of updates
 (~1000docs/commit)) the nodes continuously run out of memory.
 
 During development we frequently ran excessive stress tests and after
 tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
 bq parameter for boosting recent documents, documents older than a day
 receive 50% less boost, similar to the example but with a much steeper
 slope. For clarity, i'm not using the ordinal function but the reciprocal
 version in the bq parameter which is warned against when using Solr 1.4.1
 according to the wiki.
 
 This week we started the stress tests and nodes are going down again. I've
 reconfigured the nodes to have different settings for the bq parameter (or
 no bq parameter).
 
 It seems the bq the cause of the misery.
 
 Issue SOLR- keeps popping up but it has not been resolved. Is there
 anyone who can confirm one of those patches fixes this issue before i
 waste hours of work finding out it doesn't? ;)
 
 Am i correct when i assume that Lucene FieldCache entries are added for
 each unique function query?  In that case, every query is a unique cache
 entry because it operates on milliseconds. If all doesn't work i might be
 able to reduce precision by operating on minutes or even more instead of
 milli seconds. I, however, cannot use other nice math function in the ms()
 parameter so that might make things difficult.
 
 However, date math seems available (NOW/HOUR) so i assume it would also
 work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent
 useless entries.
 
 My apologies for this long mail but it may prove useful for other users and
 hopefully we find the solution and can update the wiki to add this warning.
 
 Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Alright, i can now confirm the issue has been resolved by reducing precision. 
The garbage collector on nodes without reduced precision has a real hard time 
keeping up and clearly shows a very different graph of heap consumption.

Consider using MINUTE, HOUR or DAY as precision in case you suffer from 
excessive memory consumption:

recip(ms(NOW/PRECISION,DATE_FIELD),TIME_FRACTION,1,1)

On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote:
 Well, it's quite hard to debug because the values listed on the stats page
 in the fieldCache section don't make much sense. Reducing precision with
 NOW/HOUR, however, does seem to make a difference.
 
 It is hard (or impossible) to reproduce this is a test setup with the same
 index but without continues updates and without stress tests. Firing manual
 queries with different values for the bf parameter don't show any
 difference in the values listed on the stats page.
 
 Someone cares to provide an explanation?
 
 Thanks
 
 On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
  Hi,
  
  In one of the environments i'm working on (4 Solr 1.4.1. nodes with
  replication, 3+ million docs, ~5.5GB index size, high commit rate
  (~1-2min), high query rate (~50q/s), high number of updates
  (~1000docs/commit)) the nodes continuously run out of memory.
  
  During development we frequently ran excessive stress tests and after
  tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
  bq parameter for boosting recent documents, documents older than a day
  receive 50% less boost, similar to the example but with a much steeper
  slope. For clarity, i'm not using the ordinal function but the reciprocal
  version in the bq parameter which is warned against when using Solr 1.4.1
  according to the wiki.
  
  This week we started the stress tests and nodes are going down again.
  I've reconfigured the nodes to have different settings for the bq
  parameter (or no bq parameter).
  
  It seems the bq the cause of the misery.
  
  Issue SOLR- keeps popping up but it has not been resolved. Is there
  anyone who can confirm one of those patches fixes this issue before i
  waste hours of work finding out it doesn't? ;)
  
  Am i correct when i assume that Lucene FieldCache entries are added for
  each unique function query?  In that case, every query is a unique cache
  entry because it operates on milliseconds. If all doesn't work i might be
  able to reduce precision by operating on minutes or even more instead of
  milli seconds. I, however, cannot use other nice math function in the
  ms() parameter so that might make things difficult.
  
  However, date math seems available (NOW/HOUR) so i assume it would also
  work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent
  useless entries.
  
  My apologies for this long mail but it may prove useful for other users
  and hopefully we find the solution and can update the wiki to add this
  warning.
  
  Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350