Re: FunctionQueries and FieldCache and OOM
: Alright, i can now confirm the issue has been resolved by reducing precision. : The garbage collector on nodes without reduced precision has a real hard time : keeping up and clearly shows a very different graph of heap consumption. : : Consider using MINUTE, HOUR or DAY as precision in case you suffer from : excessive memory consumption: : : recip(ms(NOW/PRECISION,DATE_FIELD),TIME_FRACTION,1,1) FWIW: it sounds like your problem wasn't actually related to your fieldCache, but probably instead if was because of how big your queryResultCache is : Am i correct when i assume that Lucene FieldCache entries are added for : each unique function query? In that case, every query is a unique cache ...no, the FieldCache has one entry per field name, and the value of that cache is an array keyed off of the internal docId for every doc in the index, and the corrisponding value (it's an uninverted version of lucene's inverted index for doing fast value lookups by document) changes in the *values* used in your function queries won't affect FieldCache usage -- only changing the *fields* used in your functions would impact that. : each unique function query? In that case, every query is a unique cache : entry because it operates on milliseconds. If all doesn't work i might be what you describe is correct, but not in the FieldCache -- the queryResultCache is where queries that deal with the main result set (ie: paginated and/or sorted) wind up .. having lots of distinct queries in the bq (or q) param will make the number of unique items in that cache grow significantly (just like having lots of distinct queries in the fq will cause your filterCache to grow significantly) you should definitley checkout what max size you have configured for your queryResultCache ... it sounds like it's proably too big, if you were getting OOM errors from having high precision dates in your boost queries. while i think using less precision is a wise choice, you should still consider dialing that max size down, so that if some other usage pattern still causes lots of unique queries in a short time period (a bot crawling your site map perhaps) it doesn't fill up and cause another OOM -Hoss
Re: FunctionQueries and FieldCache and OOM
Well, it's quite hard to debug because the values listed on the stats page in the fieldCache section don't make much sense. Reducing precision with NOW/HOUR, however, does seem to make a difference. It is hard (or impossible) to reproduce this is a test setup with the same index but without continues updates and without stress tests. Firing manual queries with different values for the bf parameter don't show any difference in the values listed on the stats page. Someone cares to provide an explanation? Thanks On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote: Hi, In one of the environments i'm working on (4 Solr 1.4.1. nodes with replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min), high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes continuously run out of memory. During development we frequently ran excessive stress tests and after tuning JVM and Solr settings all ran fine. A while ago i added the DisMax bq parameter for boosting recent documents, documents older than a day receive 50% less boost, similar to the example but with a much steeper slope. For clarity, i'm not using the ordinal function but the reciprocal version in the bq parameter which is warned against when using Solr 1.4.1 according to the wiki. This week we started the stress tests and nodes are going down again. I've reconfigured the nodes to have different settings for the bq parameter (or no bq parameter). It seems the bq the cause of the misery. Issue SOLR- keeps popping up but it has not been resolved. Is there anyone who can confirm one of those patches fixes this issue before i waste hours of work finding out it doesn't? ;) Am i correct when i assume that Lucene FieldCache entries are added for each unique function query? In that case, every query is a unique cache entry because it operates on milliseconds. If all doesn't work i might be able to reduce precision by operating on minutes or even more instead of milli seconds. I, however, cannot use other nice math function in the ms() parameter so that might make things difficult. However, date math seems available (NOW/HOUR) so i assume it would also work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent useless entries. My apologies for this long mail but it may prove useful for other users and hopefully we find the solution and can update the wiki to add this warning. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: FunctionQueries and FieldCache and OOM
Alright, i can now confirm the issue has been resolved by reducing precision. The garbage collector on nodes without reduced precision has a real hard time keeping up and clearly shows a very different graph of heap consumption. Consider using MINUTE, HOUR or DAY as precision in case you suffer from excessive memory consumption: recip(ms(NOW/PRECISION,DATE_FIELD),TIME_FRACTION,1,1) On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote: Well, it's quite hard to debug because the values listed on the stats page in the fieldCache section don't make much sense. Reducing precision with NOW/HOUR, however, does seem to make a difference. It is hard (or impossible) to reproduce this is a test setup with the same index but without continues updates and without stress tests. Firing manual queries with different values for the bf parameter don't show any difference in the values listed on the stats page. Someone cares to provide an explanation? Thanks On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote: Hi, In one of the environments i'm working on (4 Solr 1.4.1. nodes with replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min), high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes continuously run out of memory. During development we frequently ran excessive stress tests and after tuning JVM and Solr settings all ran fine. A while ago i added the DisMax bq parameter for boosting recent documents, documents older than a day receive 50% less boost, similar to the example but with a much steeper slope. For clarity, i'm not using the ordinal function but the reciprocal version in the bq parameter which is warned against when using Solr 1.4.1 according to the wiki. This week we started the stress tests and nodes are going down again. I've reconfigured the nodes to have different settings for the bq parameter (or no bq parameter). It seems the bq the cause of the misery. Issue SOLR- keeps popping up but it has not been resolved. Is there anyone who can confirm one of those patches fixes this issue before i waste hours of work finding out it doesn't? ;) Am i correct when i assume that Lucene FieldCache entries are added for each unique function query? In that case, every query is a unique cache entry because it operates on milliseconds. If all doesn't work i might be able to reduce precision by operating on minutes or even more instead of milli seconds. I, however, cannot use other nice math function in the ms() parameter so that might make things difficult. However, date math seems available (NOW/HOUR) so i assume it would also work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent useless entries. My apologies for this long mail but it may prove useful for other users and hopefully we find the solution and can update the wiki to add this warning. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350