Re: FieldCache usage for custom field collapse in solr 1.4
Hey Yonik. Thanks for clarifying. The reason I went rolling my own way - I asked previously is there's any plan to back-port the field collapse to solr 1.4 and I understood that its not at all straight forward. If you think it'll be fairly easy to look at the new code in Solr 4.0 trunk and use that as basis for example I'd go ahead and do that. Q - does the field collapse componet expect the field to collapse on to be stored? or does it also try to use field cache trickery? Thanks, Adam On Mon, Dec 6, 2010 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sun, Dec 5, 2010 at 6:12 PM, Adam H. jimmoe...@gmail.com wrote: StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader, collapseField); where 'reader' is the instance of the SolrIndexReader passed along to the component with the ResponseBuilder.SolrQueryRequest object. As I understand, this can double memory usage due to (re)loading this fieldcache on a reader-wide basis rather than on a per segment basis? Yep. Sorting and function queries use per-segment FieldCache entries. So If you also request a FieldCache from the top level reader, it won't reuse the per-segment caches and hence will take up 2x memory over just using per-segment. Solr's field collapsing already works on a per-segment basis... if your needs are at all general, it could make sense to try and get it rolled into solr rather than implementing custom code. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote: Hey Yonik. Thanks for clarifying. The reason I went rolling my own way - I asked previously is there's any plan to back-port the field collapse to solr 1.4 and I understood that its not at all straight forward. Ahhh... I'd just use trunk if possible ;-) The risks to being in production on custom code that no one else uses is perhaps greater than running on a widely used development version. But yes... I don't see a backport happening for 1.4 -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? One last Q regarding correct usage of per-segment FieldCache in Solr components - since this is something I might also have issues with elsewhere, and I suspect other people who work on custom logic as well, i think it might be useful to have some documentation and/or a simple programmatic interface for implementing correct access path to these inside a custom SolrComponent. I looked around the Grouping code abit and have yet to fully understand whats going on, but is the ValueSource supposed to take care of access to underlying field? On Mon, Dec 6, 2010 at 12:34 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote: Hey Yonik. Thanks for clarifying. The reason I went rolling my own way - I asked previously is there's any plan to back-port the field collapse to solr 1.4 and I understood that its not at all straight forward. Ahhh... I'd just use trunk if possible ;-) The risks to being in production on custom code that no one else uses is perhaps greater than running on a widely used development version. But yes... I don't see a backport happening for 1.4 -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote: Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? Yes, the external APIs are very compatible. The internal APIs - not so much. You should reindex also. One last Q regarding correct usage of per-segment FieldCache in Solr components - since this is something I might also have issues with elsewhere, and I suspect other people who work on custom logic as well, i think it might be useful to have some documentation and/or a simple programmatic interface for implementing correct access path to these inside a custom SolrComponent. I looked around the Grouping code abit and have yet to fully understand whats going on, but is the ValueSource supposed to take care of access to underlying field? Yes - you can actually group on arbitrary function queries even. That will be more useful when we add some bucketing functions. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote: Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? Yes, the external APIs are very compatible. The internal APIs - not so much. You should reindex also. And not be (too) surprised if things change before the official 4.x release -- the chances are good that something will change that may require reindexing. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
So, summing up all the information i now have, and the fact I have some additional custom components that use fieldcache, such that the specific answer for field collapsing by migrating to solr 4.0 is not a complete solution to my problems, it seems to me more and more like I might have to actually implement a custom solr QueryComponent, whereby I will pass it multiple collectors (perhaps via some kind of MultiCollector interface, similar to Grouping uses) which will do their appropriate field value collection/aggregation as results are being fetched. In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Is this accurate? Thanks again for all the info here.. Adam On Mon, Dec 6, 2010 at 1:48 PM, Ryan McKinley ryan...@gmail.com wrote: On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote: Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? Yes, the external APIs are very compatible. The internal APIs - not so much. You should reindex also. And not be (too) surprised if things change before the official 4.x release -- the chances are good that something will change that may require reindexing. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote: In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Sure, it's possible, and not too hard (as long as no sort field involves score). Just instruct the QueryComponent to retrieve the set of all matching documents, then you can use that to run then through whatever collectors you want again. I've been meaning to implement this optimization to field collapsing... Depending on the details, either replacing the QueryComponent with your custom one, or inserting an additional component after the query component could make sense. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
ah! so just so I can get cracking on this - Can you be alittle more specific? e.g in my component implementation that runs in the request handling after the normal QueryComponent, How would I access the specific field value for the documents that were retrieved? i.e how would it fit in a code like this if at all: // docList is the matching documents for given offset/rows/query DocIterator it = docList.iterator(); while (it.hasNext()) { docId = it.next(); score = it.score(); // this would've worked if this was stored field: // reader.document(docId).get(fieldName) ?? } On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote: In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Sure, it's possible, and not too hard (as long as no sort field involves score). Just instruct the QueryComponent to retrieve the set of all matching documents, then you can use that to run then through whatever collectors you want again. I've been meaning to implement this optimization to field collapsing... Depending on the details, either replacing the QueryComponent with your custom one, or inserting an additional component after the query component could make sense. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
One more comment/question - Having looked at the Solr stats panel, I do not see detailed memory usage for the field i'm collapsing on in the lucene FieldCache entries listings. As I understand ( after having looked through this ticket: https://issues.apache.org/jira/browse/SOLR-1292 ), this means that its not an 'insanity' instance, and so actually I am not using double the memory, but rather only have this field in the FieldCache on the whole index level. This got me thinking - If i'm not using any segment-level fieldcaching for this field, there's no reason not to use an index-wide one, as long as I can guarantee thats the only use case for this field in the fieldcache.. is this correct? Thanks again for helping me out with this delicate subject :) Adam On Mon, Dec 6, 2010 at 3:21 PM, Adam H. jimmoe...@gmail.com wrote: ah! so just so I can get cracking on this - Can you be alittle more specific? e.g in my component implementation that runs in the request handling after the normal QueryComponent, How would I access the specific field value for the documents that were retrieved? i.e how would it fit in a code like this if at all: // docList is the matching documents for given offset/rows/query DocIterator it = docList.iterator(); while (it.hasNext()) { docId = it.next(); score = it.score(); // this would've worked if this was stored field: // reader.document(docId).get(fieldName) ?? } On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote: In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Sure, it's possible, and not too hard (as long as no sort field involves score). Just instruct the QueryComponent to retrieve the set of all matching documents, then you can use that to run then through whatever collectors you want again. I've been meaning to implement this optimization to field collapsing... Depending on the details, either replacing the QueryComponent with your custom one, or inserting an additional component after the query component could make sense. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
FieldCache usage for custom field collapse in solr 1.4
Hey, I'm trying to use the lucene FieldCache for some custom field collapsing implementation: basically i'm collapsing on a non-stored field, and so am using the fieldcache to retrieve field value instances during run. I noticed I'm getting some OOM's after deploying it, and after looking into it for abit, figured that it might be to do with using a call like this: StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader, collapseField); where 'reader' is the instance of the SolrIndexReader passed along to the component with the ResponseBuilder.SolrQueryRequest object. As I understand, this can double memory usage due to (re)loading this fieldcache on a reader-wide basis rather than on a per segment basis? If so, what would be a way to migrate this code to use a per segment cache? i'm not sure I understand the semantics there at all... Any help will be greatly appreciated, thanks alot! Adam