Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
I just committed a fix Ryan - should work with upgraded Lucene jars. - Mark Ryan McKinley wrote: thanks! On Apr 23, 2009, at 6:32 PM, Mark Miller wrote: Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
thanks Mark! how far is lucene /trunk from what is currently in solr? Is it something we should consider upgrading? On Apr 24, 2009, at 8:30 AM, Mark Miller wrote: I just committed a fix Ryan - should work with upgraded Lucene jars. - Mark Ryan McKinley wrote: thanks! On Apr 23, 2009, at 6:32 PM, Mark Miller wrote: Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue $1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl $Cache.get(FieldCacheImpl.java:73) at org .apache .lucene .search .FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java: 168) at org .apache .lucene .search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org .apache .solr .search .SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) at org .apache .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 924) at org .apache .solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) at org .apache .solr .handler.component.QueryComponent.process(QueryComponent.java:171) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
I think Shalin upgraded the jars this morning, so I'd just grab them again real quick. 4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228 Ryan McKinley wrote: thanks Mark! how far is lucene /trunk from what is currently in solr? Is it something we should consider upgrading? On Apr 24, 2009, at 8:30 AM, Mark Miller wrote: I just committed a fix Ryan - should work with upgraded Lucene jars. - Mark Ryan McKinley wrote: thanks! On Apr 23, 2009, at 6:32 PM, Mark Miller wrote: Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Yes, I upgraded the lucene jars a few hours ago for trie api updates. Do you want me to upgrade them again? On Fri, Apr 24, 2009 at 7:51 PM, Mark Miller markrmil...@gmail.com wrote: I think Shalin upgraded the jars this morning, so I'd just grab them again real quick. 4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228 Ryan McKinley wrote: thanks Mark! how far is lucene /trunk from what is currently in solr? Is it something we should consider upgrading? On Apr 24, 2009, at 8:30 AM, Mark Miller wrote: I just committed a fix Ryan - should work with upgraded Lucene jars. - Mark Ryan McKinley wrote: thanks! On Apr 23, 2009, at 6:32 PM, Mark Miller wrote: Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Yes, that would be great! the changes we need are in rev 768275: http://svn.apache.org/viewvc?view=revrevision=768275 thanks On Apr 24, 2009, at 11:23 AM, Shalin Shekhar Mangar wrote: Yes, I upgraded the lucene jars a few hours ago for trie api updates. Do you want me to upgrade them again? On Fri, Apr 24, 2009 at 7:51 PM, Mark Miller markrmil...@gmail.com wrote: I think Shalin upgraded the jars this morning, so I'd just grab them again real quick. 4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228 Ryan McKinley wrote: thanks Mark! how far is lucene /trunk from what is currently in solr? Is it something we should consider upgrading? On Apr 24, 2009, at 8:30 AM, Mark Miller wrote: I just committed a fix Ryan - should work with upgraded Lucene jars. - Mark Ryan McKinley wrote: thanks! On Apr 23, 2009, at 6:32 PM, Mark Miller wrote: Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue $1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl $Cache.get(FieldCacheImpl.java:73) at org .apache .lucene .search .FieldSortedHitQueue .getCachedComparator(FieldSortedHitQueue.java:168) at org .apache .lucene .search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org .apache .solr .search .SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 1214) at org .apache .solr .search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 924) at org .apache .solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 345) at org .apache .solr .handler.component.QueryComponent.process(QueryComponent.java: 171) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java: 195) at org .apache .solr .handler .RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org .apache .solr .servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr- dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
On Fri, Apr 24, 2009 at 9:07 PM, Ryan McKinley ryan...@gmail.com wrote: Yes, that would be great! the changes we need are in rev 768275: http://svn.apache.org/viewvc?view=revrevision=768275 Done. I upgraded to r768336. -- Regards, Shalin Shekhar Mangar.
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue $1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl $Cache.get(FieldCacheImpl.java:73) at org .apache .lucene .search .FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org .apache .lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org .apache .solr .search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 1214) at org .apache .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 345) at org .apache .solr.handler.component.QueryComponent.process(QueryComponent.java:171) at org .apache .solr .handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 195) at org .apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non- deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik -- - Mark http://www.lucidimagination.com
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
thanks! On Apr 23, 2009, at 6:32 PM, Mark Miller wrote: Looks like its my fault. Auto resolution was moved upto IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not tickling it first. I'll take a look. - Mark Ryan McKinley wrote: Ok, not totally resolved Things work fine when I have my custom Filter alone or with other Filters, however if I add a query string to the mix it breaks with an IllegalStateException: java.lang.IllegalStateException: Auto should be resolved before now at org.apache.lucene.search.FieldSortedHitQueue $1.createValue(FieldSortedHitQueue.java:216) at org.apache.lucene.search.FieldCacheImpl $Cache.get(FieldCacheImpl.java:73) at org .apache .lucene .search .FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java: 168) at org .apache .lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java: 58) at org .apache .solr .search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 1214) at org .apache .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 924) at org .apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 345) at org .apache .solr.handler.component.QueryComponent.process(QueryComponent.java: 171) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) This is for a query: /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN bounds=XXX triggers my custom filter to kick in. Any thoughts where to look? This error is new since upgrading the lucene libs (in recent solr) Thanks! ryan On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote: thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik -- - Mark http://www.lucidimagination.com
lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I'm fine building a new RTree for each reader if that is required. Is there any existing code that deals with this situation? - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non- deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? - - - - - - thanks for any pointers. ryan
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non-deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik
Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
thanks! everything got better when I removed my logic to cache based on the index modification time. On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote: On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote: This issue started on java-user, but I am moving it to solr-dev: http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception I am using solr trunk and building an RTree from stored document fields. This process worked fine until a recent change in 2.9 that has different document id strategy then I was used to. In that thread, Yonik suggested: - pop back to the top level from the sub-reader, if you really need a single set - if a set-per-reader will work, then cache per segment (better for incremental updates anyway) I'm not quite sure what you mean by a set-per-reader. I meant RTree per reader (per segment reader). Previously I was building a single RTree and using it until the the last modified time had changed. This avoided building an index anytime a new reader was opened and the index had not changed. I *think* that our use of re-open will return the same IndexReader instance if nothing has changed... so you shouldn't have to try and do that yourself. I'm fine building a new RTree for each reader if that is required. If that works just as well, it will put you in a better position for faster incremental updates... new RTrees will be built only for those segments that have changed. Is there any existing code that deals with this situation? To cache an RTree per reader, you could use the same logic as FieldCache uses... a weak map with the reader as the key. If a single top-level RTree that covers the entire index works better for you, then you can cache the RTree based on the top level multi reader and translate the ids... that was my fix for ExternalFileField. See FileFloatSource.getValues() for the implementation. - - - - Yonik also suggested: Relatively new in 2.9, you can pass null to enumerate over all non- deleted docs: TermDocs td = reader.termDocs(null); It would probably be a lot faster to iterate over indexed values though. If I iterate of indexed values (from the FieldCache i presume) then how do i get access to the document id? IndexReader.terms(Term t) returns a TermEnum that can iterate over terms, starting at t. IndexReader.termDocs(Term t or TermEnum te) will give you the list of documents that match a term. -Yonik