subject:"lucene 2.9 migration issues \-\- MultiReader vs IndexReader document ids"

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Mark Miller

I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:

thanks!

On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto IndexSearcher
in Lucene, and it looks like SolrIndexSearcher is not tickling it
first. I'll take a look.

- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other
Filters, however if I add a query string to the mix it breaks with
an IllegalStateException:

java.lang.IllegalStateException: Auto should be resolved before now
at
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216)

at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)

at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)

at
org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58)

at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)

at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)

at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)

at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171)

at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

This is for a query:
/solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look? This error is new since upgrading the
lucene libs (in recent solr)

Thanks!
ryan

On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:

thanks!

everything got better when I removed my logic to cache based on the
index modification time.

On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com
wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored document
fields.
This process worked fine until a recent change in 2.9 that has
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really
need a single

set
- if a set-per-reader will work, then cache per segment (better for
incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.

I meant RTree per reader (per segment reader).

Previously I was
building a single RTree and using it until the the last modified
time had
changed. This avoided building an index anytime a new reader was
opened and

the index had not changed.

I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try
and do

that yourself.

I'm fine building a new RTree for each reader if
that is required.

If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.

Is there any existing code that deals with this situation?

To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for
ExternalFileField.

See FileFloatSource.getValues() for the implementation.

- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed values
though.

If I iterate of indexed values (from the FieldCache i presume)
then how do i

get access to the document id?

IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.

-Yonik

--
- Mark

http://www.lucidimagination.com

--
- Mark

http://www.lucidimagination.com

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Ryan McKinley

thanks Mark!

how far is lucene /trunk from what is currently in solr?

Is it something we should consider upgrading?

On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:

I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:

thanks!

On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto
IndexSearcher in Lucene, and it looks like SolrIndexSearcher is
not tickling it first. I'll take a look.

- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other
Filters, however if I add a query string to the mix it breaks
with an IllegalStateException:

java.lang.IllegalStateException: Auto should be resolved before now
at org.apache.lucene.search.FieldSortedHitQueue
$1.createValue(FieldSortedHitQueue.java:216)
at org.apache.lucene.search.FieldCacheImpl
$Cache.get(FieldCacheImpl.java:73)
at
org
.apache
.lucene
.search
.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:
168)
at
org
.apache
.lucene
.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58)
at
org
.apache
.solr
.search
.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)
at
org
.apache
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:
924)
at
org
.apache
.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)
at
org
.apache
.solr
.handler.component.QueryComponent.process(QueryComponent.java:171)
at
org
.apache
.solr
.handler
.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org
.apache
.solr
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
at
org
.apache
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:
303)

This is for a query:
/solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look? This error is new since upgrading
the lucene libs (in recent solr)

Thanks!
ryan

On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:

thanks!

everything got better when I removed my logic to cache based on
the index modification time.

On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley
ryan...@gmail.com wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored
document fields.
This process worked fine until a recent change in 2.9 that has
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really
need a single

set
- if a set-per-reader will work, then cache per segment
(better for

incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.

I meant RTree per reader (per segment reader).

Previously I was
building a single RTree and using it until the the last
modified time had
changed. This avoided building an index anytime a new reader
was opened and

the index had not changed.

I *think* that our use of re-open will return the same
IndexReader
instance if nothing has changed... so you shouldn't have to try
and do

that yourself.

I'm fine building a new RTree for each reader if
that is required.

If that works just as well, it will put you in a better
position for
faster incremental updates... new RTrees will be built only for
those

segments that have changed.

Is there any existing code that deals with this situation?

To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works
better
for you, then you can cache the RTree based on the top level
multi
reader and translate the ids... that was my fix for
ExternalFileField.

See FileFloatSource.getValues() for the implementation.

- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed
values though.

If I iterate of indexed values (from the FieldCache i presume)
then how do i

get access to the document id?

IndexReader.terms(Term t) returns a TermEnum that can iterate
over

terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the
list of

documents that match a term.

-Yonik

--
- Mark

http://www.lucidimagination.com

--
- Mark

http://www.lucidimagination.com

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Mark Miller

I think Shalin upgraded the jars this morning, so I'd just grab them
again real quick.

4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228

Ryan McKinley wrote:

thanks Mark!

how far is lucene /trunk from what is currently in solr?

Is it something we should consider upgrading?

On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:

I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:

thanks!

On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto
IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not
tickling it first. I'll take a look.