Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Mark Miller

I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:

thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto IndexSearcher 
in Lucene, and it looks like SolrIndexSearcher is not tickling it 
first. I'll take a look.


- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other 
Filters, however if I add a query string to the mix it breaks with 
an IllegalStateException:


java.lang.IllegalStateException: Auto should be resolved before now
   at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) 

   at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) 

   at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) 

   at 
org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) 

   at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) 

   at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) 

   at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) 

   at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) 

   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) 

   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 


   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 



This is for a query:
 /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading the 
lucene libs (in recent solr)


Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:


thanks!

everything got better when I removed my logic to cache based on the 
index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com 
wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception 



I am using solr trunk and building an RTree from stored document 
fields.
This process worked fine until a recent change in 2.9 that has 
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really 
need a single

set
- if a set-per-reader will work, then cache per segment (better for
incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


Previously I was
building a single RTree and using it until the the last modified 
time had
changed.  This avoided building an index anytime a new reader was 
opened and

the index had not changed.


I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try 
and do

that yourself.


I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for 
ExternalFileField.

See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all 
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed values 
though.


If I iterate of indexed values (from the FieldCache i presume) 
then how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.


-Yonik







--
- Mark

http://www.lucidimagination.com








--
- Mark

http://www.lucidimagination.com





Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Ryan McKinley

thanks Mark!

how far is lucene /trunk from what is currently in solr?

Is it something we should consider upgrading?


On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:


I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:

thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto  
IndexSearcher in Lucene, and it looks like SolrIndexSearcher is  
not tickling it first. I'll take a look.


- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other  
Filters, however if I add a query string to the mix it breaks  
with an IllegalStateException:


java.lang.IllegalStateException: Auto should be resolved before now
  at org.apache.lucene.search.FieldSortedHitQueue 
$1.createValue(FieldSortedHitQueue.java:216)
  at org.apache.lucene.search.FieldCacheImpl 
$Cache.get(FieldCacheImpl.java:73)
  at  
org 
.apache 
.lucene 
.search 
.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java: 
168)
  at  
org 
.apache 
.lucene 
.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58)
  at  
org 
.apache 
.solr 
.search 
.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)
  at  
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
924)
  at  
org 
.apache 
.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)
  at  
org 
.apache 
.solr 
.handler.component.QueryComponent.process(QueryComponent.java:171)
  at  
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
  at  
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
  at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
303)


This is for a query:
/solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading  
the lucene libs (in recent solr)


Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:


thanks!

everything got better when I removed my logic to cache based on  
the index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley  
ryan...@gmail.com wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored  
document fields.
This process worked fine until a recent change in 2.9 that has  
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really  
need a single

set
- if a set-per-reader will work, then cache per segment  
(better for

incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


Previously I was
building a single RTree and using it until the the last  
modified time had
changed.  This avoided building an index anytime a new reader  
was opened and

the index had not changed.


I *think* that our use of re-open will return the same  
IndexReader
instance if nothing has changed... so you shouldn't have to try  
and do

that yourself.


I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better  
position for
faster incremental updates... new RTrees will be built only for  
those

segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works  
better
for you, then you can cache the RTree based on the top level  
multi
reader and translate the ids... that was my fix for  
ExternalFileField.

See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all  
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed  
values though.


If I iterate of indexed values (from the FieldCache i presume)  
then how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate  
over

terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the  
list of

documents that match a term.


-Yonik







--
- Mark

http://www.lucidimagination.com








--
- Mark

http://www.lucidimagination.com







Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Mark Miller
I think Shalin upgraded the jars this morning, so I'd just grab them 
again real quick.


4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228

Ryan McKinley wrote:

thanks Mark!

how far is lucene /trunk from what is currently in solr?

Is it something we should consider upgrading?


On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:


I just committed a fix Ryan - should work with upgraded Lucene jars.

- Mark

Ryan McKinley wrote:

thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto 
IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not 
tickling it first. I'll take a look.


- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other 
Filters, however if I add a query string to the mix it breaks with 
an IllegalStateException:


java.lang.IllegalStateException: Auto should be resolved before now
  at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) 

  at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73) 

  at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) 

  at 
org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) 

  at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) 

  at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) 

  at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) 

  at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) 

  at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) 

  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 


  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
  at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 



This is for a query:
/solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading the 
lucene libs (in recent solr)


Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:


thanks!

everything got better when I removed my logic to cache based on 
the index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley 
ryan...@gmail.com wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception 



I am using solr trunk and building an RTree from stored 
document fields.
This process worked fine until a recent change in 2.9 that has 
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really 
need a single

set
- if a set-per-reader will work, then cache per segment (better 
for

incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


Previously I was
building a single RTree and using it until the the last 
modified time had
changed.  This avoided building an index anytime a new reader 
was opened and

the index had not changed.


I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try 
and do

that yourself.


I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better position 
for
faster incremental updates... new RTrees will be built only for 
those

segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works 
better

for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for 
ExternalFileField.

See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all 
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed 
values though.


If I iterate of indexed values (from the FieldCache i presume) 
then how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the 
list of

documents that match a term.


-Yonik







--
- Mark

http://www.lucidimagination.com








--
- Mark

http://www.lucidimagination.com








--
- Mark

http://www.lucidimagination.com





Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Shalin Shekhar Mangar
Yes, I upgraded the lucene jars a few hours ago for trie api updates. Do you
want me to upgrade them again?

On Fri, Apr 24, 2009 at 7:51 PM, Mark Miller markrmil...@gmail.com wrote:

 I think Shalin upgraded the jars this morning, so I'd just grab them again
 real quick.

 4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228


 Ryan McKinley wrote:

 thanks Mark!

 how far is lucene /trunk from what is currently in solr?

 Is it something we should consider upgrading?


 On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:

  I just committed a fix Ryan - should work with upgraded Lucene jars.

 - Mark

 Ryan McKinley wrote:

 thanks!


 On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

  Looks like its my fault. Auto resolution was moved upto IndexSearcher
 in Lucene, and it looks like SolrIndexSearcher is not tickling it first.
 I'll take a look.

 - Mark

 Ryan McKinley wrote:

 Ok, not totally resolved

 Things work fine when I have my custom Filter alone or with other
 Filters, however if I add a query string to the mix it breaks with an
 IllegalStateException:

 java.lang.IllegalStateException: Auto should be resolved before now
  at
 org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216)

  at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)
  at
 org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)

  at
 org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58)

  at
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)

  at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)

  at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)
  at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171)

  at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)

  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)


 This is for a query:
 /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
 bounds=XXX triggers my custom filter to kick in.

 Any thoughts where to look?  This error is new since upgrading the
 lucene libs (in recent solr)

 Thanks!
 ryan


 On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:

  thanks!

 everything got better when I removed my logic to cache based on the
 index modification time.


 On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

  On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com
 wrote:

 This issue started on java-user, but I am moving it to solr-dev:

 http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

 I am using solr trunk and building an RTree from stored document
 fields.
 This process worked fine until a recent change in 2.9 that has
 different
 document id strategy then I was used to.

 In that thread, Yonik suggested:
 - pop back to the top level from the sub-reader, if you really need
 a single
 set
 - if a set-per-reader will work, then cache per segment (better for
 incremental updates anyway)

 I'm not quite sure what you mean by a set-per-reader.


 I meant RTree per reader (per segment reader).

  Previously I was
 building a single RTree and using it until the the last modified
 time had
 changed.  This avoided building an index anytime a new reader was
 opened and
 the index had not changed.


 I *think* that our use of re-open will return the same IndexReader
 instance if nothing has changed... so you shouldn't have to try and
 do
 that yourself.

  I'm fine building a new RTree for each reader if
 that is required.


 If that works just as well, it will put you in a better position for
 faster incremental updates... new RTrees will be built only for
 those
 segments that have changed.

  Is there any existing code that deals with this situation?


 To cache an RTree per reader, you could use the same logic as
 FieldCache uses... a weak map with the reader as the key.

 If a single top-level RTree that covers the entire index works
 better
 for you, then you can cache the RTree based on the top level multi
 reader and translate the ids... that was my fix for
 ExternalFileField.
 See FileFloatSource.getValues() for the implementation.


  - - - -

 Yonik also suggested:

 Relatively new in 2.9, you can pass null to enumerate over all
 non-deleted
 docs:
 TermDocs td = reader.termDocs(null);

 It would probably be a lot faster to iterate over indexed values
 though.

 If I iterate of indexed values (from the FieldCache i presume) then
 how do i
 get access to the document id?


 IndexReader.terms(Term t) returns a TermEnum that can iterate over
 terms, starting at t.
 IndexReader.termDocs(Term t or TermEnum te) will give you the 

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Ryan McKinley

Yes, that would be great!  the changes we need are in rev 768275:
http://svn.apache.org/viewvc?view=revrevision=768275

thanks



On Apr 24, 2009, at 11:23 AM, Shalin Shekhar Mangar wrote:

Yes, I upgraded the lucene jars a few hours ago for trie api  
updates. Do you

want me to upgrade them again?

On Fri, Apr 24, 2009 at 7:51 PM, Mark Miller markrmil...@gmail.com  
wrote:


I think Shalin upgraded the jars this morning, so I'd just grab  
them again

real quick.

4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228


Ryan McKinley wrote:


thanks Mark!

how far is lucene /trunk from what is currently in solr?

Is it something we should consider upgrading?


On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:

I just committed a fix Ryan - should work with upgraded Lucene jars.


- Mark

Ryan McKinley wrote:


thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto  
IndexSearcher
in Lucene, and it looks like SolrIndexSearcher is not tickling  
it first.

I'll take a look.

- Mark

Ryan McKinley wrote:


Ok, not totally resolved

Things work fine when I have my custom Filter alone or with  
other
Filters, however if I add a query string to the mix it breaks  
with an

IllegalStateException:

java.lang.IllegalStateException: Auto should be resolved  
before now

at
org.apache.lucene.search.FieldSortedHitQueue 
$1.createValue(FieldSortedHitQueue.java:216)


at
org.apache.lucene.search.FieldCacheImpl 
$Cache.get(FieldCacheImpl.java:73)

at
org 
.apache 
.lucene 
.search 
.FieldSortedHitQueue 
.getCachedComparator(FieldSortedHitQueue.java:168)


at
org 
.apache 
.lucene 
.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58)


at
org 
.apache 
.solr 
.search 
.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1214)


at
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
924)


at
org 
.apache 
.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
345)

at
org 
.apache 
.solr 
.handler.component.QueryComponent.process(QueryComponent.java: 
171)


at
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java: 
195)


at
org 
.apache 
.solr 
.handler 
.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)


at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
at
org 
.apache 
.solr 
.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)



This is for a query:
/solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading  
the

lucene libs (in recent solr)

Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:

thanks!


everything got better when I removed my logic to cache based  
on the

index modification time.


On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com 


wrote:

This issue started on java-user, but I am moving it to solr- 
dev:


http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored  
document

fields.
This process worked fine until a recent change in 2.9 that  
has

different
document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you  
really need

a single
set
- if a set-per-reader will work, then cache per segment  
(better for

incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.



I meant RTree per reader (per segment reader).

Previously I was
building a single RTree and using it until the the last  
modified

time had
changed.  This avoided building an index anytime a new  
reader was

opened and
the index had not changed.



I *think* that our use of re-open will return the same  
IndexReader
instance if nothing has changed... so you shouldn't have to  
try and

do
that yourself.

I'm fine building a new RTree for each reader if

that is required.



If that works just as well, it will put you in a better  
position for
faster incremental updates... new RTrees will be built only  
for

those
segments that have changed.

Is there any existing code that deals with this situation?




To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works
better
for you, then you can cache the RTree based on the top level  
multi

reader and translate the ids... that was my fix for
ExternalFileField.
See FileFloatSource.getValues() for the implementation.


- - - -


Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over  
all

non-deleted
docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed  
values

though.

If I iterate of indexed values 

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-24 Thread Shalin Shekhar Mangar
On Fri, Apr 24, 2009 at 9:07 PM, Ryan McKinley ryan...@gmail.com wrote:

 Yes, that would be great!  the changes we need are in rev 768275:
 http://svn.apache.org/viewvc?view=revrevision=768275


Done. I upgraded to r768336.

-- 
Regards,
Shalin Shekhar Mangar.


Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-23 Thread Ryan McKinley

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other  
Filters, however if I add a query string to the mix it breaks with an  
IllegalStateException:


java.lang.IllegalStateException: Auto should be resolved before now
	at org.apache.lucene.search.FieldSortedHitQueue 
$1.createValue(FieldSortedHitQueue.java:216)
	at org.apache.lucene.search.FieldCacheImpl 
$Cache.get(FieldCacheImpl.java:73)
	at  
org 
.apache 
.lucene 
.search 
.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
	at  
org 
.apache 
.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58)
	at  
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1214)
	at  
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)
	at  
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
345)
	at  
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java:171)
	at  
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
195)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)


This is for a query:
  /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading the  
lucene libs (in recent solr)


Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:


thanks!

everything got better when I removed my logic to cache based on the  
index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com  
wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored document  
fields.
This process worked fine until a recent change in 2.9 that has  
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really  
need a single

set
- if a set-per-reader will work, then cache per segment (better for
incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


Previously I was
building a single RTree and using it until the the last modified  
time had
changed.  This avoided building an index anytime a new reader was  
opened and

the index had not changed.


I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try and  
do

that yourself.


I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for  
ExternalFileField.

See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all non- 
deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed values  
though.


If I iterate of indexed values (from the FieldCache i presume)  
then how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.


-Yonik






Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-23 Thread Mark Miller
Looks like its my fault. Auto resolution was moved upto IndexSearcher in 
Lucene, and it looks like SolrIndexSearcher is not tickling it first. 
I'll take a look.


- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other 
Filters, however if I add a query string to the mix it breaks with an 
IllegalStateException:


java.lang.IllegalStateException: Auto should be resolved before now
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216) 

at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) 

at 
org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:58) 

at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214) 

at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924) 

at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345) 

at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171) 

at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) 

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 


at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 



This is for a query:
  /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading the 
lucene libs (in recent solr)


Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:


thanks!

everything got better when I removed my logic to cache based on the 
index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com 
wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception 



I am using solr trunk and building an RTree from stored document 
fields.
This process worked fine until a recent change in 2.9 that has 
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really need 
a single

set
- if a set-per-reader will work, then cache per segment (better for
incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


Previously I was
building a single RTree and using it until the the last modified 
time had
changed.  This avoided building an index anytime a new reader was 
opened and

the index had not changed.


I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try and do
that yourself.


I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for ExternalFileField.
See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all 
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed values 
though.


If I iterate of indexed values (from the FieldCache i presume) then 
how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.


-Yonik







--
- Mark

http://www.lucidimagination.com





Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-23 Thread Ryan McKinley

thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved upto  
IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not  
tickling it first. I'll take a look.


- Mark

Ryan McKinley wrote:

Ok, not totally resolved

Things work fine when I have my custom Filter alone or with other  
Filters, however if I add a query string to the mix it breaks with  
an IllegalStateException:


java.lang.IllegalStateException: Auto should be resolved before now
   at org.apache.lucene.search.FieldSortedHitQueue 
$1.createValue(FieldSortedHitQueue.java:216)
   at org.apache.lucene.search.FieldCacheImpl 
$Cache.get(FieldCacheImpl.java:73)
   at  
org 
.apache 
.lucene 
.search 
.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java: 
168)
   at  
org 
.apache 
.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java: 
58)
   at  
org 
.apache 
.solr 
.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 
1214)
   at  
org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
924)
   at  
org 
.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 
345)
   at  
org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java: 
171)
   at  
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
   at  
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)


This is for a query:
 /solr/flat/select?q=SGIDbounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.

Any thoughts where to look?  This error is new since upgrading the  
lucene libs (in recent solr)


Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:


thanks!

everything got better when I removed my logic to cache based on  
the index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley  
ryan...@gmail.com wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored document  
fields.
This process worked fine until a recent change in 2.9 that has  
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really  
need a single

set
- if a set-per-reader will work, then cache per segment (better  
for

incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


Previously I was
building a single RTree and using it until the the last modified  
time had
changed.  This avoided building an index anytime a new reader  
was opened and

the index had not changed.


I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try  
and do

that yourself.


I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better position  
for
faster incremental updates... new RTrees will be built only for  
those

segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works  
better

for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for  
ExternalFileField.

See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

Relatively new in 2.9, you can pass null to enumerate over all  
non-deleted

docs:
TermDocs td = reader.termDocs(null);

It would probably be a lot faster to iterate over indexed values  
though.


If I iterate of indexed values (from the FieldCache i presume)  
then how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the  
list of

documents that match a term.


-Yonik







--
- Mark

http://www.lucidimagination.com







lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-20 Thread Ryan McKinley

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored document  
fields.  This process worked fine until a recent change in 2.9 that  
has different document id strategy then I was used to.


In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really need a  
single set

- if a set-per-reader will work, then cache per segment (better for
incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.  Previously I  
was building a single RTree and using it until the the last modified  
time had changed.  This avoided building an index anytime a new reader  
was opened and the index had not changed.  I'm fine building a new  
RTree for each reader if that is required.


Is there any existing code that deals with this situation?

- - - -

Yonik also suggested:

  Relatively new in 2.9, you can pass null to enumerate over all non- 
deleted docs:

  TermDocs td = reader.termDocs(null);

  It would probably be a lot faster to iterate over indexed values  
though.


If I iterate of indexed values (from the FieldCache i presume) then  
how do i get access to the document id?


- - - - -  -

thanks for any pointers.

ryan


Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-20 Thread Yonik Seeley
On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com wrote:
 This issue started on java-user, but I am moving it to solr-dev:
 http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

 I am using solr trunk and building an RTree from stored document fields.
  This process worked fine until a recent change in 2.9 that has different
 document id strategy then I was used to.

 In that thread, Yonik suggested:
 - pop back to the top level from the sub-reader, if you really need a single
 set
 - if a set-per-reader will work, then cache per segment (better for
 incremental updates anyway)

 I'm not quite sure what you mean by a set-per-reader.

I meant RTree per reader (per segment reader).

  Previously I was
 building a single RTree and using it until the the last modified time had
 changed.  This avoided building an index anytime a new reader was opened and
 the index had not changed.

I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try and do
that yourself.

 I'm fine building a new RTree for each reader if
 that is required.

If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.

 Is there any existing code that deals with this situation?

To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for ExternalFileField.
 See FileFloatSource.getValues() for the implementation.


 - - - -

 Yonik also suggested:

  Relatively new in 2.9, you can pass null to enumerate over all non-deleted
 docs:
  TermDocs td = reader.termDocs(null);

  It would probably be a lot faster to iterate over indexed values though.

 If I iterate of indexed values (from the FieldCache i presume) then how do i
 get access to the document id?

IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.


-Yonik


Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

2009-04-20 Thread Ryan McKinley

thanks!

everything got better when I removed my logic to cache based on the  
index modification time.



On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley ryan...@gmail.com  
wrote:

This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

I am using solr trunk and building an RTree from stored document  
fields.
 This process worked fine until a recent change in 2.9 that has  
different

document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you really need  
a single

set
- if a set-per-reader will work, then cache per segment (better for
incremental updates anyway)

I'm not quite sure what you mean by a set-per-reader.


I meant RTree per reader (per segment reader).


 Previously I was
building a single RTree and using it until the the last modified  
time had
changed.  This avoided building an index anytime a new reader was  
opened and

the index had not changed.


I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try and do
that yourself.


 I'm fine building a new RTree for each reader if
that is required.


If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.


Is there any existing code that deals with this situation?


To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for ExternalFileField.
See FileFloatSource.getValues() for the implementation.



- - - -

Yonik also suggested:

 Relatively new in 2.9, you can pass null to enumerate over all non- 
deleted

docs:
 TermDocs td = reader.termDocs(null);

 It would probably be a lot faster to iterate over indexed values  
though.


If I iterate of indexed values (from the FieldCache i presume) then  
how do i

get access to the document id?


IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.


-Yonik