[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-05-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708307#action_12708307
 ] 

Mark Miller commented on LUCENE-831:


I won't likely be getting to this anytime soon if someone else wants to work on 
it. I'll get back at it at some point if not though.

I believe the latest patch is a nice base to work from.

I'm still not clear to me if its best to start merging using the ValueSource 
somehow, or do something where the ValueSource has a merge implementation 
(allowing for a more efficient private merge). It seems the merge code for 
fields, norms, dels, is fairly specialized now, but could become a bit more 
generic. Then perhaps you could add any old ValueSource (other than norms, 
fields, dels)  and easily hook into the merge process. Maybe even in RAM merges 
of RAM based ValueSources - FieldCache etc. Of course, I guess you could also 
still do things specialized as now, and just provide access to the files 
through a ValueSource. That really crimps the pluggability though.

The next step (in terms of the current patch) seems to be to start working 
ValueSource into norms, dels, possibly stored fields. Eventually they should 
become pluggable, but I'm not sure how best to plug them in. I was thinking you 
could set a default ValueSource by field for the FieldCache using the Reader 
open method with a new param. Perhaps it should take a ValueSourceFactory that 
can provide a variety of ValueSources based on field, norms, dels, stored 
fields, with variations for read-only? The proposed componentization of 
IndexReader could be another approach if it materializes, or worked into this 
issue.

I don't think I'll understand whats needed for updatability until I'm in 
deeper. It almost seems like something like setInt(int doc, int n), setByte(int 
doc, byte b) on the ValueSource might work. They could possibly throw 
Unsupported. I know there are a lot of little difficulties involved in all of 
this though, so I'm not very sure of anything at the moment. The backing impl 
would be free to update in RAM (say synced dels), or do a copy on write, etc. I 
guess all methods would throw Unsupported by default, but if you override a 
getXXX you would have the option of  overriding a setXXX. 

ValueSources also need the ability to be sharable across IndexReaders with the 
ability to do copy on write if they are shared and updatable.



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702526#action_12702526
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
Grandma! But yeah we need to somehow support probably plain Java
objects rather than every primitive derivative?
{quote}

You mean big arrays (one per doc) of plain-java-objects?  Is Bobo doing that 
today?  Or do you mean a single Java obect that, internally, deals with lookup 
by docID?

{quote}
(In reference to Mark's post 2nd to last post) Bobo efficiently
nicely calculates facets for multiple values per doc which is
the same thing as multi value faceting?
{quote}

Neat.  How do you compactly represent (in RAM) multiple values per doc?

{quote}
Are norms and deletes implemented? These would just be byte
arrays in the current approach? If not how would they be
represented? It seems like for deleted docs we'd want the
BitVector returned from a ValueSource.get type of method?
{quote}

The current patch doesn't do this -- but we should think about how this change 
could absorb norms/deleted docs, in the future.  We would add a bit variant 
of getXXX (eg that returns BitVector, BitSet, something).

{quote}
Hmm... Does this mean we'd replace the current IndexReader
method of performing updates on norms and deletes with this more
generic update mechanism?
{quote}

Probably we'd still leave the sugar APIs in place, but under the hood their 
impls would be switched to this.

bq. It would be cool to get CSF going?

Most definitely!!

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701751#action_12701751
 ] 

Jason Rutherglen commented on LUCENE-831:
-

I'm trying to figure out how to integrate Bobo faceting field
caches with this patch, I applied the patch, browsed the
ValueSource API and yeah, it's not what I expected. we can
return arrays, objects, or anything and your grandmother not
Grandma! But yeah we need to somehow support probably plain Java
objects rather than every primitive derivative? 

(In reference to Mark's post 2nd to last post) Bobo efficiently
nicely calculates facets for multiple values per doc which is
the same thing as multi value faceting? 

 by back compat with deletes, norms though.

Are norms and deletes implemented? These would just be byte
arrays in the current approach? If not how would they be
represented? It seems like for deleted docs we'd want the
BitVector returned from a ValueSource.get type of method?

M.M.: Updatability is tricky... ValueSource would maybe need a
startChanges() API, which would copy the array (copy-on-write)
if it's not already private

Hmm... Does this mean we'd replace the current IndexReader
method of performing updates on norms and deletes with this more
generic update mechanism?

It would be cool to get CSF going?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700571#action_12700571
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. I was also thinking that some of these issues could force back up to 
multi-reader support though. 

Hopefully not...

bq. I want field handling to become easier in Lucene, but I hope we don't lose 
any of our super on the fly settings. +1 on making field handing easier, but I 
am much more weary of a fixed schema type thing.

I think consolidating per-field details (FieldType) is well decoupled from 
forcing every occurrence of a field to be the same (fixed schema).  We can 
(and I think should) do FieldType without forcing a fixed schema.

bq. I am very interested in having updatable CSF's (much too easy to mistype 
that). There are many cool things to use it for, especially in combination with 
near realtime search (tagging variations).

For tags we'd presumably want multi-valued fields handled in ValueSource, plus 
updatability, plus NRT.

Updatability is tricky... ValueSource would maybe need a startChanges() API, 
which would copy the array (copy-on-write) if it's not already private.  The 
problem is... direct array access precludes more efficient data structures that 
amortize the copy-on-write cost (eg by block), which we are wanting to 
eventually get to for deleted docs  norms (it's likely a large cost in NRT 
reader turnaround, though I hven't yet measured just how costly).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700583#action_12700583
 ] 

Mark Miller commented on LUCENE-831:


bq. I was also thinking that some of these issues could force back up to 
multi-reader support though. 

bq.Hopefully not...

Yes, I don't know enough yet to know for sure. My thought was things like norms 
and deletes that are available from multireader now will have to either still 
be, or straddle multi/segment for a while. I guess that doesnt become much of 
an issue if we go with the same method of just don't load from both single and 
multi or you will double your reqs? It just gets ugly trying to prevent 
multireader use with valuesource, but then have to support it due to all the 
back compat reqs.

bq. We can (and I think should) do FieldType without forcing a fixed schema.

Fair enough, fair enough. I wasn't really taking this completely from this 
discussion, but from a variety of ideas about fields that have been spilling 
out on the list. Of course we can still get a lot better (easier) without 
hitting fixed.

bq. For tags we'd presumably want multi-valued fields handled in ValueSource, 
plus updatability, plus NRT.

Well I'm glad its a small order. Yonik did do some multi value faceting work 
that I never really looked at. I'll go dig it up.

It may just be best if this sits for a while and we see what happens with a 
couple other issues floating around it. I said I had sweat to pump into this, 
not intelligence ;) If we hit all this stuff (and yes, your not saying we need 
to or should, but) this ends up touching most things in IndexReader, than 
possibly writing and merging and what not in IndexWriter (pluggable norms etc 
still need to be written, merged, loaded, etc), and ... 



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700585#action_12700585
 ] 

Uwe Schindler commented on LUCENE-831:
--

I am still thinking about the difference between function query's ValueSource 
and the new ValueSource and I would really like to combine both.
I know for sorting, the array approach is faster, but maybe the new ValueSource 
could provide both ways to access. In the array approach, one would only get 
arrays for single segments, but the method-like access could still map the 
document ids to the correct segment, to have a uniform access even to multi 
readers.
So, maybe there is a possibility to merge both approaches and only provide one 
ValueSource supplying both access strategies.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700586#action_12700586
 ] 

Mark Miller commented on LUCENE-831:


bq. but maybe the new ValueSource could provide both ways to access

Yeah, this goes with with what Mike pointed out above - we can return arrays, 
objects, or anything and your grandmother. My main worry with that idea is the 
ValueSource API - it could have 10's of accessors, but only 1 or 2 are 
generally implemented and you have to know the right one to call - it could 
work of course, but on first thought, its fairly ugly. You could make a fair 
point that we are already a ways down that path with the design we already have 
I guess though.

bq. So, maybe there is a possibility to merge both approaches and only provide 
one ValueSource supplying both access strategies. 

Its a good point. Something makes me think we will still be a bit hindered by 
back compat with deletes, norms though.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700430#action_12700430
 ] 

Earwin Burrfoot commented on LUCENE-831:


{quote}
Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms  deleted docs.
{quote}
On the other hand, maybe, we shouldn't?
Deleted docs should definetly be mutable, but that's it.
Anybody is updating norms on a regular basis on a serious project? But still 
everyone pays for the feature with running ugly synchronization code for norms. 
Let's dump it!
As for mutable fields, okay, users of Sphinx have them. They use them mostly.. 
hehehe.. for implementing deletions that Sphinx lacks. I bet there could exist 
some other usecases, but they can be handled with a custom ValueSource without 
the need to bring it into API everyone must implement.

{quote}
Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
deleted docs one wanted.
{quote}
That's why I think this is a misfeature. Deleted docs have different meaning 
from field values. They can be updated, and they should be checked against 
uberfast.
Swapping in another impl is cool, while forcing everyone and his dog under the 
same usage API is not so cool.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700479#action_12700479
 ] 

Mark Miller commented on LUCENE-831:


{quote}Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
deleted docs one wanted.{quote}

Some of your comments seem to indicate you think we will need to end up with an 
object rather than raw arrays?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700523#action_12700523
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Some of your comments seem to indicate you think we will need to end up 
with an object rather than raw arrays?

Well, really I threw out all these future items to stir up the pot and
see if some clarity comes out of it ;) This is what I try to do
whenever I'm stuck on how to design something... some sort of defense
mechanism.

That said, what requires object instead of array?  EG for binary
fields (deleted docs) we'd have eg BitVector getBits(...).

For multi-valued fields, I'm not sure what's best.  I think Yonik did
something neat with Solr for holding multi-valued fields but I can't
find it now.  But, with ValueSource, we have the freedom to use arrays
for simple cases and something else for interesting ones?  It's not
either/or?

bq. And we would want to lose exposing Parser so that CFS can be a seamless 
backing. 

I see the CFS/CSF confusion has already struck!

But yes cleaner API would be a nice step forward...

bq. We have it? Just pass the CSFValueSource at IndexReader creation?

Yes I think we have this one.

Though... I feel like ValueSource should represent a single field's
values, and something else (FieldType?) returns the ValueSource for
that field.  Ie, I think we are overloading ValueSource now?

bq. Good point. We need a way to update, that can throw USO Exception?

Maybe... or we can defer for future.  We don't need full answers nor
impls for all of these now...

{quote}
 Possible future when Lucene computes sort cache (for text fields)
 and stores in the index

I'm not familiar with that idea, so not sure what affect this has...
{quote}

Sort cache is just getStringIndex()... all other types just use the
values directly (no need for separate ords).  If it's costly to
compute per-reopen we may want to store it in the index.  But
honestly, since we load the full thing into RAM, I wonder how
different the time'd really be loading it vs recomputing it.

bq. Good point again. Getting norms under this API will add a bit more meat to 
this issue.

Yeah I'm not sure whether norms/deleted docs fit; certainly we'd
need updatability first.  It's just that, from a distance, they are
clearly a value per doc for every doc in the index.  If we had norms
 deletions under this API then suddenly, [almost] for free, we'd get
pluggability of deleted docs  norms.

bq. I am kind of liking Uwe's idea of assigning ValueSources per field, though 
that could probably get messy. Perhaps a default, and then per field overrides?

I'm also more liking per field to be somehow handled.  Whether
IndexReader exposes that vs a FieldType (that also holds other
per-field stuff), I'm not sure.

bq. Anybody is updating norms on a regular basis on a serious project?

This is a good question -- I'd love to know too.

But I think updating CSFs would be compelling; having to reindex the
entire doc because only 1 or 2 metadata fields had changed is a common
annoyance.  Of course we'd have to figure out (or rule out) updating
the postings for such changes...


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700529#action_12700529
 ] 

Mark Miller commented on LUCENE-831:


{quote}But, with ValueSource, we have the freedom to use arrays
for simple cases and something else for interesting ones? It's not
either/or?{quote}

Good point. I was also thinking that some of these issues could force back up 
to multi-reader support though. But I guess that is not such a worry now that 
we search per segment is it. A lot of that could probably be deprecated (though 
I really don't know how easily - I hope to spend a lot more time getting more 
familiar with IndexReader code).

{quote}[almost] for free, we'd get
pluggability of deleted docs  norms.{quote}
I like that idea as well. Plugability is nice.

{quote}I'm also more liking per field to be somehow handled. Whether
IndexReader exposes that vs a FieldType (that also holds other
per-field stuff), I'm not sure.{quote}
I want field handling to become easier in Lucene, but I hope we don't lose any 
of our super on the fly settings. +1 on making field handing easier, but I am 
much more weary of a fixed schema type thing.

{quote}But I think updating CSFs would be compelling; having to reindex the
entire doc because only 1 or 2 metadata fields had changed is a common
annoyance. {quote}

I am very interested in having updatable CSF's (much too easy to mistype that). 
There are many cool things to use it for, especially in combination with near 
realtime search (tagging variations).


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700177#action_12700177
 ] 

Michael McCandless commented on LUCENE-831:
---

I've been struggling with the right way forward here... despite
following all comments and aggressive ongoing mulling, I still don't
have much clarity.

It feels like one of those features that just hasn't quite clicked
yet (to me at least).  In fact, the more I try to think about it, the
less clarity I get!

I think there're some cncrete reasons to create a new API (some
overlap w/ Mark's list above):

  * Make caching external/public so you can control when things are
evicted

  * Cleaner API -- it's just awkward that you now must call a separate
place (ExtendedFieldCache.EXT_DEFAULT) to getInts.  FieldCache 
ExtendedFieldCache are awkward, and they are interfaces.  It makes
more sense to ask the reader directly for ints (or a future
component of the reader).

  * Better extensibility on uninversion (either via you make your own
ValueSource entirely, or you can subclass Uninverted and tweak
it).  Trie needs this (though, we have a viable approach in field
cache).  Fields with more than one value want custom control to
pick one.

  * Making it not-so-easy to get all field values at the reader level
(don't set dangerous API traps)

Honestly these reasons are not net/net compelling enough to warrant a
whole new API?  They are fairly minor.  And I agree: LUCENE-1483 has
already achieved the biggest step forward here.

Furthermore, there are other innovations happening that may affect how
we do this. EG LUCENE-1597 introduces type information for fields (at
least at indexing time), and Earwin is working on componentizing
SegmentReader.  Normally I don't like letting big distant future
feature X prevent progess on today's feature Y, but since we lack
clarity on Y...

I can imagine a future when the FieldType would be the central place
that records all details for a field:

  * The analyzer to use (so we don't need PerFieldAnalyzerWrapper)

  * The ValueSource

  * It's native type (now switched in many places, like
FieldComparator, SortField, FieldCache, etc.)

  * All the index-time configuration

And then instead of having ValueSource dispatch per field, we'd simply
ask the FieldType what it's source is.

Finally, there are a number of future improvements we should take into
account.  We wouldn't try to accomplish these right now, but we ought
to think about them (eg, not preclude them) in whatever approach we
settle on:

  * We need source pluggability for when CSF arrives (but, admittedly,
we could wait until CSF actually does arrive)

  * Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms  deleted docs.

  * How would norms be folded into this?  Ideally, each field could
choose to pull its norms from any source.  Document level norms
was discussed somewhere, and should easily fit as another norms
source.  We'd need to relax how per-doc-field boosting is computed
at runtime to pull from such arbitrary sources.

  * Deleted docs could also be represented as a ValueSource?  Just one
bit per doc.  This way one could swap in whatever source for
deleted docs one wanted.

  * Allowing for docs that have more than one value.  (We'd also need
to extend sorting to be able to compare multiple vlaues).

  * An mmap implementation (like Lucy/KS) -- should feel just like CSF
or uninversion (ie, just another impl).

  * Impls of getStrings and getStringIndex that are based on offsets
into char[] (not actual individual String object).

  * Good impls for the enum case (all strings could be considered
enums), eg if there are only 100 unique strings in that field, you
only need 7 bits per ord derefing into the char[] values.

  * Possible future when Lucene computes sort cache (for text fields)
and stores in the index

  * Allowing field sort to use an entirely external source of values

There's alot to think about :)


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700391#action_12700391
 ] 

Mark Miller commented on LUCENE-831:


I've got a bit of the same feeling. My list was more or less cherry picked from 
all of the above comments, and my initial feeling was their was not enough 
motivation as well. But the more I thought about it, the more kind of ugly 
field cache is. And we would want to lose exposing Parser so that CFS can be a 
seamless backing. That makes FieldCache even uglier for a while. Clickless thus 
far here too, but I think we have a good base to work with still.

{quote}Honestly these reasons are not net/net compelling enough to warrant a
whole new API? They are fairly minor. And I agree: LUCENE-1483 has
already achieved the biggest step forward here.{quote}

Not only that, but almost all of those reasons can be handled by allowing a 
custom FieldCache to be used, rather than just hard coding to the default 
singleton.

A couple responses:

{quote}We need source pluggability for when CSF arrives (but, admittedly,
we could wait until CSF actually does arrive){quote}
We have it? Just pass the CSFValueSource at IndexReader creation?

{quote}
Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms  deleted docs.{quote}
Good point. We need a way to update, that can throw USO Exception?

{quote}
How would norms be folded into this? Ideally, each field could
choose to pull its norms from any source. Document level norms
was discussed somewhere, and should easily fit as another norms
source. We'd need to relax how per-doc-field boosting is computed
at runtime to pull from such arbitrary sources.{quote}
Good point again. Getting norms under this API will add a bit more meat to this 
issue.

{quote}
Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
deleted docs one wanted.{quote}
You've got me here at the moment. I don't know the delete code very well, but I 
will in time :)

{quote}
  Allowing for docs that have more than one value. (We'd also need
  to extend sorting to be able to compare multiple values).
{quote}
This is an interesting one, because I wonder if we can do it and stick with 
arrays? A multi dimensional array seems a bit much...

{quote}
An mmap implementation (like Lucy/KS) - should feel just like CSF
or uninversion (ie, just another impl).{quote}
This is already fairly independent I think...

{quote}
Good impls for the enum case (all strings could be considered
enums), eg if there are only 100 unique strings in that field, you
only need 7 bits per ord derefing into the char[] values.
{quote}
+1. Yes.

{quote}
Possible future when Lucene computes sort cache (for text fields)
and stores in the index{quote}
I'm not familiar with that idea, so not sure what affect this has...

{quote}
Allowing field sort to use an entirely external source of values
{quote}
I think both options allow that now - if you pass the ValueSource from the 
reader, it can get its values from everywhere. If you override the reader 
valuesource with the sortfield valuesource, it too can load from anywhere. I am 
just not sure both options are really needed. I am kind of liking Uwe's idea of 
assigning ValueSources per field, though that could probably get messy. Perhaps 
a default, and then per field overrides? 

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699545#action_12699545
 ] 

Uwe Schindler commented on LUCENE-831:
--

Hi, looks good:

I am only not sure, what would be the right caching ValueSource. If you use a 
caching value source externally from IndexReader, what should I use? The 
original trie patch used the CachingValueSource (as when the patch was done, 
there only existed CacingValueSource):

{code}
+  public static final ValueSource TRIE_VALUE_SOURCE = new 
CachingValueSource(new TrieValueSource());
{code}

But correct would be CacheByReaderValueSource as a per-JVM singleton? For the 
tests is its not a problem, because there is only one index with one segment. 
If I use CachingValueSurce as a singleton, it would cache all values from all 
index readers mixed together?



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699644#action_12699644
 ] 

Mark Miller commented on LUCENE-831:


Right, you really want to use CacheByReaderValueSource. Better would probably 
be to get that cache on the segment reader as well. But I think that would mean 
bringing back some sort of general cache to IndexReader. You would have to be 
able to attach arbitrary ValueSources to the reader. We will see what ends up 
materializing. I am agonizingly slow at understanding anything, but quick to 
move anyway ;)

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699649#action_12699649
 ] 

Uwe Schindler commented on LUCENE-831:
--

This was the idea behin the FieldType: You register at the top-level 
IndexReader/MultiReader/whatever the parsers/valuesources (e.g. in a map coded 
by field), all subreaders would also get this map (passed through) and if one 
asks for cache values for a specific field, he would get the correctly decoded 
fields (from CSF, Univerter, TrieUniverter, Stored Fields [not really, but 
would be possible]). This was the original approach of this issue: attach 
caching to the single index/segmentreaders (with possibility to register 
valuesources for specific fields).
In this case the SortField ctors taking ValueSource or Parser can be cancelled 
(and we can do this for 2.9, as the Parser ctor of SortField was not yet 
released!).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699663#action_12699663
 ] 

Mark Miller commented on LUCENE-831:



Thats somewhat possible now (with the exception that you can't yet set the 
value source for the segment reader yet - it would likely become an argument to 
the static open methods): ValueSource gets a field as an argument, so it is 
also easy enough to set a ValueSource that does trie encoding for arbitrary 
fields on the SegmentReader, eg FieldTypeValueSource could take arguments to 
configure it per field and then you set it on the IndexReader when you open it. 
Thats all still in the patch - its just a bit more of a pain than being able to 
set it at any time on the SortField as an override.

I guess I almost see things going just to the segment reader valuesource option 
though - once FieldCache goes back to standard, it might make sense to drop the 
SortField valuesource support too, and just do the segment ValueSource. Being 
able to init the SegmentReader with a ValueSource really allows for anything 
needed - I just wasn't sure if it was too much of a pain in comparison to also 
having a dynamic SortField override.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699678#action_12699678
 ] 

Mark Miller commented on LUCENE-831:


So I'm flopping around on this, but I guess my latest take is that:

I want to drop the SortField ValueSource override option. Everything would need 
to be handled by overriding the segment reader ValueSource.

Drop the current back compat code for FieldCache - its mostly unnecessary I 
think. Instead, perhaps go back to orig FieldCache impl, except if the Reader 
is a segment reader, use the new ValueSource API ? Grrr - except if someone has 
mucked with the ValueSource or used a custom FieldCache Parser, it won't match 
correctly...thats it - you just can't straddle the two APIs. So I'll revert 
FieldCache to its former self and just deprecate.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699880#action_12699880
 ] 

Mark Miller commented on LUCENE-831:


Okay, now that I half way understand this issue, I think I have to go back to 
the basic motivations. The original big win was taken away by 1483, so lets see 
if we really need a new API for the wins we have left.

h3. Advantage of new API (kind of as it is in the patch)
FieldCache is interface and it would be nice to move to abstract class, 
ExtendedFieldCache is ugly
Avoid global sync by IndexReader to access cache
its easier/cleaner to block caching by multireaders (though I am almost 
thinking I would prefer warnings/advice about performance and encouragement to 
move to per segment)
It becomes easier to share a ValueSource instance across readers.

h3. Disadvantages of new API
If we want only SegmentReaders to have a ValueSource, you can't efficiently 
back the old API with the new, causing RAM reqs jumps if you straddle the two 
APIs and ask for the same array data from each.

Its probably a higher barrier to a custom Parser to implement and init a Reader 
with a ValueSource (presumably that works per field) than to simply pass the 
Parser on a SortField. However, Parser stops making sense if we end up being 
able to back ValueSource with column stride fields. We could allow ValueSource 
to be passed on the SortField (the current incarnation of this patch), but then 
you have to go back to a global cache by reader the ValueSources passed that 
way (you would also still have the per segment reader, settable ValueSource).

h3. Advantages of staying with old API
Avoid forcing large migration for users, with possible RAM req penalties if 
they don't switch from deprecated code (we are doing something similar with 
1483 even without deprecated code though - if you were using an external 
multireader FieldCache that matched a sort FieldCache key, youd double your RAM 
reqs).

h3. Thoughts
If we stayed with the old API, we could still allow a custom FieldCache to be 
supplied. We could still back FieldCacheImpl with Uninverter to reduce code. We 
could still have CachingFieldCache. Though CachingValueSource is much better :) 
FieldCache implies caching, and so the name would be confusing. We could also 
avoid CachingFieldCache though, as just making a pluggable FieldCache would 
allow alternate caching implementations (with a bit more effort).

We could deprecate the Parser methods and force supplying a new FieldCache impl 
for custom uninversion to get to an API suitable to be backed by CSF.

Or:

We could also move to ValueSource, but allow a ValueSource on multi-readers. 
That would probably make straddling the API's much more possible (and 
efficient) in the default case. We could advise that its best to work per 
segment, but leave the option to the user.

h3. Conclusion
I am not sure. I thought I was convinced we might as well not even move from 
FieldCache at all, but now that I've written a bit out, I'm thinking it would 
be worth going to ValueSource. I'm just not positive on what we should support. 
SortField ValueSource override keyed by reader? ValueSources on MultiReaders?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699893#action_12699893
 ] 

Uwe Schindler commented on LUCENE-831:
--

We have the problem with the ValueSource-override not only with SortField. Also 
Functions Queries need the additional ValueSource-override and other places 
too. So a central place to register a ValueSource per field for a IndexReader 
(MultiReader,... passing down to segments) would really be nice.

For the caching problem: Possibly the ValueSource given to SortField etc. 
behaves like the current parser. The cache in IndexReader should also be keyed 
by the ValueSource. So the SortField/FunctionQuery ValueSource override is 
passed down to IndexReader's cache. If the IndexReader has an entry in its 
cache for same (field, ValueSource, ...) key, it could use the arrays from 
there, if not fill cache with an array from the overridden ValueSource. I would 
really make the ValueSource per-field.

Univerter inner class should be made public and the Univerter should accept a 
starting term to iterate (overwrite ) and the newTerm() method should be able 
to return false to stop iterating (see my ValueSource example for trie). With 
that one could easily create a subclass of univerter with a own parser logic 
(like trie).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1263#action_1263
 ] 

Mark Miller commented on LUCENE-831:


I think we don't want to expose Uninverter though? The API should be neutral 
enough to naturally support loading from CSF, in which case Uninverter doesnt 
make sense...so we were going to go with having to override the value source to 
handle uninverter type stuff.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699166#action_12699166
 ] 

Mark Miller commented on LUCENE-831:


Thinking a bit on this this morning:

I think that will work out right. We would have 3 different attacks at 
ValueSource.

1. A ValueSource per segment reader that handles all default ValueSource needs 
- you get it with IndexReader.getValueSource. Its an UninversionValueSource 
wrapped by a CachingValueSource by default.

2. A singleton back compat value source that is wrapped by 
CacheByReaderValueSource. It has extra methods that takes Uninverters, allowing 
custom Uninverters and caching by Uninverter.

3. You can override the ValueSource used for Sorting by attaching it to the 
SortField. Likely, you would weap in a CacheByReaderValueSource and have your 
own singleton.

I think that gives us back compat and the best of both worlds?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699435#action_12699435
 ] 

Uwe Schindler commented on LUCENE-831:
--

bq. Patch is not done, but all tests now pass except for TrieRange (have not 
run back compat tests yet due to Trie failure - havnt looked into yet either).

The TrieRange tests do not pass, because the FieldCache.StopFillCacheException 
is not handled in the uninverter code (part of LUCENE-1582, 
FieldCache.StopFillCacheException). When Trie gets switched to an own 
ValueSource, StopFillCacheException can be removed (you can do this in this 
patch, it was just a temporary hack).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699433#action_12699433
 ] 

Mark Miller commented on LUCENE-831:


Bah, just wrote a bunch and hit cancel.

Attacking this from the old incarnation of the patch has me trying to hard to 
back FieldCache with the new API. Looking from closer eyes now, I don't see 
that being necessary. We just want the SegmentReader level ValueSource and the 
option for SortField override. FieldCache can use its current implementation. 
The motivation to back it is to minimize the RAM reqs of straddling the two 
API's. If we want the new API to work at the SegmentReader level though, we can 
never really achieve that. Might as well not half *ss it. I'll return the 
FieldCache to as is at some point and just deprecate.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699439#action_12699439
 ] 

Uwe Schindler commented on LUCENE-831:
--

I do not get the patch applied to trunk (merging works) but it gots lots of 
compile failures (because of the changes of LUCENE-1575). Mostly because the 
comparators got new ctors and so on.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699440#action_12699440
 ] 

Mark Miller commented on LUCENE-831:


Thanks Uwe. I had it in my mind to update to trunk about an hour ago and...

I'll repost in a moment.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699448#action_12699448
 ] 

Uwe Schindler commented on LUCENE-831:
--

By the way: In the patch, the ValueSource set in SortField seems to be never 
used when building the comparators. If it is used, when applying the patch 
attached by me some days ago (LUCENE-831-trieimpl.patch), the trie tests should 
also work.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699016#action_12699016
 ] 

Mark Miller commented on LUCENE-831:


I was going to throw in that constructor Uwe, and I got caught up doing a few 
things. I'm not so sure we can stick with attaching the ValueSource on the 
SortField. In the end, we'd really like the cache to be held per segment in the 
reader, rather than a monster reader cache. So I have moved the standard 
ValueSource back to the reader. I've got a separate massive reader cache (as 
in, everything goes in one cache keyed by reader) for handling the backcompat 
issues with custom parsers in FieldCache (it would be awesome to get this out 
before having to deprecate the SortField parser stuff).

This doesnt really jive well with passing in ValueSources on the fly with 
SortField. Which sucks, because that was nice.

What do you think about having to pull off Trie from the ValueSource set on 
your reader at reader init? I'm not thinking its super pretty - I guess you 
take which fields to override at init, and then do trie stuff for certain 
fields, but pass through to the default ValueSource impl for other fields?

I hope that can be improved, but i don't see how at the moment...

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699028#action_12699028
 ] 

Mark Miller commented on LUCENE-831:


Or maybe we can just keep all 3 options? Backcompat parser stuff goes through a 
reader keyed cache, built in stuff goes through a segment level ValueSource, 
and custom stuff using SortField can do whatever to override the ValueSource - 
they would cache if they wanted, etc.

So you could either override the default ValueSource, and provide your own 
override on the fly.

I guess that does seem to work out anyway ... ?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698394#action_12698394
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Also, this early, I know you'll have me changing directions fast enough 
that I'm still in pretty rough hammer out mode.

Yeah I hear you ;) Things're fast moving... I'm glad you're good with
The Eclipse.

bq. Do we need Uninverter if overriding getXXX is easy enough and we pass the 
ValueSource on SortField?

Good question... though it is nice/important to not have to implement
your own TermEnum/TermDocs iteration logic.

bq. Being able to change the ValueSource on the fly now with SortField has 
implications for CachingValueSource.

I think an app would need to pay attention here, ie, if caching is
needed the app should always pass the same CachingValueSource to all
sort fields.  It's up to the app to scope their ValueSources
correctly.  You're right that if you have a bug and don't always use a
consistent CachingValueSource, you can use too much RAM since a given
field's int[] can be double cached; but I think that's a bug in the
app?  It's analogous to using 2 IndexReaders instead of sharing?

Though... I'm not sure.  Something's not yet right about our approach
but I can't think of how to fix it... will mull it over.

I wonder if we can somehow fit norms handling into this?  Norms ought
to be some sort of XXX.getBytes() somewhere, but I'm not sure how
yet.  It's tricky because norms accept changes and thus must implement
copy-on-write.  So maybe we levae them out for now...


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698419#action_12698419
 ] 

Mark Miller commented on LUCENE-831:


bq. It worked until now without serialization and so I think we should remove 
serialization from SortField. 

I don't think we can right now because of remote searchable - I think?. I agree 
the factories are silly, but now I know why they exist! It had eluded me before.

I can roll your patch in Uwe - sorry I missed it with that last one - I had 
meant to, but it slipped my mind.

I've been waiting to pin down how ValueSource handles its cache (either 
lightning will strike my mind, or more likely, Mike will tell me what to do) - 
but the main reason for the factory at the moment is to get the remote tests to 
pass - since SortField is serializable, it allows us to pass the ValueSource 
without it being seriazable.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698421#action_12698421
 ] 

Uwe Schindler commented on LUCENE-831:
--

I am just wondering why the parsers and locales work, which all are not 
serializable. But they are NULL per default. So in principle, if I do a remote 
search with a custom parser or comparator, it would fail, too.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698423#action_12698423
 ] 

Mark Miller commented on LUCENE-831:


Okay, good point. I've got to take a closer look at what we are required to 
support for remote searching.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698196#action_12698196
 ] 

Michael McCandless commented on LUCENE-831:
---

 I like this idea, but i am a little bit concerned about only one ValueSource 
 for the Reader. 

Thinking more about this...

Over in KS/Lucy, the approach Marvin is taking is something called a
FieldSpec, to define the extended type for a field.  The idea is to
strongly decouple a field's type from its value, allowing that type to
be shared across different fields  instances of the same field.

So in KS/Lucy, presumably IndexReader would simply consult the
FieldSpec for a given field, to determine which ValueSource impl is
responsible for producing values for this field.

Right now details for a field are scattered about (PerFieldValueSource
and PerFieldAnalyzerWrapper and Field.Index/Store/TermVector.*,
FieldInfo, etc.). This then requires alot of app-level code to
properly use Trie* fields -- you have to use Trie* to analyze the
field, use Trie* to construct the query, use PerFieldValueSource to
populate the FieldCache, etc.

Maybe, as part of the cleanup of our three *Field classes, and index
vs search time documents, we should make steps towards having a
consolidated class that represents the extended type of a field.
Then in theory one could make a Field, attach a NumericFieldType() to
it (after renaming Trie* - Numeric*), and then everything would
default properly.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698195#action_12698195
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Any ideas on where parser fits in with valuesource?

I think the UninversionValueSource would accept a custom parser (String - 
native type), like what's done today.

Maybe it should also allow stopping the loop early (which Trie* uses), or 
perhaps outright overriding of the inversion loop itself (which if we make that 
class subclass-able should be simple).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698214#action_12698214
 ] 

Mark Miller commented on LUCENE-831:


{quote}
I think the UninversionValueSource would accept a custom parser (String - 
native type), like what's done today.

Maybe it should also allow stopping the loop early (which Trie* uses), or 
perhaps outright overriding of the inversion loop itself (which if we make that 
class subclass-able should be simple).
{quote}

Its the accepting that seems tricky though. If the getInts() calls take the 
parser, you have to use instanceof code to work with ValueSource.  Thats why I 
was thinking maybe callbacks - if a new type is added you just add a new one 
returning a default parser. Then you can just extend and replace the parsers 
you want to. I wasn't a big fan of that idea, but I am not sure of a nice, 
clean, extensible way to specify a bunch of parsers to UninversionValueSource 
that allows the API to cleanly be used from ValueSource. It already kind of 
seemed annoying that you would have to set a new ValueSource on the reader just 
to specify different parsers. I guess at least that has to be accepted though.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698215#action_12698215
 ] 

Earwin Burrfoot commented on LUCENE-831:


I'm using a similar approach.

There's a FieldType, that governs conversions from Java type into Lucene 
strings and declares 'abilities' of that type. Like - conversion is 
order-preserving (all numerics + some others), converted values can be 
meaningfully prefix-searched (like TreeId, that is essentially an int[], used 
to represent things like nested category trees). Some types can also declare 
themselves as derivatives of others, like DateType being derived from LongType.

Then there's a FieldInfo, that defines field name, FieldType used for it, and 
actions we're going to take on the field. E.g. if we want to sort on it, build 
clusters with certain characteristics, load values for this field for each 
found document, use fast rangefilters, store/filter on field being 
null/notnull, apply transforms on the field before storing/searching, copy 
value of the field to another field (with probable transformation) when 
indexing, etc. From FieldType and desired actions, FieldInfo is able to deduce 
tokenize/index/store/cache behaviour, and can say that additional lucene fields 
are required (e.g. for handling null/notnull searches, or trie ranges, or a 
special sort-form).

Then there's an interface that contains FieldInfo constants and a special 
constant FieldEnum FIELDS = fieldsOf(ResumeFields.class); that is essentially a 
navigable list of all FieldInfos defined in this interface and interfaces it 
extends (allows me to have CommonFields + ResumeFields extends CommonFields, 
VacancyFields extends CommonFields).

FieldType, and consequently FieldInfo is type-parameterized with the java type 
associated with the field, so you get the benefit of type-safety when 
storing/loading/searching the field. All 
Filters/Queries/Sorters/Loaders/Document accept FieldInfo instead of String for 
field name, so for example Filters.Range(field, fromValue, fromInclusive, 
toValue, toInclusive) knows whether to use a simple range filter or a trie one, 
ensures from/toValues are of a proper type and converts them properly. 
Filters.IsSet(field) can consult an additional field created during indexation, 
or access a FieldCache. DocLoader will either get a value for the field from 
index or from the cache. etc, etc, etc.

While I like resulting schema-style very much, I don't want to see the likes of 
it within Lucene core. Better to have some contrib/extension/whatever that 
builds on core-defined primitives. That way if one needs to build his own 
somewhat divergent schema, they can easily do it, instead of trying to fit 
theirs over Lucene's. For the very same reason I'd like to see fieldcaches 
moved away from the core, and depending on the same in-core IndexReader segment 
creation/deletion/whatever hooks that users will use to build their extensions. 

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698226#action_12698226
 ] 

Uwe Schindler commented on LUCENE-831:
--

Looks good, one addition: newTerm() could return false to stop iterating (for 
trie).

But I do not know how performat this is with the class-global variable 
currentVal...

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698225#action_12698225
 ] 

Michael McCandless commented on LUCENE-831:
---


How about something like this (NOTE: not compiled/tested):

{code}
abstract class Uninverter {
  abstract void newTerm(String text);
  abstract void handleDoc(int docID);
  void go(IndexReader r) {
... TermEnum/TermDocs uninvert code...
  }
}
{code}

and then:

{code}
class IntUninverter extends Uninverter {
  final int[] values;
  IntUninverter(IndexReader r) {
values = new int[r.maxDoc()];
  }

  int currentVal;
  void newTerm(String text) {
currentVal = Intger.parseInt(text);
  }

  void handleDoc(int docID) {
values[docID] = currentVal;
  }
}
{code}


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698219#action_12698219
 ] 

Marvin Humphrey commented on LUCENE-831:


FieldType is probably a better name than FieldSpec, as it implies
subclasses with Type as a suffix: FullTextType, StringType, BlobType,
Float32Type, etc.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698224#action_12698224
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. FieldType is probably a better name than FieldSpec

+1

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698230#action_12698230
 ] 

Mark Miller commented on LUCENE-831:


bq. I like this idea, but i am a little bit concerned about only one 
ValueSource for the Reader. This makes plugging in different sources for 
different field types hard.

I guess you would just have to set a TrieEnabledValueSource when creating your 
IndexReaders? I suppose it could extend UninversionValueSource and for given 
fields do Trie unencoding, uninverting, and all other fields, yeld to the super 
impl?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698232#action_12698232
 ] 

Mark Miller commented on LUCENE-831:


Note: I think we should add the option to sort nulls first or last with this.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698233#action_12698233
 ] 

Earwin Burrfoot commented on LUCENE-831:


bq. I guess you would just have to set a TrieEnabledValueSource when creating 
your IndexReaders? I suppose it could extend UninversionValueSource and for 
given fields do Trie unencoding, uninverting, and all other fields, yeld to the 
super impl?
And then if you get some other XXX encoding, you'll end up with XXXVS extends 
UVS, TrieVS extends UVS, XXXAndTrieVS extends XXXVS or TrieVS + duplicate code 
from the other one. Ugly.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698234#action_12698234
 ] 

Michael McCandless commented on LUCENE-831:
---

That is why I'd love to somehow move to a FieldType class that holds such 
per-field details.

As things stand now (or, soon) you have to do many separate things to use 
TrieXXX:

  * Create the right tokenStream for it, and stick that into your Field

  * Make the right query type at search time

  * Make the right sort-field parser

  * Make the right ValueSource

It's crazy.  I should be able to make a TrieFieldType (SingleNumberFieldType, 
or something, after the rename), make that the type of my field when I add it 
to my document, and then have all these places that do range search, sorting, 
value retrieval consult the FieldInfo and see that this is a trie field, and 
act accordingly.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698235#action_12698235
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Note: I think we should add the option to sort nulls first or last with 
this.

You mean for getStringIndex()?  I agree!

Actually, it'd be nice to disallow nulls entirely, somehow, since this forces 
us to sprinkle null checks all over the place in StringOrdVarlComparator.  
Maybe we would allow you to pass in a null equivalent, eg you could use , 
UNDEFINED, whatever, as long as it's a valid string.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698238#action_12698238
 ] 

Michael McCandless commented on LUCENE-831:
---

Another thing that'd be great to fix about FieldCache is its
intermittent checking of the some docs had more than one token in the
field error.  The current check only catches it in limited cases,
which is deadly because you can test like crazy and think you're OK
only in production months later to index slightly different content
and hit the exception. RuntimeException

But I can't think of a cheap way to do it reliably.

At least, we should upgrade the exception from RuntimeException to a
checked exception.  Or, we could turn the check off entirely (which I
think is better than intermittently catching it).

We should also somehow allow turning off the check on a case by case
basis, since there are really times when it's OK.  (Though maybe you
just make your own ValueSource, or maybe subclass Uninverter,
or... something?).


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698239#action_12698239
 ] 

Mark Miller commented on LUCENE-831:


bq. Ugly.

Well no worries yet :) Still in early design mode, so if it can be made better, 
I'm sure it will. Of course I'd love to get to 'everything works right for the 
right field automagically' as well - not sure that will fit into the scope of 
this issue though (though nothing saying this issue can't be further delayed). 
We will do the best we can regardless though.

I'm kind of worried that any change is going to hurt Apps like Solr - if you 
end up using the new built in API, but also have code that must stick for a 
while with the old API (for multireader fieldcache or something), you'll likely 
increase RAM usage more than before - how much of a concern that ends up being, 
I'm not sure. I suppose eventually it has to become up to the upgrader to 
consider and deal with it if we want to move to segment level caching.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698241#action_12698241
 ] 

Mark Miller commented on LUCENE-831:


bq. Or, we could turn the check off entirely (which I think is better than 
intermittently catching it).

+1 - agreed that the check is nasty - better to never trip it if your only 
going to trip it depending...

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698242#action_12698242
 ] 

Marvin Humphrey commented on LUCENE-831:


 Another thing that'd be great to fix about FieldCache is its
 intermittent checking of the some docs had more than one token in the
 field error.

Add a FieldType that only allows one value per document. At index-time, 
verify when the doc is added that indeed, only one value was supplied.

In Lucy, I expect StringType to fill this role. FullTextType is for multi-token
fields.

Optionally, add a NOT NULL check to verify that each doc supplies a
value, or allow the FieldType object to specify a default value that should
be inserted.


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698243#action_12698243
 ] 

Earwin Burrfoot commented on LUCENE-831:


bq. At least, we should upgrade the exception from RuntimeException to a 
checked exception.
Exceptions are for expected conditions that can be adequately handled by the 
caller. RuntimeExceptions are for possible, but unexpected conditions that can 
theoretically be handled by the caller, but most of the times caller will 
terminate anyway.
Having several values in place where only one should be at all times is 
obviously an unexpected indexer's fault. So by using checked exception here 
you'll only provoke some ugly rethrowing/wrapping code, or propagation of said 
checked exception up the method hierarchy, without gaining any benefit at all.

bq. Or, we could turn the check off entirely.
Yes!

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698247#action_12698247
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
 Or, we could turn the check off entirely (which I think is better than 
 intermittently catching it).

+1 - agreed that the check is nasty - better to never trip it if your only 
going to trip it depending...
{quote}
OK -- let's turn this check off entirely.

I like Marvin's approach but we don't quite have a FieldType just yet...

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698248#action_12698248
 ] 

Earwin Burrfoot commented on LUCENE-831:


bq. I'm kind of worried that any change is going to hurt Apps like Solr - if 
you end up using the new built in API, but also have code that must stick for a 
while with the old API (for multireader fieldcache or something), you'll likely 
increase RAM usage more than before - how much of a concern that ends up being, 
I'm not sure.
My personal stance is that until you have one perfectly thought out API, 
nothing should restrain you from changing it. It's better to feel pain once or 
twice, when you adapt to API changes, then to feel it constantly, each time 
you're using that half-assed thing you're keeping around for back-compat. Look 
at google-collections. They did some really breaking changes since they 
released, but most of them eased my life after I made my project compile and 
run with the new version of their library.

bq. In Lucy, I expect StringType to fill this role. FullTextType is for 
multi-token fields.
In our case multiplicity is defined on FieldInfo level. Because we can have one 
Int field that holds some value, and another Int field that holds several ids.
Same goes for the String field - you might want tags on a document that are 
represented as untokenized strings, but each document can have 0..n of them.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698254#action_12698254
 ] 

Mark Miller commented on LUCENE-831:


I began to make the switch of allowing an Inverter to be attached to a 
SortField like parser with the idea of doing backcompat by wrapping the parser 
with an Inverter. I caught myself though, because the reason I wasn't doing 
that before is that Inverter may not make sense for a given ValueSource. By 
forcing you to set the inverters per type on the UninversionValueSource instead 
(current posted patch), this issue is kind of avoided - the possible problem is 
that it seems somewhat less dynamic in that you set it once on the ValueSource 
and leave it instead of being able to pass any impl any time on a SortField. 
Perhaps not so bad. But then how do I set the Uninverter for back compat with 
an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to 
the UninversionValueSource does it? Thread safety issues, and ... but then how 
to handle back compat with the parser? ... 

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698255#action_12698255
 ] 

Mark Miller commented on LUCENE-831:


I guess we do need to have Uninverters settable per run somehow ... Then if a 
Parser comes in on SortField, we can downcast to UninversionValueSource and 
pass the Uninverter wrapping the Parser. I don't think we should pass the 
Uninverter on SortField though, going forward. Uninverter may not apply to 
ValueSource, and internally, things will work with ValueSource.

So a user cannot pass an Uninverter for internal sorting per sort, it has to 
init the UninversionValueSource with the right Uninverters, but for backcompat, 
FieldComparator would be able to pass an Uninverter that takes precedence?

That sounds somewhat alright to me, I'll roll that way a bit for now.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698257#action_12698257
 ] 

Michael McCandless commented on LUCENE-831:
---

How about SortField taking ValueSource?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698258#action_12698258
 ] 

Michael McCandless commented on LUCENE-831:
---

This issue is fast moving!  Here're some thoughts on the last patch:

  * Need to pass in ValueSource impl to IndexReader.open, defaulting
to Cached(Uninverted)

  * Maybe ValueSource should provide default throws UOE exceptions
methods (instead of abstract) since it's rather cumbersome for a
subclass that only intends to provide eg getInts().

  * Should CachingValueSource really include IndexReader in its key?
I think it shouldn't?  The cache is already scoped to a reader
because each SegmentReader will have its own instance.  Also, one
might cache ValueSources that don't have a reader (eg pull from
a DB or custom file format or something).  Cloning an SR for now
should just copy over the same cache.

  * I think we should back deprecated FieldCache with ValueSource for
the reader; one simple way to not ignore the Parser passed in is
to anonymously subclass Uninverter and invoke the passed in
ByteParser from its newTerm method?

  * Why do we need to define StringInverter, IntInverter, etc in
UninversionValueSource?  Couldn't someone instead subclass
UninversionValueSource, and override whichever getXXX's they want
using their own anonymous uninverter subclass?

  * Should we deprecate function.FieldCacheSource, and make a new
function.ReaderValueSource (ie pulls its values from the
IndexReader's ValueSource)?


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698264#action_12698264
 ] 

Mark Miller commented on LUCENE-831:


bq. How about SortField taking ValueSource? 

Right - theres the right thought I think. I'll play with that.

bq. Need to pass in ValueSource impl to IndexReader.open, defaulting to 
Cached(Uninverted)

Yes - only havn't because that code is kind of unfirm - it already seems like 
it will prob be moved out to SortField :) So I was just short-cut initing it.

bq. Should CachingValueSource really include IndexReader in its key?

Probably not then :) I'll be going over that tighter - I was in speed mode and 
havn't considered the CachingValueSource much yet - I kind of just banged out a 
quick impl (gotto love eclipses generate hashcode/equals), put it in and tested 
it.  To handle all of this 'do it for Long, repeat for Int, repeat for Byte, 
etc' I go somewhat into robot mode. Also, this early, I know you'll have me 
changing directions fast enough that I'm still in pretty rough hammer out mode.

bq. Why do we need to define StringInverter, IntInverter, etc in 
UninversionValueSource? 

Yes true. I was using anon classes in the prev patch, but upon switch to 
Uninverter, I just did mostly what came to mind quickest looking at your 
example code and what I had.
Indeed, overriding getXXX is simple and effective. As I think about it - now I  
am thinking I did it for the getArray call to return the right type (easy with 
anon class, but custom passed Uninverter ?). Could return object and cast 
though ...

Do we need Uninverter if overriding getXXX is easy enough and we pass the 
ValueSource on SortField?

bq. Should we deprecate function.FieldCacheSource,

Yeah for sure. I'll take a closer look at the function package soon. Thus far, 
I've just got it compiling and the main sort tests passing. As soon as I feel 
the API is firming (which, sweet, it already is a bit I think), I'll start 
polishing and filling in the missing pieces, more thorough nocommits, comments.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698272#action_12698272
 ] 

Mark Miller commented on LUCENE-831:


bq. Should CachingValueSource really include IndexReader in its key?

I see now - I wasnt always holding the ValueSource in the Reader - I was using 
getXXX(reader, field). Now I am back to that - and since the CachingValueSource 
holds the map, I am back to needing that. Being able to change the ValueSource 
on the fly now with SortField has implications for CachingValueSource. There 
has to be some kind of default ValueSource, which would likely do caching. Once 
you started using ValueSources from SortField, they won't share the cache. Get 
enough of them going and the RAM reqs jump.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698286#action_12698286
 ] 

Mark Miller commented on LUCENE-831:


I really like CachingValueSource, but somehow the cache has to move to the 
segmentreader or something... (not that there is not a segmentreader - 
valuesource mapping)

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698300#action_12698300
 ] 

Uwe Schindler commented on LUCENE-831:
--

By the way: If this new API goes into 2.9, the SortField ctors with parser and 
the whole parser support in SortField/Search-code can be removed, if Parser 
itsself is deprecated (as support for LUCENE-1478 was not released until now). 
New code could always use ValueSource.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698309#action_12698309
 ] 

Mark Miller commented on LUCENE-831:


I guess the caching is not the problem I thought. It's got to be per  
value source anyway. I guess I just have to stick the default  
valuesource in a better place. Seems cachingvaluesource is still legit.

Still, I introduced reader back into the caching , and you wanted to  
avoid that..

- Mark

http://www.lucidimagination.com

On Apr 12, 2009, at 8:45 PM, Mark Miller (JIRA) j...@apache.org  



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
 LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698096#action_12698096
 ] 

Mark Miller commented on LUCENE-831:


Thanks Mike! Everything makes sense on first read, so I'll work in that 
direction.

In regards to FieldCache - yes it def will be deprecated (pretty rough patch to 
start so I might have missed some of that - I know plenty is undone).

As far as back compat with it, I was trying to make it so that if you happened 
to have code that still used it, and you used the new cache, you woudn't double 
your mem needs (the original point of backing FieldCache with the new API). 
Thats pretty restrictive though - indeed, things look much nicer if we don't 
attempt that (and in the case of a MultiReader cache array, we wouldn't be able 
to avoid it anyway, so I guess it does make sense we don't worry about it so 
much).

bq. Do we even need ComparatorFactory*?

Probably not then - I didn't really touch any of the custom type stuff yet. I 
mainly just got all of the sort tests except remote and custom to pass (though 
tests elsewhere are still failing).

I'll make another push with your suggestions.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698101#action_12698101
 ] 

Uwe Schindler commented on LUCENE-831:
--

{quote}
How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc. (Just like
ExtendedFieldCache).

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion. Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single enum terms  terms docs loop that
switches out to a handler to deal with each term  doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it? It would default to
CachedValueSource(UninversionValueSource()). I think we should
require that you set this on opening the reader, and you can't later
change it.
{quote}

I like this idea, but i am a little bit concerned about only one ValueSource 
for the Reader. This makes plugging in different sources for different field 
types hard.

E.g.: One have a CSF and a TrieField and several normal int/float fields. For 
each of these fields he needs another ValueSource. The CSF field can be loaded 
from Payloads, the TrieField by decoding the prefix encoded values and the 
others like it is now.

So the IndexReaders ValueSource should be a Map of FieldValueSources, so the 
user could register FieldValueSources for different field types.

The idea of setting the ValueSource for the IndexReader is nice, we then could 
simply remove the extra SortField constructors, I added in LUCENE-1478, as it 
would be possible to specify the type for Sorting when creating the 
IndexReader. Search code then would simply say, sort by fields a, b, c without 
knowing what type of field it is. The sort code would get the type and the 
arrays from the underlying IndexReaders.

The same with the current function query value sources (just a question: are 
the functions query value sources then obsolete and can be merged with the 
new ValueSource)?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698104#action_12698104
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. E.g.: One have a CSF and a TrieField and several normal int/float fields. 
For each of these fields he needs another ValueSource.

Couldn't we make a PerFieldValueSource impl to handle this?  (And leave the 
switching logic out of IndexReader).

I think another useful ValueSource would be one that first consults CSF and 
uses that, if present, else falls back to the uninversion source.

bq.  The sort code would get the type and the arrays from the underlying 
IndexReaders.

I'm not sure this'll work -- IndexReader still won't know what type to ask for, 
for a given field?

bq. are the functions query value sources then obsolete and can be merged with 
the new ValueSource

Well, the API is a little different (this API returns int[], but the function 
query's ValueSource has int intVal(int docID)), so I think we'd wrap the new 
API to match function query's  (ie, cut over function query's use of FieldCache 
to this new FieldCache).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698135#action_12698135
 ] 

Mark Miller commented on LUCENE-831:


Any ideas on where parser fits in with valuesource? Its easy enough to kind of 
keep it how it is, but then what if CSF can be stored as a byte rep of an int 
or something? parse(String) won't make any sense. If we move Parser up to an 
Impl of valuesource, we have to special case things -

Any thoughts? Just stick with allowing the passing of a 'String to type' Parser 
and worry about possible byte handling later? A different parser object of some 
kind?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698151#action_12698151
 ] 

Mark Miller commented on LUCENE-831:


or parsing is just done by the FieldValue implementation, with overrides or 
something? To change parsers you override UnivertedValuedSource returning your 
parsers in the callbacks (or something similiar?)

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697812#action_12697812
 ] 

Mark Miller commented on LUCENE-831:


Some random thoughts:

If we are going to allow random access, I like the idea of sticking with the 
arrays. They are faster than hiding behind a method, and it allows easier 
movement from the old API. It would be nice if we can still deprecate all of 
that by backing it with the new impl (as done with the old patch).

The current API (from this patch) still looks fairly good to me - a given 
cachekey gets your data, and knows how to construct it. You get data something 
like: return (byte[]) reader.getCachedData(new ByteCacheKey(field, parser)). It 
could be improved, but it seems a good start to me.

The immediate problem I see is how to handle multireader vs reader. Not being 
able to treat them the same is a real pain. In the segment case, you just want 
an array back, in the multi-segment perhaps an array of arrays? Or unsupported? 
I havn't thought of anything nice.

We have always been able to customize a lot of behavior with our custom sort 
types - I guess the real issue is making the built in sort types customizable. 
So I guess we need someway to say, use this cachekey for this built in type?

When we load the new caches in FieldComparator, can we count on those being 
segmentreaders? We can Lucene wise, but not API wise right? Does that matter? I 
suppose its really tied in with the multireader vs reader API.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697820#action_12697820
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
If we are going to allow random access, I like the idea of sticking
with the arrays. They are faster than hiding behind a method, and it
allows easier movement from the old API.
{quote}

I agree.

{quote}
It would be nice if we can
still deprecate all of that by backing it with the new impl (as done
with the old patch).
{quote}

That seems fine?

bq. The current API (from this patch) still looks fairly good to me - a given 
cachekey gets your data, and knows how to construct it. You get data something 
like: return (byte[]) reader.getCachedData(new ByteCacheKey(field, parser)). It 
could be improved, but it seems a good start to me.

Agreed.

bq. The immediate problem I see is how to handle multireader vs reader. Not 
being able to treat them the same is a real pain. In the segment case, you just 
want an array back, in the multi-segment perhaps an array of arrays? Or 
unsupported? I havn't thought of anything nice.

I would lean towards throwing UOE, and suggesting that you call
getSequentialReaders instead.

Eg with the new getUniqueTermCount() we do that.

bq. We have always been able to customize a lot of behavior with our custom 
sort types - I guess the real issue is making the built in sort types 
customizable. So I guess we need someway to say, use this cachekey for this 
built in type?

I don't quite follow that last sentence.

We'll have alot of customizability here, ie, if you want to change how
String is parsed to int, if you want to fully override how uninversion
works, etc.  At first the core will only support uninversion as a
source of values, but once CSF is online that should be an alternate
pluggable source, presumably plugging in the same way that
customization would allow you to override uninversion.

bq. When we load the new caches in FieldComparator, can we count on those being 
segmentreaders? We can Lucene wise, but not API wise right? Does that matter? I 
suppose its really tied in with the multireader vs reader API.

Once getSequentialSubReaders() is called (and, recursively if needed),
then those atomic readers should be able to provide values.  I guess
that's the contract we require of a given IndexReader impl?


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697822#action_12697822
 ] 

Mark Miller commented on LUCENE-831:


{quote}
bq. We have always been able to customize a lot of behavior with our custom 
sort types - I guess the real issue is making the built in sort types 
customizable. So I guess we need someway to say, use this cachekey for this 
built in type?

I don't quite follow that last sentence.

We'll have alot of customizability here, ie, if you want to change how
String is parsed to int, if you want to fully override how uninversion
works, etc.  At first the core will only support uninversion as a
source of values, but once CSF is online that should be an alternate
pluggable source, presumably plugging in the same way that
customization would allow you to override uninversion.
{quote}

Right - since a custom cachekey builds the array from a reader, you can pretty 
much do anything. What I meant was that you could do anything before with a 
custom sort type as well - the problem was that you could not say use this 
custom sort type when sorting on a built in type (eg INT, BYTE, STRING). So 
thats all we need, right? A way to say, use this builder (cachekey) for LONG, 
use this one for INT, etc. When we get CSF, you would set it to use cachekeys 
that built arrays from that data.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697830#action_12697830
 ] 

Michael McCandless commented on LUCENE-831:
---

bq.  A way to say, use this builder (cachekey) for LONG, use this one for INT, 
etc. When we get CSF, you would set it to use cachekeys that built arrays from 
that data.

That sounds right, though it'd presumably be field dependent rather than 
relying on only the native type?  Ie I may have 3 fields that should load 
long[]'s, but each has its own custom decoding to be done.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697832#action_12697832
 ] 

Mark Miller commented on LUCENE-831:


Yes, good point. Okay, I think I have a much clearer picture of what needs to 
be done - this may be less work than I thought - a lot of what has been done is 
probably still helpful.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697485#action_12697485
 ] 

Mark Miller commented on LUCENE-831:


{quote}
I'd like to see the new FieldCache API de-emphasize get me a single array 
holding all values for all docs in the index for a MultiReader. That 
invocation is exceptionally costly in the context of reopened readers, and 
providing the illusion that one can simply get this array is dangerous. It's a 
leaky API, like how virtual memory API pretends you can use more memory than 
is physically available.

I think it's OK to return an array-of-arrays (ie, one contiguous array per 
underlying segment); if the app really wants to make a massive array  
concatenate it, they can do so outside of the FieldCache API. 
{quote}

Is there much difference in one massive array or an array of arrays? Its just 
as much space and just as dangerous, right? Some apps will need random access 
to the field cache for any given document right? Don't we always have to 
support that in some way, and won't it always be severely limited by RAM (until 
IO is as fast)?

I like the idea of an iterator API, but it seems we will still have to provide 
random access with all its problems, right?

{quote}
We should also set this API up as much as possible for LUCENE-1231. Ie, the 
current un-invert the field approach that FieldCache takes is merely one 
source of values per doc. Column stride fields in the future will be a 
different (faster) source of values, that should be able to just plug in 
under the hood somehow to this same exposure API.
{quote}

Definitely.

{quote}
On Uwe's suggestion for some flexibility on how the un-inversion takes place, I 
think allowing differing degrees of extension makes sense. EG we already allow 
you to provide a custom parser. We need to allow control on whether a given 
value replaces the already-seen value (LUCENE-1372), or whether to stop the 
looping early (Uwe's needs for improving Trie). We should also allow outright 
entire custom class that creates the value array.
{quote}

Allow a custom field cache loader for each type?


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697492#action_12697492
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Is there much difference in one massive array or an array of arrays? Its 
just as much space and just as dangerous, right?

One massive array is far more dangerous during reopen() (ie that's why we did 
LUCENE-1483) since it's non-incremental.

Array-per-segment I think is OK.

But yes both of them consume RAM, but I don't consider that dangerous.

bq. I like the idea of an iterator API, but it seems we will still have to 
provide random access with all its problems, right?

Right.

bq. Some apps will need random access to the field cache for any given document 
right?

Yes but I think such apps should move to the per-segment model (eg a Filter's 
getDocIdSet is called per segment reader).

If an app really wants to make a single massive array, they can certainly do 
so, outside of Lucene.

bq. Allow a custom field cache loader for each type?

Yes, possibly w/ different degrees of extension (much like Collector).  EG 
maybe you just want to override how you parse an int, or maybe you want to take 
control over the entire uninversion.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697495#action_12697495
 ] 

Mark Miller commented on LUCENE-831:


{quote}
One massive array is far more dangerous during reopen() (ie that's why we did 
LUCENE-1483) since it's non-incremental.

Array-per-segment I think is OK.

But yes both of them consume RAM, but I don't consider that dangerous.
{quote}

Okay, I got you now. We will force everyone to migrate to use fieldcache at the 
segment level rather than MR (or create their own array from the subarrays).

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697519#action_12697519
 ] 

Mark Miller commented on LUCENE-831:


bq. But yes both of them consume RAM, but I don't consider that dangerous.

I guess you meant dangerous as in dangerous to reopen then? I actually thought 
you meant as in dangerous because it could require too may resources. Dangerous 
is a tough to pin down word ;)

So what are the advantages of the iterator API again then? It not likely you 
are going to stream the values, and random access will likely still have a use 
as mentioned.

Just trying to get a clearer picture in my head - I doubt I'll have time, but 
I'd love to put a little sweat into this issue.

It probably makes sense to start from one of Hoss's original patches or even 
from scratch.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697540#action_12697540
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
I guess you meant dangerous as in dangerous to reopen then? I actually thought 
you meant as in dangerous because it could require too may resources. Dangerous 
is a tough to pin down word 
{quote}
Dangerous is a dangerous word ;)

I meant: I don't like exposing non-performant APIs; they are sneaky traps.  (EG 
TermEnum.skipTo is another such API).

bq. So what are the advantages of the iterator API again then?

The big advantage is the possibility of backing it with eg an IndexInput, so 
that the values need to all be in RAM for one segment.  Though, as Lucy is 
doing, we could leave things on disk and still have random access via mmap, 
which perhaps should be an option for Lucene as well.  However, iterator only 
messes up out-of-order scoring (BooleanScorer for OR queries), so I'm 
tentatively leaning against iterator only at this point.

bq. but I'd love to put a little sweat into this issue.

That would be AWESOME (if you can somehow make time)!

We should hash out the design a bit before figuring out how/where to start.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693464#action_12693464
 ] 

Michael McCandless commented on LUCENE-831:
---

Let's make sure the new API fixes LUCENE-1579.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-27 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689867#action_12689867
 ] 

Earwin Burrfoot commented on LUCENE-831:


Adding to Tim, I'd like to see the ability not only to be notified of 
SegmentReader destruction, but of SegmentReader creation (within reopen) too. 
And new FieldCache logic should be built on these notifications.
Then it's possible to extend/replace Lucene's native FieldCache, then it's 
possible to create a cache specialized for trie-fields.

Linking objects to each other with WeakHashmaps is insanely evil, especially in 
the case when object creation/destruction is clearly visible.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689624#action_12689624
 ] 

Uwe Schindler commented on LUCENE-831:
--

I will attach my comments regarding the problem with the TrieRangeFilter and 
sorting (stop collecting terms into cache when lower precisions begin or only 
collect terms using a specific range (like a range filter). So you could fill a 
FieldCache and specify a starting term and ending term, all terms inbetween 
could be put into the cache, others outside left out. In this way, it would be 
possible to just use TrieUtils.prefixCodeLong() to specify the upper and lower 
integer bound encoded in the highest precision.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-26 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689629#action_12689629
 ] 

Tim Smith commented on LUCENE-831:
--

One requirement i would like to request is the ability to attach an arbitrary 
object to each Segment.
This will allow people using lucene to store any arbitrary per segment caches 
and statistics that their application requires (fully free form)

Would like to see the following:
* add SegmentReader.setCustomCacheManager(CacheManager m) // mabye add a string 
for a CacheManager id (to allow registration of multiple cache managers)
* add SegmentReader.getCustomCacheManager() // to allow accessing the manager

CacheManager should be a very light interface (just a close() method that is 
called when the SegmentReader is closed)



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-02-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674255#action_12674255
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Are there still things planned for this issue now that LUCENE-1483 has been 
committed? 

Good question... I think it'd still be nice to 1) have the IndexReader
expose the API for accessing the FieldCache values, 2) allow for
customization of the caching policy.

Though maybe we should hold off on those changes until we do
LUCENE-1231 (column stride fields), which I think would use exactly
the same API with the only difference being whether under-the-hood
there was a more efficient (column-stride storage) representation for
the field values vs the slower uninvert  resort (for StringIndex)
approach that FieldCache does today.

Also, in the new API I'd like to make it not-so-easy to materialize
the full array.  I think it's OK to ask for the full array of a
sub-reader, but if you want to access @ the MultiReader level, we
should encourage either random access getX(int docID), iteration or
get-sub-arrays and append yourself.


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-02-16 Thread Jeremy Volkman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674023#action_12674023
 ] 

Jeremy Volkman commented on LUCENE-831:
---

Are there still things planned for this issue now that LUCENE-1483 has been 
committed? 

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657409#action_12657409
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
  this will turn more into an API overhaul than an IndexReader reopen time 
 saver.
{quote}
...and given the progress on LUCENE-1483 (copying values into the sort queues), 
I think this new FieldCache API should probably be primarily an iteration API.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Jeremy Volkman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657401#action_12657401
 ] 

Jeremy Volkman commented on LUCENE-831:
---

A couple things:

# Looking at the getCachedData method for MultiReader and MultiSegmentReader, 
it doesn't appear that the CacheData objects from merge operations are cached.  
Is there any reason for this?
# I've written a merge method for StringIndexCacheKey. The process isn't all 
that complicated (apart from all of the off-by-ones), but it's expensive.

{code:java}
  public boolean isMergable() {
return true;
  }

  private static class OrderNode {
  int index;
  OrderNode next;
  }
  
  public CacheData mergeData(int[] starts, CacheData[] data) 
  throws UnsupportedOperationException {
int[] mergedOrder = new int[starts[starts.length - 1]];
// Lookup map is 1-based
String[] mergedLookup = new String[starts[starts.length - 1] + 1];

// Unwrap cache payloads and flip order arrays
StringIndex[] unwrapped = new StringIndex[data.length];

/* Flip the order arrays (reverse indices and values)
 * Since the ord map has a many-to-one relationship with the lookup table,
 * the flipped structure must be one-to-many which results in an array of
 * linked lists.
 */
OrderNode[][] flippedOrders = new OrderNode[data.length][];
for (int i = 0; i  data.length; i++) {
StringIndex si = (StringIndex) data[i].getCachePayload();
unwrapped[i] = si;
flippedOrders[i] = new OrderNode[si.lookup.length];
for (int j = 0; j  si.order.length; j++) {
OrderNode a = new OrderNode();
a.index = j;
a.next = flippedOrders[i][si.order[j]];
flippedOrders[i][si.order[j]] = a;
}
}

// Lookup map is 1-based
int[] lookupIndices = new int[unwrapped.length];
Arrays.fill(lookupIndices, 1);

int lookupIndex = 0;
String currentVal;
int currentSeg;
while (true) {
currentVal = null;
currentSeg = -1;
int remaining = 0;
// Find the next ordered value from all the segments
for (int i = 0; i  unwrapped.length; i++) {
if (lookupIndices[i]  unwrapped[i].lookup.length) {
remaining++;
String that = unwrapped[i].lookup[lookupIndices[i]];
if (currentVal == null || currentVal.compareTo(that)  0) {
currentVal = that;
currentSeg = i;
}
}
}
if (remaining == 1) {
break;
} else if (remaining == 0) {
/* The only way this could happen is if there are 0 segments or if
 * all segments have 0 terms. In either case, we can return
 * early.
 */
return new CacheData(new StringIndex(
new int[starts[starts.length - 1]], new String[1]));
}
if (!currentVal.equals(mergedLookup[lookupIndex])) {
lookupIndex++;
mergedLookup[lookupIndex] = currentVal;
}
OrderNode a = flippedOrders[currentSeg][lookupIndices[currentSeg]];
while (a != null) {
mergedOrder[a.index + starts[currentSeg]] = lookupIndex;
a = a.next;
}
lookupIndices[currentSeg]++;
}
{code}



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-16 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657151#action_12657151
 ] 

Robert Newson commented on LUCENE-831:
--

I was wondering if the next version of the patch could include a sample 
disk-based cache? It seems that CacheKey classes are fine for an in-memory 
HashMap (since SimpleMapCache works just fine) but I wonder if equals/hashCode 
is sufficient when the data is on disk?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655988#action_12655988
 ] 

Michael McCandless commented on LUCENE-831:
---


{quote}
 At present, KS only caches the docID - ord map as an array. It builds that
 array by iterating over the terms in the sort field's Lexicon and mapping the
 docIDs from each term's posting list.
{quote}

OK, that corresponds to the order array in Lucene's
FieldCache.StringIndex class.

{quote}
 Building the docID - ord array is straightforward for a single-segment
 SegLexicon. The multi-segment case requires that several SegLexicons be
 collated using a priority queue. In KS, there's a MultiLexicon class which
 handles this; I don't believe that Lucene has an analogous class.
{quote}

Lucene achieves the same functionality by using a MultiReader to read
the terms in order (which uses MultiSegmentReader.MultiTermEnum, which
uses a pqueue under the hood) and building up StringIndex from that.
It's very costly.

{quote}
 Relying on the docID - ord array alone works quite well until you get to the
 MultiSearcher case. As you know, at that point you need to be able to
 retrieve the actual field values from the ordinal numbers, so that you can
 compare across multiple searchers (since the ordinal values are meaningless).
{quote}

Right, and we are trying to move towards pushing searcher down to the
segment.  Then we can use the per-segment ords for within-segment
collection, and then the real values for merging the separate pqueues
at the end (but, initial results from LUCENE-1483 show that collecting
N queues then merging in the end adds ~20% slowdown for N = 100
segments).

{quote}
 Lex_Seek_By_Num(lexicon, term_num);
 field_val = Lex_Get_Term(lexicon);
 
 The problem is that seeking by ordinal value on a MultiLexicon iterator
 requires a gnarly implementation and is very expensive. I got it working, but
 I consider it a dead-end design and a failed experiment.
{quote}

OK.

{quote}
 The planned replacement for these iterator-based quasi-FieldCaches involves
 several topics of recent discussion:
 
 1) A keyword field type, implemented using a format similar to what Nate
 and I came up with for the lexicon index.
 2) Write per-segment docID - ord maps at index time for sort fields.
 3) Memory mapping.
 4) Segment-centric searching.
 
 We'd mmap the pre-composed docID - ord map and use it for intra-segment
 sorting. The keyword field type would be implemented in such a way that we'd
 be able to mmap a few files and get a per-segment field cache, which we'd then
 use to sort hits from multiple segments.
{quote}

OK so your keyword field type would expose random-access to field
values by docID, to be used to merge the N segments' pqueues into a
single final pqueue?

The alternative is to use iterator but pull the values into your
pqueues when they are inserted.  The benefit is iterator-only
exposure, but the downside is likely higher net cost of insertion.
And if the assumption is these fields can generally be ram resident
(explicitly or via mmap), then the net benefit of iterator-only API is
not high.


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12656150#action_12656150
 ] 

Marvin Humphrey commented on LUCENE-831:


 Building the docID - ord array is straightforward for a single-segment
 SegLexicon. The multi-segment case requires that several SegLexicons be
 collated using a priority queue. In KS, there's a MultiLexicon class which
 handles this; I don't believe that Lucene has an analogous class.
 
 Lucene achieves the same functionality by using a MultiReader to read
 the terms in order (which uses MultiSegmentReader.MultiTermEnum, which
 uses a pqueue under the hood) and building up StringIndex from that.
 It's very costly.

Ah, you're right, that class is analogous.  The difference is that
MultiTermEnum doesn't implement seek(), let alone seekByNum().  I was pretty
sure you wouldn't have bothered, since by loading the actual term values into
an array you eliminate the need for seeking the iterator.

 OK so your keyword field type would expose random-access to field
 values by docID, 

Yes.  There would be three files for each keyword field in a segment.

  * docID - ord map.  A stack of i32_t, one per doc.
  * Character data.  Each unique field value would be stored as uncompressed
UTF-8, sorted lexically (by default).
  * Term offsets.  A stack of i64_t, one per term plus one, demarcating the 
term text boundaries in the character data file.

Assuming that we've mmap'd those files -- or slurped them -- here's the
function to find the keyword value associated with a doc num:

{code}
void
KWField_Look_Up(KeyWordField *self, i32_t doc_num, ViewCharBuf *target)
{
if (doc_num  self-max_doc) {
CONFESS(Doc num out of range: %u32 %u32, 
}
else {
i64_t offset  = self-offsets[doc_num];
i64_t next_offset = self-offsets[doc_num + 1];
i64_t len = next_offset - offset;
ViewCB_Assign_Str(target, self-chardata + offset, len);
}
}
{code}

I'm not sure whether IndexReader.fetchDoc() should retrieve the values for
keyword fields by default, but I lean towards yes.  The locality isn't ideal,
but I don't think it'll be bad enough to contemplate storing keyword values
redundantly alongside the other stored field values.

 to be used to merge the N segments' pqueues into a
 single final pqueue?

Yes, although I think you only need one two priority queues total: one
dedicated to iterating intra-segment, which gets emptied out after each
seg into the other, final queue.

 The alternative is to use iterator but pull the values into your
 pqueues when they are inserted. The benefit is iterator-only
 exposure, but the downside is likely higher net cost of insertion.
 And if the assumption is these fields can generally be ram resident
 (explicitly or via mmap), then the net benefit of iterator-only API is
 not high.

If I understand where you're going, you'd like to apply the design of the
deletions iterator to this problem?

For that to work, we'd need to store values for each document, rather than
only unique values... right?  And they couldn't be stored in sorted order,
because we aren't pre-sorting the docs in the segment according to the value
of a keyword field -- which means string diffs don't help.  You'd have a
single file, with each doc's values encoded as a vbyte byte-count followed by
UTF-8 character data.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support 

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Mark Miller

Michael McCandless wrote:


I think it does make sense (it's well defined).  This is what the 
SubsearcherTopDocs.convertTopDoc method is doing (in the 
multisearcher.take2.patch on LUCENE-1471).


In fact, returning by document order is a particularly trivial sort, 
since you'd just have to concatenate the results coming out of the 
pqueues (ie you wouldn't need a 2nd pqueue).  In fact, any SortField[] 
that contains a SortField.FIELD_DOC could be truncated since that sort 
order is total.  But these are minor optimizations which we 
shouldn't worry about for now...


Mike
Yeah, right again. Just trying to get out of what wasn't working and 
seemed like it should without work from me.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless


Mark Miller wrote:


Michael McCandless wrote:


Mark Miller wrote:


Mark Miller wrote:


Which new sort stuff are you referring to?  Is it LUCENE-1471?


Yes. First thing I did was try and patch this in, but the sort  
tests failed. It would be the right order, but like the two  
center docs would be reversed or something. No time to dig in, so  
I just switch to the trunk MultiSearcher and all tests passed  
except for the two with the above issues.
Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting  
different aspects of an issue thats messed up with the old  
MultiSearcher as well.


OK.  If you're building on LUCENE-1471, make sure you start from  
the first patch.  It'd be good to factor that logic (2nd pqueue for  
merging) out so it can be reused b/w IndexSearcher  MultiSearcher.
I actually worked with the second. I'll take a look at the first  
instead. I'm sticking with using the MultiSearcher for the first  
patch - it can be worked out later if it speed things up.


OK.  And, the first now has a 2nd iteration (factors  
ParallelMultiSearcher to do the merge sort too).


Does returning by document id order even make sense with this  
though? Did it make sense with MultiSearcher? They are pseudo ids  
(mapped), so it almost seems I can't support that right...it would  
depend on the order of the readers.


I think it does make sense (it's well defined).  This is what the  
SubsearcherTopDocs.convertTopDoc method is doing (in the  
multisearcher.take2.patch on LUCENE-1471).


In fact, returning by document order is a particularly trivial sort,  
since you'd just have to concatenate the results coming out of the  
pqueues (ie you wouldn't need a 2nd pqueue).  In fact, any SortField[]  
that contains a SortField.FIELD_DOC could be truncated since that sort  
order is total.  But these are minor optimizations which we  
shouldn't worry about for now...


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless


Mark Miller wrote:


Mark Miller wrote:

Mark Miller wrote:


Which new sort stuff are you referring to?  Is it LUCENE-1471?


Yes. First thing I did was try and patch this in, but the sort  
tests failed. It would be the right order, but like the two center  
docs would be reversed or something. No time to dig in, so I just  
switch to the trunk MultiSearcher and all tests passed except for  
the two with the above issues.

Got the auto detection working though.
Bah, I didn't. Brought up an old bug I've seen before - if you use  
multisearcher and an index doesn't have the field, AUTO won't work.  
Advice I always got was don't use AUTO, but even Lucene uses it  
internally. Thought I had a workarount, but didn't quite work. Not  
sure what to do about this one - I'll have to mull it and the ids  
issue over a bit I suppose.



Hmm... I think we have to keep the AUTO - true type resolution that  
MultiReader would do?  Ie, ask MultiReader for the TermEnum, not the  
first sub-reader, for resolving.


In fact we should factor out an explicit method to do this; it's  
currently in ExtendedFieldCache.autoCache.createValue.


As long as you do that resolving up front w/ the MultiReader, and pass  
only resolved SortField[] to each sub-reader, that should fix it?


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654820#action_12654820
 ] 

Michael McCandless commented on LUCENE-831:
---

Marvin, does KS/Lucy have something like FieldCache?  If so, what API do you 
use?  Is it iterator-only?

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Michael McCandless


On thinking more about this... I think with a few small changes we
could achieve Sort by field without materializing a full array.  We
can decouple this change from LUCENE-831.

I think all that's needed is:

  * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
IndexReader.getSubReaders.  Default impl could just return
length-1 array of itself.

  * Change IndexSearcher.sort that takes a Sort, to first call
IndexReader.getSubReaders, and then do the same logic that
MultiSearcher does, with improvements from LUCENE-1471 (run
separate search per-reader, then merge-sort the top hits from
each).

The results should be functionally identical to what we have today,
but, searching after doing a reopen() should be much faster since we'd
no longer re-build the global FieldCache array.

Does this make sense?  It's a small change for a big win, I think.
Does anyone want to take a crack at this patch?

Mike

Mark Miller wrote:


Michael McCandless wrote:


I'd like to decouple upgraded to Object vs materialize full  
array, ie, so we can access native values w/o materializing the  
full array.  I also think upgrade to Object is dangerous to even  
offer since it's so costly.



I'm right with you. I didn't think the Object approach was really an  
upgrade (beyond losing the merge, which is especially important for  
StringIndex - it has no merge option at the moment) which is why I  
left both options for now. So I def agree we need to move to  
iterator, drop object, etc.


Its the doin' that aint so easy. The iterator approach seems  
somewhat straightforward (though its complicated by needing to  
provide a random access object as well), but I'm still working  
through how we control so many iterator types (I dont see how you  
can use polymorphism yet ).


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Mark Miller
What do we get from this though? A MultiSearcher (with the  scoring 
issues) that can properly do rewrite? Won't we have to take 
MultiSearchers scoring baggage into this as well?


Michael McCandless wrote:


On thinking more about this... I think with a few small changes we
could achieve Sort by field without materializing a full array.  We
can decouple this change from LUCENE-831.

I think all that's needed is:

  * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
IndexReader.getSubReaders.  Default impl could just return
length-1 array of itself.

  * Change IndexSearcher.sort that takes a Sort, to first call
IndexReader.getSubReaders, and then do the same logic that
MultiSearcher does, with improvements from LUCENE-1471 (run
separate search per-reader, then merge-sort the top hits from
each).

The results should be functionally identical to what we have today,
but, searching after doing a reopen() should be much faster since we'd
no longer re-build the global FieldCache array.

Does this make sense?  It's a small change for a big win, I think.
Does anyone want to take a crack at this patch?

Mike

Mark Miller wrote:


Michael McCandless wrote:


I'd like to decouple upgraded to Object vs materialize full 
array, ie, so we can access native values w/o materializing the 
full array.  I also think upgrade to Object is dangerous to even 
offer since it's so costly.



I'm right with you. I didn't think the Object approach was really an 
upgrade (beyond losing the merge, which is especially important for 
StringIndex - it has no merge option at the moment) which is why I 
left both options for now. So I def agree we need to move to 
iterator, drop object, etc.


Its the doin' that aint so easy. The iterator approach seems somewhat 
straightforward (though its complicated by needing to provide a 
random access object as well), but I'm still working through how we 
control so many iterator types (I dont see how you can use 
polymorphism yet ).


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Michael McCandless


Mark Miller wrote:

What do we get from this though? A MultiSearcher (with the  scoring  
issues) that can properly do rewrite? Won't we have to take  
MultiSearchers scoring baggage into this as well?


If this can work, what we'd get is far better reopen() performance
when you sort-by-field, with no change to the returned results
(rewrite, scores, sort order are identical).

Say you have 1MM doc index, and then you add 100 docs  commit.
Today, when you reopen() and then do a search, FieldCache recomputes
from scratch (iterating through all Terms in entire index) the global
arrays for the fields you're sorting on.  The cost is in proportion to
total index size.

With this change, only the new segment's terms will be iterated on, so
the cost is in proportion to what new segments appeared.

This is the same benefit we are seeking with LUCENE-831, for all uses
of FieldCache (not just sort-by-field), it's just that I think we can
achieve this speedup to sort-by-field without LUCENE-831.

I think there would be no change to the scoring: we would still create
a Weight based on the toplevel IndexReader, but then search each
sub-reader separately, using that Weight.

Though... that is unusual (to create a Weight with the parent
IndexSearcher and then use it in the sub-searchers) -- will something
break if we do that?  (This is new territory for me).

If something will break, I think we can still achieve this, but it
will be a more invasive change and probably will have to be re-coupled
to the new API we will introduce with LUCENE-831.  Marvin actually
referred to how to do this, here:

  https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854 
#action_12650854


in the paragraph starting with If our goal is minimal impact
Basically during collection, the FieldSortedHitQueue would have to
keep track of subReaderIndex/subReaderDocID (mapping, through
iteration, from the primary docID w/o doing a wasteful new binary
search for each) and enroll into different pqueues indexed by
subReaderIndex, then do the merge sort in the end.

Mike



Michael McCandless wrote:


On thinking more about this... I think with a few small changes we
could achieve Sort by field without materializing a full array.  We
can decouple this change from LUCENE-831.

I think all that's needed is:

 * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
   IndexReader.getSubReaders.  Default impl could just return
   length-1 array of itself.

 * Change IndexSearcher.sort that takes a Sort, to first call
   IndexReader.getSubReaders, and then do the same logic that
   MultiSearcher does, with improvements from LUCENE-1471 (run
   separate search per-reader, then merge-sort the top hits from
   each).

The results should be functionally identical to what we have today,
but, searching after doing a reopen() should be much faster since  
we'd

no longer re-build the global FieldCache array.

Does this make sense?  It's a small change for a big win, I think.
Does anyone want to take a crack at this patch?

Mike

Mark Miller wrote:


Michael McCandless wrote:


I'd like to decouple upgraded to Object vs materialize full  
array, ie, so we can access native values w/o materializing the  
full array.  I also think upgrade to Object is dangerous to  
even offer since it's so costly.



I'm right with you. I didn't think the Object approach was really  
an upgrade (beyond losing the merge, which is especially important  
for StringIndex - it has no merge option at the moment) which is  
why I left both options for now. So I def agree we need to move to  
iterator, drop object, etc.


Its the doin' that aint so easy. The iterator approach seems  
somewhat straightforward (though its complicated by needing to  
provide a random access object as well), but I'm still working  
through how we control so many iterator types (I dont see how you  
can use polymorphism yet ).


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Mark Miller

Michael McCandless wrote:


Mark Miller wrote:

What do we get from this though? A MultiSearcher (with the  scoring 
issues) that can properly do rewrite? Won't we have to take 
MultiSearchers scoring baggage into this as well?


If this can work, what we'd get is far better reopen() performance
when you sort-by-field, with no change to the returned results
(rewrite, scores, sort order are identical).

Say you have 1MM doc index, and then you add 100 docs  commit.
Today, when you reopen() and then do a search, FieldCache recomputes
from scratch (iterating through all Terms in entire index) the global
arrays for the fields you're sorting on.  The cost is in proportion to
total index size.

With this change, only the new segment's terms will be iterated on, so
the cost is in proportion to what new segments appeared.

This is the same benefit we are seeking with LUCENE-831, for all uses
of FieldCache (not just sort-by-field), it's just that I think we can
achieve this speedup to sort-by-field without LUCENE-831.


Yup, I'm with you on all that. Except the without LUCENE-831 part - we 
need some FieldCache meddling right? The current FieldCache approach 
doesn't allow us to meddle much. Isn't it more like, we want the 
LUCENE-831 API (or something similar), but we won't need the objectarray 
or merge stuff?




I think there would be no change to the scoring: we would still create
a Weight based on the toplevel IndexReader, but then search each
sub-reader separately, using that Weight.

Though... that is unusual (to create a Weight with the parent
IndexSearcher and then use it in the sub-searchers) -- will something
break if we do that?  (This is new territory for me).


Okay, right. That does change things. Would love to hear more opinions, 
but that certainly seems reasonable to me. You score each segment using 
tf/idf stats from all of the segments.




If something will break, I think we can still achieve this, but it
will be a more invasive change and probably will have to be re-coupled
to the new API we will introduce with LUCENE-831.  Marvin actually
referred to how to do this, here:

  
https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854#action_12650854 



in the paragraph starting with If our goal is minimal impact
Basically during collection, the FieldSortedHitQueue would have to
keep track of subReaderIndex/subReaderDocID (mapping, through
iteration, from the primary docID w/o doing a wasteful new binary
search for each) and enroll into different pqueues indexed by
subReaderIndex, then do the merge sort in the end.

Mike



Michael McCandless wrote:


On thinking more about this... I think with a few small changes we
could achieve Sort by field without materializing a full array.  We
can decouple this change from LUCENE-831.

I think all that's needed is:

 * Expose sub-readers (LUCENE-1475) by adding IndexReader[]
   IndexReader.getSubReaders.  Default impl could just return
   length-1 array of itself.

 * Change IndexSearcher.sort that takes a Sort, to first call
   IndexReader.getSubReaders, and then do the same logic that
   MultiSearcher does, with improvements from LUCENE-1471 (run
   separate search per-reader, then merge-sort the top hits from
   each).

The results should be functionally identical to what we have today,
but, searching after doing a reopen() should be much faster since we'd
no longer re-build the global FieldCache array.

Does this make sense?  It's a small change for a big win, I think.
Does anyone want to take a crack at this patch?

Mike

Mark Miller wrote:


Michael McCandless wrote:


I'd like to decouple upgraded to Object vs materialize full 
array, ie, so we can access native values w/o materializing the 
full array.  I also think upgrade to Object is dangerous to even 
offer since it's so costly.



I'm right with you. I didn't think the Object approach was really 
an upgrade (beyond losing the merge, which is especially important 
for StringIndex - it has no merge option at the moment) which is 
why I left both options for now. So I def agree we need to move to 
iterator, drop object, etc.


Its the doin' that aint so easy. The iterator approach seems 
somewhat straightforward (though its complicated by needing to 
provide a random access object as well), but I'm still working 
through how we control so many iterator types (I dont see how you 
can use polymorphism yet ).


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654413#action_12654413
 ] 

Michael McCandless commented on LUCENE-831:
---


bq. It seems with this field cache approach and the recent 
FieldCacheRangeFilter on trunk, that Lucene has a robust and coherent answer to 
performing efficient sorting and range filtering for float, double, short, int 
and long values, perhaps it's time to enhance Document. That might cut down the 
size of the API, which in turn makes it easy to test and tune. Document could 
preclude tokenization for such fields, I suspect I'm not the only one to build 
a type-safe replacement to Document.

This is an interesting idea.  Say we create IntField, a subclass of
Field.  It could directly accept a single int value and not accept
tokenization options.  It could assert not null, if the field wanted
that.  FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well.  If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields.  Etc


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654417#action_12654417
 ] 

Uwe Schindler commented on LUCENE-831:
--

{quote}This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert not null, if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc{quote}

Maybe this document could also manage the encoding of these fields to the index 
format. With that it would be possible to extend Docuemnt, to automatically use 
my trie-based encoding for storing the raw term values. On the otrher hand 
RangeQuery would be aware of the field encoding and can switch dynamically to 
the correct search/sort algorithm. Great!


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654418#action_12654418
 ] 

Robert Newson commented on LUCENE-831:
--


Yes, something like that. I made a Document class with an add method for each 
primitive type which allowed only the sensible choices for Store and Index. 
Field subclasses would achieve the same thing. A subclass per primitive type 
might be excessive, they'd be 99% identical to each other. A NumericField that 
could hold a single short, int, long, float, double or Date might be enough 
(new NumericField(name, 99.99F, true), the final boolean toggling YES/NO for 
Store, since Index is always UNANALYZED_NO_NORMS).

Adding this to FieldInfo would change the on-disk format such that it remembers 
that a particular field is of a special type?  That way all the places that 
Lucene currently has a multiplicity of classes or constants (SortField.INT, 
etc) could be eliminated, replaced by first class support in Document/Field.

A remaining question would be whether field name is sufficient for uniqueness, 
I suggest it becomes fieldname+type. This also implies changes to the Query and 
Filter hierarchy. 

If it helps, I can post my Document class, which had helper methods for 
RangeFilter and TermQuery's for each type. It's not a complicated class, you 
can probably already picture it.


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Michael McCandless


Mark Miller wrote:


Michael McCandless wrote:


Mark Miller wrote:

What do we get from this though? A MultiSearcher (with the   
scoring issues) that can properly do rewrite? Won't we have to  
take MultiSearchers scoring baggage into this as well?


If this can work, what we'd get is far better reopen() performance
when you sort-by-field, with no change to the returned results
(rewrite, scores, sort order are identical).

Say you have 1MM doc index, and then you add 100 docs  commit.
Today, when you reopen() and then do a search, FieldCache recomputes
from scratch (iterating through all Terms in entire index) the global
arrays for the fields you're sorting on.  The cost is in proportion  
to

total index size.

With this change, only the new segment's terms will be iterated on,  
so

the cost is in proportion to what new segments appeared.

This is the same benefit we are seeking with LUCENE-831, for all uses
of FieldCache (not just sort-by-field), it's just that I think we can
achieve this speedup to sort-by-field without LUCENE-831.


Yup, I'm with you on all that. Except the without LUCENE-831 part -  
we need some FieldCache meddling right? The current FieldCache  
approach doesn't allow us to meddle much. Isn't it more like, we  
want the LUCENE-831 API (or something similar), but we won't need  
the objectarray or merge stuff?


We wouldn't need any change to FieldCache, because we only ask  
FieldCache for int[] (eg) on the SegmentReader instances.  Because  
reopen() shares SegmentReader instances, only the new segments would  
have a cache miss in FieldCache.  I think?


Once we do LUCENE-831, minus objectarray and merging, this change  
would be basically the same, ie, accessing per-segment int values,  
just with a new API.  Ie, by doing this change first I don't think  
we're going to waste much in then cutting over in the future to  
LUCENE-831's API (vs waiting for LUCENE-831 api).


I think there would be no change to the scoring: we would still  
create

a Weight based on the toplevel IndexReader, but then search each
sub-reader separately, using that Weight.

Though... that is unusual (to create a Weight with the parent
IndexSearcher and then use it in the sub-searchers) -- will something
break if we do that?  (This is new territory for me).


Okay, right. That does change things. Would love to hear more  
opinions, but that certainly seems reasonable to me. You score each  
segment using tf/idf stats from all of the segments.


That's my expectation (hope).  So the results are identical but  
performance is much better.



If something will break, I think we can still achieve this, but it
will be a more invasive change and probably will have to be re- 
coupled

to the new API we will introduce with LUCENE-831.  Marvin actually
referred to how to do this, here:

 https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650854 
#action_12650854


in the paragraph starting with If our goal is minimal impact
Basically during collection, the FieldSortedHitQueue would have to
keep track of subReaderIndex/subReaderDocID (mapping, through
iteration, from the primary docID w/o doing a wasteful new binary
search for each) and enroll into different pqueues indexed by
subReaderIndex, then do the merge sort in the end.

Mike



Michael McCandless wrote:


On thinking more about this... I think with a few small changes we
could achieve Sort by field without materializing a full array.  We
can decouple this change from LUCENE-831.

I think all that's needed is:

* Expose sub-readers (LUCENE-1475) by adding IndexReader[]
  IndexReader.getSubReaders.  Default impl could just return
  length-1 array of itself.

* Change IndexSearcher.sort that takes a Sort, to first call
  IndexReader.getSubReaders, and then do the same logic that
  MultiSearcher does, with improvements from LUCENE-1471 (run
  separate search per-reader, then merge-sort the top hits from
  each).

The results should be functionally identical to what we have today,
but, searching after doing a reopen() should be much faster since  
we'd

no longer re-build the global FieldCache array.

Does this make sense?  It's a small change for a big win, I think.
Does anyone want to take a crack at this patch?

Mike

Mark Miller wrote:


Michael McCandless wrote:


I'd like to decouple upgraded to Object vs materialize full  
array, ie, so we can access native values w/o materializing  
the full array.  I also think upgrade to Object is dangerous  
to even offer since it's so costly.



I'm right with you. I didn't think the Object approach was  
really an upgrade (beyond losing the merge, which is especially  
important for StringIndex - it has no merge option at the  
moment) which is why I left both options for now. So I def agree  
we need to move to iterator, drop object, etc.


Its the doin' that aint so easy. The iterator approach seems  
somewhat straightforward (though its complicated by needing to  
provide a random 

  1   2   >