subject:"\[jira\] Commented\: \(LUCENE\-831\) Complete overhaul of FieldCache API\/Implementation"

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2007-03-27 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484518
 ] 

Otis Gospodnetic commented on LUCENE-831:
-

I haven't looked at the patch yet.  However, I do know that a colleague of mine 
is about to start porting some FieldCache-based Filter stuff to Lucene's trunk. 
 Because that work may conflict with Hoss' changes here, we should see if this 
get applied first.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Attachments: fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2007-07-03 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509958
 ] 

Mark Miller commented on LUCENE-831:


I think this patch is great.  Not only does it make all of the sort caching 
logic much easier to decipher, being able to throw in a sophisticated cache 
manager like ehcache (which can let caches overflow to disk) could end up being 
pretty powerful. I am also very interested in the possibility of pre-warming a 
Reader.

I will spend some time playing with what is here so far.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2007-07-18 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513777
 ] 

Hoss Man commented on LUCENE-831:
-

thanks for the feedback mark ... i honestly haven't looked at this patch since 
the last time i updated the issue ... (i'm not sure if i've even thought about 
it once since then).  it's the kind of things that seemed really cool important 
at the time, but then ... you know, other things come up.

by all means, feel free to update it.

as i recall, the biggest thing about this patch that was really just pie in the 
sky and may not make any sense is the whole concept of merging and letting 
subreaders of MultiReader do their own caching which could then percolate up.  
I did it on the assumption that it would come in handy when reopening an 
IndexReader that contains several segments -- many of which may not have 
changed since the last time you opened the index.  but i really didn't have any 
idea how the whole reopening things would work.  i see now there is some reopen 
code in LUCENE-743, but frankly i'm still not sure wether the API makes sense, 
or is total overkill.

it might be better to gut the merging logic from the patch and add it later 
if/when there becomes a more real use case for it (the existing mergeData and 
isMergable methods could always be re-added to the abstract base classes if it 
turns out they do make sense)


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2007-07-26 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515914
 ] 

Otis Gospodnetic commented on LUCENE-831:
-

Catching up with java-dev (just 300-400 more emails to go! ;)), I spotted this 
in "Lucene 2.2 soon"  thread (Doron's email):

The part I am not sure of in this regard are my changes
to FieldCache and FieldcacheImpl, while LUCENE-831 is
ongoing too. (though btw 831 doesn't apply cleanly on
current trunk). (It is my plan to get into LUCENE-831
but I haven't got to it yet.)


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-01-15 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559008#action_12559008
 ] 

Uwe Schindler commented on LUCENE-831:
--

I did some extensive tests with Lucene 2.3 today and was wondering, that 
IndexReader.reopen() in combination with FieldCache/Sorting does not bring a 
performance increase.

Until now, even if you reopen an Index using IndexReader.reopen(), you will get 
a new IndexReader instance which is not available in the default FieldCache. 
Because of that, all values from the cached fields must be reloaded. As 
reopen() only opens new/changed segments, a FieldCache implementation directly 
embedded into IndexReader as propsed by this issue would help here. Each 
segment would have its own FieldCache and reopening would be quite fast.

As with Lucene 2.3 the reopen is possible, how about this issue? Would this be 
possible for 2.4?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-01-15 Thread Michael Busch (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559341#action_12559341
 ] 

Michael Busch commented on LUCENE-831:
--

{quote}
As with Lucene 2.3 the reopen is possible, how about this issue? Would this be 
possible for 2.4?
{quote}

Yeah, this is a limitation currently of reopen(). I'm planning to work on it 
after the 2.3 release is
out and LUCENE-584 is committed!

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-21 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581189#action_12581189
 ] 

Mark Miller commented on LUCENE-831:


I spent a little time getting this patch somewhat updated to the trunk and 
running various benchmarks with reopen. As expected, the sequence of searching 
a large index with a sort, adding a few docs and then reopening a Reader to 
perform a sorted search, can be 10 of times faster.

I think the new API is great too - I really like being able to experiment with 
other caching strategies. I find the code easier to follow as well.

Did you have any ideas for merging a String index? That is something that still 
needs to be done...

- Mark

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-22 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581307#action_12581307
 ] 

Hoss Man commented on LUCENE-831:
-

Mark:

I haven't looked at this issue or any of the code in my patch since my last 
updated, nor have i had a chance to look at your updated patch, but a few 
comments...

1) you rock.

2) I have no idea what i had in mind for dealing with StringIndex when i said 
"a bit complicated, but should be possible/efficient".  I do distinctly 
remember thinking that we should add support for *just* getting the string 
indexes (ie: the current int[numDocs]) for when you don't care what the strings 
are, just the sort order or *just* getting a String[numDocs] when you aren't 
doing sorting and just want an "inverted inverted index" on the field ... but 
obviously we still need to support the curent StringIndex (it's used in 
MultiSearcher right?)


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-23 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581375#action_12581375
 ] 

Mark Miller commented on LUCENE-831:


Right, I think its used in MultiSearcher and Parallel-MultSearcher and I 
believe its because you cannot compare simple ord arrays across Searchers and 
so you need the original sort term?

I havn't been able to come up with anything that would be very efficient for 
merging a StringIndex, but I have not thought to much about it yet. Anyone out 
there have any ideas? Fastest way to merge two String[], each with an int[] 
indexing into the String[]? Is there a *really* fast way?

I agree that it would be nice to skip the String[] if a MultiSearcher was not 
being used.

I'll keep playing with it

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-24 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581758#action_12581758
 ] 

Yonik Seeley commented on LUCENE-831:
-

> I agree that it would be nice to skip the String[] if a MultiSearcher was not 
> being used.

If you're going to incrementally update a FieldCache of a MultiReader, it's the 
same issue... can't merge the ordinals without the original (String) values.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-26 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582422#action_12582422
 ] 

Michael McCandless commented on LUCENE-831:
---


One question here: should we switch to a method call, instead of a
straight array, to retrieve a cached value for a doc?

If we did that, then MultiSearchers would forward the request to the
right IndexReader.

The benefit then is that reopen() of a reader would not have to
allocate & bulk copy massive arrays when updating the caches.  It
would keep the cost of reopen closer to the size of the new segments.
And this way the old reader & the new one would not double-allocate
the RAM required to hold the common parts of the cache.

We could always still provide a "give me the full array" fallback if
people really wanted that (and were willing to accept the cost).



> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-26 Thread Michael Busch (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582443#action_12582443
 ] 

Michael Busch commented on LUCENE-831:
--

{quote}
The benefit then is that reopen() of a reader would not have to
allocate & bulk copy massive arrays when updating the caches. It
would keep the cost of reopen closer to the size of the new segments.
{quote}

I agree, Mike. Currently during reopen() the MultiSegmentReader 
allocates a new norms array with size maxDoc(), which is, as you said,
inefficient if only some (maybe even small) segments changed.

The method call might be a little slower than the array lookup, but
I doubt that this would be very significant. We can make this change for
the norms and run performance tests to measure the slowdown.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-26 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582471#action_12582471
 ] 

Mark Miller commented on LUCENE-831:


>If you're going to incrementally update a FieldCache of a MultiReader, it's 
>the same issue... can't merge the ordinals without the original (String) 
>>values.

That is a great point.

>should we switch to a method call, instead of a straight array, to retrieve a 
>cached value for a doc?

Sounds like a great idea to me. Solves the StringIndex merge and eliminates all 
merge costs at the price of a method call per access.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-26 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582480#action_12582480
 ] 

Mark Miller commented on LUCENE-831:


Hmm...how do we avoid having to pull the cached field values through a sync on 
every call? The field data has to be cached...and the method to return the 
single cached field value has to be multi-threaded...

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-03-27 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582576#action_12582576
 ] 

Michael McCandless commented on LUCENE-831:
---

I think if we can finally move to having read-only IndexReaders then they would 
not sync on this method?

Also, we should still provide the "give me the full array as of right now" 
fallback which in a read/write usage would allow you to spend lots of RAM in 
order to not synchronize.  Of course you'd also have to update your array (or, 
periodically ask for a new one) if you are altering fields.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael Busch
> Fix For: 2.4
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-08-21 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624529#action_12624529
 ] 

Mark Miller commented on LUCENE-831:


Deprecating the function package is problematic. Doing it by method leaves the 
classes open to issues maintaining two possible states that could be mixed - 
the FieldCache and parsers are used as method params - you can't really replace 
the old internal representation with the new one. You really have to support 
the old method and new method and tell people not to use both or something. 
Doing it by class means duplicating half a dozen classes with new names. 
Neither is very satisfying. This stuff is all marked experimental and subject 
to change though - could it just be done clean sweep style? A little abrupt 
but...experimental is experimental 

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-08-22 Thread Fuad Efendi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624835#action_12624835
 ] 

Fuad Efendi commented on LUCENE-831:


Would be nice to have TermVectorCache (if term vectors are stored in the index)

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-08-22 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624844#action_12624844
 ] 

Mark Miller commented on LUCENE-831:


That patch may have a goof , I'll peel off another soon. Unfortunately, it 
looks like this is slower than the orig implementation. I have to run more 
benchmarks, but it might even be in the 10-20% mark. My guess is that its the 
method calls - you may gain much faster reopen, but you appear to lose quite a 
bit on standard sort speed...

Could give the choice of going with the arrays, if they prove as fast as orig, 
but then back to needing an efficient StringIndex merge...

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-08-23 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625084#action_12625084
 ] 

Mark Miller commented on LUCENE-831:


I've got the function package happily deprecated with replacement methods.

I am going to ballpark the sort speed with method calls to be about 10% slower, 
but there are a lot of variables and I am still benchmarking.

I've added a check for a system property so that by default, sorting will still 
use primitive arrays, but if you want to pay the small sort cost you can turn 
on the method calls and get faster reopen without cache merging.

So that wraps up everything I plan to do unless comments, criticisms bring up 
anything else.

The one piece that is missing and should be addressed is an efficient merge for 
the StringIndex.

Patch to follow.

- mark

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-08-23 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625098#action_12625098
 ] 

Mark Miller commented on LUCENE-831:


bq. change norm caching to use new caches (if not the same
main cache, then at least the same classes in a private cache)


What benefit do you see to this? Does it offer anything over the simple map 
used now?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-08-25 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625470#action_12625470
 ] 

Hoss Man commented on LUCENE-831:
-

bq. What benefit do you see to this? Does it offer anything over the simple map 
used now?

I *think* what i ment there was that if the norm caching used the same 
functionality then it could simplify the code, and norms would be optimized in 
the reopen case as well ... plus custom Cache Impls could report stats on norm 
usage (just like anything else) so people could  see when norms were getting 
used for fields they didn't expect them to be.

But as i've said before ... most of my early comments were pie in the sky 
theoretical brainstorming comments -- don't read too much into them if they 
don't really make sense.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-03 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644786#action_12644786
 ] 

Mark Miller commented on LUCENE-831:


This is missing a good way to manage the caching of the field values per 
cachekey (allowing for a more fine grained  weak or disk strategy for example). 
That type of stuff would have to be built into the cachkey currently - so to 
add a new strategy, youd have to implement a lot of new cachekeys and multiply 
your options to manage...for every int/long/byte/etc array and object array you 
would have to implement a new cachkey for a weakref solution. Is there an API 
that can overcome this?

The current cache that you can easily override, cachkey to data, is pretty 
course...I am not sure how many helpful things you can do easily.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-19 Thread Alex Vigdor (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649180#action_12649180
 ] 

Alex Vigdor commented on LUCENE-831:


Another useful feature that seems like it would be pretty easy to implement on 
top of this patch is cache warming; that is, the ability to reopen an 
IndexReader and repopulate its caches before making it "live".  The main thing 
missing is a Cache.getKeys() method which could be used to discover what caches 
are already in use, in order to repopulate them after reopening the 
IndexReader.  This cache warming could be performed externally to the 
IndexReader (by calling getCacheData for each of the keys after reopening), or 
perhaps the reopen method could be overloaded with a boolean "reloadCaches" to 
perform this in the same method call.

The rationale for this I hope is clear; my application, like many I'm sure, 
keeps a single IndexReader for handling searches, and in a separate thread from 
search request handling commits writes and reopens the IndexReader before 
replacing the IndexReader reference for new searches (reference counting is 
then used to eventually close the old reader instances).  It would be ideal for 
that separate thread to also bear the expense of cache warming; even with this 
patch against our 4 GB indices, along with -Duse.object.array.sort=true,  a 
search request coming immediately after reopening the index will pause 20-40 
seconds while the caches refill.  Preferably that could be done in the 
background and request handling threads would never be slowed down.



> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-19 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649188#action_12649188
 ] 

Mark Miller commented on LUCENE-831:


Also - your reopen time will vary greatly depending on how many segments need 
to be reopened and what size they are...so I think a lower merge factor would 
help, while hurting overall search speed of course.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-19 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649191#action_12649191
 ] 

Mark Miller commented on LUCENE-831:


You've tried the patch? Awesome!

How long did it take to fill the cache before the patch?

I agree that we should be as friendly to warming as we can be. Keep in 
mind that you can warm the reader yourself by issuing a sorted search 
before putting the reader into duty - of course you don't get to warm 
from RAM like with what you suggest.

Keep that feedback coming. I've been building momentum on coming back to 
this issue, unless a commiter beats me to it.




> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-20 Thread Alex Vigdor (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649385#action_12649385
 ] 

Alex Vigdor commented on LUCENE-831:


To be honest, the cache never successfully refilled before the patch - or at 
least I gave up after waiting 10 minutes.  I was about to give up on sorting.  
It could have to do with the fact that we're running with a relatively modest 
amount of RAM (768M) given our index size. But with the patch at least sorting 
is a realistic option!

I will look at adding the warming to my own code as you suggest; it is another 
peculiarity of this project that I can't know in the code what fields will be 
used for sorting, but I'll just track the searches coming through and aggregate 
any sorts they perform into a warming query.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649391#action_12649391
 ] 

Mark Miller commented on LUCENE-831:


bq. i haven't had any time to do further work on this issue ... partly because 
i haven't had a lot of time, but mainly because i'm hoping to get some feedback 
on the overall approach before any more serious effort investment. 

Wheres that investment Hoss? You've orphaned your baby. There is a fairly 
decent amount of feedback here.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649393#action_12649393
 ] 

Mark Miller commented on LUCENE-831:


I think this would actually be better if all cachekey types had to implement 
both ObjectArray access as well as primitive Array access. Makes the code 
cleaner and cuts down on the cachekey explosion. Should have done it this way 
to start, but couldnt see the forest through the trees back then i suppose.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-05 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653757#action_12653757
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. change norm caching to use new caches (if not the same

I think we could go even further, and [eventually] change norms to use an 
iterator API, which'd also have the same benefit of not requiring costly 
materialization of a full byte[] array for every doc in the index (ie, reopen() 
cost would be in proportion to changed segments not total index size).

Likewise field cache / stored fields / column stride fields could eventually 
open up an iterator API as well.  This API would be useful if eg in a custom 
HitCollector you wanted to look at a field's value in order to do custom 
filtering/scoring.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654055#action_12654055
 ] 

Michael McCandless commented on LUCENE-831:
---


[Note: my understanding of this area in general, and this patch in
particular, is still rather spotty... so please correct my
misconceptions in what follows...]

This change is a great improvement, since the cache management would
be per-IndexReader, and more public so that you could see what's
cached, access the cache via the reader, swap in your own cache
management, etc.

But I'm concerned, because this change continues the "materialize
massive array for entire index" approach, which is the major remaining
cost when (re)opening readers.  EG, isMergable()/mergeData() methods
build up the whole array from sub readers.

What would it take to never require materializing the full array for
the index, for Lucene's internal purposes (external users may continue
to do so if they want)?  Ie, leave the array bound to the "leaf"
IndexReader (ie, SegmentReader).  It was briefly touched on here:

  
https://issues.apache.org/jira/browse/LUCENE-1458?focusedCommentId=12650964#action_12650964

I realize this is a big change, but I think we need to get there
eventually.

EG I can see in this patch that MultiReader & MultiSegmentReader do
expose a CacheData that has get and get2 (why do we have get2?) that
delegate to child readers, which is good, but it's not good that they
return Object (requires casting for every lookup).  We don't have
per-atomic-type variants?  Couldn't we expose eg an IntData class (and
all other types) that has int get(docID) abstract method, that
delegate to child readers?  (I'm also generally confused by why we
have the per-atomic-type switching happening in CacheKey subclasses
and not CacheData.)

Then... and probably the hardest thing to fix here: for all the
comparators we now materialize the full array.  I realize we use the
full array when sorting during a search of an
IndexSearcher(MultiReader(...)), because FieldSortedHitQueue is called
for every doc visited and must be able to quickly make its comparison.

However, stepping back, this is poor approach.  We should instead be
doing what MultiSearcher does, which is gather top results
per-sub-reader, and then merge-sort the results.  At that point, to do
the merge, we only need actual field values for those docs in the top
N.

If we could fix field-sorting like that (and I'm hazy on exactly how
to do so), I think Lucene internally would then never need the full
array?

This change also adds USE_OA_SORT, which is scary to me because Object
overhead per doc can be exceptionally costly.  Why do we need to even
offer that?


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654057#action_12654057
 ] 

Michael McCandless commented on LUCENE-831:
---


One more thing here... while random-access lookup of a field's value
via MultiReader dispatch requires the binary search to find the right
sub-reader, I think in most internal uses, the access could be
switched to an iterator instead, in which case the lookup should be
far faster.

EG when sorting by field, we could pull say an IntData iterator from
the reader, and then access the int values in docID order as we visit
the docs.

For norms, which we should eventually switch to FieldCache +
column-stride fields, it would be the same story.

Accessing via iterator should go a long ways to reducing the overhead
of "using a method" instead of accessing the full array directly.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654064#action_12654064
 ] 

Mark Miller commented on LUCENE-831:


Ah, the dirty secret of 831 - there is plenty more to do :) I've been pushing 
it down the path, but I've expected radical changes to be needed before it goes 
in.

bq. But I'm concerned, because this change continues the "materialize massive 
array for entire index" approach, which is the major remaining cost when 
(re)opening readers. EG, isMergable()/mergeData() methods build up the whole 
array from sub readers.

Originally, 3.0 wasn't so close, so there was more concern with back 
compatibility than there might be now. I think the method call will be a slight 
slowdown no matter what as well...even with an iterator approach. Perhaps other 
"wins" will make up for it though. Its certainly cleaner to support 'one' mode.

bq. What would it take to never require materializing the full array for the 
index, for Lucene's internal purposes (external users may continue to do so if 
they want)? Ie, leave the array bound to the "leaf" IndexReader (ie, 
SegmentReader). 

I'm not sure I fully understand yet. If you use the ObjectArray mode, this is 
what happens right? Each sub array is bound to the IndexReader and MultiReader 
will distribute the requests to the right subreader. Only if you use the 
primitive arrays and merging do you get the full arrays (when not using 
USE_OA_SORT).

bq. I realize this is a big change, but I think we need to get there eventually.

Sounds good to me.

bq. (why do we have get2?) 

Because a StringIndex needs to access both the array of Strings and a second 
array indexing into that. None of the other types need to access two arrays.

bq. Couldn't we expose eg an IntData class (and all other types) that has int 
get(docID) abstract method, that delegate to child readers?

Yeah, I think this would be possible. If casting does indeed cost so much, this 
may bring things closer to the primitive array speed.

bq. I'm also generally confused by why we have the per-atomic-type switching 
happening in CacheKey subclasses and not CacheData.

>From Hoss' original design. What are your concerns here? The right key gets 
>you the right data :) I've actually mulled this over some, buts its too early 
>in the morning to remember I suppose. I'll look at it some more.

bq. If we could fix field-sorting like that (and I'm hazy on exactly how to do 
so), I think Lucene internally would then never need the full array?

That would be cool. Again, I'll try to explore in this direction. It doesn't 
need the full array when using the ObjectArray stuff now though (well, it kind 
of does, just split up over the readers).
 
bq. This change also adds USE_OA_SORT, which is scary to me because Object 
overhead per doc can be exceptionally costly. Why do we need to even offer that?

All this does at the moment (and I hate system properties, but for the moment, 
thats whats working) is switch between using the primitive arrays and merging 
or using the distributed ObjectArray for internal sorting. It defaults to using 
the primitive arrays and merging because its 5-10% faster than using the 
ObjectArrays. The ObjectArray approach is just an ObjectArray backed by an 
array for each Reader - a MultiReader distributes a requests for a doc field to 
the right Readers ObjectArray.

To your second comment...I'm gong to have to spend some more time :)

No worries though, this is very much a work in progress. I'd love to have it in 
by 3.0 though. Glad to see someone else taking more of an interest - very hard 
for me to find the time to dig into it all that often. I'll work with the code 
some as I can, thinking more about your comments, and perhaps I can come up 
with some better responses/ideas. 



> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Robert Newson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654069#action_12654069
 ] 

Robert Newson commented on LUCENE-831:
--

This enhancement is particularly interesting to me (and the application my team 
is building). I'm not sure how much time I can donate but since I'd likely have 
to enhance this area of Lucene for our app anyway and it would be better to 
have it in core, I'd like to help out where I can.

The patch applies more or less cleanly against 2.4.0, but not trunk, btw. Is it 
possible to get the patch committed to a feature branch off of trunk perhaps?

Finally, I'm most interested in the ability to make disk-backed caches. A very 
quick attempt to put the cache into JDBM failed as the CacheKey classes are not 
Comparable, which seems necessary for most kinds of disk lookup structures. 
SimpleMapCache uses a HashMap, which just needs equals/hashcode methods.

The other benefit to this approach is that it allows the data structures needed 
for sorting to be reused for range filtering. My application needs both, though 
on numeric field types (dates predominantly).

Finally, this might also be a good time to add first class support for 
non-String field types. The formatting that NumberTools supplies is 
incompatible with SortField (the former outputs in base 36, the latter parses 
with parseLong), so there's clearly been several approaches to the general 
problem. In my case, I wrote a new Document class with addLong(name, value), 
etc.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654105#action_12654105
 ] 

Uwe Schindler commented on LUCENE-831:
--

Maybe every asignee should tag his issues that are related to sorting to be 
related (or similar) to this one. I am thinking about the latest developments 
in LUCENE-1478 and LUCENE-1481

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654106#action_12654106
 ] 

Uwe Schindler commented on LUCENE-831:
--

Maybe we need two trunks or branches or whatever :-). One for 2.9 and one for 
3.0. This is a typical example for that in my opinion.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-06 Thread Robert Newson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654109#action_12654109
 ] 

Robert Newson commented on LUCENE-831:
--


The conflict was easy to resolve, it was just an FYI, I appreciate trunk 
changes rapidly. I was just wondering if a feature branch would make 
synchronization easier.

Some meta-task to combine or track these things would be great. I hit the 
problem that LUCENE-1478 describes. I'd previously indexed numeric values with 
NumberTools, which gives String-based sorting, the most memory-intensive one.

It seems with this field cache approach and the recent FieldCacheRangeFilter on 
trunk, that Lucene has a robust and coherent answer to performing efficient 
sorting and range filtering for float, double, short, int and long values, 
perhaps it's time to enhance Document. That might cut down the size of the API, 
which in turn makes it easy to test and tune. Document could preclude 
tokenization for such fields, I suspect I'm not the only one to build a 
type-safe replacement to Document.

For what it's worth, I'm currently indexed longs using String.format("%019d") 
and treating dates as longs (getTime()) coupled with a long[] version of 
FieldCacheRangeFilter. It achieves a similar goal to this task, the long[] used 
for sorting is the same as for range filtering. 


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654413#action_12654413
 ] 

Michael McCandless commented on LUCENE-831:
---


bq. It seems with this field cache approach and the recent 
FieldCacheRangeFilter on trunk, that Lucene has a robust and coherent answer to 
performing efficient sorting and range filtering for float, double, short, int 
and long values, perhaps it's time to enhance Document. That might cut down the 
size of the API, which in turn makes it easy to test and tune. Document could 
preclude tokenization for such fields, I suspect I'm not the only one to build 
a type-safe replacement to Document.

This is an interesting idea.  Say we create IntField, a subclass of
Field.  It could directly accept a single int value and not accept
tokenization options.  It could assert "not null", if the field wanted
that.  FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well.  If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields.  Etc


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654417#action_12654417
 ] 

Uwe Schindler commented on LUCENE-831:
--

{quote}This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert "not null", if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc{quote}

Maybe this document could also manage the encoding of these fields to the index 
format. With that it would be possible to extend Docuemnt, to automatically use 
my trie-based encoding for storing the raw term values. On the otrher hand 
RangeQuery would be aware of the field encoding and can switch dynamically to 
the correct search/sort algorithm. Great!


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Robert Newson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654418#action_12654418
 ] 

Robert Newson commented on LUCENE-831:
--


Yes, something like that. I made a Document class with an add method for each 
primitive type which allowed only the sensible choices for Store and Index. 
Field subclasses would achieve the same thing. A subclass per primitive type 
might be excessive, they'd be 99% identical to each other. A NumericField that 
could hold a single short, int, long, float, double or Date might be enough 
(new NumericField(name, 99.99F, true), the final boolean toggling YES/NO for 
Store, since Index is always UNANALYZED_NO_NORMS).

Adding this to FieldInfo would change the on-disk format such that it remembers 
that a particular field is of a special type?  That way all the places that 
Lucene currently has a multiplicity of classes or constants (SortField.INT, 
etc) could be eliminated, replaced by first class support in Document/Field.

A remaining question would be whether field name is sufficient for uniqueness, 
I suggest it becomes fieldname+type. This also implies changes to the Query and 
Filter hierarchy. 

If it helps, I can post my Document class, which had helper methods for 
RangeFilter and TermQuery's for each type. It's not a complicated class, you 
can probably already picture it.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654488#action_12654488
 ] 

Jason Rutherglen commented on LUCENE-831:
-

M. McCandless:

"This is an interesting idea. Say we create IntField, a subclass of
Field. It could directly accept a single int value and not accept
tokenization options. It could assert "not null", if the field wanted
that. FieldInfo could store that it's an int and expose more stronly
typed APIs from IndexReader.document as well. If in the future we
enable Term to be things-other-than-String, we could do the right
thing with typed fields. Etc"

+1 For 3.0 this will be of great benefit in the effort to remove the excessive 
string creation
that happens right now with Lucene.  Term should also be more generic such that 
it
can also accept primitive or user defined types (and index format encodings).  

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654820#action_12654820
 ] 

Michael McCandless commented on LUCENE-831:
---

Marvin, does KS/Lucy have something like FieldCache?  If so, what API do you 
use?  Is it iterator-only?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655081#action_12655081
 ] 

Marvin Humphrey commented on LUCENE-831:


> Marvin, does KS/Lucy have something like FieldCache? If so, what API do you
> use? Is it iterator-only? 

At present, KS only caches the docID -> ord map as an array.  It builds that
array by iterating over the terms in the sort field's Lexicon and mapping the
docIDs from each term's posting list.

Building the docID -> ord array is straightforward for a single-segment
SegLexicon.  The multi-segment case requires that several SegLexicons be
collated using a priority queue.  In KS, there's a MultiLexicon class which
handles this; I don't believe that Lucene has an analogous class.

Relying on the docID -> ord array alone works quite well until you get to the
MultiSearcher case.  As you know, at that point you need to be able to
retrieve the actual field values from the ordinal numbers, so that you can
compare across multiple searchers (since the ordinal values are meaningless).

{code}
Lex_Seek_By_Num(lexicon, term_num);
field_val = Lex_Get_Term(lexicon);
{code}

The problem is that seeking by ordinal value on a MultiLexicon iterator
requires a gnarly implementation and is very expensive.  I got it working, but
I consider it a dead-end design and a failed experiment.

The planned replacement for these iterator-based quasi-FieldCaches involves
several topics of recent discussion:

  1) A "keyword" field type, implemented using a format similar to what Nate 
 and I came up with for the lexicon index.
  2) Write per-segment docID -> ord maps at index time for sort fields.
  3) Memory mapping.
  4) Segment-centric searching.

We'd mmap the pre-composed docID -> ord map and use it for intra-segment
sorting.  The keyword field type would be implemented in such a way that we'd
be able to mmap a few files and get a per-segment field cache, which we'd then
use to sort hits from multiple segments.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655988#action_12655988
 ] 

Michael McCandless commented on LUCENE-831:
---


{quote}
> At present, KS only caches the docID -> ord map as an array. It builds that
> array by iterating over the terms in the sort field's Lexicon and mapping the
> docIDs from each term's posting list.
{quote}

OK, that corresponds to the "order" array in Lucene's
FieldCache.StringIndex class.

{quote}
> Building the docID -> ord array is straightforward for a single-segment
> SegLexicon. The multi-segment case requires that several SegLexicons be
> collated using a priority queue. In KS, there's a MultiLexicon class which
> handles this; I don't believe that Lucene has an analogous class.
{quote}

Lucene achieves the same functionality by using a MultiReader to read
the terms in order (which uses MultiSegmentReader.MultiTermEnum, which
uses a pqueue under the hood) and building up StringIndex from that.
It's very costly.

{quote}
> Relying on the docID -> ord array alone works quite well until you get to the
> MultiSearcher case. As you know, at that point you need to be able to
> retrieve the actual field values from the ordinal numbers, so that you can
> compare across multiple searchers (since the ordinal values are meaningless).
{quote}

Right, and we are trying to move towards pushing searcher down to the
segment.  Then we can use the per-segment ords for within-segment
collection, and then the real values for merging the separate pqueues
at the end (but, initial results from LUCENE-1483 show that collecting
N queues then merging in the end adds ~20% slowdown for N = 100
segments).

{quote}
> Lex_Seek_By_Num(lexicon, term_num);
> field_val = Lex_Get_Term(lexicon);
> 
> The problem is that seeking by ordinal value on a MultiLexicon iterator
> requires a gnarly implementation and is very expensive. I got it working, but
> I consider it a dead-end design and a failed experiment.
{quote}

OK.

{quote}
> The planned replacement for these iterator-based quasi-FieldCaches involves
> several topics of recent discussion:
> 
> 1) A "keyword" field type, implemented using a format similar to what Nate
> and I came up with for the lexicon index.
> 2) Write per-segment docID -> ord maps at index time for sort fields.
> 3) Memory mapping.
> 4) Segment-centric searching.
> 
> We'd mmap the pre-composed docID -> ord map and use it for intra-segment
> sorting. The keyword field type would be implemented in such a way that we'd
> be able to mmap a few files and get a per-segment field cache, which we'd then
> use to sort hits from multiple segments.
{quote}

OK so your "keyword" field type would expose random-access to field
values by docID, to be used to merge the N segments' pqueues into a
single final pqueue?

The alternative is to use iterator but pull the values into your
pqueues when they are inserted.  The benefit is iterator-only
exposure, but the downside is likely higher net cost of insertion.
And if the "assumption" is these fields can generally be ram resident
(explicitly or via mmap), then the net benefit of iterator-only API is
not high.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-12 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656150#action_12656150
 ] 

Marvin Humphrey commented on LUCENE-831:


>> Building the docID -> ord array is straightforward for a single-segment
>> SegLexicon. The multi-segment case requires that several SegLexicons be
>> collated using a priority queue. In KS, there's a MultiLexicon class which
>> handles this; I don't believe that Lucene has an analogous class.
> 
> Lucene achieves the same functionality by using a MultiReader to read
> the terms in order (which uses MultiSegmentReader.MultiTermEnum, which
> uses a pqueue under the hood) and building up StringIndex from that.
> It's very costly.

Ah, you're right, that class is analogous.  The difference is that
MultiTermEnum doesn't implement seek(), let alone seekByNum().  I was pretty
sure you wouldn't have bothered, since by loading the actual term values into
an array you eliminate the need for seeking the iterator.

> OK so your "keyword" field type would expose random-access to field
> values by docID, 

Yes.  There would be three files for each keyword field in a segment.

  * docID -> ord map.  A stack of i32_t, one per doc.
  * Character data.  Each unique field value would be stored as uncompressed
UTF-8, sorted lexically (by default).
  * Term offsets.  A stack of i64_t, one per term plus one, demarcating the 
term text boundaries in the character data file.

Assuming that we've mmap'd those files -- or slurped them -- here's the
function to find the keyword value associated with a doc num:

{code}
void
KWField_Look_Up(KeyWordField *self, i32_t doc_num, ViewCharBuf *target)
{
if (doc_num > self->max_doc) {
CONFESS("Doc num out of range: %u32 %u32", 
}
else {
i64_t offset  = self->offsets[doc_num];
i64_t next_offset = self->offsets[doc_num + 1];
i64_t len = next_offset - offset;
ViewCB_Assign_Str(target, self->chardata + offset, len);
}
}
{code}

I'm not sure whether IndexReader.fetchDoc() should retrieve the values for
keyword fields by default, but I lean towards yes.  The locality isn't ideal,
but I don't think it'll be bad enough to contemplate storing keyword values
redundantly alongside the other stored field values.

> to be used to merge the N segments' pqueues into a
> single final pqueue?

Yes, although I think you only need one two priority queues total: one
dedicated to iterating intra-segment, which gets emptied out after each
seg into the other, final queue.

> The alternative is to use iterator but pull the values into your
> pqueues when they are inserted. The benefit is iterator-only
> exposure, but the downside is likely higher net cost of insertion.
> And if the "assumption" is these fields can generally be ram resident
> (explicitly or via mmap), then the net benefit of iterator-only API is
> not high.

If I understand where you're going, you'd like to apply the design of the
deletions iterator to this problem?

For that to work, we'd need to store values for each document, rather than
only unique values... right?  And they couldn't be stored in sorted order,
because we aren't pre-sorting the docs in the segment according to the value
of a keyword field -- which means string diffs don't help.  You'd have a
single file, with each doc's values encoded as a vbyte byte-count followed by
UTF-8 character data.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached dat

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-16 Thread Robert Newson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657151#action_12657151
 ] 

Robert Newson commented on LUCENE-831:
--

I was wondering if the next version of the patch could include a sample 
disk-based cache? It seems that CacheKey classes are fine for an in-memory 
HashMap (since SimpleMapCache works just fine) but I wonder if equals/hashCode 
is sufficient when the data is on disk?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657404#action_12657404
 ] 

Mark Miller commented on LUCENE-831:


Thanks Jeremey! Thats great news. 

I think the issues is going to heavily affected by LUCENE-1483, so once thats 
done, I hope I'll be able to come right back to this.

bq. Looking at the getCachedData method for MultiReader and MultiSegmentReader, 
it doesn't appear that the CacheData objects from merge operations are cached. 
Is there any reason for this?

I think its just not done yet - I havn't looked at the merging since I started 
playing with the ObjectArrays - I think most of this stuff becomes moot with 
1483 though - this will turn more into an API overhaul than an IndexReader 
reopen time saver.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657409#action_12657409
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
>  this will turn more into an API overhaul than an IndexReader reopen time 
> saver.
{quote}
...and given the progress on LUCENE-1483 (copying values into the sort queues), 
I think this new FieldCache API should probably be primarily an iteration API.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Jeremy Volkman (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657401#action_12657401
 ] 

Jeremy Volkman commented on LUCENE-831:
---

A couple things:

# Looking at the getCachedData method for MultiReader and MultiSegmentReader, 
it doesn't appear that the CacheData objects from merge operations are cached.  
Is there any reason for this?
# I've written a merge method for StringIndexCacheKey. The process isn't all 
that complicated (apart from all of the off-by-ones), but it's expensive.

{code:java}
  public boolean isMergable() {
return true;
  }

  private static class OrderNode {
  int index;
  OrderNode next;
  }
  
  public CacheData mergeData(int[] starts, CacheData[] data) 
  throws UnsupportedOperationException {
int[] mergedOrder = new int[starts[starts.length - 1]];
// Lookup map is 1-based
String[] mergedLookup = new String[starts[starts.length - 1] + 1];

// Unwrap cache payloads and flip order arrays
StringIndex[] unwrapped = new StringIndex[data.length];

/* Flip the order arrays (reverse indices and values)
 * Since the ord map has a many-to-one relationship with the lookup table,
 * the flipped structure must be one-to-many which results in an array of
 * linked lists.
 */
OrderNode[][] flippedOrders = new OrderNode[data.length][];
for (int i = 0; i < data.length; i++) {
StringIndex si = (StringIndex) data[i].getCachePayload();
unwrapped[i] = si;
flippedOrders[i] = new OrderNode[si.lookup.length];
for (int j = 0; j < si.order.length; j++) {
OrderNode a = new OrderNode();
a.index = j;
a.next = flippedOrders[i][si.order[j]];
flippedOrders[i][si.order[j]] = a;
}
}

// Lookup map is 1-based
int[] lookupIndices = new int[unwrapped.length];
Arrays.fill(lookupIndices, 1);

int lookupIndex = 0;
String currentVal;
int currentSeg;
while (true) {
currentVal = null;
currentSeg = -1;
int remaining = 0;
// Find the next ordered value from all the segments
for (int i = 0; i < unwrapped.length; i++) {
if (lookupIndices[i] < unwrapped[i].lookup.length) {
remaining++;
String that = unwrapped[i].lookup[lookupIndices[i]];
if (currentVal == null || currentVal.compareTo(that) > 0) {
currentVal = that;
currentSeg = i;
}
}
}
if (remaining == 1) {
break;
} else if (remaining == 0) {
/* The only way this could happen is if there are 0 segments or if
 * all segments have 0 terms. In either case, we can return
 * early.
 */
return new CacheData(new StringIndex(
new int[starts[starts.length - 1]], new String[1]));
}
if (!currentVal.equals(mergedLookup[lookupIndex])) {
lookupIndex++;
mergedLookup[lookupIndex] = currentVal;
}
OrderNode a = flippedOrders[currentSeg][lookupIndices[currentSeg]];
while (a != null) {
mergedOrder[a.index + starts[currentSeg]] = lookupIndex;
a = a.next;
}
lookupIndices[currentSeg]++;
}
{code}



> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibilit

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-02-16 Thread Jeremy Volkman (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674023#action_12674023
 ] 

Jeremy Volkman commented on LUCENE-831:
---

Are there still things planned for this issue now that LUCENE-1483 has been 
committed? 

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-02-17 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674255#action_12674255
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Are there still things planned for this issue now that LUCENE-1483 has been 
committed? 

Good question... I think it'd still be nice to 1) have the IndexReader
expose the API for accessing the FieldCache values, 2) allow for
customization of the caching policy.

Though maybe we should hold off on those changes until we do
LUCENE-1231 (column stride fields), which I think would use exactly
the same API with the only difference being whether under-the-hood
there was a more efficient (column-stride storage) representation for
the field values vs the slower uninvert & resort (for StringIndex)
approach that FieldCache does today.

Also, in the new API I'd like to make it not-so-easy to materialize
the full array.  I think it's OK to ask for the full array of a
sub-reader, but if you want to access @ the MultiReader level, we
should encourage either random access getX(int docID), iteration or
get-sub-arrays and append yourself.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-26 Thread Michael Busch (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689620#action_12689620
 ] 

Michael Busch commented on LUCENE-831:
--

Shall we try to gather all requirements we have for this feature here?
I think recently new requirements were mentioned and they are now scattered 
accross different issues and email threads.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-26 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689624#action_12689624
 ] 

Uwe Schindler commented on LUCENE-831:
--

I will attach my comments regarding the problem with the TrieRangeFilter and 
sorting (stop collecting terms into cache when lower precisions begin or only 
collect terms using a specific range (like a range filter). So you could fill a 
FieldCache and specify a starting term and ending term, all terms inbetween 
could be put into the cache, others outside left out. In this way, it would be 
possible to just use TrieUtils.prefixCodeLong() to specify the upper and lower 
integer bound encoded in the highest precision.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-26 Thread Tim Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689629#action_12689629
 ] 

Tim Smith commented on LUCENE-831:
--

One requirement i would like to request is the ability to attach an arbitrary 
object to each Segment.
This will allow people using lucene to store any arbitrary per segment caches 
and statistics that their application requires (fully free form)

Would like to see the following:
* add SegmentReader.setCustomCacheManager(CacheManager m) // mabye add a string 
for a CacheManager id (to allow registration of multiple cache managers)
* add SegmentReader.getCustomCacheManager() // to allow accessing the manager

CacheManager should be a very light interface (just a close() method that is 
called when the SegmentReader is closed)



> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-27 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689863#action_12689863
 ] 

Michael McCandless commented on LUCENE-831:
---

I'd like to see the new FieldCache API de-emphasize "get me a single array 
holding all values for all docs in the index" for a MultiReader. That 
invocation is exceptionally costly in the context of reopened readers, and 
providing the illusion that one can simply get this array is dangerous. It's a 
"leaky API", like how virtual memory API pretends you can use more memory than 
is physically available. 

I think it's OK to return an array-of-arrays (ie, one contiguous array per 
underlying segment); if the app really wants to make a massive array & 
concatenate it, they can do so outside of the FieldCache API. 

Or an method you call to get a given doc's value (like DocValues in 
o.a.l.search.function). Or an iterator API to step through all the values. 

We should also set this API up as much as possible for LUCENE-1231. Ie, the 
current "un-invert the field" approach that FieldCache takes is merely one 
source of values per doc. Column stride fields in the future will be a 
different (faster) source of values, that should be able to "just plug in" 
under the hood somehow to this same exposure API. 

On Uwe's suggestion for some flexibility on how the un-inversion takes place, I 
think allowing differing degrees of extension makes sense. EG we already allow 
you to provide a custom parser. We need to allow control on whether a given 
value replaces the already-seen value (LUCENE-1372), or whether to stop the 
looping early (Uwe's needs for improving Trie). We should also allow outright 
entire custom class that creates the value array.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-27 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689867#action_12689867
 ] 

Earwin Burrfoot commented on LUCENE-831:


Adding to Tim, I'd like to see the ability not only to be notified of 
SegmentReader destruction, but of SegmentReader creation (within reopen) too. 
And new FieldCache logic should be built on these notifications.
Then it's possible to extend/replace Lucene's native FieldCache, then it's 
possible to create a cache specialized for trie-fields.

Linking objects to each other with WeakHashmaps is insanely evil, especially in 
the case when object creation/destruction is clearly visible.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-28 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693464#action_12693464
 ] 

Michael McCandless commented on LUCENE-831:
---

Let's make sure the new API fixes LUCENE-1579.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697485#action_12697485
 ] 

Mark Miller commented on LUCENE-831:


{quote}
I'd like to see the new FieldCache API de-emphasize "get me a single array 
holding all values for all docs in the index" for a MultiReader. That 
invocation is exceptionally costly in the context of reopened readers, and 
providing the illusion that one can simply get this array is dangerous. It's a 
"leaky API", like how virtual memory API pretends you can use more memory than 
is physically available.

I think it's OK to return an array-of-arrays (ie, one contiguous array per 
underlying segment); if the app really wants to make a massive array & 
concatenate it, they can do so outside of the FieldCache API. 
{quote}

Is there much difference in one massive array or an array of arrays? Its just 
as much space and just as dangerous, right? Some apps will need random access 
to the field cache for any given document right? Don't we always have to 
support that in some way, and won't it always be severely limited by RAM (until 
IO is as fast)?

I like the idea of an iterator API, but it seems we will still have to provide 
random access with all its problems, right?

{quote}
We should also set this API up as much as possible for LUCENE-1231. Ie, the 
current "un-invert the field" approach that FieldCache takes is merely one 
source of values per doc. Column stride fields in the future will be a 
different (faster) source of values, that should be able to "just plug in" 
under the hood somehow to this same exposure API.
{quote}

Definitely.

{quote}
On Uwe's suggestion for some flexibility on how the un-inversion takes place, I 
think allowing differing degrees of extension makes sense. EG we already allow 
you to provide a custom parser. We need to allow control on whether a given 
value replaces the already-seen value (LUCENE-1372), or whether to stop the 
looping early (Uwe's needs for improving Trie). We should also allow outright 
entire custom class that creates the value array.
{quote}

Allow a custom field cache loader for each type?


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697492#action_12697492
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Is there much difference in one massive array or an array of arrays? Its 
just as much space and just as dangerous, right?

One massive array is far more dangerous during reopen() (ie that's why we did 
LUCENE-1483) since it's non-incremental.

Array-per-segment I think is OK.

But yes both of them consume RAM, but I don't consider that "dangerous".

bq. I like the idea of an iterator API, but it seems we will still have to 
provide random access with all its problems, right?

Right.

bq. Some apps will need random access to the field cache for any given document 
right?

Yes but I think such apps should move to the per-segment model (eg a Filter's 
getDocIdSet is called per segment reader).

If an app really wants to make a single massive array, they can certainly do 
so, outside of Lucene.

bq. Allow a custom field cache loader for each type?

Yes, possibly w/ different degrees of extension (much like Collector).  EG 
maybe you just want to override how you parse an int, or maybe you want to take 
control over the entire uninversion.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697495#action_12697495
 ] 

Mark Miller commented on LUCENE-831:


{quote}
One massive array is far more dangerous during reopen() (ie that's why we did 
LUCENE-1483) since it's non-incremental.

Array-per-segment I think is OK.

But yes both of them consume RAM, but I don't consider that "dangerous".
{quote}

Okay, I got you now. We will force everyone to migrate to use fieldcache at the 
segment level rather than MR (or create their own array from the subarrays).

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697519#action_12697519
 ] 

Mark Miller commented on LUCENE-831:


bq. But yes both of them consume RAM, but I don't consider that "dangerous".

I guess you meant dangerous as in dangerous to reopen then? I actually thought 
you meant as in dangerous because it could require too may resources. Dangerous 
is a tough to pin down word ;)

So what are the advantages of the iterator API again then? It not likely you 
are going to stream the values, and random access will likely still have a use 
as mentioned.

Just trying to get a clearer picture in my head - I doubt I'll have time, but 
I'd love to put a little sweat into this issue.

It probably makes sense to start from one of Hoss's original patches or even 
from scratch.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697540#action_12697540
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
I guess you meant dangerous as in dangerous to reopen then? I actually thought 
you meant as in dangerous because it could require too may resources. Dangerous 
is a tough to pin down word 
{quote}
Dangerous is a dangerous word ;)

I meant: I don't like exposing non-performant APIs; they are sneaky traps.  (EG 
TermEnum.skipTo is another such API).

bq. So what are the advantages of the iterator API again then?

The big advantage is the possibility of backing it with eg an IndexInput, so 
that the values need to all be in RAM for one segment.  Though, as Lucy is 
doing, we could leave things on disk and still have random access via mmap, 
which perhaps should be an option for Lucene as well.  However, iterator only 
messes up out-of-order scoring (BooleanScorer for OR queries), so I'm 
tentatively leaning against iterator only at this point.

bq. but I'd love to put a little sweat into this issue.

That would be AWESOME (if you can somehow make time)!

We should hash out the design a bit before figuring out how/where to start.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697812#action_12697812
 ] 

Mark Miller commented on LUCENE-831:


Some random thoughts:

If we are going to allow random access, I like the idea of sticking with the 
arrays. They are faster than hiding behind a method, and it allows easier 
movement from the old API. It would be nice if we can still deprecate all of 
that by backing it with the new impl (as done with the old patch).

The current API (from this patch) still looks fairly good to me - a given 
cachekey gets your data, and knows how to construct it. You get data something 
like: return (byte[]) reader.getCachedData(new ByteCacheKey(field, parser)). It 
could be improved, but it seems a good start to me.

The immediate problem I see is how to handle multireader vs reader. Not being 
able to treat them the same is a real pain. In the segment case, you just want 
an array back, in the multi-segment perhaps an array of arrays? Or unsupported? 
I havn't thought of anything nice.

We have always been able to customize a lot of behavior with our custom sort 
types - I guess the real issue is making the built in sort types customizable. 
So I guess we need someway to say, use this "cachekey" for this built in type?

When we load the new caches in FieldComparator, can we count on those being 
segmentreaders? We can Lucene wise, but not API wise right? Does that matter? I 
suppose its really tied in with the multireader vs reader API.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697820#action_12697820
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
If we are going to allow random access, I like the idea of sticking
with the arrays. They are faster than hiding behind a method, and it
allows easier movement from the old API.
{quote}

I agree.

{quote}
It would be nice if we can
still deprecate all of that by backing it with the new impl (as done
with the old patch).
{quote}

That seems fine?

bq. The current API (from this patch) still looks fairly good to me - a given 
cachekey gets your data, and knows how to construct it. You get data something 
like: return (byte[]) reader.getCachedData(new ByteCacheKey(field, parser)). It 
could be improved, but it seems a good start to me.

Agreed.

bq. The immediate problem I see is how to handle multireader vs reader. Not 
being able to treat them the same is a real pain. In the segment case, you just 
want an array back, in the multi-segment perhaps an array of arrays? Or 
unsupported? I havn't thought of anything nice.

I would lean towards throwing UOE, and suggesting that you call
getSequentialReaders instead.

Eg with the new getUniqueTermCount() we do that.

bq. We have always been able to customize a lot of behavior with our custom 
sort types - I guess the real issue is making the built in sort types 
customizable. So I guess we need someway to say, use this "cachekey" for this 
built in type?

I don't quite follow that last sentence.

We'll have alot of customizability here, ie, if you want to change how
String is parsed to int, if you want to fully override how uninversion
works, etc.  At first the core will only support uninversion as a
source of values, but once CSF is online that should be an alternate
pluggable source, presumably plugging in the same way that
customization would allow you to override uninversion.

bq. When we load the new caches in FieldComparator, can we count on those being 
segmentreaders? We can Lucene wise, but not API wise right? Does that matter? I 
suppose its really tied in with the multireader vs reader API.

Once getSequentialSubReaders() is called (and, recursively if needed),
then those "atomic" readers should be able to provide values.  I guess
that's the contract we require of a given IndexReader impl?


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697822#action_12697822
 ] 

Mark Miller commented on LUCENE-831:


{quote}
bq. We have always been able to customize a lot of behavior with our custom 
sort types - I guess the real issue is making the built in sort types 
customizable. So I guess we need someway to say, use this "cachekey" for this 
built in type?

I don't quite follow that last sentence.

We'll have alot of customizability here, ie, if you want to change how
String is parsed to int, if you want to fully override how uninversion
works, etc.  At first the core will only support uninversion as a
source of values, but once CSF is online that should be an alternate
pluggable source, presumably plugging in the same way that
customization would allow you to override uninversion.
{quote}

Right - since a custom cachekey builds the array from a reader, you can pretty 
much do anything. What I meant was that you could do anything before with a 
custom sort type as well - the problem was that you could not say use this 
custom sort type when sorting on a built in type (eg INT, BYTE, STRING). So 
thats all we need, right? A way to say, use this builder (cachekey) for LONG, 
use this one for INT, etc. When we get CSF, you would set it to use cachekeys 
that built arrays from that data.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697830#action_12697830
 ] 

Michael McCandless commented on LUCENE-831:
---

bq.  A way to say, use this builder (cachekey) for LONG, use this one for INT, 
etc. When we get CSF, you would set it to use cachekeys that built arrays from 
that data.

That sounds right, though it'd presumably be field dependent rather than 
relying on only the native type?  Ie I may have 3 fields that should load 
long[]'s, but each has its own custom decoding to be done.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-10 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697832#action_12697832
 ] 

Mark Miller commented on LUCENE-831:


Yes, good point. Okay, I think I have a much clearer picture of what needs to 
be done - this may be less work than I thought - a lot of what has been done is 
probably still helpful.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698089#action_12698089
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Iterate iterate iterate I suppose.

Here here!  Ready, set, GO!

bq. We may need the multireader to do the full array for back compat.

Can't we just create the "make massive array & copy sub arrays in"
inside the old FieldCache?  (And deprecate the old FieldCache
entirely).

bq. I dont think any of the custom type stuff is setup to work yet.

How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc.  (Just like
ExtendedFieldCache).

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion.  Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single "enum terms & terms docs" loop that
switches out to a "handler" to deal with each term & doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it?  It would default to
CachedValueSource(UninversionValueSource()).  I think we should
require that you set this on opening the reader, and you can't later
change it.

This would mean a single CachingValueSource can be used for more than
one reader, which is good because IndexReader.open would send it down
to all SegmentReaders it opens.

This would then replace *CacheKey.

This approach is not that different from what we have today, but I
think there are important differences:

  * Decouple value generation (ValueSource) from caching

  * Tell IndexReader what its ValueSource is, so eg when you do
sorting the sort pulls from your ValueSource and not a global
default one.

  * Hopefully don't duplicate so much code (eg uninversion)

Other thoughts:

  * Presumably, at this point, the arrays returned by field cache
should be considered readonly by the app, right?  So cloning
a reader should simply make a shallow clone of the cache.  (Longer
term, with CSF as the source, we think updating fields should be
possible, so we'd need a copy-on-write solution, like we now do w/
deleted docs).

  * Looks like some accidental regressions snuck in, eg in
DirIndexReader:
{code}
-final String[] files = dir.listAll();
+final String[] files = dir.list();
{code}
and in IndexReader:
{code}
protected IndexReader(Directory directory) {
 this();
-this.directory = directory;
}
{code}

  * Do we even need ComparatorFactory*?  Seems like this patch
shouldn't be be in the business of creating comparators.

  * You should hit UOE if you try to getXXX() on a MultiReader

  * Shouldn't FieldCache be deprecated entirely?  I would think, going
forward, I interact only w/ the IndexReader's default ValueSource?


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there i

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698096#action_12698096
 ] 

Mark Miller commented on LUCENE-831:


Thanks Mike! Everything makes sense on first read, so I'll work in that 
direction.

In regards to FieldCache - yes it def will be deprecated (pretty rough patch to 
start so I might have missed some of that - I know plenty is undone).

As far as back compat with it, I was trying to make it so that if you happened 
to have code that still used it, and you used the new cache, you woudn't double 
your mem needs (the original point of backing FieldCache with the new API). 
Thats pretty restrictive though - indeed, things look much nicer if we don't 
attempt that (and in the case of a MultiReader cache array, we wouldn't be able 
to avoid it anyway, so I guess it does make sense we don't worry about it so 
much).

bq. Do we even need ComparatorFactory*?

Probably not then - I didn't really touch any of the custom type stuff yet. I 
mainly just got all of the sort tests except remote and custom to pass (though 
tests elsewhere are still failing).

I'll make another push with your suggestions.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698101#action_12698101
 ] 

Uwe Schindler commented on LUCENE-831:
--

{quote}
How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc. (Just like
ExtendedFieldCache).

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion. Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single "enum terms & terms docs" loop that
switches out to a "handler" to deal with each term & doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it? It would default to
CachedValueSource(UninversionValueSource()). I think we should
require that you set this on opening the reader, and you can't later
change it.
{quote}

I like this idea, but i am a little bit concerned about only one ValueSource 
for the Reader. This makes plugging in different sources for different field 
types hard.

E.g.: One have a CSF and a TrieField and several normal int/float fields. For 
each of these fields he needs another ValueSource. The CSF field can be loaded 
from Payloads, the TrieField by decoding the prefix encoded values and the 
others like it is now.

So the IndexReaders ValueSource should be a Map of FieldValueSources, so the 
user could register FieldValueSources for different field types.

The idea of setting the ValueSource for the IndexReader is nice, we then could 
simply remove the extra SortField constructors, I added in LUCENE-1478, as it 
would be possible to specify the type for Sorting when creating the 
IndexReader. Search code then would simply say, sort by fields a, b, c without 
knowing what type of field it is. The sort code would get the type and the 
arrays from the underlying IndexReaders.

The same with the current function query value sources (just a question: are 
the functions query value sources then obsolete and can be merged with the 
"new" ValueSource)?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698104#action_12698104
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. E.g.: One have a CSF and a TrieField and several normal int/float fields. 
For each of these fields he needs another ValueSource.

Couldn't we make a "PerFieldValueSource" impl to handle this?  (And leave the 
switching logic out of IndexReader).

I think another useful ValueSource would be one that first consults CSF and 
uses that, if present, else falls back to the uninversion source.

bq.  The sort code would get the type and the arrays from the underlying 
IndexReaders.

I'm not sure this'll work -- IndexReader still won't know what type to ask for, 
for a given field?

bq. are the functions query value sources then obsolete and can be merged with 
the "new" ValueSource

Well, the API is a little different (this API returns int[], but the function 
query's ValueSource has int intVal(int docID)), so I think we'd wrap the new 
API to match function query's  (ie, cut over function query's use of FieldCache 
to this new FieldCache).

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698135#action_12698135
 ] 

Mark Miller commented on LUCENE-831:


Any ideas on where parser fits in with valuesource? Its easy enough to kind of 
keep it how it is, but then what if CSF can be stored as a byte rep of an int 
or something? parse(String) won't make any sense. If we move Parser up to an 
Impl of valuesource, we have to special case things -

Any thoughts? Just stick with allowing the passing of a 'String to type' Parser 
and worry about possible byte handling later? A different parser object of some 
kind?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-11 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698151#action_12698151
 ] 

Mark Miller commented on LUCENE-831:


or parsing is just done by the FieldValue implementation, with overrides or 
something? To change parsers you override UnivertedValuedSource returning your 
parsers in the callbacks (or something similiar?)

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698196#action_12698196
 ] 

Michael McCandless commented on LUCENE-831:
---

> I like this idea, but i am a little bit concerned about only one ValueSource 
> for the Reader. 

Thinking more about this...

Over in KS/Lucy, the approach Marvin is taking is something called a
FieldSpec, to define the "extended type" for a field.  The idea is to
strongly decouple a field's type from its value, allowing that type to
be shared across different fields & instances of the same field.

So in KS/Lucy, presumably IndexReader would simply consult the
FieldSpec for a given field, to determine which ValueSource impl is
responsible for producing values for this field.

Right now details for a field are scattered about (PerFieldValueSource
and PerFieldAnalyzerWrapper and Field.Index/Store/TermVector.*,
FieldInfo, etc.). This then requires alot of app-level code to
properly use Trie* fields -- you have to use Trie* to analyze the
field, use Trie* to construct the query, use PerFieldValueSource to
populate the FieldCache, etc.

Maybe, as part of the cleanup of our three *Field classes, and index
vs search time documents, we should make steps towards having a
consolidated class that represents the "extended type" of a field.
Then in theory one could make a Field, attach a NumericFieldType() to
it (after renaming Trie* -> Numeric*), and then everything would
default properly.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698195#action_12698195
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Any ideas on where parser fits in with valuesource?

I think the UninversionValueSource would accept a custom parser (String -> 
native type), like what's done today.

Maybe it should also allow stopping the loop early (which Trie* uses), or 
perhaps outright overriding of the inversion loop itself (which if we make that 
class subclass-able should be simple).

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698214#action_12698214
 ] 

Mark Miller commented on LUCENE-831:


{quote}
I think the UninversionValueSource would accept a custom parser (String -> 
native type), like what's done today.

Maybe it should also allow stopping the loop early (which Trie* uses), or 
perhaps outright overriding of the inversion loop itself (which if we make that 
class subclass-able should be simple).
{quote}

Its the accepting that seems tricky though. If the getInts() calls take the 
parser, you have to use instanceof code to work with ValueSource.  Thats why I 
was thinking maybe callbacks - if a new type is added you just add a new one 
returning a default parser. Then you can just extend and replace the parsers 
you want to. I wasn't a big fan of that idea, but I am not sure of a nice, 
clean, extensible way to specify a bunch of parsers to UninversionValueSource 
that allows the API to cleanly be used from ValueSource. It already kind of 
seemed annoying that you would have to set a new ValueSource on the reader just 
to specify different parsers. I guess at least that has to be accepted though.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698215#action_12698215
 ] 

Earwin Burrfoot commented on LUCENE-831:


I'm using a similar approach.

There's a FieldType, that governs conversions from Java type into Lucene 
strings and declares 'abilities' of that type. Like - conversion is 
order-preserving (all numerics + some others), converted values can be 
meaningfully prefix-searched (like TreeId, that is essentially an int[], used 
to represent things like nested category trees). Some types can also declare 
themselves as derivatives of others, like DateType being derived from LongType.

Then there's a FieldInfo, that defines field name, FieldType used for it, and 
actions we're going to take on the field. E.g. if we want to sort on it, build 
clusters with certain characteristics, load values for this field for each 
found document, use fast rangefilters, store/filter on field being 
null/notnull, apply transforms on the field before storing/searching, copy 
value of the field to another field (with probable transformation) when 
indexing, etc. From FieldType and desired actions, FieldInfo is able to deduce 
tokenize/index/store/cache behaviour, and can say that additional lucene fields 
are required (e.g. for handling null/notnull searches, or trie ranges, or a 
special sort-form).

Then there's an interface that contains FieldInfo constants and a special 
constant FieldEnum FIELDS = fieldsOf(ResumeFields.class); that is essentially a 
navigable list of all FieldInfos defined in this interface and interfaces it 
extends (allows me to have CommonFields + ResumeFields extends CommonFields, 
VacancyFields extends CommonFields).

FieldType, and consequently FieldInfo is type-parameterized with the java type 
associated with the field, so you get the benefit of type-safety when 
storing/loading/searching the field. All 
Filters/Queries/Sorters/Loaders/Document accept FieldInfo instead of String for 
field name, so for example Filters.Range(field, fromValue, fromInclusive, 
toValue, toInclusive) knows whether to use a simple range filter or a trie one, 
ensures from/toValues are of a proper type and converts them properly. 
Filters.IsSet(field) can consult an additional field created during indexation, 
or access a FieldCache. DocLoader will either get a value for the field from 
index or from the cache. etc, etc, etc.

While I like resulting schema-style very much, I don't want to see the likes of 
it within Lucene core. Better to have some contrib/extension/whatever that 
builds on core-defined primitives. That way if one needs to build his own 
somewhat divergent schema, they can easily do it, instead of trying to fit 
theirs over Lucene's. For the very same reason I'd like to see fieldcaches 
moved away from the core, and depending on the same in-core IndexReader segment 
creation/deletion/whatever hooks that users will use to build their extensions. 

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsub

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698226#action_12698226
 ] 

Uwe Schindler commented on LUCENE-831:
--

Looks good, one addition: newTerm() could return false to stop iterating (for 
trie).

But I do not know how performat this is with the class-global variable 
currentVal...

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698225#action_12698225
 ] 

Michael McCandless commented on LUCENE-831:
---


How about something like this (NOTE: not compiled/tested):

{code}
abstract class Uninverter {
  abstract void newTerm(String text);
  abstract void handleDoc(int docID);
  void go(IndexReader r) {
... TermEnum/TermDocs uninvert code...
  }
}
{code}

and then:

{code}
class IntUninverter extends Uninverter {
  final int[] values;
  IntUninverter(IndexReader r) {
values = new int[r.maxDoc()];
  }

  int currentVal;
  void newTerm(String text) {
currentVal = Intger.parseInt(text);
  }

  void handleDoc(int docID) {
values[docID] = currentVal;
  }
}
{code}


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698219#action_12698219
 ] 

Marvin Humphrey commented on LUCENE-831:


"FieldType" is probably a better name than "FieldSpec", as it implies
subclasses with "Type" as a suffix: FullTextType, StringType, BlobType,
Float32Type, etc.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698224#action_12698224
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. "FieldType" is probably a better name than "FieldSpec"

+1

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698230#action_12698230
 ] 

Mark Miller commented on LUCENE-831:


bq. I like this idea, but i am a little bit concerned about only one 
ValueSource for the Reader. This makes plugging in different sources for 
different field types hard.

I guess you would just have to set a TrieEnabledValueSource when creating your 
IndexReaders? I suppose it could extend UninversionValueSource and for given 
fields do Trie unencoding, uninverting, and all other fields, yeld to the super 
impl?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698232#action_12698232
 ] 

Mark Miller commented on LUCENE-831:


Note: I think we should add the option to sort nulls first or last with this.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698233#action_12698233
 ] 

Earwin Burrfoot commented on LUCENE-831:


bq. I guess you would just have to set a TrieEnabledValueSource when creating 
your IndexReaders? I suppose it could extend UninversionValueSource and for 
given fields do Trie unencoding, uninverting, and all other fields, yeld to the 
super impl?
And then if you get some other XXX encoding, you'll end up with XXXVS extends 
UVS, TrieVS extends UVS, XXXAndTrieVS extends XXXVS or TrieVS + duplicate code 
from the other one. Ugly.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698234#action_12698234
 ] 

Michael McCandless commented on LUCENE-831:
---

That is why I'd love to somehow move to a FieldType class that holds such 
per-field details.

As things stand now (or, soon) you have to do many separate things to use 
TrieXXX:

  * Create the right tokenStream for it, and stick that into your Field

  * Make the right query type at search time

  * Make the right sort-field parser

  * Make the right ValueSource

It's crazy.  I should be able to make a TrieFieldType (SingleNumberFieldType, 
or something, after the rename), make that the type of my field when I add it 
to my document, and then have all these places that do range search, sorting, 
value retrieval consult the FieldInfo and see that this is a trie field, and 
act accordingly.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698235#action_12698235
 ] 

Michael McCandless commented on LUCENE-831:
---

bq. Note: I think we should add the option to sort nulls first or last with 
this.

You mean for getStringIndex()?  I agree!

Actually, it'd be nice to disallow nulls entirely, somehow, since this forces 
us to sprinkle null checks all over the place in StringOrdVarlComparator.  
Maybe we would allow you to pass in a "null equivalent", eg you could use "", 
"UNDEFINED", whatever, as long as it's a valid string.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698238#action_12698238
 ] 

Michael McCandless commented on LUCENE-831:
---

Another thing that'd be great to fix about FieldCache is its
intermittent checking of the "some docs had more than one token in the
field" error.  The current check only catches it in limited cases,
which is deadly because you can test like crazy and think you're OK
only in production months later to index slightly different content
and hit the exception. RuntimeException

But I can't think of a cheap way to do it reliably.

At least, we should upgrade the exception from RuntimeException to a
checked exception.  Or, we could turn the check off entirely (which I
think is better than intermittently catching it).

We should also somehow allow turning off the check on a case by case
basis, since there are really times when it's OK.  (Though maybe you
just make your own ValueSource, or maybe subclass Uninverter,
or... something?).


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698239#action_12698239
 ] 

Mark Miller commented on LUCENE-831:


bq. Ugly.

Well no worries yet :) Still in early design mode, so if it can be made better, 
I'm sure it will. Of course I'd love to get to 'everything works right for the 
right field automagically' as well - not sure that will fit into the scope of 
this issue though (though nothing saying this issue can't be further delayed). 
We will do the best we can regardless though.

I'm kind of worried that any change is going to hurt Apps like Solr - if you 
end up using the new built in API, but also have code that must stick for a 
while with the old API (for multireader fieldcache or something), you'll likely 
increase RAM usage more than before - how much of a concern that ends up being, 
I'm not sure. I suppose eventually it has to become up to the upgrader to 
consider and deal with it if we want to move to segment level caching.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698241#action_12698241
 ] 

Mark Miller commented on LUCENE-831:


bq. Or, we could turn the check off entirely (which I think is better than 
intermittently catching it).

+1 - agreed that the check is nasty - better to never trip it if your only 
going to trip it depending...

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698242#action_12698242
 ] 

Marvin Humphrey commented on LUCENE-831:


> Another thing that'd be great to fix about FieldCache is its
> intermittent checking of the "some docs had more than one token in the
> field" error.

Add a FieldType that only allows one value per document. At index-time, 
verify when the doc is added that indeed, only one value was supplied.

In Lucy, I expect StringType to fill this role. FullTextType is for multi-token
fields.

Optionally, add a "NOT NULL" check to verify that each doc supplies a
value, or allow the FieldType object to specify a default value that should
be inserted.


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698243#action_12698243
 ] 

Earwin Burrfoot commented on LUCENE-831:


bq. At least, we should upgrade the exception from RuntimeException to a 
checked exception.
Exceptions are for expected conditions that can be adequately handled by the 
caller. RuntimeExceptions are for possible, but unexpected conditions that can 
theoretically be handled by the caller, but most of the times caller will 
terminate anyway.
Having several values in place where only one should be at all times is 
obviously an unexpected indexer's fault. So by using checked exception here 
you'll only provoke some ugly rethrowing/wrapping code, or propagation of said 
checked exception up the method hierarchy, without gaining any benefit at all.

bq. Or, we could turn the check off entirely.
Yes!

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698247#action_12698247
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
> Or, we could turn the check off entirely (which I think is better than 
> intermittently catching it).

+1 - agreed that the check is nasty - better to never trip it if your only 
going to trip it depending...
{quote}
OK -- let's turn this check off entirely.

I like Marvin's approach but we don't quite have a FieldType just yet...

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698248#action_12698248
 ] 

Earwin Burrfoot commented on LUCENE-831:


bq. I'm kind of worried that any change is going to hurt Apps like Solr - if 
you end up using the new built in API, but also have code that must stick for a 
while with the old API (for multireader fieldcache or something), you'll likely 
increase RAM usage more than before - how much of a concern that ends up being, 
I'm not sure.
My personal stance is that until you have one perfectly thought out API, 
nothing should restrain you from changing it. It's better to feel pain once or 
twice, when you adapt to API changes, then to feel it constantly, each time 
you're using that half-assed thing you're keeping around for back-compat. Look 
at google-collections. They did some really breaking changes since they 
released, but most of them eased my life after I made my project compile and 
run with the new version of their library.

bq. In Lucy, I expect StringType to fill this role. FullTextType is for 
multi-token fields.
In our case multiplicity is defined on FieldInfo level. Because we can have one 
Int field that holds some value, and another Int field that holds several ids.
Same goes for the String field - you might want tags on a document that are 
represented as untokenized strings, but each document can have 0..n of them.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698254#action_12698254
 ] 

Mark Miller commented on LUCENE-831:


I began to make the switch of allowing an Inverter to be attached to a 
SortField like parser with the idea of doing backcompat by wrapping the parser 
with an Inverter. I caught myself though, because the reason I wasn't doing 
that before is that Inverter may not make sense for a given ValueSource. By 
forcing you to set the inverters per type on the UninversionValueSource instead 
(current posted patch), this issue is kind of avoided - the possible problem is 
that it seems somewhat less dynamic in that you set it once on the ValueSource 
and leave it instead of being able to pass any impl any time on a SortField. 
Perhaps not so bad. But then how do I set the Uninverter for back compat with 
an old Parser? It doesn't seem wise to allow arbitrary Uninverter updates to 
the UninversionValueSource does it? Thread safety issues, and ... but then how 
to handle back compat with the parser? ... 

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698255#action_12698255
 ] 

Mark Miller commented on LUCENE-831:


I guess we do need to have Uninverters settable per run somehow ... Then if a 
Parser comes in on SortField, we can downcast to UninversionValueSource and 
pass the Uninverter wrapping the Parser. I don't think we should pass the 
Uninverter on SortField though, going forward. Uninverter may not apply to 
ValueSource, and internally, things will work with ValueSource.

So a user cannot pass an Uninverter for internal sorting per sort, it has to 
init the UninversionValueSource with the right Uninverters, but for backcompat, 
FieldComparator would be able to pass an Uninverter that takes precedence?

That sounds somewhat alright to me, I'll roll that way a bit for now.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698257#action_12698257
 ] 

Michael McCandless commented on LUCENE-831:
---

How about SortField taking ValueSource?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698258#action_12698258
 ] 

Michael McCandless commented on LUCENE-831:
---

This issue is fast moving!  Here're some thoughts on the last patch:

  * Need to pass in ValueSource impl to IndexReader.open, defaulting
to Cached(Uninverted)

  * Maybe ValueSource should provide default "throws UOE" exceptions
methods (instead of abstract) since it's rather cumbersome for a
subclass that only intends to provide eg getInts().

  * Should CachingValueSource really include IndexReader in its key?
I think it shouldn't?  The cache is already "scoped" to a reader
because each SegmentReader will have its own instance.  Also, one
might cache ValueSources that don't "have" a reader (eg pull from
a DB or custom file format or something).  Cloning an SR for now
should just copy over the same cache.

  * I think we should back deprecated FieldCache with ValueSource for
the reader; one simple way to not ignore the Parser passed in is
to anonymously subclass Uninverter and invoke the passed in
ByteParser from its "newTerm" method?

  * Why do we need to define StringInverter, IntInverter, etc in
UninversionValueSource?  Couldn't someone instead subclass
UninversionValueSource, and override whichever getXXX's they want
using their own anonymous uninverter subclass?

  * Should we deprecate function.FieldCacheSource, and make a new
function.ReaderValueSource (ie pulls its values from the
IndexReader's ValueSource)?


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698264#action_12698264
 ] 

Mark Miller commented on LUCENE-831:


bq. How about SortField taking ValueSource? 

Right - theres the right thought I think. I'll play with that.

bq. Need to pass in ValueSource impl to IndexReader.open, defaulting to 
Cached(Uninverted)

Yes - only havn't because that code is kind of unfirm - it already seems like 
it will prob be moved out to SortField :) So I was just short-cut initing it.

bq. Should CachingValueSource really include IndexReader in its key?

Probably not then :) I'll be going over that tighter - I was in speed mode and 
havn't considered the CachingValueSource much yet - I kind of just banged out a 
quick impl (gotto love eclipses generate hashcode/equals), put it in and tested 
it.  To handle all of this 'do it for Long, repeat for Int, repeat for Byte, 
etc' I go somewhat into robot mode. Also, this early, I know you'll have me 
changing directions fast enough that I'm still in pretty rough hammer out mode.

bq. Why do we need to define StringInverter, IntInverter, etc in 
UninversionValueSource? 

Yes true. I was using anon classes in the prev patch, but upon switch to 
Uninverter, I just did mostly what came to mind quickest looking at your 
example code and what I had.
Indeed, overriding getXXX is simple and effective. As I think about it - now I  
am thinking I did it for the getArray call to return the right type (easy with 
anon class, but custom passed Uninverter ?). Could return object and cast 
though ...

Do we need Uninverter if overriding getXXX is easy enough and we pass the 
ValueSource on SortField?

bq. Should we deprecate function.FieldCacheSource,

Yeah for sure. I'll take a closer look at the function package soon. Thus far, 
I've just got it compiling and the main sort tests passing. As soon as I feel 
the API is firming (which, sweet, it already is a bit I think), I'll start 
polishing and filling in the missing pieces, more thorough nocommits, comments.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698272#action_12698272
 ] 

Mark Miller commented on LUCENE-831:


bq. Should CachingValueSource really include IndexReader in its key?

I see now - I wasnt always holding the ValueSource in the Reader - I was using 
getXXX(reader, field). Now I am back to that - and since the CachingValueSource 
holds the map, I am back to needing that. Being able to change the ValueSource 
on the fly now with SortField has implications for CachingValueSource. There 
has to be some kind of default ValueSource, which would likely do caching. Once 
you started using ValueSources from SortField, they won't share the cache. Get 
enough of them going and the RAM reqs jump.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698286#action_12698286
 ] 

Mark Miller commented on LUCENE-831:


I really like CachingValueSource, but somehow the cache has to move to the 
segmentreader or something... (not that there is not a segmentreader -> 
valuesource mapping)

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698300#action_12698300
 ] 

Uwe Schindler commented on LUCENE-831:
--

By the way: If this new API goes into 2.9, the SortField ctors with parser and 
the whole parser support in SortField/Search-code can be removed, if Parser 
itsself is deprecated (as support for LUCENE-1478 was not released until now). 
New code could always use ValueSource.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
> LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

1 2 >

1 - 100 of 179 matches

Mail list logo