[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789398#action_12789398
 ] 

Michael McCandless commented on LUCENE-2133:


bq. And we are back to 831

Yes :)  But maybe the fresh perspective will get us through it!

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later, which not at

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789384#action_12789384
 ] 

Mark Miller commented on LUCENE-2133:
-

bq. Something along these lines maybe?

And we are back to 831 :)

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later, which not at all. Whenever new classes depend on

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-11 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789377#action_12789377
 ] 

Earwin Burrfoot commented on LUCENE-2133:
-

bq. I would like to hear the opionions of the other people involved
bq.  Earwin I think is working on a patch for componentizing SR . 
FieldCache, "just" one of IndexReaders components ..

Mike answered for me :)
My wish is to keep IR's as basic as possible, while plugging all extras (that 
includes sorting/whatever caches) on as-needed basis.
Right now I have a basic impl that works for me for half a year, but in 
practice it ended up a bit ugly and it doesn't handle IW-produced readers 
(never needed/liked this feature). So I hope to fix these two deficiencies on a 
weekend.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFi

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789249#action_12789249
 ] 

Michael McCandless commented on LUCENE-2133:


bq.  I was just a little scared about the plethora of classes in that package. 

Let's separately worry about that?  I think consumers of the cache API
shouldn't have to move.

bq. Caching fields is only one functionality provided by IndexCache. I took the 
opportunity to demonstrate caching something other than fields. You now save a 
few bytes per SegmentReader instance because of that change.

OK but let's also do this separately (Earwin I think is working on a
patch for componentizing SR, which'd be great).  FieldCache, "just"
one of IndexReaders components, is hard enough to fix; let's just
focus on it :)

Also, as hackish as it is, the getFieldCacheKey() works well.  The
need for insanity checking is annoying, so let's just disallow getting
cached @ Dir/MultiReader level (and offer sugar APIs if really
needed).

And as annoying as the static FieldCache is, LUCENE-2135 plugs yet
another hold in the dike, ie, now entries are purged on close, and you
can also explicitly purge a given IndexReader.

I don't think we should wholesale move the cache APIs just for the
sake of moving them.  That's alot of API noise; we need some real
improvements to justify making users switch.

bq. Since there are indeed several people working on things that are closely 
related, esp. LUCENE-2135) a branch would make sense

bq. Can we summarize all the open points for a "worthy" cache overhaul, close 
LUCENE-831 in favor of LUCENE-2133 and continue working here?

I don't think we need a branch just yet.  (LUCENE-2135 is a very
standalone change from this issue).  And, before closing other issues,
committing this one as a start, etc, we really first need to reach
some consensus on what even to do, here.

I really want to see us first fix FieldCache's deep problems, then
concern ourselves with where the resulting API will go / what it looks
like.  Ie, we're putting the horse before the cart, here, so far...

EG, what [minimal] changes could we make to allow someone to plugin
their own values source?  If we make this:

{code}
class FieldValues {
  public FIELD_TYPE getType();
  public int[] getInts() {};
 // and all the other types...
}

class FieldValuesSource implements Closeable {
  public FieldValues getValues(FieldInfo field);
  public void close();
}
{code}

We could then make an UninverterFieldValuesSource, subclassing
PerFieldValueSource, that takes IndexReader to its ctor, lets your
register parsers by field.  And it'd allow customization beyond simply
changing the parser, eg to control behavior of multi-valued fields,
stopping early (NRQ does this), etc.

We'd have CachedFieldValueSource that'd wrap any FieldValuesSource and
cache, allowing you to subclass it if you want to do eviction, etc.
It'd by default implement the same simplistic policy we have today --
retain everything, until close at which point you discard everything.

All consumers of field cache (sorting by field,
FieldCacheRange/TermsFilter, function queries, etc.) should all allow
me to pass in my own FieldValuesSource.  IndexReader would let me
customize, but would default to the cached uninversion source.

I could then in theory [external to Lucene] make a FieldValueSource
that slurped stuff from disk, and I could create LUCENE-1231 (= CSF)
outside of Lucene.  [And, when we build CSF inside of Lucene it'd also
just be another source].

Something along these lines maybe?


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construc

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788956#action_12788956
 ] 

Christian Kohlschütter commented on LUCENE-2133:


Mark,

that's great to hear overall.
What I want to avoid are different competing implementations. Since there are 
indeed several people working on things that are closely related, esp. 
LUCENE-2135) a branch would make sense (if all agree, of course).

Can we summarize all the open points for a "worthy" cache overhaul, close 
LUCENE-831 in favor of LUCENE-2133 and continue working here?
I would like to hear the opionions of the other people involved: Earwin, Hoss, 
Mike, Uwe, Yonik to name a few.

Best,
Christian


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilt

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788951#action_12788951
 ] 

Mark Miller commented on LUCENE-2133:
-

bq. That is, it adds a lot of duplicated code / different possible 
implementations for the same thing.

Things were still ugly were not likely to stick around - 831 was very much a 
work in progress. The solution there to handle back compat issues was a working 
solution that would need to be improved upon. 831 was still in experimentation 
state - issues that need more though had hacked in working solutions. We had a 
more general cache at one point, and began working towards ValueSources based 
on discussion. The latest 831 patch is an exploration of that, not a final 
product.

 bq. They should store arbitrary data, allow cache inspection, eviction of 
entries and so on.

Thats extremely simple to add to an IndexReader - we were thinking of a 
ValueSource as something different than a basic cache.

{quote}
It is indeed a complex problem but it can easily be split into several subtasks 
that can be addressed by different people in parallel. To allow such a 
development, we have to somehow get the base code it into SVN, not necessarily 
trunk, admittedly, a branch would also do. Of course, this requires also 
additional work to keep it in sync with trunk. If we can really assume to have 
3.1 in one year, we have lots of time for developing a stable, powerful new API 
directly in trunk. Of course, this is a decision related to release management 
and not to the actual problem. I can live with both ways (trunk vs. branch), 
but, in my opinion, managing the changes just as patch files in jira is not a 
viable option.
{quote}

A branch is certainly a possibility, but with only one person working on it, I 
think its overkill. With some additional interest, a branch can make sense - 
otherwise its not worth the merging headaches. You also have to have a 
committer(s) thats willing to take on the merging.

At one point, 831 was much more like this patch. Discussion along what Mike 
brought up above started transforming it to something else. We essentially 
decided that unless that much was brought to the table, the disrupting change 
just wasn't worth it for a different cache API.

I'm def a proponent of FieldCache reform - but I think we want to fully flesh 
it out before committing to something in trunk.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at le

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788936#action_12788936
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. bq.LUCENE-831 still requires a static FieldCache, the root of all evil  
bq. It doesn't require one though? It supports a cache per segment reader just 
like this. Except its called a ValueSource.

With "requires" I mean it's still there, not marked as deprecated and still 
used. Plus a lot of "ifs" like

{{{
 if(parserUninverter != null) {
currentReaderValues = uninversionValueSource.getLongs(reader, field, 
parserUninverter);
  } else if(valueSource != null) {
currentReaderValues = valueSource.getLongs(reader, field);
  } else {
currentReaderValues = reader.getValueSource().getLongs(reader, field);
  }
}}}

That is, it adds a lot of duplicated code / different possible implementations 
for the same thing.

I am not saying LUCENE-831 was a bad idea. And apparently, apart from the 
different wording, we see a few similarities with LUCENE-2133. We just need to 
move on at some point.

What is still different in my proposal is the additional abstraction layer of 
"IndexCache". Was this already somehow planned with "ValueSourceFactory"? That 
class exists in LUCENE-831 but was never used.

As we see from LUCENE-2135 Index-specific caches are much more than 
FieldCache/ValueSource implementations. They should store arbitrary data, allow 
cache inspection, eviction of entries and so on.

bq. bq.Let's make it simple, submit what we have and build upon that.

bq. I dont think thats simple The patch can be iterated on outside of trunk as 
easy as in.

It is indeed a complex problem but it can easily be split into several subtasks 
that can be addressed by different people in parallel. To allow such a 
development, we have to somehow get the base code it into SVN, not necessarily 
trunk, admittedly, a branch would also do. Of course, this requires also 
additional work to keep it in sync with trunk. If we can really assume to have 
3.1 in one year, we have lots of time for developing a stable, powerful new API 
directly in trunk. Of course, this is a decision related to release management 
and not to the actual problem. I can live with both ways (trunk vs. branch), 
but, in my opinion, managing the changes just as patch files in jira is not a 
viable option.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are exp

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788908#action_12788908
 ] 

Mark Miller commented on LUCENE-2133:
-

bq. LUCENE-831 still requires a static FieldCache, the root of all evil  :) 

It doesn't require one though? It supports a cache per segment reader just like 
this. Except its called a ValueSource.

The CacheByReaderValueSource is just there to handle a back compat issue - its 
something that we would want to get around and use the reader valuesource for 
instead - but that patch still had a long way to go.

Overall, from what I can see, the approach was about the same.

bq. It probably makes sense to start from one of Hoss's original patches or 
even from scratch

That was said before a lot more work was done. The API was actually starting to 
shape up nicely.

bq. The more complex the patches are, the longer it will take to integrate them 
into a new version.

Of course - and this is a complex issue with a lot of upgrade pain. Like with 
831, it not really worth the pain to users without more benefits.

bq. The more such patches you have, the longer it will take to get to a new 
release.

Thats not really true. 3.1 does't need this patch - there would be no reason to 
hold it for it. Patches go in when they are ready.

bq. Let's make it simple, submit what we have and build upon that.

I dont think thats simple :) The patch can be iterated on outside of trunk as 
easy as in.



> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() meth

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788896#action_12788896
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. I don't know that back compat is really a concern if we are just leaving 
the old API intact as part of that, with its own caching mechanism? 

I don't think that this is an option. FieldCache is fundamentally flawed and 
already creates a lot of trouble that has only been somehow fixed recently 
(FieldCacheKey, Insanity checks).

FieldCache, the static "registry" of caches sort of violates fundamental OOP 
concepts -- I mean, almost *all* methods require IndexReader as the first 
parameter. Since we allow IndexReader composition and decoration this 
apparently was not the right way to go because it exactly caused the problems 
that have later been addressed by the aformentioned FieldCacheKey and insanity 
checks.

bq. Just deprecate the old API, and make a new one. This is a big pain, because 
you have to be sure you don't straddle the two apis on upgrading, but thats the 
boat we will be in anyway.

That is always an option, but as you see Uwe's initial reaction I think it is 
not so easy to convince everybody. Which is fine, because we now have a 
solution that is somehow intermediate between the two extremes (keeping as it 
is vs. complete rework) and still is completely bw-compatible.
With a "real" complete overhaul of FieldCache, I imagine a lot of additional 
work to be then done for other projects such as SOLR, Nutch etc., which I would 
rather see not to happen abruptly.

The IndexCache abstraction allows us to separate the two issues 1: "make caches 
non-static" and 2: "make the fieldcache snappier" very easily. We start with 
issue 1 here in LUCENE-2133, and then develop a new FancyFieldCache in 
LUCENE-xyz (which can be used in parallel to the then-old IndexFieldCache).

In parallel we can integrate Michael's, Earwin's and Yonik's suggestions from 
LUCENE-2135 without starting the discussion from the beginning.

I am not suggesting to take the current solution as a "temporary workaround", 
but as a base for future work. Anything else would make no sense indeed.

bq. Since I don't see that this provides anything over 831, I don't see how its 
not in the same boat.

LUCENE-831 still requires a static FieldCache, the root of all evil :) It 
addresses orthogonal problems (ValueSource, Tries etc.).
Finally, to cite yourself from LUCENE-831: "It probably makes sense to start 
from one of Hoss's original patches or even from scratch."

bq. I'm not sure we should target a specific release with this - we don't even 
know when 3.1 is going to happen. 2.9 took a year. Its anybodies guess - we 
should prob just do what makes sense and commit it when its ready.

The more complex the patches are, the longer it will take to integrate them 
into a new version.
The more such patches you have, the longer it will take to get to a new release.

Let's make it simple, submit what we have and build upon that.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> e

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788874#action_12788874
 ] 

Mark Miller commented on LUCENE-2133:
-

I don't know that back compat is really a concern if we are just leaving the 
old API intact as part of that, with its own caching mechanism?

Just deprecate the old API, and make a new one. This is a big pain, because you 
have to be sure you don't straddle the two apis on upgrading, but thats the 
boat we will be in anyway.

Which means a new impl should provide enough benefits to make that large pain 
worth enduring. 831 was not committed for the same reason - it didn't bring 
enough to table to be worth it after we got to a per segment cache in another 
way. Since I don't see that this provides anything over 831, I don't see how 
its not in the same boat.

I'm not sure we should target a specific release with this - we don't even know 
when 3.1 is going to happen. 2.9 took a year. Its anybodies guess - we should 
prob just do what makes sense and commit it when its ready.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes 

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788863#action_12788863
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. This patch is a good step forward - it associates the cache directly
with IndexReader, where it belongs; it cleanly decouples cache from
reader (vs the hack we have today with IndexReader.getFieldCacheKey),
so that eg cloned readers can share the same cache; it also preserves
back compat, which is quite a stunning accomplishment

Yes, backwards compatibility was actually the driving factor for this patch.
I actually have not addressed major changes in functionality. This would 
definitely require rework that breaks backwards compatibility.
I just wanted to get rid of the static FieldCache, which caused me a lot of 
OOME headaches.

As I wrote above, I see this patch as a good starting point for further 
development. Imagine I had submitted a real, complete rework of the FieldCache 
like in LUCENE-831 -- it would be stuck as an open issue forever just like its 
predecessor.

There is nothing wrong with the current patch (that's why I suggest comitting 
it to trunk soon-ish) -- it does not break anything (it could even go into 3.1 
I guess).
Starting from a common codebase we can then later on extend it and address all 
the points you mentioned in Michael's most recent post:

bq. But... there are many more things I don't like about FieldCache, that I'm 
not sure the patch addresses:
bq. (snip) 

None of these (very important) issues have been addressed by the patch. 
Intentionally (as described).

bq. Some other questions about the patch:
bq.
bq.* Consumers of the cache API (the sort comparator,
bq.  FieldCacheTerms/RangeFilter, and any other future users of "the
bq.  field cache") shouldn't have to move down into fields sub-package?

As you wish. I don't have problems with keep it in o.a.l.search. I was just a 
little scared about the plethora of classes in that package. Since we do not 
need to utilize package-level protection, it was actually possible to 
"encapsulate" that field-related functionality in another namespace. I just 
personally prefer to use many nested packages, I do not have problems of moving 
classes back to its parent package.

bq.* It's a little strange that the term vectors & fields reader also
bq.  got pulled into the cache?

They are not in the FieldCache. Caching fields is only one functionality 
provided by IndexCache. I took the opportunity to demonstrate caching something 
other than fields. You now save a few bytes per SegmentReader instance because 
of that change. Maybe it would make sense to move SegmentReader.CoreReaders to 
SegmentReaderIndexCache completely. However, if I had included that as well 
into this issue, it would again be too large to be discussed in time for the 
next release.

If you have strong objections against moving these SegmentReader-specific 
things to the SegmentReaderIndexCache *now*, I can remove that part from the 
patch. However, I should probably then file another issue. Unfortunately, this 
will then depend on the outcome of this LUCENE-2133.

To summarize, I suggest:

1. Complete the additions to src/test (i.e. do not remove/modify the existing 
test cases but rather add new ones, as discussed previously)
2. Commit the patch to svn, target release for Lucene 3.1.
3. File another issue and discuss functional improvements for the 
IndexFieldCache part. Use Michael's wishlist as a starting point. Agree on 
breaking backwards compatibility, target for Lucene 4.0, 3.5 or whatever fancy 
version number. Incorporate things from LUCENE-831, LUCENE-1231 and also 
LUCENE-2135.
4. File a new issue for the improvements in SegmentReader (move things that are 
shared between the original reader and its clones to a common 
SegmentReaderIndexCache, such as CoreReader and ThreadLocals), keep backwards 
compatibility and target for Lucene 3.1.

What do you think?


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be m

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788846#action_12788846
 ] 

Mark Miller commented on LUCENE-2133:
-

My impression is that this is not much different than LUCENE-831, and we are 
stuck on the same issues (831 started down the path of working with these 
goals, but its severely out of date now).

bq. It's a little strange that the term vectors & fields reader also got pulled 
into the cache?

Couldn't figure this out either ... if anything, you'd think norms might goto 
the cache, but the vector and fields reader ... ?

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=>

Re: [jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread Erick Erickson
Mike:

Which of these do you think this patch *should* address before committing?
Just the last two?
As many as Christian has energy for ?

On Thu, Dec 10, 2009 at 12:24 PM, Michael McCandless (JIRA)  wrote:

>
>[
> https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788798#action_12788798]
>
> Michael McCandless commented on LUCENE-2133:
> 
>
> This patch is a good step forward -- it associates the cache directly
> with IndexReader, where it belongs; it cleanly decouples cache from
> reader (vs the hack we have today with IndexReader.getFieldCacheKey),
> so that eg cloned readers can share the same cache; it also preserves
> back compat, which is quite a stunning accomplishment :)
>
> But... there are many more things I don't like about FieldCache, that
> I'm not sure (?) the patch addresses:
>
>  * Uninversion to derive eg an int[] is horribly slow, compared to
>say loading the pre-encoded binary ints from disk, created during
>indexing.  Ie, I think, if we are going to overhaul FieldCache
>API, we should somehow make LUCENE-1231 feasible.
>
>  * There's no pluggability to customize where the int[] comes from
>for a given field -- most you can do is provide your own IntParser
>that the uninverter uses.  EG the fact that the patch had to
>"move" FieldCacheRange/TermsFilter down, is strange -- these
>filters (and in general any future "cache consumers") should live
>in oal.search, but simply pull from a different int[] source,
>somehow.
>
>  * Error checking of single-value-per-field (for StringIndex) is
>dangerous, today -- it's intermittent, and, it's an unchecked
>exception.  We should probably just remove the exception, or,
>maybe make it checked.  Actually I'll go open a new issue for
>this.  We should simply fix this.
>
>  * Single-value-per-field limitation (though, that's a nice to have,
>future improvement)
>
>  * Even accepting the single-value-per-field limitaiton, we should
>allow multiple values per field during uninversion, w/
>customizable logic about which value is kept as the single one
>(there is an issue open for this I think).  This really should be
>some sort of added extensibility to whatever class drives
>uninversion...
>
>  * The terror of accidentally asking for the array at the top-level
>of Multi/DirReader.  I think this shouldn't even be allowed, at
>least not easily, ie Dir/MultiReader.getIndexCache should throw
>UOE.  If we really wanted to, we could provide sugar methods in
>maybe ReaderUtil to "glom N int[]'s into a new int[]".  But it
>should be named something scary :) Then we wouldn't need any
>insanity checking.
>
>  * No control over caching policy (cannot evict things)
>
>  * If we make field cache flexible enough, we could maybe fold norms
>& deleted docs into it (would be a separate future issue to
>actually do so...).
>
> Some other questions about the patch:
>
>  * Consumers of the cache API (the sort comparator,
>FieldCacheTerms/RangeFilter, and any other future users of "the
>field cache") shouldn't have to move down into fields sub-package?
>
>  * It's a little strange that the term vectors & fields reader also
>got pulled into the cache?
>
>
> > [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> > -
> >
> > Key: LUCENE-2133
> > URL: https://issues.apache.org/jira/browse/LUCENE-2133
> > Project: Lucene - Java
> >  Issue Type: Improvement
> >  Components: Search
> >Affects Versions: 2.9.1, 3.0
> >Reporter: Christian Kohlschütter
> > Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch,
> LUCENE-2133.patch, LUCENE-2133.patch
> >
> >
> > Hi all,
> > up to the current version Lucene contains a conceptual flaw, that is the
> FieldCache. The FieldCache is a singleton which is supposed to cache certain
> information for every IndexReader that is currently open
> > The FieldCache is flawed because it is incorrect to assume that:
> > 1. one IndexReader instance equals one index. In fact, there can be many
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access
> the very same data.
> > 2. the cache information remains valid for the lifetime of an
> IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may
> contain completely different information.
> > 3. all IndexReaders need the same type of cache. In fact, because of the
> limitations imposed by the singleton construct there was no implementation
> other than FieldCacheImpl.
> > Furthermore, FieldCacheImpl and FieldComparator are bloated by several
> static inner-classes that could be moved to package level.
> > There have been a few attempts 

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788798#action_12788798
 ] 

Michael McCandless commented on LUCENE-2133:


This patch is a good step forward -- it associates the cache directly
with IndexReader, where it belongs; it cleanly decouples cache from
reader (vs the hack we have today with IndexReader.getFieldCacheKey),
so that eg cloned readers can share the same cache; it also preserves
back compat, which is quite a stunning accomplishment :)

But... there are many more things I don't like about FieldCache, that
I'm not sure (?) the patch addresses:

  * Uninversion to derive eg an int[] is horribly slow, compared to
say loading the pre-encoded binary ints from disk, created during
indexing.  Ie, I think, if we are going to overhaul FieldCache
API, we should somehow make LUCENE-1231 feasible.

  * There's no pluggability to customize where the int[] comes from
for a given field -- most you can do is provide your own IntParser
that the uninverter uses.  EG the fact that the patch had to
"move" FieldCacheRange/TermsFilter down, is strange -- these
filters (and in general any future "cache consumers") should live
in oal.search, but simply pull from a different int[] source,
somehow.

  * Error checking of single-value-per-field (for StringIndex) is
dangerous, today -- it's intermittent, and, it's an unchecked
exception.  We should probably just remove the exception, or,
maybe make it checked.  Actually I'll go open a new issue for
this.  We should simply fix this.

  * Single-value-per-field limitation (though, that's a nice to have,
future improvement)

  * Even accepting the single-value-per-field limitaiton, we should
allow multiple values per field during uninversion, w/
customizable logic about which value is kept as the single one
(there is an issue open for this I think).  This really should be
some sort of added extensibility to whatever class drives
uninversion...

  * The terror of accidentally asking for the array at the top-level
of Multi/DirReader.  I think this shouldn't even be allowed, at
least not easily, ie Dir/MultiReader.getIndexCache should throw
UOE.  If we really wanted to, we could provide sugar methods in
maybe ReaderUtil to "glom N int[]'s into a new int[]".  But it
should be named something scary :) Then we wouldn't need any
insanity checking.

  * No control over caching policy (cannot evict things)

  * If we make field cache flexible enough, we could maybe fold norms
& deleted docs into it (would be a separate future issue to
actually do so...).

Some other questions about the patch:

  * Consumers of the cache API (the sort comparator,
FieldCacheTerms/RangeFilter, and any other future users of "the
field cache") shouldn't have to move down into fields sub-package?

  * It's a little strange that the term vectors & fields reader also
got pulled into the cache?


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787791#action_12787791
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. But we still want its functionality - we still want to check for "doubling" 
up insanity. 

Mark, 
the latest patch re-inserts the commented fragment.

The "readerkey" in the original FieldCacheSanityChecker refers to the 
FieldCacheKey in IndexReader, which is now obsoleted by 
IndexReader#getIndexCache.getIndexReader(). I now use this as the reader key, 
even though I am still not quite sure if it is really necessary. Checking for 
insanity sounds so paranoid to me. If we know that there is a conceptional bug 
in the code, we should fix it, not circumvent and hotfix it using a insanity 
checker. But this is another story.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.field

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787772#action_12787772
 ] 

Uwe Schindler commented on LUCENE-2133:
---

bq. I still think it might be a little early. This has a lot of consequences.

I mean we should not wait for years like with LUCENE-831 :-) No heavy 
committing please *g*.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immed

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787768#action_12787768
 ] 

Christian Kohlschütter commented on LUCENE-2133:


Uwe, thanks a lot for your assessment!

I will definitely look at the flex branch and see what needs to be changed.

Yes, I still work at L3S, working hard for my PhD (planned to finish mid 2010). 
This is probably the main reason for not being so active in the Lucene 
community in the past. However, as I use Lucene for research and commercial 
systems over the last years, I always try to contribute back patches whenever 
possible.

Jan Brase doesn't work at L3S anymore, but I will happily forward the greetings.

Cheers,
Christian


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787767#action_12787767
 ] 

Mark Miller commented on LUCENE-2133:
-

bq. not bind the cache so hard to the IndexReader (which was also the problem 
with the last FieldCache), instead just make it a plugin component

At a minimum, you should be able to set the cache for the reader.

bq. For the functionality of Lucene, FieldCache is not needed, sorting is just 
an addon on searching

The way he has it, this is not just for the fieldache, but also the 
fieldsreader and vectorreader - if we go down that road, we should consider 
norms as well.

bq.  I see no problems with appling it soon

I still think it might be a little early. This has a lot of consequences.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTerms

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787763#action_12787763
 ] 

Mark Miller commented on LUCENE-2133:
-

But we still want its functionality - we still want to check for "doubling" up 
insanity.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later, which not at all. Whenever new classes depend o

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787759#action_12787759
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. And what about the doubling up insanity? It looks like you just commented 
out that check? It appears to me that thats still an issue we want to check for 
- we want to make sure Lucene core and users have a way to be sure they are not 
using a toplevel reader and its sub readers for caches unless they really 
intend to. 

The new IndexFieldCacheSanityChecker can be removed out-of-the-box (as 
documented in the class itself). It is just kind of a "showcase" class for this 
issue.
The section I commented out is non-functional with the new change since there 
is no more ReaderKey in IndexFieldCache.CacheEntry.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> In

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787757#action_12787757
 ] 

Uwe Schindler commented on LUCENE-2133:
---

After removing the very problematic change to the Collector class (which is 
very central in Lucene and should not be changed again after 2.9), thatI told 
you to do in the morning, the other changes are no longer too intrusive. I am 
quite happy with your new patch and I now also like it very much. It is as a 
good candiate for replacing the very, very ugly FieldCache impl we currently 
have.

I am not really sure, if the package name is good and I would like to also add 
Earwins changes and not bind the cache so hard to the IndexReader (which was 
also the problem with the last FieldCache), instead just make it a plugin 
component (like all the other flex parts). For the functionality of Lucene, 
FieldCache is not needed, sorting is just an addon on searching, but not realy 
basic functionality (lots of users have other sorting algos). Also not sure if 
all classes in search that contain "FieldCache" should be renamed. The 
FieldCacheTermsFilter and RangeFilter only use the cache internally, they 
should simply be changed to use the new API and maybe only get additional ctors 
for the other parser instance classes. So some stuff still needs to be changed.

If it fits good together with the new flexible indexing branch, I see no 
problems with appling it soon. So its all good work. It is a pity, tht we heard 
not much from you in the past, the patch suddenly appeared in JIRA and almost 
nobody know you. I only found one introduction from you long time ago on 
java-dev. Are you still working at L3S? If yes, send nice greetings also to Jan 
Brase! :-)

This patch and the flex branch together and the many deprecations would make a 
3.9 soon and 4.0 without all ugly stuff soon would be nice. I again would do 
the cleaning heavy commiting police man!

So, good work. Thanks!

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionalit

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787754#action_12787754
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. I didn't notice that you actually just deprecated the originals - I guess 
thats not a complete rug pull ...

Writing the new code was one day, then making it backwards compatible with all 
these deprecations another one :-b
I actually wouldn't care so much about backwards compatibility in the most 
cases, but I think it is better now to allow a smooth transition to the new 
code.

bq. By the way, I don't think you need to deprecate something in a new class ( 
IndexFieldCacheImpl):

Indeed. This was a leftover from the early changes to src/test code. It's 
removed now (locally).


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - 

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787752#action_12787752
 ] 

Mark Miller commented on LUCENE-2133:
-

And what about the doubling up insanity? It looks like you just commented out 
that check? It appears to me that thats still an issue we want to check for - 
we want to make sure Lucene core and users have a way to be sure they are not 
using a toplevel reader and its sub readers for caches unless they *really* 
intend to.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final note

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787750#action_12787750
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. It looks like FieldCacheTermsFilterDocIdSet is using the wrong StringIndex? 
And I think the FieldCache import in that class can be removed (same with 
IndexFieldCacheRangeFilter).

Good catch. Since o.a.l.search.field.StringIndex extends 
o.a.l.search.StringIndex it's just a matter of declaration. The imports can 
indeed be removed (they were only required for javadocs, which should now also 
refer to the new classes instead). I have made the changes locally and will put 
them into the next update for this patch.

bq. I dont quite understand why we would have no more insanity? What happens 
when you get the cache from a multireader and then from each segment reader 
within it? You double up right? Insanity?

The MultiReader has a different cache than the internal SegmentReaders, there 
is no interconnection between the two. While change the testcases, my patch 
initially even triggered a JUnit failure because of this -- 
o.a.l.util.TestFieldCacheSanityChecker expected a cache error which now will 
not happen anymore.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexRea

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787748#action_12787748
 ] 

Mark Miller commented on LUCENE-2133:
-

bq.  I think it does not hurt either.

I didn't notice that you actually just deprecated the originals - I guess thats 
not a complete rug pull ...

By the way, I don't think you need to deprecate something in a new class (  
IndexFieldCacheImpl):

{code}

  /**
   * @deprecated Use {...@link #clear()} instead.
   */
  public void purgeAllCaches() {
init();
  }
{code}

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparat

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787743#action_12787743
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. I know the FieldComparator class is ugly, but I'm not sure we should pull 
the rug by putting the impls in a new package. On the other hand, its not 
likely to affect many and it was experimental - so its a tough call. Its a lot 
of classes in there

I think it makes sense to move the stuff into another package because 
o.a.l.search is already filled with a lot of classes. Since there are no 
package-level-protection dependencies to o.a.l.search, I think it does not hurt 
either.

bq. I'm also not sure if fields is the right package name? And do the Filters 
belong in that package?

I couldn't really come up with a better name. It's all about the fields somehow 
(caches, comparators and value parsers).

bq. Also, almost a non issue, but extending a deprecated class is going to be 
an ultra minor back compat break when its removed. Not likely a problem though. 
But we might put a note to that affect to be clear. It is almost self 
documenting anyway though 

Yes. I thought it's better to make it as clear as possibe.

Also, almost a non issue, but extending a deprecated class is going to be an 
ultra minor back compat break when its removed. Not likely a problem though. 
But we might put a note to that affect to be clear. It is almost self 
documenting anyway though

bq. Rather then changing the tests to the new classes, we should prob copy them 
and make new ones - then remove them when the deprecations are removed.

Yes, good idea. This way we can test the old and the new behaviour at once. 
Maybe it is enough to add new test methods to the same class?

bq. Also, you should pull the author tag(s) - all credit is through JIRA and 
Changes. (I only see it like once, so I bet thats eclipse?)

Sorry, Eclipse defaults. I thought I had removed all of them.

bq. I haven't done a thorough review it all, but this is pretty great stuff to 
appear so complete and out of nowhere 

Yes, it also just came over me the last weekend ;-)


I will make a new patch tomorrow (CET time) with the new test cases 
incorporated.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these is

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787740#action_12787740
 ] 

Mark Miller commented on LUCENE-2133:
-

I dont quite understand why we would have no more insanity? What happens when 
you get the cache from a multireader and then from each segment reader within 
it? You double up right? Insanity?

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/m

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787734#action_12787734
 ] 

Mark Miller commented on LUCENE-2133:
-

It looks like FieldCacheTermsFilterDocIdSet is using the wrong StringIndex? And 
I think the FieldCache import in that class can be removed.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones lat

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787729#action_12787729
 ] 

Mark Miller commented on LUCENE-2133:
-

A couple more quick notes:

I know the FieldComparator class is ugly, but I'm not sure we should pull the 
rug by putting the impls in a new package. On the other hand, its not likely to 
affect many and it was experimental - so its a tough call. Its a lot of classes 
in there ;)

I'm also not sure if fields is the right package name? And do the Filters 
belong in that package?

Also, almost a non issue, but extending a deprecated class is going to be an 
ultra minor back compat break when its removed. Not likely a problem though. 
But we might put a note to that affect to be clear. It is almost self 
documenting anyway though :)

Rather then changing the tests to the new classes, we should prob copy them and 
make new ones - then remove them when the deprecations are removed.

Also, you should pull the author tag(s) - all credit is through JIRA and 
Changes. (I only see it like once, so I bet thats eclipse?)

I havn't done a thorough review it all, but this is pretty great stuff to 
appear so complete and out of nowhere :)

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed t

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787720#action_12787720
 ] 

Christian Kohlschütter commented on LUCENE-2133:


bq. Hmm ... nevermind. The exception is related and most of the imports are 
correct - brain spin.

OK, no problem.

bq. This looks pretty nice overall.

Thanks :-)


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/method

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787718#action_12787718
 ] 

Christian Kohlschütter commented on LUCENE-2133:


Mark, could you please name the changes you would like to be excluded?
I thought I had only included necessary changes.

Some dependent changes are not so obvious, but necessary.

For example, Analyzer.close now needs to throw an IOException because of 
CloseableThreadLocal now closing an IOException because of doClose() throwing 
an IOException because it may close references from an IndexCache that might 
throw an IOException. Luckily this is covered by java.io.Closeable (which 
declared #close() to throw an IOException), and Analyzer implements that 
interface already.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheT

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787715#action_12787715
 ] 

Mark Miller commented on LUCENE-2133:
-

Hmm ... nevermind. The exception is related and most of the imports are correct 
- brain spin.

Didn't see that 

import org.apache.lucene.search.SortField; // for javadocs 

wasn't being used anymore anyway.

import org.apache.lucene.search.fields.IndexFieldCache in NumericQuery should 
get a //javadoc so someone doesn't accidently remove it.

And I guess the t to threadLocal change doesn't hurt with the amount your 
changing that anyway. Its a better name.

This looks pretty nice overall.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> Inde

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787711#action_12787711
 ] 

Mark Miller commented on LUCENE-2133:
-

There are a bunch or unrelated changes (imports/names/exception thrown) that 
should be pulled from this patch.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, 
> LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later, which not at all. Wheneve

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787699#action_12787699
 ] 

Michael McCandless commented on LUCENE-2133:


bq. I have removed the test, contrib and demo files because they actually do 
not belong to the patch (since LUCENE-2133.patch really provides complete 
backwards compatibility now).

Ahh OK

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediat

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787667#action_12787667
 ] 

Christian Kohlschütter commented on LUCENE-2133:


Michael, LUCENE-2133.patch now contains all the patches for src/java. I have 
removed the test, contrib and demo files because they actually do not belong to 
the patch (since LUCENE-2133.patch really provides complete backwards 
compatibility now). It would not make much sense to claim backwards 
compatibility and at the same time modify a bunch of test classes, would it? :)

Nevertheless, I am now preparing an updated LUCENE-2133-complete.patch with all 
these additional patches included. In the meantime you may simply apply 
LUCENE-2133.patch to your local trunk copy and see if it works for you.


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTerms

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787661#action_12787661
 ] 

Michael McCandless commented on LUCENE-2133:


Christian, could you roll up all patches into a single attachment?  (Actually 
it looks like the separate demo, contrib, test, core patches were removed).

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: LUCENE-2133.patch, LUCENE-2133.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Lucene community has to decide which classes/methods can immediately be 
> removed, which ones later,

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787502#action_12787502
 ] 

Christian Kohlschütter commented on LUCENE-2133:


Hi Uwe,

your arguments seem reasonable to me. I have added a new patch (src/java only), 
leaving all test and contrib classes intact (but still passes all tests).
The new patch now keeps setNextReader as it is.

Cheers,
Christian


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: contrib.patch, demo.patch, java.patch, test.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
> - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
> - FieldCache (=> IndexFieldCache)
> - FieldCacheImpl (=> IndexFieldCacheImpl)
> - all classes in FieldCacheImpl (=> several package-level classes)
> - all subclasses of FieldComparator (=> several package-level classes)
> Final notes:
> - The patch would be simpler if no backwards compatibility was necessary. The 
> Luc

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787387#action_12787387
 ] 

Uwe Schindler commented on LUCENE-2133:
---

bq. The change of setNextReader() to use only the cache may not be the correct 
approach, as the collection of results is IndexReader-specific and not 
cache-specific (the Collectors in core for sorting only need that, but other 
collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. 
needed for building filters and so on.

This is one of the backwards breaks. As noted, the Collector abstract methods 
cannot be changed and should not, as the IndexReader is the important part used 
for collecting the results. The cache is only used by sorting but not in 
general. Use of IndexCache here would be a design flaw, because it combines 
unrelated things.

> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: contrib.patch, demo.patch, java.patch, test.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl and the related classes (FieldComparator, 
> SortField) 
> I have provided an patch which takes care of all these issues. It passes all 
> JUnit tests.
> The patch is quite large, admittedly, but the change required several 
> modifications and some more to preserve backwards-compatibility. 
> Backwards-compatibility is preserved by moving some of the updated 
> functionality in the package org.apache.lucene.search.fields (field 
> comparators and parsers, SortField) while adding wrapper instances and 
> keeping old code in org.apache.lucene.search.
> In detail and besides the above mentioned improvements, the following is 
> provided:
> 1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved 
> from SegmentReader to SegmentReaderIndexCache.
> 2. A housekeeping improvement to CloseableThreadLocal. Now delegates the 
> close() method to all registered instances by calling an onClose() method 
> with the threads' instances.
> 3. Analyzer.close now may throw an IOException (this already is covered by 
> java.io.Closeable).
> 4. A change to Collector: allow IndexCache instead of IndexReader being 
> passed to setNextReader()
> 5. SortField's numeric types have been replaced by direct assignments of 
> FieldComparatorSource. This removes the "switch" statements and the 
> possibility to throw IllegalArgumentExceptions because of unsupported type 
> values.
> The following classes have been deprecated and replaced by new classes in 
> org.apache.lucene.search.fields:
> - FieldCacheR

[jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField

2009-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787378#action_12787378
 ] 

Christian Kohlschütter commented on LUCENE-2133:


Hi Uwe,

thanks for reviewing. Just a quick response to your comment:

# The problem with your current patch is the same as with the other propsals: 
backwards compatibility is not yet adressed, but is the most hairy problem with 
all other approaches.

As I wrote, the patch *does* provide backwards compatibility. Please correct me 
if I am wrong :)
I think actually over-stressed the backwards-compatibility issue because many 
classes were marked as "Expert; can be changed in incompatible ways in the next 
release" (e.g. SortField/FieldComparator, Collector)

# The change of setNextReader() to use only the cache may not be the correct 
approach, as the collection of results is IndexReader-specific and not 
cache-specific (the Collectors in core for sorting only need that, but other 
collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. 
needed for building filters and so on.

You can access the IndexReader from IndexCache and vice versa. The patch 
actually contains a few changes for contrib where this is actually utilized.

Passing IndexCache instead of IndexReader to setNextReader gives a slight gain 
in performance for most cases (i.e. whenever only the IndexCache is accessed), 
since there is no need to traverse the IndexReader-to-IndexCache method chain 
(IndexReader#getIndexCache() and the synchronized getOrCreateIndexCache()) for 
each call to setNextReader.

In all the other cases, newCache.getIndexReader() gives full access to the 
IndexReader. There even is a default method which calls the 
(now-to-be-deprecated) newIndexReader(IndexReader,int) method, so there are 
actually zero changes necessary for existing code.

# Also the patch file changes files, that are not related or removes import 
statements, that are clearly marked as "// javadocs only".

The corresponding references in the javadocs comments have been removed (e.g. 
FieldCache has been replaced to IndexFieldCache), so it made sense to remove 
the imports as well.

# For version 3.1: The "affects version", setting should be 2.9/3.0, as 3.1 is 
not yet released. An fix version cannot be yet given, thats correct.

Indeed.


Best regards,
Christian


> [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
> -
>
> Key: LUCENE-2133
> URL: https://issues.apache.org/jira/browse/LUCENE-2133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Christian Kohlschütter
> Attachments: contrib.patch, demo.patch, java.patch, test.patch
>
>
> Hi all,
> up to the current version Lucene contains a conceptual flaw, that is the 
> FieldCache. The FieldCache is a singleton which is supposed to cache certain 
> information for every IndexReader that is currently open
> The FieldCache is flawed because it is incorrect to assume that:
> 1. one IndexReader instance equals one index. In fact, there can be many 
> clones (of SegmentReader) or decorators (FilterIndexReader) which all access 
> the very same data.
> 2. the cache information remains valid for the lifetime of an IndexReader. In 
> fact, some IndexReaders may be reopen()'ed and thus they may contain 
> completely different information.
> 3. all IndexReaders need the same type of cache. In fact, because of the 
> limitations imposed by the singleton construct there was no implementation 
> other than FieldCacheImpl.
> Furthermore, FieldCacheImpl and FieldComparator are bloated by several static 
> inner-classes that could be moved to package level.
> There have been a few attempts to improve FieldCache, namely LUCENE-831, 
> LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: 
> There is a central registry for assigning Caches to IndexReader instances.
> I now propose the following:
> 1. Obsolete FieldCache and FieldCacheKey and provide index-specific, 
> extensible cache instances ("IndexCache"). IndexCaches provide common caching 
> functionality for all IndexReaders and may be extended (for example, 
> SegmentReader would have a SegmentReaderIndexCache and store different data 
> than a regular IndexCache)
> 2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. 
> IndexFieldCache is an interface just like FieldCache and may support 
> different implementations.
> 3. The IndexCache instances may be flushed/closed by the associated 
> IndexReaders whenever necessary.
> 4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected 
> (or at least, they do not impact the overall performance)
> 5. Refactor FieldCacheImpl an