[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

Uwe Schindler (JIRA) Fri, 30 Jan 2009 02:52:23 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668812#action_12668812
 ]


Uwe Schindler commented on LUCENE-1478:
---------------------------------------

bq. Uwe, would that result in a memory leak? Ie, a single long-lived segment 
would accumulate multiple entries for each new XXXParser instance used during 
sorting? (Unless there's logic to evict the "stale" entries).

As noted before on Dec 9, 08, the parser should be a singleton for all field 
caches or have hashCode()/equals(). You create a FloatParser and supply it to 
SortField. When then sorting against this field, the new sort implementation 
would create a FieldCache for each segment. When one segment is reloaded, the 
FieldCache gets unused and a new one for the replacement segment is created 
(this is the same like with the standard field parser). The FieldCache is using 
(IndexReader,SortField.type,Parser) as key.

If the FieldCache is used for CachingFilters, there is also no problem: The new 
search algorithm executes each filter's getDocIDSet() for each single 
SegmentReader.

So the only problem is, that with the new search implementation, you cannot 
rely anymore on the fact, that for a MultiReader only one FieldCache exists and 
every filter's getDocIdSet() is executed only one time. So injecting a custom 
FieldCache into the cache for the whole MultiReader before search (by getting 
it) is not possible anymore.

I tested the new search impl with trie fields, sorting works perfect using 
TrieUtils.getSortField()/TrieUtils.LONG_PARSER, no leaks (because FieldParser 
is singleton). I had to only modify the test case (Revision: 737079), because 
with an unoptimized index, the statistics for retrieving the number of visited 
terms did not work anymore (because the Filter was called more than once per 
search).

The problem with stale entries, if the external file changes is another 
problem, that also happend before the new sort impl.

bq. It seems like LUCENE-831, which would expose / allow custom control over 
FieldCache's caching impl, would help here too.

Yes!

> Missing possibility to supply custom FieldParser when sorting search results
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-1478
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1478
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1478-cleanup.patch, 
> LUCENE-1478-no-superinterface.patch, LUCENE-1478.patch, LUCENE-1478.patch, 
> LUCENE-1478.patch, LUCENE-1478.patch, LUCENE-1478.patch
>
>
> When implementing the new TrieRangeQuery for contrib (LUCENE-1470), I was 
> confronted by the problem that the special trie-encoded values (which are 
> longs in a special encoding) cannot be sorted by Searcher.search() and 
> SortField. The problem is: If you use SortField.LONG, you get 
> NumberFormatExceptions. The trie encoded values may be sorted using 
> SortField.String (as the encoding is in such a way, that they are sortable as 
> Strings), but this is very memory ineffective.
> ExtendedFieldCache gives the possibility to specify a custom LongParser when 
> retrieving the cached values. But you cannot use this during searching, 
> because there is no possibility to supply this custom LongParser to the 
> SortField.
> I propose a change in the sort classes:
> Include a pointer to the parser instance to be used in SortField (if not 
> given use the default). My idea is to create a SortField using a new 
> constructor
> {code}SortField(String field, int type, Object parser, boolean reverse){code}
> The parser is "object" because all current parsers have no super-interface. 
> The ideal solution would be to have:
> {code}SortField(String field, int type, FieldCache.Parser parser, boolean 
> reverse){code}
> and FieldCache.Parser is a super-interface (just empty, more like a 
> marker-interface) of all other parsers (like LongParser...). The sort 
> implementation then must be changed to respect the given parser (if not 
> NULL), else use the default FieldCache.getXXXX without parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

Reply via email to