[
https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Cowan updated LUCENE-1372:
-------------------------------
Attachment: lucene-multisort.patch
Patch which deals with this in the case of Strings, with a test case. This is a
POC example; if people are happy with the approach I'll implement for the other
types (float, int, etc) as I think it makes sense there also.
> Proposal: introduce more sensible sorting when a doc has multiple values for
> a term
> -----------------------------------------------------------------------------------
>
> Key: LUCENE-1372
> URL: https://issues.apache.org/jira/browse/LUCENE-1372
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.3.2
> Reporter: Paul Cowan
> Priority: Minor
> Attachments: lucene-multisort.patch
>
>
> At the moment, FieldCacheImpl has somewhat disconcerting values when sorting
> on a field for which multiple values exist for one document. For example,
> imagine a field "fruit" which is added to a document multiple times, with the
> values as follows:
> doc 1: {"apple"}
> doc 2: {"banana"}
> doc 3: {"apple", "banana"}
> doc 4: {"apple", "zebra"}
> if one sorts on the field "fruit", the loop in
> FieldCacheImpl.stringsIndexCache.createValue() (and similarly for the other
> methods in the various FieldCacheImpl caches) does the following:
> while (termDocs.next()) {
> retArray[termDocs.doc()] = t;
> }
> which means that we look over the terms in their natural order and, on each
> one, overwrite retArray[doc] with the value for each document with that term.
> Effectively, this overwriting means that a string sort in this circumstance
> will sort by the LAST term lexicographically, so the docs above will
> effecitvely be sorted as if they had the single values ("apple", "banana",
> "banana", "zebra") which is nonintuitive. To change this to sort on the first
> time in the TermEnum seems relatively trivial and low-overhead; while it's
> not perfect (it's not local-aware, for example) the behaviour seems much more
> sensible to me. Interested to see what people think.
> Patch to follow.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]