[
https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905020#action_12905020
]
Toke Eskildsen commented on LUCENE-2369:
----------------------------------------
{quote}
ICU keys are just byte[] just like regular terms. they are "regular terms"
{quote}
Do they or do they not need to be loaded into heap in order to be used for
sorted search?
{quote}
Can we forget about the stupid runtime Locale sort, if you have a way to
improve memory usage for byte[] terms, lets look just at that? Then this could
be more general and more useful.
{quote}
Easy now. The whole runtime-vs-index-time issue is something that I don't care
much about at this point. Pre-sorting can be done both at index and search
time. Let's just say that we do it at index-time and go from there.
Not holding the sort-terms in memory (whether they be Strings, BytesRefs,
regular terms or ICU keys) and doing all possible sorting up front (in the case
of a hybrid ICU-approach: A merge-sort of the already sorted segments), is what
I'm looking at. Could you please re-read my comment with that in mind and see
if my breakdown and trade-off lists makes sense? It seems to me that you're
quite certain that there is something I've missed, but I haven't yet understood
what it is. I do know that ICU keys are just regular terms in the technical
sense. When I use the designation ICU keys, I do it to make it clear that we're
getting locale-specific ordering.
Deep breaths, ok? I'm going to fetch the kids from school, so you don't need to
rush your answer.
> Locale-based sort by field with low memory overhead
> ---------------------------------------------------
>
> Key: LUCENE-2369
> URL: https://issues.apache.org/jira/browse/LUCENE-2369
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Toke Eskildsen
> Priority: Minor
>
> The current implementation of locale-based sort in Lucene uses the FieldCache
> which keeps all sort terms in memory. Beside the huge memory overhead,
> searching requires comparison of terms with collator.compare every time,
> making searches with millions of hits fairly expensive.
> This proposed alternative implementation is to create a packed list of
> pre-sorted ordinals for the sort terms and a map from document-IDs to entries
> in the sorted ordinals list. This results in very low memory overhead and
> faster sorted searches, at the cost of increased startup-time. As the
> ordinals can be resolved to terms after the sorting has been performed, this
> approach supports fillFields=true.
> This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335
> which contain previous discussions on the subject.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]