[jira] Commented: (LUCENE-2482) Index sorter

Eks Dev (JIRA) Thu, 27 May 2010 13:46:03 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872357#action_12872357
 ]


Eks Dev commented on LUCENE-2482:
---------------------------------

nice! 
There is also another interesting use case for sorting index, performance and 
index size!

We use a couple of fields with low cardinality (zip code, user group... and 
likes). Having index sorted on these makes rle compression of  postings really 
effective, making it possible to load all values into couple of M-bytes of ram.
At a moment we just sort collection before indexing.

Would  it be possible somehow to use a combination of stored fields and to 
specify comparator? Even comparing them as byte[] would do the trick for this 
business case as it is only important to keep the same values together, order 
is irrelevant. Of course, having decoder to decode byte[] before comparing 
would be useful (e.g. for composite fields) , but would work in many cases 
without it.   

This works fine even with moderate update rate, as you can re-sort 
periodically. It does not have to be totally sorted, everything works, just 
slightly more memory is needed for filters

With flex, having postings that use rle compression is quite possible ... this 
tool could become "optimizeHard()" tool for some indexes :)

> Index sorter
> ------------
>
>                 Key: LUCENE-2482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2482
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Andrzej Bialecki 
>             Fix For: 3.1
>
>         Attachments: indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2482) Index sorter

Reply via email to