On Oct 22, 2007, at 10:09 AM, Marvin Humphrey wrote:
The conclusion I reached was that you needed to have a dedicated
TermEnum for each field, implying individual term dictionary files
(.tis, .tii).
I realized that I needed to explain this.
If KS allows users to supply Perl sort subs as collators, the cost
per comparison will be high. This doesn't scale well for large
result sets.
One solution is to move the sorting cost to index-time for individual
fields. Since KS has global field semantics, it's possible to
associate a collator with a field name, and sort terms within the
term dictionary by it. However, using multiple collators within the
same term dictionary is messy, because it's difficult to decide which
one you should be using at any given point during a scan. Using a
dedicated TermEnum for each field cleans that up.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]