On Oct 22, 2007, at 10:09 AM, Marvin Humphrey wrote:

The conclusion I reached was that you needed to have a dedicated TermEnum for each field, implying individual term dictionary files (.tis, .tii).

I realized that I needed to explain this.

If KS allows users to supply Perl sort subs as collators, the cost per comparison will be high. This doesn't scale well for large result sets.

One solution is to move the sorting cost to index-time for individual fields. Since KS has global field semantics, it's possible to associate a collator with a field name, and sort terms within the term dictionary by it. However, using multiple collators within the same term dictionary is messy, because it's difficult to decide which one you should be using at any given point during a scan. Using a dedicated TermEnum for each field cleans that up.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to