On Mon, 2008-01-14 at 10:58 -0500, Alex Wang wrote:
> Toke, you mentioned "Using a Collator works but does take a fair amount
> of memory", can you please elaborate a little more on that. Thanks.

We have an index with 10 million records that takes up 37GB. Practically
all records have a title, which can be used for sorting. The size of the
titles is about 450MB in total. Using a Collator the usual way means
that a CollatorKey is stored for all of those titles, among with the
titles themselves.

As for using a Collator, there's a neat tutorial at
http://wiki.epc.ub.uu.se/display/~ronnie/International+Sorting


Unfortunately for us, the memory requirements for this was too much for
our machine. Instead of using the build-in caching mechanism, we used
the fact that our index is currently static and calculated a single
integer for every document, stating its position relative to all other
documents in local sort order. That gives us fast sort with low memory
overhead. Of course we'll have to discard that idea when we move to a
non-static index.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to