Hi, Paul,

I think to warm-up or not, it needs some benchmarking for specific
application.

For the implementation of the sort fields, when I talk about norms in
Lucene, I am thinking we could borrow the same implmentation of the norms to
do it.

But, on a higher level, my idea is really just to create an array of
integers for each sort field. The array length is NumOfDocs in the index.
Each integer corresponds to a displayable string value. For example, if you
have a field of different colors, you can assign integers like this:

0 <=> whilte
1 <=> blue
2 <=> yellow
...

Thus, you don't need to use strings for sorting. For example, if you have
document number 0,1,2, which stores colors blue, white, yellow respectively,
the array would be:

{1, 0, 2}.

To do sorting, this array could be pre-loaded into memory (warming up the
index), or, during collecting the hits (in HitCollector), the relevant
integer values could be loaded from disk given a doc id.

If you have 10 million documents, for one sort field, you will have 10x4=40
MB array.

Cheers,

Jian


On 4/9/07, Paul Smith <[EMAIL PROTECTED]> wrote:

>
> In our application, we have to sync up the index pretty frequently,
> the
> warm-up of the index is killing it.
>

Yep, it speeds up the first sort, but at the cost of making all the
others slower (maybe significantly so).  That's obviously not ideal
but could make use of sorts in larger indexes practical.

> To address your concern about single sort locale, what about
> creating a sort
> field for each sort locale? So, if you have, say, 10 locales, you
> will have
> 10 sort fields, each utilizing the mechanism of constructing the
> norms.
>

I really don't understand norms properly so I'm not sure exactly how
that would help.  I'll have to go over your original email again to
understand.

My main goal is to get some discussion going amongst the community,
which hopefully we've kicked along.


Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to