Let's back up a minute. The number of matched records is not
important when sorting, what's important is the number of unique
terms in the field being sorted. Do you have any figures on that? One
very common sorting issue is sorting on very fine date time resolutions,
although your examples don't include that...

Now, cache loading is an issue. The very first time you sort on a field,
all the values are read into a cache. Is it possible this is the source
of your problems? You can cure this with warmup queries. The take-away
is that measuring the response time for the first sorted query is
very misleading.

Although if you're sorting on many, many, many email addresses,
that could be "interesting".

The comment "returning 10,000,000 documents" is, I hope, a
misstatement. If you're trying to *return* 10M docs sorting
is irrelevant compared to assembling that many documents. If
you're trying to return the first 100 of 10M documents, it should
work.

Overall, we need more details on what you're sorting and what
you're trying to return as well as how you're measuring before
we can say much....

Along with how much memory you're giving your JVM to work with,
what "exploding" means. Are you CPU bound? IO bound? Swapping?
You need to characterize what is going wrong before worrying about
solutions......

Best
Erick

On Mon, Aug 16, 2010 at 3:08 PM, Michel Nadeau <aka...@gmail.com> wrote:

> Hi,
>
> we are building an application using Lucene and we have HUGE data sets (our
> index contains millions and millions and millions of documents), which
> obviously cause us very important problems when sorting. In fact, we
> disabled sorting completely because the servers were just exploding when
> trying to sort results in RAM. The users using the system can search for
> whatever they want, so we never know how many results will be returned - a
> search can return 10 documents (no problem with sorting) or 10,000,000
> (huge
> sorting problems).
>
> I woke up this morning and had a flash : is it possible with Lucene to have
> a "natural sorting" of documents? For example, let's say I have 3 columns I
> want to be able to sort by : first name, last name, email; I would have 3
> different indexes with the very same data but with a different primary key
> for sorting. I know it's far fetched, and I have never seen anything like
> that since I use Lucene, but we're just desperate... how people do to have
> huge data sets, a lot of users, and sort!?
>
> Thanks,
>
> Mike
>

Reply via email to