James åé:
Hi Che-

The presort method was our first approach but this doesn't work
in practice because we update the index incrementally and insertion order
doesn't match date ordering as we add updates.

I don't think sorting top hits only will deliver what the user is
expecting -- that is, results listed in most-recent-first order. Is there a better way to do it?
I don't think a single index can match both your requirment on speed and compacity.
How about split databased into multi level: daily / weekly / monthly if searcher can get enough top hits in top db.



BTW, we're not seeing a ridiculous performance degredation arising out of sorts on large result sets. But on the other hand, sort doesn't seem to be working very well, so far...
I check the HitCollector: seem Lucene will pre-fetch first 100 results by default. Lucene with pre-fetch to 200 results only if you walk to 101.

BTW: I feel lucene is getting too complex after adding to many functions on customized sorting and fuzzy query.


Regards,
James


Che Dong
http://www.chedong.com/


--- Che Dong <[EMAIL PROTECTED]> wrote:

Just like Google said: full text search service is not traditional database application. Lucene is not a database too: if you wanna sort on some fields, you'd better pre-sort it before it indexed: like date. then get results by doc id.

For lucene you can only sort results in top hits. if you sort 400k result hits by date: you lost the speed of Lucene.


Thanks

Che Dong
http://www.chedong.com/

Erik Hatcher ÃââÃÂâ:

On Apr 21, 2005, at 5:22 PM, James Levine wrote:


I have an index of around 3 million records, and typical queries
can result in result sets of between 1 and 400,000 results.

We have indexed "dateTime" fields in the form 20050415142, that is, to
10-minute precision.

When I try to sort queries I get something back that is roughly sorted
on index, but not quite. Stuff is out of order just a bit. The
size of the result set does not seem to be related occurance of
this problem.

We've tried lucene 1.4-final and1.4.3.

my code looks like this

s = new Sort( new SortField[] { new SortField( "dateTime", SortField.STRING,
true ), SortField.FIELD_SCORE } );


...

hits = searcher.search( qry, s );


Any help is appreciated, I'm so far baffled by this problem.


I don't have a solution, but rather some questions to check.... are all dateTime's the same width, zero padded on the right? Does every document have a dateTime field?

I recommend you sort with type INT instead of STRING if it fits, or maybe LONG. STRING will use the most resources for sorting.

   Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to