Re: Possible to remove duplicate documents in sort API?

2004-09-07 Thread Doug Cutting
Kevin A. Burton wrote:
My problem is that I have two machines... one for searching, one for 
indexing.

The searcher has an existing index.
The indexer found an UPDATED document and then adds it to a new index 
and pushes that new index over to the searcher.

The searcher then reloads and when someone performs a search BOTH 
documents could show up (including the stale document).

I can't do a delete() on the searcher because the indexer doesn't have 
the entire index as the searcher.
I can think of a couple ways to fix this.
If the indexer box kept copies of the indexes that it has already sent 
to the searcher, then it can mark updated documents as deleted in these 
old indexes.  Then you can, with the new index, also distribute new .del 
files for the old indexes.

Alternately, you could, on the searcher box, before you open the new 
index, open an IndexReader on all of the existing indexes and mark all 
new documents as deleted in the old indexes.  This shouldn't take more 
than a few seconds.

IndexReader.delete() just sets a bit in a bit vector that is written to 
file by IndexReader.close().  So it's quite fast.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Possible to remove duplicate documents in sort API?

2004-09-05 Thread Kevin A. Burton
I want to sort a result set but perform a group by as well... IE remove 
duplicate items. 

Is this possible with the new API?  Seems like a huge drawback to lucene 
right now.

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]