We're using lucene with one large target index which right now is 5G. Every night we take sub-indexes which are about 500M and merging them into this main index. This merge (done via IndexWriter.addIndexes(Directory[]) is taking way too much time.

Looking at the stats for the box we're essentially blocked on reads. The disk is blocked on read IO and CPU is at 5%. If I'm right I think this could be minimized by continually picking the two smaller indexes, merging them, then picking the next two smallest, merging them, and then keep doing this until we're down to one index.

Does this sound about right?

--

Please reply using PGP.

http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to