On Apr 2, 2007, at 2:50 PM, Steven Parkes wrote:

On the one hand, creating separate per-article files is "clean" in that when you then ingest, you only have disk i/o that's going to affect the
ingest performance (as opposed to, say, uncompressing/parsing). On the
other hand, that's a lot of disk i/o (compresses by about 5X) and a lot
of directory lookups.

One reason I was expanding the elements into individual files was so that I could compare different libraries against Lucene, including those in other languages. It was important to measure the engines themselves, not SGML parsers.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to