On Apr 2, 2007, at 2:50 PM, Steven Parkes wrote:
On the one hand, creating separate per-article files is "clean" in
that
when you then ingest, you only have disk i/o that's going to affect
the
ingest performance (as opposed to, say, uncompressing/parsing). On the
other hand, that's a lot of disk i/o (compresses by about 5X) and a
lot
of directory lookups.
One reason I was expanding the elements into individual files was so
that I could compare different libraries against Lucene, including
those in other languages. It was important to measure the engines
themselves, not SGML parsers.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]