Re: [jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

Marvin Humphrey Mon, 02 Apr 2007 14:59:34 -0700


On Apr 2, 2007, at 2:50 PM, Steven Parkes wrote:

On the one hand, creating separate per-article files is "clean" inthatwhen you then ingest, you only have disk i/o that's going to affectthe
ingest performance (as opposed to, say, uncompressing/parsing). On the
other hand, that's a lot of disk i/o (compresses by about 5X) and alot
of directory lookups.

One reason I was expanding the elements into individual files was sothat I could compare different libraries against Lucene, includingthose in other languages. It was important to measure the enginesthemselves, not SGML parsers.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

Reply via email to