Cass,
Thanks for converting it. I've posted it to my blog:
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
Sorry for the XML tags: I guess I followed the instructions on the
Lucene performance benchmarks page to literally ("Post these figures
to the lucene-user mailing list using this template.").
Sorry if it hurt your eyes! :-)
-Glen
On 15/04/2008, Cass Costello <[EMAIL PROTECTED]> wrote:
> I just did that so I could read it. :) I'll leave it up until Glen resends
> or posts it somewhere...
> http://www.casscostello.com/?page_id=28
>
>
>
>
> On Tue, Apr 15, 2008 at 5:18 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:
>
> > Hi Glen.
> > can you resend this in plain text?
> > or put the HTML up on a server somewhere and point to it with a brief
> > summary in the post?
> > I'd love to look and read it, all those tags are making me go blind.
> >
> >
> > Glen Newton wrote:
> >
> > > <benchmark>
> > > <ul>
> > > <p>
> > > <b>Hardware Environment</b><br/>
> > > <li><i>Dedicated machine for indexing</i>: yes</li>
> > > <li><i>CPU</i>: Dual processor dual core Xeon CPU 3.00GHz;
> > > hyperthreading ON for 8 virtual cores</li>
> > > <li><i>RAM</i>: 8GB</li>
> > > <li><i>Drive configuration</i>: Dell EMC AX150 storage array fibre
> > > channel</li>
> > > </p>
> > > <p>
> > > <b>Software environment</b><br/>
> > > <li><i>Lucene Version</i>: 2.3.1</li>
> > > <li><i>Java Version</i>: Java(TM) SE Runtime Environment (build
> > > 1.6.0_02-b05)</li>
> > > <li><i>Java VM</i>: Java HotSpot(TM) 64-Bit Server VM (build
> > > 1.6.0_02-b05, mixed mode)</li>
> > > <li><i>OS Version</i>: Linux OpenSUSE 10.2 (64-bit X86-64)</li>
> > > <li><i>Location of index</i>: Filesystem, on attached storage</li>
> > > </p>
> > > <p>
> > > <b>Lucene indexing variables</b><br/>
> > > <li><i>Number of source documents</i>: 6,404,464</li>
> > > <li><i>Total filesize of source documents</i>: 141GB; Note that this
> > > is only the full-text: the metadata (title, author(s), abstract,
> > > keywords, journal name) are in addition to this</li>
> > > <li><i>Average filesize of source documents</i>:
> > > 22KB + metadata (see above)</li>
> > > <li><i>Source documents storage location</i>: Where are the documents
> > > being indexed located?
> > > Filesystem</li>
> > > <li><i>File type of source documents</i>: text (PDFs converted to
> > > text then gzipped)</li>
> > > <li><i>Parser(s) used, if any</i>: None, but files GZIPed & had to
> > > be un-gziped by Java application which also did indexing</li>
> > > <li><i>Analyzer(s) used</i>: StandardAnalyzer</li>
> > > <li><i>Number of fields per document</i>: 24</li>
> > > <li><i>Type of fields</i>: all text; 20 stored; 3 of indexed
> > > tokenized with term vector (full-text [not stored], title, abstract);
> > > 10 stored with no parsing; </li>
> > > <li><i>Index persistence</i>: FSDirectory</li>
> > > <li><i>Index size</i>: 83GB</li>
> > > <li><i>Number of terms</i>: 143,298,010</li>
> > > </p>
> > > <p>
> > > <b>Figures</b><br/>
> > > <li><i>Time taken (in ms/s as an average of at least 3 indexing
> > > runs)</i>: 20.5 hours</li>
> > > <li><i>Time taken / 1000 docs indexed</i>: 11.5 seconds </li>
> > > <li><i>Memory consumption</i>: -Xms4000m -Xmx6000m</li>
> > > <li><i>Query speed</i>: average time a query takes, type
> > > of queries (e.g. simple one-term query, phrase query),
> > > not measuring any overhead outside Lucene</li>
> > > </p>
> > > <p>
> > > <b>Notes</b><br/>
> > > <li><i>Notes</i>:
> > > <ul>
> > > <li>
> > > These are journal articles, so the additional fields besides
> > > the
> > > full-text are bibliographic metadata, such as title, authors,
> > > abstract, keywords, journal name, volume, issue, start page, year.
> > > </li>
> > > <li>Java command line directives: -XX:+AggressiveOpts
> > > -XX:+ScavengeBeforeFullGC -XX:-UseParallelGC -server -Xms4000m
> > > -Xmx6000m
> > > </li>
> > > <li>Highly multithreaded & pipelined architecture using
> > > java.util.concurrent.ThreadPoolExecutor
> > > </li>
> > > <li>File system file reading and Un-gzip performed multithreaded
> > > </li>
> > > <li>Eight separate parallel IndexWriters are fed by the pipeline
> > > (creation of Document objects occurs in parallel with 64 threads),
> > > merged at end into single index. Each parallel index had slightly
> > > different RAM_BUFFER_SIZE_MB (64, 67, 70, 73, 76, 79, 83, 85 MB
> > > respectively), so that flushing wouldn't all happen at the same time.
> > > </li>
> > > <li>
> > > Contact: glen DOT newton AT nrc-cnrc DOT gc DOT ca
> > > </li>
> > >
> > > </ul>
> > > </li>
> > > </p>
> > > </ul>
> > > </benchmark>
> > >
> > >
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
>
> --
> Lego timeline:
> http://cache.gizmodo.com/assets/resources/2008/01/lego-brick4-timeline.jpg
>
--
-
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]