Hello, Here's is a benchmark. I am not sure if that is proper etiquette, but I will just paste it into this mail and hope that it gets funneled into the right channels.
Cheers! Jochen <benchmark> <ul> <p> <b>Hardware Environment</b><br/> <li><i>Dedicated machine for indexing</i>no, some other work performed on it. shouldn't influence results much since it's a multiple processor machine</li> <li><i>CPU</i>2x Intel Xeon 3.05GHz</li> <li><i>RAM</i>4GB</li> <li><i>Drive configuration</i>SCSI</li> </p> <p> <b>Software environment</b><br/> <li><i>Java Version</i>1.4.2-b28</li> <li><i>Java VM</i>Java HotSpot Client VM 1.4.2</li> <li><i>OS Version</i>Redhat 8</li> <li><i>Location of index</i>local</li> </p> <p> <b>Lucene indexing variables</b><br/> <li><i>Number of source documents</i>5,000,000</li> <li><i>Total filesize of source documents</i>40GB</li> <li><i>Average filesize of source documents</i>8kB</li> <li><i>Source documents storage location</i>DB on remote server</li> <li><i>File type of source documents</i>pre-parsed HTML</li> <li><i>Parser(s) used, if any</i>n/a</li> <li><i>Analyzer(s) used</i>StandardAnalyzer</li> <li><i>Number of fields per document</i>5</li> <li><i>Type of fields</i>actual text is indexed but not stored in lucene index</li> <li><i>Index persistence</i>: Where the index is stored, e.g. FSDirectory, SqlDirectory, etc</li> </p> <p> <b>Figures</b><br/> <li><i>Time taken (in ms/s as an average of at least 3 indexing runs)</i>332 minutes</li> <li><i>Time taken / 1000 docs indexed</i>4 sec</li> <li><i>Memory consumption</i>about 100MB</li> </p> <p> <b>Notes</b><br/> <li><i>Notes</i>With the above configuration we pretty consistently achieve a 250 docs / sec rate of indexing. The actual text cannot be retrieved from the index, this keeps the index size down (6.1GB) and increases indexing speed. When the actual documents are stored in the index the rate drops by about 30% to 160 docs / sec.</li> </p> </ul> </benchmark> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]