Add "tokenize documents only" task to contrib/benchmark -------------------------------------------------------
Key: LUCENE-967 URL: https://issues.apache.org/jira/browse/LUCENE-967 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.3 Attachments: LUCENE-967.patch I've been looking at performance improvements to tokenization by re-using Tokens, and to help benchmark my changes I've added a new task called ReadTokens that just steps through all fields in a document, gets a TokenStream, and reads all the tokens out of it. EG this alg just reads all Tokens for all docs in Reuters collection: doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker doc.maker.forever=false {ReadTokens > : * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]