730 msecs is the correct number for 10 * 16k docs with StandardTokenizer! The 11ms per doc figure in my post was for highlighlighting using a \ lower-case-filter-only analyzer. 5ms of this figure was the cost of the \ lower-case-filter-only analyzer.
73 msecs is the cost of JUST StandardTokenizer (no highlighting) StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It \ tries to keep certain text eg email addresses as one term. I can live without it and \ I suspect most apps can too. I haven't looked into why its slow but I notice it does \ make use of Vectors. I think a lot of people's highlighter performance issues may \ extend from this. Cheers Mark (apologies for the cross-post to lucene-dev - wrong group!) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]