This is real and not just for very short docs. The reflection overhead is pretty expensive I think. here are some stats from the hamshari corpus (i have been trec testing persian just to ensure everything is ok)
SimpleAnalyzer: (has reusableTokenStream) Total time: 47816 ms Unique tokens: 441660 PersianAnalyzer (no reuse): Total time: 53928 ms Unique tokens: 438286 PersianAnalyzer (with reusableTokenStream) Total time: 47704 ms Unique tokens: 438286 On Mon, Aug 10, 2009 at 10:35 AM, Mark Miller<markrmil...@gmail.com> wrote: > Discussion on speed of new TokenStream API in Solr. > > see: > http://search.lucidimagination.com/search/document/d0040ebe6addad4b/indexing_slowdown_with_latest_lucene_udpate > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org