After profiling in-memory indexing, I noticed that calls to String.intern() showed up surprisingly high; especially the one from Field() constructor. This is understandable due to overhead String.intern() has (being native and synchronized method; overhead incurred even if String is already interned), and the fact this essentially gets called once per document+field combination.
Now, it would be quite easy to improve things a bit (in theory), such that most intern() calls could be avoid, transparent to the calling app; for example, for each IndexWriter() one could use a simple HashMap() for caching interned Strings. This approach is more than twice as fast as directly calling intern(). One could also use per-thread cache, or global one; all of which would probably be faster. However, Field constructor hard-codes call to intern(), so it would be necessary to add a new constructor that indicates that field name is known to be interned. And there would also need to be a way to invoke the new optional functionality. Has anyone tried this approach to see if speedup is worth the hassle (in my case it'd probably be something like 2 - 3%, assuming profiler's 5% for intern() is accurate)? -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
