[ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606052#action_12606052 ]
Rene Schwietzke commented on LUCENE-1308: ----------------------------------------- A wrote a small test case that runs a single thread search, as well as a multithreaded search using the same indexsearcher. Especially when running in a threaded context, the replacement of String.intern() pays off. Even the single thread is faster. I measured the following numbers: String.Intern, Single Searcher [main] Search took: 3453ms [Thread-2] Search took: 17812ms [Thread-3] Search took: 18313ms [Thread-1] Search took: 18234ms [Thread-0] Search took: 18562ms WeakHashMap, Single Searcher [main] Search took: 3156ms [Thread-3] Search took: 14953ms [Thread-1] Search took: 15593ms [Thread-0] Search took: 15656ms [Thread-2] Search took: 16188ms ConcurrentHashMap, Single Searcher [main] Search took: 2844ms [Thread-1] Search took: 14812ms [Thread-0] Search took: 14890ms [Thread-2] Search took: 15172ms [Thread-3] Search took: 14656ms > Remove String.intern() from Field.java to increase performance and lower > contention > ----------------------------------------------------------------------------------- > > Key: LUCENE-1308 > URL: https://issues.apache.org/jira/browse/LUCENE-1308 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.3.2 > Reporter: Rene Schwietzke > Attachments: yad.zip > > > Right now, *document.Field is interning all field names. While this makes > sense because it lowers the overall memory consumption, the method intern() > of String is know to be difficult to handle. > 1) it is a native call and therefore slower than anything on the Java level > 2) the String pool is part of the perm space and not of the general heap, so > it's size is more restricted and needs extra VM params to be managed > 3) Some VMs show GC problems with strings in the string pool > Suggested solution is a WeakHashMap instead, that takes care of unifying the > String instances and at the same time keeping the pool in the heap space and > releasing the String when it is not longer needed. For extra performance in a > concurrent environment, a ConcurrentHashMap-like implementation of a weak > hashmap is recommended, because we mostly read from the pool. > We saw a 10% improvement in throughout and response time of our application > and the application is not only doing searches (we read a lot of documents > from the result). So a single measurement test case could show even more > improvement in single and concurrent usage. > The Cache: > /** Cache to replace the expensive String.intern() call with the java version > */ > private final static Map<String, WeakReference<String>> unifiedStringsCache = > Collections.synchronizedMap(new WeakHashMap<String, > WeakReference<String>>(109)); > The access to it, instead of this.name = name.intern; > // unify the strings, but do not use the expensive String.intern() version > // which is not "weak enough", uses the perm space and is a native call > String unifiedName = null; > WeakReference<String> ref = unifiedStringsCache.get(name); > if (ref != null) > { > unifiedName = ref.get(); > } > if (unifiedName == null) > { > unifiedStringsCache.put(name, new WeakReference(name)); > unifiedName = name; > } > this.name = unifiedName; > I guess it is sufficient to have mostly all fields names interned, so I > skipped the additional synchronization around the access and take the risk > that only 99.99% :) of all field names are interned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]