[ 
https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606052#action_12606052
 ] 

Rene Schwietzke commented on LUCENE-1308:
-----------------------------------------

A wrote a small test case that runs a single thread search, as well as a 
multithreaded search using the same indexsearcher. Especially when running in a 
threaded context, the replacement of String.intern() pays off. Even the single 
thread is faster.

I measured the following numbers:

String.Intern, Single Searcher
[main] Search  took: 3453ms
[Thread-2] Search  took: 17812ms
[Thread-3] Search  took: 18313ms
[Thread-1] Search  took: 18234ms
[Thread-0] Search  took: 18562ms

WeakHashMap, Single Searcher
[main] Search  took: 3156ms
[Thread-3] Search  took: 14953ms
[Thread-1] Search  took: 15593ms
[Thread-0] Search  took: 15656ms
[Thread-2] Search  took: 16188ms

ConcurrentHashMap, Single Searcher
[main] Search  took: 2844ms
[Thread-1] Search  took: 14812ms
[Thread-0] Search  took: 14890ms
[Thread-2] Search  took: 15172ms
[Thread-3] Search  took: 14656ms

> Remove String.intern() from Field.java to increase performance and lower 
> contention
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1308
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3.2
>            Reporter: Rene Schwietzke
>         Attachments: yad.zip
>
>
> Right now, *document.Field is interning all field names. While this makes 
> sense because it lowers the overall memory consumption, the method intern() 
> of String is know to be difficult to handle. 
> 1) it is a native call and therefore slower than anything on the Java level
> 2) the String pool is part of the perm space and not of the general heap, so 
> it's size is more restricted and needs extra VM params to be managed
> 3) Some VMs show GC problems with strings in the string pool
> Suggested solution is a WeakHashMap instead, that takes care of unifying the 
> String instances and at the same time keeping the pool in the heap space and 
> releasing the String when it is not longer needed. For extra performance in a 
> concurrent environment, a ConcurrentHashMap-like implementation of a weak 
> hashmap is recommended, because we mostly read from the pool.
> We saw a 10% improvement in throughout and response time of our application 
> and the application is not only doing searches (we read a lot of documents 
> from the result). So a single measurement test case could show even more 
> improvement in single and concurrent usage.
> The Cache:
> /** Cache to replace the expensive String.intern() call with the java version 
> */
> private final static Map<String, WeakReference<String>> unifiedStringsCache =
>    Collections.synchronizedMap(new WeakHashMap<String, 
> WeakReference<String>>(109));
> The access to it, instead of this.name = name.intern;
> // unify the strings, but do not use the expensive String.intern() version
> // which is not "weak enough", uses the perm space and is a native call
> String unifiedName = null;
> WeakReference<String> ref = unifiedStringsCache.get(name);
> if (ref != null)
> {
>     unifiedName = ref.get();
> }
> if (unifiedName == null)
> {
>     unifiedStringsCache.put(name, new WeakReference(name));
>     unifiedName = name;
> }
> this.name = unifiedName;
> I guess it is sufficient to have mostly all fields names interned, so I 
> skipped the additional synchronization around the access and take the risk 
> that only 99.99% :) of all field names are interned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to