[ 
https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Cowan updated LUCENE-806:
------------------------------

    Attachment: lucene-806.patch

Just to clarify, the specific issue here is that RuleBasedCollator (the only 
concrete implementation of Collator in the JDK, and the one that is always 
returned by Collator.getInstance(Locale)) has a synchronized compare(), meaning 
that if you have many threads building FieldSortedHitQueues with large results 
and locale-sensitive sorting, and they share a Collator, they end up waiting 
for each other in that method (which can obviously be called tens of thousands 
of times during a very large search). The way to get a Collator is by calling 
Collator.getInstance(Locale), which makes it look like this problem is the 
JDK's fault; however, Collator.getInstance(Locale) actually returns a clone() 
of the object from the cache. The caching mechanism seems to be to prevent 
having to rebuld the rule tables, rather than the objects themselves. 
Therefore, the JDK version balances performance and thread safety quite well. 

On the Lucene end, though, the FieldSortedHitQueue implements its own caching 
mechanism, meaning that the generated ScoreDocComparators are cached (with no 
way to disable this behaviour, even if you wanted to). Therefore, one you've 
got your comparator (the unique key being {reader, fieldname, type, locale, 
factory}), every search using sorting on the same field in the same way on the 
same reader will use the same Collator, possibly causing a synchronization 
bottleneck. Even providong your own factory to the SortField doesn't REALLY 
help, as they're cached one level 'above' that (you can work around this; see 
below)

Attached is a patch which provides a 'quick and dirty' way of dealing with 
this. NOTE: THIS PATCH IS NOT PRODUCTION QUALITY, it's just a proof of concept. 
If people like the idea, I'll tidy it up substantially.

This works by adding a flag, usePerThreadLocaleComparators, set by a static 
method, to FieldSortedHitQueue. If this flag is NOT set, behaviour remains the 
same. If it's set to true, however, createValue calls a per-thread version of 
comparatorStringLocale, which returns a simple wrapper ScoreDocComparator which 
delegates to the one created by comparatorStringLocale, using a ThreadLocal to 
make sure they're not shared between threads.

For demonstration purposes, I have added a quick demo main() method to 
FieldSortedHitQueue, which does a simple timing test -- 20 threads each 
inserting 20000 dummy documents into a 200-element FieldSortedHitQueue. Note 
that it uses CountDownLatches to coordinate the threads, so this dummy test 
will only run under Java 5. Sorry, but as a quick demo it will do for now. By 
changing the values of
    final int threadCount = 20;
    final int docCount = 10000;
    final int queueSize = 200;
you can change the parameters I mentioned above. However, the figures seem to 
roughly the same proportion regardless of how high or low those numbers are, 
within reason; the parameters provided are  enough to spend a LOT of time 
waiting for locks; making them higher doesn't really make that much difference. 
If anything, making the queuesize larger makes the new version of the code 
(with the flag set) look better in comparison. On my dev machine (1.8 GHz 
Celeron laptop) the test as-is gives the following figures:
   Shared=5219ms
   PerThread=2140ms
this is a pretty substantial difference, and (I think) makes it worth pursuing 
this further. Your mileage may vary, but between 2x-4x faster seems typical for 
anywhere above, say, 5 threads, 1000 docs, and queue size of 50.

So if people are happy for me to proceed down this path, I'm equally happy to 
tidy up and produce a cleaner, documented etc. version of this patch. However, 
the more I look at this, the more I'd like to refactor this code -- it's not 
the nicest code in Lucene, and I think it could be tidied up (personally). My 
proposal would be something along the lines of changing all those static 
methods in FieldSortedHitQueue (comparatorXXXX) to be implementations of 
SortComparatorSource. There'd be a StringComparatorSource, 
AutoComparatorSource, etc. Everything in FSHQ would be made to deal with the 
SortComparatorSources only, abstracting out all the hard work. The logic of the 
create() method  could be replaced by a PerFieldComparatorSource, which 
produces one or more of the others depending on the field type, much as it does 
now. Everything else could be implemented (possibly) using the Decorator 
pattern to implement new SortComparatorSources. Namely, a 
CachingComparatorSource would use some sort of caching mechanism (possibly the 
FieldCacheImpl.Cache, as it does now, though that seems like an odd coupling) 
to cache the SortComparatorSources produced by the PerFieldComparatorSource; 
then we're basically back where we are now. Along comes the 
PerThreadComparatorSource, which uses ThreadLocals to do basically what my 
patch above does. All these classes would be available externally, for people 
to wrap around their own SortComparatorSources when setting up their 
SortFields; if no factory is provided in the SortField, things work much as 
they do now.

What do people think? Is the quick and dirty way (a) worthwhile, and (b) good 
enough? Should I look at implementing the bigger, fancier patch which will be 
more work and more complicated but ultimately (I think) make 
FieldSortedHitQueues much cleaner? Or is there another alternative (for 
example, another low-impact option would be doing something like what  I've 
done now, but instead using a ThreadLocal in a Comparator implementation, which 
could mean that the API for FieldSortedHitQueue doesn't need to change at all). 
Or is this not worth making part of the source tree, given that there are ways 
around it (supplying your own SortComparatorSource which manages its own 
ThreadLocals).  The performance gain IS substantial, though...

Apologies for length, this is quite a confusing area and I wanted to make sure 
I hadn't forgotten anything.

> Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-806
>                 URL: https://issues.apache.org/jira/browse/LUCENE-806
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.0
>            Reporter: Paul Cowan
>            Priority: Minor
>         Attachments: lucene-806.patch
>
>
> The below is from a post by (my colleague) Paul Smith to the java-users list:
> ---
> Hi ho peoples.
> We have an application that is internationalized, and stores data from many 
> languages (each project has it's own index, mostly aligned with a single 
> language, maybe 2).
> Anyway, I've noticed during some thread dumps diagnosing some performance 
> issues, that there appears to be a _potential_ synchronization bottleneck 
> using Locale-based sorting of Strings.  I don't think this problem is the 
> root cause of our performance problem, but I thought I'd mention it here.  
> Here's the stack dump of a thread waiting:
> "http-1001-Processor245" daemon prio=1 tid=0x31434da0 nid=0x3744 waiting for 
> monitor entry [0x2cd44000..0x2cd45f30]
>         at java.text.RuleBasedCollator.compare(RuleBasedCollator.java)
>         - waiting to lock <0x6b1e8c68> (a java.text.RuleBasedCollator)
>         at 
> org.apache.lucene.search.FieldSortedHitQueue$4.compare(FieldSortedHitQueue.java:320)
>         at 
> org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:114)
>         at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:120)
>         at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:47)
>         at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:58)
>         at 
> org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:90)
>         at 
> org.apache.lucene.search.FieldSortedHitQueue.insert(FieldSortedHitQueue.java:97)
>         at 
> org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:47)
>         at 
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
>         at 
> com.aconex.index.search.FastLocaleSortIndexSearcher.search(FastLocaleSortIndexSearcher.java:90)
> .....
> In our case we had 12 threads waiting like this, while one thread had the 
> lock on the RuleBasedCollator.  Turns out RuleBasedCollator's.compare(...) 
> method is synchronized.  I wonder if a ThreadLocal based collator would be 
> better here... ?  There doesn't appear to be a reason for other threads 
> searching the same index to wait on this sort.  Be just as easy to use their 
> own.  (Is RuleBasedCollator a "heavy" object memory wise?  Wouldn't have 
> thought so, per thread)
> Thoughts?
> ---
> I've investigated this somewhat, and agree that this is a potential problem 
> with a series of possible workarounds. Further discussion (including 
> proof-of-concept patch) to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to