[jira] [Comment Edited] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

Uwe Schindler (JIRA) Fri, 12 Apr 2013 07:56:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630125#comment-13630125
 ]


Uwe Schindler edited comment on LUCENE-4930 at 4/12/13 2:55 PM:
----------------------------------------------------------------

bq. The reason is that Lucene internally creates TokenStream objects when you 
pass a token array to IndexableField, and doesn’t reuse them,

Thats the source of the issue. The problem is not contention in thgis specific 
part (addAttribute), the source is the heavy cost of creating new TokenStreams. 
This was a change in Lucene 4.0 (I am not happy with it and Robert and me 
already discussed about it a while ago). In Lucene 3.x, for single token 
fields, IndexWriter/DocInverter had a private single-token AttributeSource 
reused for new fields. Unfortunately with StringField this is gone - and indeed 
it creates a new single-token TokenStream over and over. This should be fixed 
to behave like Lucene 3.x (reuse a SingeToken TokenStream in StringField).
                
      was (Author: thetaphi):
    bq. The reason is that Lucene internally creates TokenStream objects when 
you pass a token array to IndexableField, and doesn’t reuse them,

Thats the source of the issue. The problem is not contention in thgis specific 
part (addAttribute), the source is the heavy cost of creating new TokenStreams. 
This was a change in Lucene 4.0 (I am not happy with it and Robert and me 
already discussed about it a while ago). In Lucene 4.0, for single token 
fields, IndexWriter/DocInverter had a private single-token AttributeSource 
reused for new fields. Unfortunately with StringField this is gone - and indeed 
it creates a new single-token TokenStream over and over. This should be fixed 
to behave like Lucene 3.x (reuse a SingeToken TokenStream in StringField).
                  
> Lucene's use of WeakHashMap at index time prevents full use of cores on some 
> multi-core machines, due to contention
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4930
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4930
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.2
>         Environment: Dell blade system with 16 cores
>            Reporter: Karl Wright
>         Attachments: thread_dump.txt
>
>
> Our project is not optimally using full processing power during under 
> indexing load on Lucene 4.2.0.  The reason is the 
> AttributeSource.addAttribute() method, which goes through a WeakHashMap 
> synchronizer, which is apparently single-threaded for a significant amount of 
> time.  Have a look at the following trace:
> "pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for 
> monitor entry [0x00007f47d19ed000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
>         - waiting to lock <0x00000005c5cd9988> (a 
> java.lang.ref.ReferenceQueue$Lock)
>         at 
> org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
>         at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
>         at 
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
>         at 
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
>         at 
> org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
>         at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
>         at 
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
>         at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
>         at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
> …
> We’ve had to make significant changes to the way we were indexing in order to 
> not hit this issue as much, such as indexing using TokenStreams which we 
> reuse, when it would have been more convenient to index with just tokens.  
> (The reason is that Lucene internally creates TokenStream objects when you 
> pass a token array to IndexableField, and doesn’t reuse them, and the 
> addAttribute() causes massive contention as a result.)  However, as you can 
> see from the trace above, we’re still running into contention due to other 
> addAttribute() method calls that are buried deep inside Lucene.
> I can see two ways forward.  Either not use WeakHashMap or use it in a more 
> efficient way, or make darned sure no addAttribute() calls are done in the 
> main code indexing execution path.  (I think it would be easy to fix 
> DocInverterPerField in that way, FWIW.  I just don’t know what we’ll run into 
> next.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

Reply via email to