[
https://issues.apache.org/jira/browse/LUCENE-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright updated LUCENE-4930:
--------------------------------
Attachment: thread_dump.txt
A complete thread dump of the application while running.
> Lucene's use of WeakHashMap at index time prevents full use of cores on some
> multi-core machines, due to contention
> -------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-4930
> URL: https://issues.apache.org/jira/browse/LUCENE-4930
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 4.2
> Environment: Dell blade system with 16 cores
> Reporter: Karl Wright
> Attachments: thread_dump.txt
>
>
> Our project is not optimally using full processing power during under
> indexing load on Lucene 4.2.0. The reason is the
> AttributeSource.addAttribute() method, which goes through a WeakHashMap
> synchronizer, which is apparently single-threaded for a significant amount of
> time. Have a look at the following trace:
> "pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for
> monitor entry [0x00007f47d19ed000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
> - waiting to lock <0x00000005c5cd9988> (a
> java.lang.ref.ReferenceQueue$Lock)
> at
> org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
> at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
> at
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
> at
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
> at
> org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
> at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
> at
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
> …
> We’ve had to make significant changes to the way we were indexing in order to
> not hit this issue as much, such as indexing using TokenStreams which we
> reuse, when it would have been more convenient to index with just tokens.
> (The reason is that Lucene internally creates TokenStream objects when you
> pass a token array to IndexableField, and doesn’t reuse them, and the
> addAttribute() causes massive contention as a result.) However, as you can
> see from the trace above, we’re still running into contention due to other
> addAttribute() method calls that are buried deep inside Lucene.
> I can see two ways forward. Either not use WeakHashMap or use it in a more
> efficient way, or make darned sure no addAttribute() calls are done in the
> main code indexing execution path. (I think it would be easy to fix
> DocInverterPerField in that way, FWIW. I just don’t know what we’ll run into
> next.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]