[
https://issues.apache.org/jira/browse/LUCENE-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-4930:
----------------------------------
Attachment: LUCENE-4930.patch
bq. A possibly better option than making reap a no-op could be to only reap on
put. I mean one usually invokes get() but once that event of unloading an
interface actually happens and something new needs to be added one would reap
the old keys (in worst case perhaps one unloading later).
This is a good idea. The attached patch implements this (optionally). This
makes our own implementation better for such use cases like the original
WeakHashMap (which always reaps on get). As WeakIdentityMap is a internal API,
I made the setting a required parameter on creation.
Currently I pass false for the maps which have few keys, many gets and where
unlikely keys are removed by the garbage collector.
For e.g. MMapDirectory, I have to pass true (but it does not matter, as this
map is never read, only one time on close() of the master ByteBufferIndexInput.
> Lucene's use of WeakHashMap at index time prevents full use of cores on some
> multi-core machines, due to contention
> -------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-4930
> URL: https://issues.apache.org/jira/browse/LUCENE-4930
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 4.2
> Environment: Dell blade system with 16 cores
> Reporter: Karl Wright
> Attachments: LUCENE-4930.patch, thread_dump.txt
>
>
> Our project is not optimally using full processing power during under
> indexing load on Lucene 4.2.0. The reason is the
> AttributeSource.addAttribute() method, which goes through a WeakHashMap
> synchronizer, which is apparently single-threaded for a significant amount of
> time. Have a look at the following trace:
> "pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for
> monitor entry [0x00007f47d19ed000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
> - waiting to lock <0x00000005c5cd9988> (a
> java.lang.ref.ReferenceQueue$Lock)
> at
> org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
> at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
> at
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
> at
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
> at
> org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
> at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
> at
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
> …
> We’ve had to make significant changes to the way we were indexing in order to
> not hit this issue as much, such as indexing using TokenStreams which we
> reuse, when it would have been more convenient to index with just tokens.
> (The reason is that Lucene internally creates TokenStream objects when you
> pass a token array to IndexableField, and doesn’t reuse them, and the
> addAttribute() causes massive contention as a result.) However, as you can
> see from the trace above, we’re still running into contention due to other
> addAttribute() method calls that are buried deep inside Lucene.
> I can see two ways forward. Either not use WeakHashMap or use it in a more
> efficient way, or make darned sure no addAttribute() calls are done in the
> main code indexing execution path. (I think it would be easy to fix
> DocInverterPerField in that way, FWIW. I just don’t know what we’ll run into
> next.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]