Hello,

I've been looking a bit more carefully at nightly benchmarks recently and
I'm puzzled by the fact that indexing spends almost 5% of the time on
AttributeSource#addAttribute. Here is the link
<http://people.apache.org/~mikemccand/lucenebench/2021.10.20.08.24.09.html#profiler_4kb_indexing_1_cpu>
.

4.37%         14731
org.apache.lucene.util.AttributeSource#addAttribute()
                              at
org.apache.lucene.document.Field$StringTokenStream#()
                              at
org.apache.lucene.document.Field#tokenStream()
                              at
org.apache.lucene.index.IndexingChain$PerField#invert()
                              at
org.apache.lucene.index.IndexingChain#processField()
                              at
org.apache.lucene.index.IndexingChain#processDocument()
                              at
org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments()
                              at
org.apache.lucene.index.DocumentsWriter#updateDocuments()
                              at
org.apache.lucene.index.IndexWriter#updateDocuments()
                              at
org.apache.lucene.index.IndexWriter#updateDocument()
                              at
org.apache.lucene.index.IndexWriter#addDocument()
                              at perf.IndexThreads$IndexThread#run()

Given that nightly benchmarks reuse Field instances across documents, this
should only happen once per thread, so why does it show up as a bottleneck
in our nightly benchmarks? I tried to reproduce locally, but I'm not seeing
AttributeSource among top CPU consumers.

-- 
Adrien

Reply via email to