[
https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563044#comment-17563044
]
LuYunCheng edited comment on LUCENE-10627 at 7/6/22 8:20 AM:
-------------------------------------------------------------
[~jpountz] Hi, I try to use ByteBuffersDataInput to reduce memory copy because
it can get from ByteBuffersDataOutput.toDataInput.
And i am not sure whether can change Compressor interface compress input param
from byte[] to ByteBuffersDataInput
at this commit,
# using ByteBuffersDataInput to reduce memory copy in
{{CompressingStoredFieldsWriter}} doing {{flush}}
# using ByteBuffersDataInput to reduce memory copy in
{{CompressingTermVectorsWriter}} doing {{flush}}
# {{{}{}}}using ByteBuffer to *reduce memory copy* in
*{{CompressingStoredFieldsWriter}} doing {{copyOneDoc}}*
# {{{}{}}}replace compressor interface param from byte[] to
ByteBuffersDataInput
{{i also do the runStoredFieldsBenchmark with jvm StatisticsHelper it shows as
following:}}
||Msec to index||BEST_SPEED ||BEST_SPEED YGC
||BEST_COMPRESSION||BEST_COMPRESSION YGC||
|Baseline|317973|1176 ms (258 collections)|605492|1476 ms (264 collections)|
|Candidate|314765|1012 ms (238 collections)|601253|1175 ms (234 collections)|
{{ }}
was (Author: luyuncheng):
[~jpountz] Hi, I try to use ByteBuffersDataInput to reduce memory copy because
it can get from ByteBuffersDataOutput.toDataInput.
> Using CompositeByteBuf to Reduce Memory Copy
> --------------------------------------------
>
> Key: LUCENE-10627
> URL: https://issues.apache.org/jira/browse/LUCENE-10627
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs, core/store
> Reporter: LuYunCheng
> Priority: Major
>
> Code: [https://github.com/apache/lucene/pull/987]
> I see When Lucene Do flush and merge store fields, need many memory copies:
> {code:java}
> Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms
> elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable
> [0x00007f17718db000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
> at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
> at
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
> at
> org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
> {code}
> When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many
> memory copies:
> With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
> # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk
> compress
> # compressor copy dict and data into one block buffer
> # do compress
> # copy compressed data out
> With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
> # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk
> compress
> # do compress
> # copy compressed data out
>
> I think we can use CompositeByteBuf to reduce temp memory copies:
> # we do not have to *bufferedDocs.toArrayCopy* when just need continues
> content for chunk compress
>
> I write a simple mini benchamrk in test code ([link
> |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]):
> *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin
> elapse:5391ms , New elapse:5297ms
> *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin
> elapse:{*}115ms{*}, New elapse:{*}12ms{*}
>
> And I run runStoredFieldsBenchmark with doc_limit=-1:
> shows:
> ||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
> |Baseline|318877.00|606288.00|
> |Candidate|314442.00|604719.00|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]