[ https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
LuYunCheng updated LUCENE-10627: -------------------------------- Description: Code: [https://github.com/apache/lucene/pull/987] I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable [0x00007f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy compressed data out I think we can use -CompositeByteBuf- to reduce temp memory copies: # we do not have to *bufferedDocs.toArrayCopy* when just need continues content for chunk compress I write a simple mini benchamrk in test code ([link |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]): *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin elapse:{*}115ms{*}, New elapse:{*}12ms{*} And I run runStoredFieldsBenchmark with doc_limit=-1: shows: ||Msec to index||BEST_SPEED ||BEST_COMPRESSION|| |Baseline|318877.00|606288.00| |Candidate|314442.00|604719.00| -----------UPDATE----------- I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can get from ByteBuffersDataOutput.toDataInput. and it could reduce this complexity ([PR|https://github.com/apache/lucene/pull/987]) BUT i am not sure whether can change Compressor interface compress input param from byte[] to ByteBuffersDataInput. If change this interface [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35], it increased the backport code [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274], however if we change the interface with ByteBuffersDataInput, we can optimize memory copy into different compress algorithm code. Also, i found we can do more memory copy reduce in *{{{}CompressingStoredFieldsWriter.{}}}{{{}copyOneDoc [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L516] and CompressingTermVectorsWriter.flush{}}}* I think this commit just reduce memory copy, so we not only use one benchmark time metric but also use jvm gc time to see the improvement. so i try to add StatisticsHelper into StoredFieldsBenchmark.([code|https://github.com/luyuncheng/luceneutil/commit/e77c7c7bff01bb036b1826e7ec5d46ad7ed5666d]) so at latest commit: # using ByteBuffersDataInput to reduce memory copy in {{CompressingStoredFieldsWriter}} doing {{flush}} # using ByteBuffersDataInput to reduce memory copy in {{CompressingTermVectorsWriter}} doing {{flush}} # using ByteBuffer to *reduce memory copy* in *{{CompressingStoredFieldsWriter}} doing {{copyOneDoc}}* # replace compressor interface param from byte[] to ByteBuffersDataInput {{i do the runStoredFieldsBenchmark with jvm StatisticsHelper it shows as following:}} ||Msec to index||BEST_SPEED ||BEST_SPEED YGC ||BEST_COMPRESSION||BEST_COMPRESSION YGC|| |Baseline|317973|1176 ms (258 collections)|605492|1476 ms (264 collections)| |Candidate|314765|1012 ms (238 collections)|601253|1175 ms (234 collections)| was: Code: [https://github.com/apache/lucene/pull/987] I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable [0x00007f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy compressed data out I think we can use CompositeByteBuf to reduce temp memory copies: # we do not have to *bufferedDocs.toArrayCopy* when just need continues content for chunk compress I write a simple mini benchamrk in test code ([link |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]): *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin elapse:{*}115ms{*}, New elapse:{*}12ms{*} And I run runStoredFieldsBenchmark with doc_limit=-1: shows: ||Msec to index||BEST_SPEED ||BEST_COMPRESSION|| |Baseline|318877.00|606288.00| |Candidate|314442.00|604719.00| > Using CompositeByteBuf to Reduce Memory Copy > -------------------------------------------- > > Key: LUCENE-10627 > URL: https://issues.apache.org/jira/browse/LUCENE-10627 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/store > Reporter: LuYunCheng > Priority: Major > > Code: [https://github.com/apache/lucene/pull/987] > I see When Lucene Do flush and merge store fields, need many memory copies: > {code:java} > Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms > elapsed=68.76s tid=0x00007ee990002c50 nid=0x3aac54 runnable > [0x00007f17718db000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) > at > org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) > at > org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) > {code} > When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many > memory copies: > With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: > # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk > compress > # compressor copy dict and data into one block buffer > # do compress > # copy compressed data out > With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: > # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk > compress > # do compress > # copy compressed data out > > I think we can use -CompositeByteBuf- to reduce temp memory copies: > # we do not have to *bufferedDocs.toArrayCopy* when just need continues > content for chunk compress > > I write a simple mini benchamrk in test code ([link > |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]): > *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin > elapse:5391ms , New elapse:5297ms > *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin > elapse:{*}115ms{*}, New elapse:{*}12ms{*} > > And I run runStoredFieldsBenchmark with doc_limit=-1: > shows: > ||Msec to index||BEST_SPEED ||BEST_COMPRESSION|| > |Baseline|318877.00|606288.00| > |Candidate|314442.00|604719.00| > > -----------UPDATE----------- > > I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can > get from ByteBuffersDataOutput.toDataInput. and it could reduce this > complexity ([PR|https://github.com/apache/lucene/pull/987]) > BUT i am not sure whether can change Compressor interface compress input > param from byte[] to ByteBuffersDataInput. If change this interface > [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35], > it increased the backport code > [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274], > however if we change the interface with ByteBuffersDataInput, we can > optimize memory copy into different compress algorithm code. > Also, i found we can do more memory copy reduce in > *{{{}CompressingStoredFieldsWriter.{}}}{{{}copyOneDoc > [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L516] > and CompressingTermVectorsWriter.flush{}}}* > > I think this commit just reduce memory copy, so we not only use one benchmark > time metric but also use jvm gc time to see the improvement. so i try to add > StatisticsHelper into > StoredFieldsBenchmark.([code|https://github.com/luyuncheng/luceneutil/commit/e77c7c7bff01bb036b1826e7ec5d46ad7ed5666d]) > so at latest commit: > # using ByteBuffersDataInput to reduce memory copy in > {{CompressingStoredFieldsWriter}} doing {{flush}} > # using ByteBuffersDataInput to reduce memory copy in > {{CompressingTermVectorsWriter}} doing {{flush}} > # using ByteBuffer to *reduce memory copy* in > *{{CompressingStoredFieldsWriter}} doing {{copyOneDoc}}* > # replace compressor interface param from byte[] to ByteBuffersDataInput > > {{i do the runStoredFieldsBenchmark with jvm StatisticsHelper it shows as > following:}} > ||Msec to index||BEST_SPEED ||BEST_SPEED YGC > ||BEST_COMPRESSION||BEST_COMPRESSION YGC|| > |Baseline|317973|1176 ms (258 collections)|605492|1476 ms (264 collections)| > |Candidate|314765|1012 ms (238 collections)|601253|1175 ms (234 collections)| -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org