qq726658006 opened a new issue, #18275:
URL: https://github.com/apache/hudi/issues/18275
### Bug Description
**What happened:**
When we imported the data and enabled asynchronous compaction at the same
time, we encountered some errors, such as memory overflow, which caused the
compaction to fail. After the failure, when we wrote again with the record id
related to that compaction, a rowkind error occurred. It changed from the
expected update to an insert. I suspect that the reason was that the
corresponding records in the base file were rolled back and deleted, resulting
in the identification as an insert. I would like to confirm if my guess is
correct. Are there any corresponding solutions or configurations that can avoid
this situation?
Thanks!
compaction error:
1.
`org.apache.flink.table.data.columnar.vector.heap.HeapBytesVector.reserve(HeapBytesVector.java:107)
at
org.apache.flink.table.data.columnar.vector.heap.HeapBytesVector.appendBytes(HeapBytesVector.java:79)
at
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBinary(BytesColumnReader.java:88)
at
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatch(BytesColumnReader.java:50)
at
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatch(BytesColumnReader.java:31)
at
org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:189)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextBatch(ParquetColumnarRowSplitReader.java:325)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.ensureBatch(ParquetColumnarRowSplitReader.java:296)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.reachedEnd(Parque
tColumnarRowSplitReader.java:277) at
org.apache.hudi.table.format.ParquetSplitRecordIterator.hasNext(ParquetSplitRecordIterator.java:42)
at
org.apache.hudi.common.table.read.buffer.KeyBasedFileGroupRecordBuffer.doHasNext(KeyBasedFileGroupRecordBuffer.java:147)
at
org.apache.hudi.common.table.read.buffer.FileGroupRecordBuffer.hasNext(FileGroupRecordBuffer.java:153)
at
org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:246)
at
org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:333)
at
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at
org.apache.hudi.io.FileGroupReaderBasedMergeHandle.doMerge(FileGroupReaderBasedMergeHandle.java:271)
at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:164)
at
org.apache.hudi.table.action.compact.HoodieFlinkMergeOnReadTableCompactor.compact(HoodieFlinkMergeOnReadTableCompa
ctor.java:78) at
org.apache.hudi.sink.compact.CompactOperator.doCompaction(CompactOperator.java:192)
at
org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:168)
at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:131)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)Caused by:
java.lang.OutOfMemoryError: Java heap space at
org.apache.flink.table.data.columnar.vector.heap.HeapBytesVector.reserve(HeapBytesVector.java:102)
at
org.apache.flink.table.data.columnar.vector.heap.HeapBytesVector.appendBytes(HeapBytesVector.java:79)
at
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBinary(BytesColumnReader.java:88)
at
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatch(BytesColumnReader.ja
va:50) at
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatch(BytesColumnReader.java:31)
at
org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:189)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextBatch(ParquetColumnarRowSplitReader.java:325)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.ensureBatch(ParquetColumnarRowSplitReader.java:296)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.reachedEnd(ParquetColumnarRowSplitReader.java:277)
at
org.apache.hudi.table.format.ParquetSplitRecordIterator.hasNext(ParquetSplitRecordIterator.java:42)
at
org.apache.hudi.common.table.read.buffer.KeyBasedFileGroupRecordBuffer.doHasNext(KeyBasedFileGroupRecordBuffer.java:147)
at
org.apache.hudi.common.table.read.buffer.FileGroupRecordBuffer.hasNext(FileGroupRecordBuffer.java:153)
at org.apache.hudi.common.table.read.HoodieF
ileGroupReader.hasNext(HoodieFileGroupReader.java:246) at
org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:333)
at
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at
org.apache.hudi.io.FileGroupReaderBasedMergeHandle.doMerge(FileGroupReaderBasedMergeHandle.java:271)
at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:164)
at
org.apache.hudi.table.action.compact.HoodieFlinkMergeOnReadTableCompactor.compact(HoodieFlinkMergeOnReadTableCompactor.java:78)
at
org.apache.hudi.sink.compact.CompactOperator.doCompaction(CompactOperator.java:192)
at
org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:168)
at
org.apache.hudi.sink.compact.CompactOperator$$Lambda$4645/0x00000008018cf440.run(Unknown
Source) at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:131)
at org.apache.hu
di.sink.utils.NonThrownExecutor$$Lambda$4657/0x00000008018d3440.run(Unknown
Source) ... 3 more
`
2.
`org.apache.hudi.exception.HoodieIOException: Decides whether the parquet
columnar row split reader reached end exception
at
org.apache.hudi.table.format.ParquetSplitRecordIterator.hasNext(ParquetSplitRecordIterator.java:44)
at
org.apache.hudi.common.table.read.buffer.KeyBasedFileGroupRecordBuffer.doHasNext(KeyBasedFileGroupRecordBuffer.java:147)
at
org.apache.hudi.common.table.read.buffer.FileGroupRecordBuffer.hasNext(FileGroupRecordBuffer.java:153)
at
org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:246)
at
org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:333)
at
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at
org.apache.hudi.io.FileGroupReaderBasedMergeHandle.doMerge(FileGroupReaderBasedMergeHandle.java:271)
at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:164)
at
org.apache.hudi.table.action.compact.HoodieFlinkMergeOnReadTableCompactor.compact(HoodieFlinkMergeOnReadTableCompactor.java:78)
at
org.apache.hudi.sink.compact.CompactOperator.doCompaction(CompactOperator.java:192)
at
org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:168)
at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:131)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.InterruptedIOException: read on
s3a://3363/UsCases/5/00000058-0434-488b-8b23-5a7a8efeca69_30-32-0_20260303034224078.parquet:
com.amazonaws.AbortedException:
at
org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:394)
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:200)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:554)
at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)
at
org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
at
org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
at
org.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)
at
org.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)
at
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.readNextRowGroup(ParquetColumnarRowSplitReader.java:334)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextBatch(ParquetColumnarRowSplitReader.java:319)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.ensureBatch(ParquetColumnarRowSplitReader.java:296)
at
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.reachedEnd(ParquetColumnarRowSplitReader.java:277)
at
org.apache.hudi.table.format.ParquetSplitRecordIterator.hasNext(ParquetSplitRecordIterator.java:42)
... 14 more
Caused by: com.amazonaws.AbortedException:
at
com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:61)
at
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:89)
at
org.apache.hadoop.fs.s3a.S3AInputStream.lambda$read$3(S3AInputStream.java:564)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
... 31 more`
**What you expected:**
after compaction error, the rowkind of record remain correct
### Environment
**Hudi version:1.1.1
**Query engine: Flink (Spark/Flink/Trino etc)
**Relevant configs:**
### Logs and Stack Trace
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]