KnightChess opened a new issue, #18479: URL: https://github.com/apache/hudi/issues/18479
### Bug Description **What happened:** flink: 1.16 hudi: 0.13.1 with https://github.com/apache/hudi/pull/12967 we use https://github.com/apache/hudi/pull/12967 in our inner branch, our record is 400kb avg size, the default `write.memory.segment.page.size` is 32kb. we found during the flush, it frequently throws the following exception, causing data to fail to be written normally, but if we set `write.memory.segment.page.size` 500kb, the exception will no longer occur. ```shell Caused by: java.lang.RuntimeException: java.lang.NegativeArraySizeException: -2063597517 at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:72) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37) at org.apache.hudi.jd.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.hudi.jd.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.hudi.io.storage.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:175) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:45) at org.apache.hudi.io.storage.row.LSMHoodieRowDataCreateHandle.writeRow(LSMHoodieRowDataCreateHandle.java:235) ... 12 more Caused by: java.lang.NegativeArraySizeException: -2063597517 at org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:296) at org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:266) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.doWrite(ParquetRowDataWriter.java:532) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.write(ParquetRowDataWriter.java:503) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:95) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:70) ... 18 more ``` **What you expected:** when `write.memory.segment.page.size` is 32kb can still write data **Steps to reproduce:** we can't reprodut the same exception, but have a similar one. branch: master flink: 1.18 UT: TestWriteCopyOnWrite&testInsertWithSmallBufferSize env1: write.memory.segment.page.size = 32 ```shell Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73) at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65) ... 23 more Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122) at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240) at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48) at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34) at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67) ... 24 more Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76) at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63) at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162) at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102) ... 28 more Caused by: java.lang.IndexOutOfBoundsException at org.apache.flink.core.memory.MemorySegment.getLong(MemorySegment.java:935) at org.apache.flink.table.data.binary.BinaryRowData.getTimestamp(BinaryRowData.java:351) at org.apache.flink.table.data.utils.JoinedRowData.getTimestamp(JoinedRowData.java:203) at org.apache.hudi.client.model.AbstractHoodieRowData.getTimestamp(AbstractHoodieRowData.java:129) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$Timestamp64Writer.write(ParquetRowDataWriter.java:305) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69) ... 37 more ``` env2: write.memory.segment.page.size = 32, increate the `DATA_SET_INSERT_DUPLICATES` record size ```shell org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20260408122418316 at org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:81) at org.apache.hudi.table.action.commit.FlinkUpsertCommitActionExecutor.execute(FlinkUpsertCommitActionExecutor.java:53) at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.upsert(HoodieFlinkCopyOnWriteTable.java:113) at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:223) at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$514ba0a6$2(StreamWriteFunction.java:215) at org.apache.hudi.sink.StreamWriteFunction$WriteFunction.write(StreamWriteFunction.java:516) at org.apache.hudi.sink.StreamWriteFunction.writeRecords(StreamWriteFunction.java:445) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:381) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:323) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:184) at org.apache.hudi.sink.utils.StreamWriteFunctionWrapper.invoke(StreamWriteFunctionWrapper.java:215) at org.apache.hudi.sink.utils.TestWriteBase$TestHarness.consume(TestWriteBase.java:191) at org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertWithSmallBufferSize(TestWriteCopyOnWrite.java:540) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0 at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:123) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:124) at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:103) at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:98) at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:65) at org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:74) ... 15 more Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0 at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:69) at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:44) at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121) ... 21 more Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0 at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73) at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65) ... 23 more Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0 at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122) at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240) at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48) at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34) at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67) ... 24 more Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0 at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76) at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63) at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162) at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102) ... 28 more Caused by: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0 at org.apache.flink.core.memory.MemorySegment.get(MemorySegment.java:467) at org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:292) at org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:255) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69) ... 37 more ``` ### Environment **Hudi version:** 0.13.1/master **Query engine:** (Spark/Flink/Trino etc) Flink **Relevant configs:** write.memory.segment.page.size ### Logs and Stack Trace _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
