KnightChess opened a new issue, #18479:
URL: https://github.com/apache/hudi/issues/18479

   ### Bug Description
   
   **What happened:**
   flink: 1.16
   hudi: 0.13.1 with https://github.com/apache/hudi/pull/12967
   we use https://github.com/apache/hudi/pull/12967 in our inner branch, our 
record is 400kb avg size, the default `write.memory.segment.page.size` is 32kb. 
we found during the flush, it frequently throws the following exception, 
causing data to fail to be written normally, but if we set 
`write.memory.segment.page.size` 500kb, the exception will no longer occur.
   ```shell
   Caused by: java.lang.RuntimeException: java.lang.NegativeArraySizeException: 
-2063597517
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:72)
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
        at 
org.apache.hudi.jd.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
        at 
org.apache.hudi.jd.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
        at 
org.apache.hudi.io.storage.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:175)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:45)
        at 
org.apache.hudi.io.storage.row.LSMHoodieRowDataCreateHandle.writeRow(LSMHoodieRowDataCreateHandle.java:235)
        ... 12 more
   Caused by: java.lang.NegativeArraySizeException: -2063597517
        at 
org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:296)
        at 
org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:266)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.doWrite(ParquetRowDataWriter.java:532)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.write(ParquetRowDataWriter.java:503)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:95)
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:70)
        ... 18 more
   ```
   
   **What you expected:**
   when `write.memory.segment.page.size` is 32kb can still write data
   
   **Steps to reproduce:**
   we can't reprodut the same exception, but have a similar one.
   branch: master
   flink: 1.18
   UT: TestWriteCopyOnWrite&testInsertWithSmallBufferSize
   env1: write.memory.segment.page.size = 32
   ```shell
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException
        at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73)
        at 
org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65)
        ... 23 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.lang.IndexOutOfBoundsException
        at 
org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122)
        at 
org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240)
        at 
org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48)
        at 
org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34)
        at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67)
        ... 24 more
   Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71)
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
        at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
        at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
        at 
org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63)
        at 
org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162)
        at 
org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102)
        ... 28 more
   Caused by: java.lang.IndexOutOfBoundsException
        at 
org.apache.flink.core.memory.MemorySegment.getLong(MemorySegment.java:935)
        at 
org.apache.flink.table.data.binary.BinaryRowData.getTimestamp(BinaryRowData.java:351)
        at 
org.apache.flink.table.data.utils.JoinedRowData.getTimestamp(JoinedRowData.java:203)
        at 
org.apache.hudi.client.model.AbstractHoodieRowData.getTimestamp(AbstractHoodieRowData.java:129)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$Timestamp64Writer.write(ParquetRowDataWriter.java:305)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93)
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69)
        ... 37 more
   ```
   env2: write.memory.segment.page.size = 32, increate the 
`DATA_SET_INSERT_DUPLICATES` record size
   ```shell
   org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time 20260408122418316
   
        at 
org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:81)
        at 
org.apache.hudi.table.action.commit.FlinkUpsertCommitActionExecutor.execute(FlinkUpsertCommitActionExecutor.java:53)
        at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.upsert(HoodieFlinkCopyOnWriteTable.java:113)
        at 
org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:223)
        at 
org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$514ba0a6$2(StreamWriteFunction.java:215)
        at 
org.apache.hudi.sink.StreamWriteFunction$WriteFunction.write(StreamWriteFunction.java:516)
        at 
org.apache.hudi.sink.StreamWriteFunction.writeRecords(StreamWriteFunction.java:445)
        at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:381)
        at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:323)
        at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:184)
        at 
org.apache.hudi.sink.utils.StreamWriteFunctionWrapper.invoke(StreamWriteFunctionWrapper.java:215)
        at 
org.apache.hudi.sink.utils.TestWriteBase$TestHarness.consume(TestWriteBase.java:191)
        at 
org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertWithSmallBufferSize(TestWriteCopyOnWrite.java:540)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
   Caused by: java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: 
pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
        at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:123)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at 
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:124)
        at 
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:103)
        at 
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:98)
        at 
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:65)
        at 
org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:74)
        ... 15 more
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: 
pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
        at 
org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:69)
        at 
org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:44)
        at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
        ... 21 more
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: 
pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
        at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73)
        at 
org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65)
        ... 23 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, 
index: 1734698597, offset: 0
        at 
org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122)
        at 
org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240)
        at 
org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48)
        at 
org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34)
        at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67)
        ... 24 more
   Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 
pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71)
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
        at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
        at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
        at 
org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76)
        at 
org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63)
        at 
org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162)
        at 
org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102)
        ... 28 more
   Caused by: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 
1936089412, index: 1734698597, offset: 0
        at 
org.apache.flink.core.memory.MemorySegment.get(MemorySegment.java:467)
        at 
org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:292)
        at 
org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:255)
        at 
org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93)
        at 
org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69)
        ... 37 more
   ```
   
   
   
   
   ### Environment
   
   **Hudi version:** 0.13.1/master
   **Query engine:** (Spark/Flink/Trino etc) Flink
   **Relevant configs:** write.memory.segment.page.size
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to