cshuo commented on issue #18479: URL: https://github.com/apache/hudi/issues/18479#issuecomment-4204599865
I think that's mainly because the limitation of flink `BinaryRowData` itself, see statement here: ``` Fixed-length part will certainly fall into a MemorySegment, which will speed up the read and write of field. During the write phase, if the target memory segment has less space than fixed length part size, we will skip the space. So the number of fields in a single Row cannot exceed the capacity of a single MemorySegment, if there are too many fields, we suggest that user set a bigger pageSize of MemorySegment. ``` https://github.com/apache/flink/blob/5dcf72d251798ef09a157f118e38b12fc37a579e/flink-table/flink-table-common/src/main/java/org/apache/flink/table/data/binary/BinaryRowData.java#L55-L59 For the reproducing test1, the test data has 5 fields, then the fixed length part is 48 bytes([calculation method](https://github.com/apache/flink/blob/5dcf72d251798ef09a157f118e38b12fc37a579e/flink-table/flink-table-common/src/main/java/org/apache/flink/table/data/binary/BinaryRowData.java#L132-L134)), so unexpected exception will be happen if the page size is set as 32, which is smaller than the fixed length of one record. > we use https://github.com/apache/hudi/pull/12967 in our inner branch, our record is 400kb avg size, the default write.memory.segment.page.size is 32kb. Could you also check the size of the fixed length part of the record in your production case? And make sure the root cause is the same. For the solution, I think maybe we can add a validation logic during compiling the pipeline, and throws exception with explicit error message to indicate the need for increasing the page size to accommodate the large record. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
