cshuo commented on issue #18479:
URL: https://github.com/apache/hudi/issues/18479#issuecomment-4204599865

   I think that's mainly because the limitation of flink `BinaryRowData` 
itself, see statement here:
   ```
   Fixed-length part will certainly fall into a MemorySegment, which will speed 
up the read and
   write of field. During the write phase, if the target memory segment has 
less space than fixed
   length part size, we will skip the space. So the number of fields in a 
single Row cannot exceed
   the capacity of a single MemorySegment, if there are too many fields, we 
suggest that user set a
   bigger pageSize of MemorySegment.
   ```
   
https://github.com/apache/flink/blob/5dcf72d251798ef09a157f118e38b12fc37a579e/flink-table/flink-table-common/src/main/java/org/apache/flink/table/data/binary/BinaryRowData.java#L55-L59
   
   For the reproducing test1, the test data has 5 fields, then the fixed length 
part is 48 bytes([calculation 
method](https://github.com/apache/flink/blob/5dcf72d251798ef09a157f118e38b12fc37a579e/flink-table/flink-table-common/src/main/java/org/apache/flink/table/data/binary/BinaryRowData.java#L132-L134)),
 so unexpected exception will be happen if the page size is set as 32, which is 
smaller than the fixed length of one record. 
   
   > we use https://github.com/apache/hudi/pull/12967 in our inner branch, our 
record is 400kb avg size, the default write.memory.segment.page.size is 32kb.
   
   Could you also check the size of the fixed length part of the record in your 
production case? And make sure the root cause is the same. 
   
   For the solution, I think maybe we can add a validation logic during 
compiling the pipeline, and throws exception with explicit error message  to 
indicate the need for increasing the page size to accommodate the large record.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to