Re: [I] Append method [in memory vs disk size] [iceberg-python]

via GitHub Wed, 21 May 2025 05:24:59 -0700


ifxchris commented on issue #1994:
URL: 
https://github.com/apache/iceberg-python/issues/1994#issuecomment-2897769050


   Hi all,
   
   we are experiencing the same issue but a bit more severe:
   
   ~~~
       avg_row_size_bytes = tbl.nbytes / tbl.num_rows
       target_rows_per_file = target_file_size // avg_row_size_bytes
       batches = tbl.to_batches(max_chunksize=target_rows_per_file)
   ~~~
   
   Our data is loaded from a parquet file with the following metadata:
   `Row group 0:  count: 1  4.163 MB records  start: 4  total(compressed): 
4.163 MB total(uncompressed):40.739 MB`
   So we only have one record in the table.
   
   According to tbl.nbytes this is around 600MB in memory.
   Since this one record is bigger than 512Mb in memory, `target_rows_per_file` 
is calculated to be zero.
   As a result the `max_chunksize` is set to 0 and pyiceberg will crash due to 
that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Append method [in memory vs disk size] [iceberg-python]

Reply via email to