[ https://issues.apache.org/jira/browse/ARROW-17583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598471#comment-17598471 ]
Antoine Pitrou commented on ARROW-17583: ---------------------------------------- Your diagnosis seems right. Would you want to submit a PR? > [Python] File write visitor throws exception on large parquet file > ------------------------------------------------------------------ > > Key: ARROW-17583 > URL: https://issues.apache.org/jira/browse/ARROW-17583 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 9.0.0 > Reporter: Joost Hoozemans > Priority: Minor > > When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws > an exception: > Traceback (most recent call last): > File "pyarrow/_dataset_parquet.pyx", line 165, in > pyarrow._dataset_parquet.ParquetFileFormat._finish_write > File "pyarrow/{_}dataset.pyx", line 2695, in > pyarrow._dataset.WrittenFile.{_}{_}init{_}_ > OverflowError: value too large to convert to int > Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor' > The file is written succesfully though. It seems related to this issue > https://issues.apache.org/jira/browse/ARROW-16761. > I would guess the problem is the python field is an int while the C++ code > returns an int64_t > [https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164] > -- This message was sent by Atlassian Jira (v8.20.10#820010)