[ https://issues.apache.org/jira/browse/ARROW-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Szucs resolved ARROW-16240. ------------------------------------- Resolution: Fixed Issue resolved by pull request 12955 [https://github.com/apache/arrow/pull/12955] > [Python] Support row_group_size/chunk_size keyword in pq.write_to_dataset > with use_legacy_dataset=False > ------------------------------------------------------------------------------------------------------- > > Key: ARROW-16240 > URL: https://issues.apache.org/jira/browse/ARROW-16240 > Project: Apache Arrow > Issue Type: Sub-task > Components: Python > Reporter: Alenka Frim > Assignee: Alenka Frim > Priority: Major > Labels: pull-request-available > Fix For: 8.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The {{pq.write_to_dataset}} (legacy implementation) supports the > {{row_group_size}}/{{chunk_size}} keyword to specify the row group size of > the written parquet files. > Now that we made {{use_legacy_dataset=False}} the default, this keyword > doesn't work anymore. > This is because {{dataset.write_dataset(..)}} doesn't support the parquet > {{row_group_size}} keyword. The {{ParquetFileWriteOptions}} class doesn't > support this keyword. > On the parquet side, this is also the only keyword that is not passed to the > {{ParquetWriter}} init (and thus to parquet's {{WriterProperties}} or > {{ArrowWriterProperties}}), but to the actual {{write_table}} call. In C++ > this can be seen at > https://github.com/apache/arrow/blob/76d064c729f5e2287bf2a2d5e02d1fb192ae5738/cpp/src/parquet/arrow/writer.h#L62-L71 > See discussion: > [https://github.com/apache/arrow/pull/12811#discussion_r845304218] -- This message was sent by Atlassian Jira (v8.20.7#820007)