[ https://issues.apache.org/jira/browse/ARROW-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Molina updated ARROW-12315: -------------------------------------- Fix Version/s: (was: 5.0.0) 6.0.0 > [R] add max_partitions argument to write_dataset() > -------------------------------------------------- > > Key: ARROW-12315 > URL: https://issues.apache.org/jira/browse/ARROW-12315 > Project: Apache Arrow > Issue Type: New Feature > Components: R > Affects Versions: 3.0.0 > Reporter: Mauricio 'Pachá' Vargas Sepúlveda > Assignee: Mauricio 'Pachá' Vargas Sepúlveda > Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > the Python docs show that we can pass, say, 1025 partitions > https://arrow.apache.org/docs/_modules/pyarrow/dataset.html > but in R this argument doesn't exist, it would be good to add this for arrow > v4.0.0 > this is useful, for example, with intl trade datasets: > {code:java} > # d = UN COMTRADE - World's bilateral flows 2019 > # 13,050,535 x 22 data.frame > d %>% > group_by(Year, `Reporter ISO`, `Partner ISO`) %>% > write_dataset("parquet", hive_style = F) > Error: Invalid: Fragment would be written into 12808 partitions. This exceeds > the maximum of 1024 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)