[ https://issues.apache.org/jira/browse/ARROW-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Keane updated ARROW-12373: ----------------------------------- Description: from https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_base.cc#L377: {{if (groups.batches.size() > static_cast<size_t>(state.write_options.max_partitions)) {}}} We cast {{state.write_options.max_partitions}} to an unsigned integer without checking it. So if a negative number is supplied, there is no error. We should either: * check for negatives (and 0?) and error appropriately * document that negative numbers will result in some large positive number of partitions being the max. This came up in working on ARROW-12315, and from that branch, one would run the following to see the behavior: ``` library(dplyr) library(arrow) dir.create("mydir") mtcars %>% group_by(cyl) %>% write_dataset("mydir", format = "parquet", max_partitions = -1) ``` was: this shouldn't happen, please note the *-*3 ``` library(dplyr) library(arrow) library(testthat) try(dir.create("mydir")) expect_error( mtcars %>% group_by(cyl) %>% write_dataset("mydir", format = "parquet", max_partitions = -3) ) Error: ``%>%`(...)` did not throw an error. ``` > [C++] max_partitions < 0 is accepted with no error > -------------------------------------------------- > > Key: ARROW-12373 > URL: https://issues.apache.org/jira/browse/ARROW-12373 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 3.0.0 > Reporter: Mauricio 'PachĂĄ' Vargas SepĂșlveda > Priority: Major > Labels: bug > > from > https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_base.cc#L377: > {{if (groups.batches.size() > > static_cast<size_t>(state.write_options.max_partitions)) {}}} > We cast {{state.write_options.max_partitions}} to an unsigned integer without > checking it. So if a negative number is supplied, there is no error. > We should either: > * check for negatives (and 0?) and error appropriately > * document that negative numbers will result in some large positive number > of partitions being the max. > This came up in working on ARROW-12315, and from that branch, one would run > the following to see the behavior: > ``` > library(dplyr) > library(arrow) > dir.create("mydir") > mtcars %>% > group_by(cyl) %>% > write_dataset("mydir", format = "parquet", max_partitions = -1) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)