pitrou commented on a change in pull request #9130: URL: https://github.com/apache/arrow/pull/9130#discussion_r555279313
########## File path: cpp/src/arrow/dataset/partition.cc ########## @@ -562,6 +569,8 @@ inline Result<std::shared_ptr<Array>> CountsToOffsets( // since no Writers accept a selection vector. class StructDictionary { public: + static constexpr int32_t kMaxGroups = std::numeric_limits<int16_t>::max(); Review comment: Then it's definitely worth having a reasonably small configurable limit (such as 100). I suspect it's easy to end up with Arrow creating a million files if you do a mistake in choosing your partition columns. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org