neopaf opened a new issue, #15138: URL: https://github.com/apache/arrow/issues/15138
### Describe the usage question you have. Please include as many useful details as possible. As I understand https://cloudxlab.com/blog/bucketing-clustered-by-and-cluster-by/ Hive supports "CLUSTERED BY (cols)" approach, where data are split into `0000<bucket>` files. Also Hive supports "PARTITIONED BY (cols)". In Arrow we have `arrow::dataset` that can be configured to write to different partitions like [this](https://arrow.apache.org/docs/cpp/dataset.html). I can't seem to find a way to write data, splitting them into relevant bucket-files. I guess, I need to define additional [custom] partitioning. But can't seem to find relevant hash function. Maybe somebody already walked that path and can share some insights (if not code)? ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org