Jefffrey commented on issue #5383: URL: https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1446029283
> When it creates multiple files, is it creating one per "partition"? Or one per worker? Seems to be 1 per partition. In my example above can bump the partitioning count to 20 partitions which will output 20 files. You have good points regarding Spark, especially about how its partitioning behaviour can be painful for smaller datasets. Yeah it would be good to expose options for writing CSVs (and JSON potentially as well) as the current API is quite limiting in how it doesn't allow this configuration. Having an API like you suggested could be a good starting point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org