Jefffrey commented on issue #5383:
URL: 
https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1446029283

   > When it creates multiple files, is it creating one per "partition"? Or one 
per worker?
   
   Seems to be 1 per partition. In my example above can bump the partitioning 
count to 20 partitions which will output 20 files.
   
   You have good points regarding Spark, especially about how its partitioning 
behaviour can be painful for smaller datasets.
   
   Yeah it would be good to expose options for writing CSVs (and JSON 
potentially as well) as the current API is quite limiting in how it doesn't 
allow this configuration. Having an API like you suggested could be a good 
starting point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to