[GitHub] [arrow-datafusion] Jefffrey commented on issue #5383: The output of write_csv and write_json methods is confusing.

via GitHub Mon, 27 Feb 2023 01:56:32 -0800


Jefffrey commented on issue #5383:
URL: 
https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1446029283


   > When it creates multiple files, is it creating one per "partition"? Or one 
per worker?
   
   Seems to be 1 per partition. In my example above can bump the partitioning 
count to 20 partitions which will output 20 files.
   
   You have good points regarding Spark, especially about how its partitioning 
behaviour can be painful for smaller datasets.
   
   Yeah it would be good to expose options for writing CSVs (and JSON 
potentially as well) as the current API is quite limiting in how it doesn't 
allow this configuration. Having an API like you suggested could be a good 
starting point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Jefffrey commented on issue #5383: The output of write_csv and write_json methods is confusing.

Reply via email to