@Reynold Xin: not really: it only works for Parquet (see partitionBy:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter),
it requires you to have a DataFrame in the first place (for my use case the
spark sql interface to avro records is more of a
A workaround would be to have multiple passes on the RDD and each pass write
its own output?
Or in a foreachPartition do it in a single pass (open up multiple files per
partition to write out)?
-Abhishek-
On Aug 14, 2015, at 7:56 AM, Silas Davis si...@silasdavis.net wrote:
Would it be right
This is already supported with the new partitioned data sources in
DataFrame/SQL right?
On Fri, Aug 14, 2015 at 8:04 AM, Alex Angelini alex.angel...@shopify.com
wrote:
Speaking about Shopify's deployment, this would be a really nice to have
feature.
We would like to write data to folders
See: https://issues.apache.org/jira/browse/SPARK-3533
Feel free to comment there and make a case if you think the issue should be
reopened.
Nick
On Fri, Aug 14, 2015 at 11:11 AM Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
A workaround would be to have multiple passes on the RDD
*tl;dr hadoop and cascading* *provide ways of writing tuples to multiple
output files based on key, but the plain RDD interface doesn't seem to and
it should.*
I have been looking into ways to write to multiple outputs in Spark. It
seems like a feature that is somewhat missing from Spark.
The