Re: Efficient saveAsTextFile by key, directory for each key?

2015-04-22 Thread Arun Luthra
I ended up post-processing the result in hive with a dynamic partition insert query to get a table partitioned by the key. Looking further, it seems that 'dynamic partition' insert is in Spark SQL and working well in Spark SQL versions 1.2.0: https://issues.apache.org/jira/browse/SPARK-3007 On

Efficient saveAsTextFile by key, directory for each key?

2015-04-21 Thread Arun Luthra
Is there an efficient way to save an RDD with saveAsTextFile in such a way that the data gets shuffled into separated directories according to a key? (My end goal is to wrap the result in a multi-partitioned Hive table) Suppose you have: case class MyData(val0: Int, val1: string, directory_name: