Re: partitionBy with partitioned column in output?

2018-02-26 Thread Alex Nastetsky
Yeah, was just discussing this with a co-worker and came to the same conclusion -- need to essentially create a copy of the partition column. Thanks. Hacky, but it works. Seems counter-intuitive that Spark would remove the column from the output... should at least give you an option to keep it.

Re: partitionBy with partitioned column in output?

2018-02-26 Thread naresh Goud
is this helps? sc.parallelize(List((1,10),(2, 20))).toDF("foo","bar").map(("foo","bar")=>("foo",("foo","bar"))). partitionBy("foo").json("json-out") On Mon, Feb 26, 2018 at 4:28 PM, Alex Nastetsky wrote: > Is there a way to make outputs created with "partitionBy" to

partitionBy with partitioned column in output?

2018-02-26 Thread Alex Nastetsky
Is there a way to make outputs created with "partitionBy" to contain the partitioned column? When reading the output with Spark or Hive or similar, it's less of an issue because those tools know how to perform partition discovery. But if I were to load the output into an external data warehouse or