Yeah, was just discussing this with a co-worker and came to the same
conclusion -- need to essentially create a copy of the partition column.
Thanks.
Hacky, but it works. Seems counter-intuitive that Spark would remove the
column from the output... should at least give you an option to keep it.
is this helps?
sc.parallelize(List((1,10),(2,
20))).toDF("foo","bar").map(("foo","bar")=>("foo",("foo","bar"))).
partitionBy("foo").json("json-out")
On Mon, Feb 26, 2018 at 4:28 PM, Alex Nastetsky
wrote:
> Is there a way to make outputs created with "partitionBy" to