Re: spark partitionBy with partitioned column in json output

Jay Mon, 04 Jun 2018 19:45:02 -0700

The partitionBy clause is used to create hive folders so that you can point
a hive partitioned table on the data .


What are you using the partitionBy for ? What is the use case ?

On Mon 4 Jun, 2018, 4:59 PM purna pradeep, <purna2prad...@gmail.com> wrote:

> im reading below json in spark
>
>     {"bucket": "B01", "actionType": "A1", "preaction": "NULL",
> "postaction": "NULL"}
>     {"bucket": "B02", "actionType": "A2", "preaction": "NULL",
> "postaction": "NULL"}
>     {"bucket": "B03", "actionType": "A3", "preaction": "NULL",
> "postaction": "NULL"}
>
>     val df=spark.read.json("actions.json").toDF()
>
> Now im writing the same to a json output as below
>
>     df.write. format("json"). mode("append").
> partitionBy("bucket","actionType"). save("output.json")
>
>
> and the output.json is as below
>
>     {"preaction":"NULL","postaction":"NULL"}
>
> bucket,actionType columns are missing in the json output, i need
> partitionby columns as well in the output
>
>

Re: spark partitionBy with partitioned column in json output

Reply via email to