Re: spark partitionBy with partitioned column in json output

2018-06-05 Thread Elior Malul
Had the same issue my self. I was surprised at first as well, but I found it useful - the amount of data saved for each partition has decreased. When I load the data from each partition, I add the partitioned columns with lit function before I merge the frames from the different partitions. On

Re: spark partitionBy with partitioned column in json output

2018-06-04 Thread Jay
The partitionBy clause is used to create hive folders so that you can point a hive partitioned table on the data . What are you using the partitionBy for ? What is the use case ? On Mon 4 Jun, 2018, 4:59 PM purna pradeep, wrote: > im reading below json in spark > > {"bucket": "B01",

Re: spark partitionBy with partitioned column in json output

2018-06-04 Thread Lalwani, Jayesh
Purna, This behavior is by design. If you provide partitionBy, Spark removes the columns from the data From: purna pradeep Date: Monday, June 4, 2018 at 8:00 PM To: "user@spark.apache.org" Subject: spark partitionBy with partitioned column in json output im reading below jso

spark partitionBy with partitioned column in json output

2018-06-04 Thread purna pradeep
im reading below json in spark {"bucket": "B01", "actionType": "A1", "preaction": "NULL", "postaction": "NULL"} {"bucket": "B02", "actionType": "A2", "preaction": "NULL", "postaction": "NULL"} {"bucket": "B03", "actionType": "A3", "preaction": "NULL", "postaction": "NULL"} val