I think you could have a Python UDF to turn the properties into JSON string:

import simplejson
def to_json(row):
     return simplejson.dumps(row.asDict(recursive=Trye))

to_json_udf = pyspark.sql.funcitons.udf(to_json)

df.select("col_1", "col_2",
to_json_udf(df.properties)).write.format("com.databricks.spark.csv").save()


On Tue, Nov 24, 2015 at 7:36 AM,  <ross.cramb...@thomsonreuters.com> wrote:
> I am generating a set of tables in pyspark SQL from a JSON source dataset. I 
> am writing those tables to disk as CSVs using 
> df.write.format(com.databricks.spark.csv).save(…). I have a schema like:
>
> root
>  |-- col_1: string (nullable = true)
>  |-- col_2: string (nullable = true)
>  |-- col_3: timestamp (nullable = true)
> ...
>  |-- properties: struct (nullable = true)
>  |    |-- prop_1: string (nullable = true)
>  |    |-- prop_2: string (nullable = true)
>  |    |-- prop3: string (nullable = true)
> …
>
> Currently I am dropping the properties section when I write to CSV, but I 
> would like to write it as a JSON column. How can I go about this? My final 
> result would be a CSV with col_1, col_2, col_3 as usual but the ‘properties’ 
> column would contain formatted JSON objects.
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to