[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-20470. ---------------------------------- Resolution: Invalid This is because {{recursive}} should be set to {{True}} in {{asDict}}. With given data stored in {{tmp.json}}, {code} import json from pyspark.sql.types import Row df = spark.read.json("tmp.json", wholeFile=True) df.printSchema() rdd = df.rdd.map(lambda row: json.dumps(row.asDict(recursive=True), indent=2)) print rdd.collect()[0] {code} prints as below: {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- start: double (nullable = true) | | |-- width: double (nullable = true) | | |-- y: double (nullable = true) { "feature": "feature_id_001", "histogram": [ { "y": 968.0, "start": 1.9796095151877942, "width": 0.1564485056196041 }, { "y": 892.0, "start": 2.1360580208073983, "width": 0.1564485056196041 }, { "y": 814.0, "start": 2.2925065264270024, "width": 0.15644850561960366 }, { "y": 690.0, "start": 2.448955032046606, "width": 0.1564485056196041 } ] } {code} > Invalid json converting RDD row with Array of struct to json > ------------------------------------------------------------ > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.6.3 > Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > {code} > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > {code} > Df schema looks good > {code} > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- start: double (nullable = true) > | | |-- width: double (nullable = true) > | | |-- y: double (nullable = true) > {code} > Need to convert each row to json now and save to HBase > {code} > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict()))) > {code} > Output JSON (Wrong) > {code} > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], > [ > 2.2925065264270024, > 814.0, > 0.15644850561960366 > ], > [ > 2.448955032046606, > 690.0, > 0.1564485056196041 > ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org