[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json
[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20470: Component/s: SQL > Invalid json converting RDD row with Array of struct to json > > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.6.3 >Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > {code} > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > {code} > Df schema looks good > {code} > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- start: double (nullable = true) > |||-- width: double (nullable = true) > |||-- y: double (nullable = true) > {code} > Need to convert each row to json now and save to HBase > {code} > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > {code} > Output JSON (Wrong) > {code} > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], > [ > 2.2925065264270024, > 814.0, > 0.15644850561960366 > ], > [ > 2.448955032046606, > 690.0, > 0.1564485056196041 > ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json
[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-20470: - Description: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: {code} { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } {code} Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase {code} rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( {code} Output JSON (Wrong) {code} { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } {code} was: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: {code} { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } {code} Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) {code} { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } {code} > Invalid json converting RDD row with Array of struct to json > > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.3 >Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > {code} > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > {code} > Df schema looks good > {code} > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- start: double (nullable = true) > |||-- width: double (nullable = true) > |||-- y: double (nullable = true) > {code} > Need to convert each row to json now and save to HBase > {code} > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > {code} > Output JSON (Wrong) > {code} > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.13605802080739
[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json
[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-20470: - Description: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: {code} { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } {code} Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) {code} { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } {code} was: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } > Invalid json converting RDD row with Array of struct to json > > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.3 >Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > {code} > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > {code} > Df schema looks good > {code} > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- start: double (nullable = true) > |||-- width: double (nullable = true) > |||-- y: double (nullable = true) > {code} > Need to convert each row to json now and save to HBase > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > Output JSON (Wrong) > {code} > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], >
[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json
[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-20470: - Description: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } was: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } Df schema looks good root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } > Invalid json converting RDD row with Array of struct to json > > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.3 >Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > Df schema looks good > {code} > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- start: double (nullable = true) > |||-- width: double (nullable = true) > |||-- y: double (nullable = true) > {code} > Need to convert each row to json now and save to HBase > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > Output JSON (Wrong) > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], > [ > 2.2925065264270024, > 814.0, > 0.156448505
[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json
[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-20470: - Description: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } Df schema looks good root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) ||-- element: struct (containsNull = true) |||-- start: double (nullable = true) |||-- width: double (nullable = true) |||-- y: double (nullable = true) Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } was: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } Df schema looks good # root # |-- feature: string (nullable = true) # |-- histogram: array (nullable = true) # ||-- element: struct (containsNull = true) # |||-- start: double (nullable = true) # |||-- width: double (nullable = true) # |||-- y: double (nullable = true) Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } > Invalid json converting RDD row with Array of struct to json > > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.3 >Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > Df schema looks good > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- start: double (nullable = true) > |||-- width: double (nullable = true) > |||-- y: double (nullable = true) > Need to convert each row to json now and save to HBase > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > Output JSON (Wrong) > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], > [ > 2.2925065264270024, > 814.0, > 0.15644850561960366 > ], >
[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json
[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-20470: - Description: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } Df schema looks good # root # |-- feature: string (nullable = true) # |-- histogram: array (nullable = true) # ||-- element: struct (containsNull = true) # |||-- start: double (nullable = true) # |||-- width: double (nullable = true) # |||-- y: double (nullable = true) Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( Output JSON (Wrong) { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } was: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > Invalid json converting RDD row with Array of struct to json > > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.3 >Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > Df schema looks good > # root > # |-- feature: string (nullable = true) > # |-- histogram: array (nullable = true) > # ||-- element: struct (containsNull = true) > # |||-- start: double (nullable = true) > # |||-- width: double (nullable = true) > # |||-- y: double (nullable = true) > Need to convert each row to json now and save to HBase > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict( > Output JSON (Wrong) > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], > [ > 2.2925065264270024, > 814.0, > 0.15644850561960366 > ], > [ > 2.448955032046606, > 690.0, > 0.1564485056196041 > ] > } -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org