[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json

2017-04-26 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20470:

Component/s: SQL

> Invalid json converting RDD row with Array of struct to json
> 
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.3
>Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't 
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {code}
> {
>   "feature": "feature_id_001",
>   "histogram": [
> {
>   "start": 1.9796095151877942,
>   "y": 968.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.1360580208073983,
>   "y": 892.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.2925065264270024,
>   "y": 814.0,
>   "width": 0.15644850561960366
> },
> {
>   "start": 2.448955032046606,
>   "y": 690.0,
>   "width": 0.1564485056196041
> }]
> }
> {code}
> Df schema looks good 
> {code}
>  root
>   |-- feature: string (nullable = true)
>   |-- histogram: array (nullable = true)
>   ||-- element: struct (containsNull = true)
>   |||-- start: double (nullable = true)
>   |||-- width: double (nullable = true)
>   |||-- y: double (nullable = true)
> {code}
> Need to convert each row to json now and save to HBase 
> {code}
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
> {code}
> Output JSON (Wrong)
> {code}
> {
>   "feature": "feature_id_001",
>   "histogram": [
> [
>   1.9796095151877942,
>   968.0,
>   0.1564485056196041
> ],
> [
>   2.1360580208073983,
>   892.0,
>   0.1564485056196041
> ],
> [
>   2.2925065264270024,
>   814.0,
>   0.15644850561960366
> ],
> [
>   2.448955032046606,
>   690.0,
>   0.1564485056196041
> ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json

2017-04-26 Thread Philip Adetiloye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-20470:
-
Description: 
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{code}
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}
{code}

Df schema looks good 

{code}
 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)
{code}

Need to convert each row to json now and save to HBase 
{code}
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
{code}

Output JSON (Wrong)
{code}
{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}
{code}


  was:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{code}
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}
{code}

Df schema looks good 

{code}
 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)
{code}
Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)
{code}
{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}
{code}



> Invalid json converting RDD row with Array of struct to json
> 
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.3
>Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't 
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {code}
> {
>   "feature": "feature_id_001",
>   "histogram": [
> {
>   "start": 1.9796095151877942,
>   "y": 968.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.1360580208073983,
>   "y": 892.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.2925065264270024,
>   "y": 814.0,
>   "width": 0.15644850561960366
> },
> {
>   "start": 2.448955032046606,
>   "y": 690.0,
>   "width": 0.1564485056196041
> }]
> }
> {code}
> Df schema looks good 
> {code}
>  root
>   |-- feature: string (nullable = true)
>   |-- histogram: array (nullable = true)
>   ||-- element: struct (containsNull = true)
>   |||-- start: double (nullable = true)
>   |||-- width: double (nullable = true)
>   |||-- y: double (nullable = true)
> {code}
> Need to convert each row to json now and save to HBase 
> {code}
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
> {code}
> Output JSON (Wrong)
> {code}
> {
>   "feature": "feature_id_001",
>   "histogram": [
> [
>   1.9796095151877942,
>   968.0,
>   0.1564485056196041
> ],
> [
>   2.13605802080739

[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json

2017-04-26 Thread Philip Adetiloye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-20470:
-
Description: 
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{code}
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}
{code}

Df schema looks good 

{code}
 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)
{code}
Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)
{code}
{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}
{code}


  was:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}

Df schema looks good 

{code}
 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)
{code}
Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)

{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}




> Invalid json converting RDD row with Array of struct to json
> 
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.3
>Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't 
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {code}
> {
>   "feature": "feature_id_001",
>   "histogram": [
> {
>   "start": 1.9796095151877942,
>   "y": 968.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.1360580208073983,
>   "y": 892.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.2925065264270024,
>   "y": 814.0,
>   "width": 0.15644850561960366
> },
> {
>   "start": 2.448955032046606,
>   "y": 690.0,
>   "width": 0.1564485056196041
> }]
> }
> {code}
> Df schema looks good 
> {code}
>  root
>   |-- feature: string (nullable = true)
>   |-- histogram: array (nullable = true)
>   ||-- element: struct (containsNull = true)
>   |||-- start: double (nullable = true)
>   |||-- width: double (nullable = true)
>   |||-- y: double (nullable = true)
> {code}
> Need to convert each row to json now and save to HBase 
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
> Output JSON (Wrong)
> {code}
> {
>   "feature": "feature_id_001",
>   "histogram": [
> [
>   1.9796095151877942,
>   968.0,
>   0.1564485056196041
> ],
> [
>   2.1360580208073983,
>   892.0,
>   0.1564485056196041
> ],
>   

[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json

2017-04-26 Thread Philip Adetiloye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-20470:
-
Description: 
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}

Df schema looks good 

{code}
 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)
{code}
Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)

{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}



  was:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}

Df schema looks good 

 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)

Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)

{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}




> Invalid json converting RDD row with Array of struct to json
> 
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.3
>Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't 
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {
>   "feature": "feature_id_001",
>   "histogram": [
> {
>   "start": 1.9796095151877942,
>   "y": 968.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.1360580208073983,
>   "y": 892.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.2925065264270024,
>   "y": 814.0,
>   "width": 0.15644850561960366
> },
> {
>   "start": 2.448955032046606,
>   "y": 690.0,
>   "width": 0.1564485056196041
> }]
> }
> Df schema looks good 
> {code}
>  root
>   |-- feature: string (nullable = true)
>   |-- histogram: array (nullable = true)
>   ||-- element: struct (containsNull = true)
>   |||-- start: double (nullable = true)
>   |||-- width: double (nullable = true)
>   |||-- y: double (nullable = true)
> {code}
> Need to convert each row to json now and save to HBase 
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
> Output JSON (Wrong)
> {
>   "feature": "feature_id_001",
>   "histogram": [
> [
>   1.9796095151877942,
>   968.0,
>   0.1564485056196041
> ],
> [
>   2.1360580208073983,
>   892.0,
>   0.1564485056196041
> ],
> [
>   2.2925065264270024,
>   814.0,
>   0.156448505

[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json

2017-04-26 Thread Philip Adetiloye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-20470:
-
Description: 
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}

Df schema looks good 

 root
  |-- feature: string (nullable = true)
  |-- histogram: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- start: double (nullable = true)
  |||-- width: double (nullable = true)
  |||-- y: double (nullable = true)

Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)

{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}



  was:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}

Df schema looks good 

# root
#  |-- feature: string (nullable = true)
#  |-- histogram: array (nullable = true)
#  ||-- element: struct (containsNull = true)
#  |||-- start: double (nullable = true)
#  |||-- width: double (nullable = true)
#  |||-- y: double (nullable = true)

Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)

{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}




> Invalid json converting RDD row with Array of struct to json
> 
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.3
>Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't 
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {
>   "feature": "feature_id_001",
>   "histogram": [
> {
>   "start": 1.9796095151877942,
>   "y": 968.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.1360580208073983,
>   "y": 892.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.2925065264270024,
>   "y": 814.0,
>   "width": 0.15644850561960366
> },
> {
>   "start": 2.448955032046606,
>   "y": 690.0,
>   "width": 0.1564485056196041
> }]
> }
> Df schema looks good 
>  root
>   |-- feature: string (nullable = true)
>   |-- histogram: array (nullable = true)
>   ||-- element: struct (containsNull = true)
>   |||-- start: double (nullable = true)
>   |||-- width: double (nullable = true)
>   |||-- y: double (nullable = true)
> Need to convert each row to json now and save to HBase 
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
> Output JSON (Wrong)
> {
>   "feature": "feature_id_001",
>   "histogram": [
> [
>   1.9796095151877942,
>   968.0,
>   0.1564485056196041
> ],
> [
>   2.1360580208073983,
>   892.0,
>   0.1564485056196041
> ],
> [
>   2.2925065264270024,
>   814.0,
>   0.15644850561960366
> ],
> 

[jira] [Updated] (SPARK-20470) Invalid json converting RDD row with Array of struct to json

2017-04-26 Thread Philip Adetiloye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Adetiloye updated SPARK-20470:
-
Description: 
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.

I read the json below into a dataframe:
{
  "feature": "feature_id_001",
  "histogram": [
{
  "start": 1.9796095151877942,
  "y": 968.0,
  "width": 0.1564485056196041
},
{
  "start": 2.1360580208073983,
  "y": 892.0,
  "width": 0.1564485056196041
},
{
  "start": 2.2925065264270024,
  "y": 814.0,
  "width": 0.15644850561960366
},
{
  "start": 2.448955032046606,
  "y": 690.0,
  "width": 0.1564485056196041
}]
}

Df schema looks good 

# root
#  |-- feature: string (nullable = true)
#  |-- histogram: array (nullable = true)
#  ||-- element: struct (containsNull = true)
#  |||-- start: double (nullable = true)
#  |||-- width: double (nullable = true)
#  |||-- y: double (nullable = true)

Need to convert each row to json now and save to HBase 
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(

Output JSON (Wrong)

{
  "feature": "feature_id_001",
  "histogram": [
[
  1.9796095151877942,
  968.0,
  0.1564485056196041
],
[
  2.1360580208073983,
  892.0,
  0.1564485056196041
],
[
  2.2925065264270024,
  814.0,
  0.15644850561960366
],
[
  2.448955032046606,
  690.0,
  0.1564485056196041
]
}



  was:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate 
the right json. It looks trivial but can't get a good json out.


rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(


> Invalid json converting RDD row with Array of struct to json
> 
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.3
>Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't 
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {
>   "feature": "feature_id_001",
>   "histogram": [
> {
>   "start": 1.9796095151877942,
>   "y": 968.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.1360580208073983,
>   "y": 892.0,
>   "width": 0.1564485056196041
> },
> {
>   "start": 2.2925065264270024,
>   "y": 814.0,
>   "width": 0.15644850561960366
> },
> {
>   "start": 2.448955032046606,
>   "y": 690.0,
>   "width": 0.1564485056196041
> }]
> }
> Df schema looks good 
> # root
> #  |-- feature: string (nullable = true)
> #  |-- histogram: array (nullable = true)
> #  ||-- element: struct (containsNull = true)
> #  |||-- start: double (nullable = true)
> #  |||-- width: double (nullable = true)
> #  |||-- y: double (nullable = true)
> Need to convert each row to json now and save to HBase 
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict(
> Output JSON (Wrong)
> {
>   "feature": "feature_id_001",
>   "histogram": [
> [
>   1.9796095151877942,
>   968.0,
>   0.1564485056196041
> ],
> [
>   2.1360580208073983,
>   892.0,
>   0.1564485056196041
> ],
> [
>   2.2925065264270024,
>   814.0,
>   0.15644850561960366
> ],
> [
>   2.448955032046606,
>   690.0,
>   0.1564485056196041
> ]
> }



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org