Miguel Cabrera created SPARK-18180:
--------------------------------------

             Summary: pyspark.sql.Row does not serialize well to json
                 Key: SPARK-18180
                 URL: https://issues.apache.org/jira/browse/SPARK-18180
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.0.1
         Environment: HDP 2.3.4, Spark 2.0.1, 
            Reporter: Miguel Cabrera


{{Row}} does not serialize well automatically. Although they are dict-like in 
Python, the json module does not see to be able to serialize it.

{noformat}
from  pyspark.sql import Row
import json

r = Row(field1='hello', field2='world')
json.dumps(r)
{noformat}

Results:
{noformat}
'["hello", "world"]'
{noformat}

Expected:

{noformat}
{'field1':'hellow', 'field2':'world'}
{noformat}

The work around is to call the {{asDict()}} method of Row. However, this makes 
custom serializing of nested objects really painful as the person has to be 
aware that is serializing a Row object. In particular with SPARK-17695,   you 
cannot serialize DataFrames easily if you have some empty or null fields,  so 
you have to customize the serialization process. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to