I've reported this in SPARK-7506
<https://issues.apache.org/jira/browse/SPARK-7506>.

On Thu, May 7, 2015 at 6:58 PM Nicholas Chammas <nicholas.cham...@gmail.com>
wrote:

> Renaming fields to get around SPARK-2775
> <https://issues.apache.org/jira/browse/SPARK-2775>.
>
> I’m doing this clunky thing:
>
>    1. Convert a DataFrame’s schema to JSON, and then a Python dictionary.
>    2. Replace the problematic characters in the schema field names.
>    3.
>
>    Convert the resulting dictionary back into JSON and then back into a
>    DataFrame schema.
>     4.
>
>    Apply the new schema with the fixed field names.
>
> Nick
> ​
>
> On Thu, May 7, 2015 at 6:51 PM Reynold Xin <r...@databricks.com> wrote:
>
>> What's the use case?
>>
>> I'm wondering if we should even expose fromJSON. I think it's more a bug
>> than feature.
>>
>>
>> On Thu, May 7, 2015 at 1:55 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Observe, my fellow Sparkophiles (Spark 1.3.1):
>>>
>>> >>> json_rdd = sqlContext.jsonRDD(sc.parallelize(['{"name": "Nick"}']))
>>> >>> json_rdd.schema
>>> StructType(List(StructField(name,StringType,true)))
>>> >>> type(json_rdd.schema)
>>> <class 'pyspark.sql.types.StructType'>
>>> >>> json_rdd.schema.json()
>>>
>>> '{"fields":[{"metadata":{},"name":"name","nullable":true,"type":"string"}],"type":"struct"}'
>>> >>> pyspark.sql.types.StructType.fromJson(json_rdd.schema.json())
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File
>>> "/Applications/apache-spark/spark-1.3.1-bin-hadoop2.4/python/pyspark/sql/types.py",
>>> line 346, in fromJson
>>>     return StructType([StructField.fromJson(f) for f in json["fields"]])
>>> TypeError: string indices must be integers, not str
>>> >>> import json
>>> >>>
>>> pyspark.sql.types.StructType.fromJson(json.loads(json_rdd.schema.json()))
>>> StructType(List(StructField(name,StringType,true)))
>>> >>>
>>>
>>> So fromJson() doesn’t actually expect JSON, which is a string. It
>>> expects a
>>> dictionary.
>>>
>>> Did I misunderstand something, or is this method poorly named?
>>>
>>> My impression was that the combination of toJson() and fromJson() would
>>> return the original input as-is, but it looks like we needed the added
>>> step
>>> of json.loads() to make things work.
>>>
>>> Nick
>>> ​
>>>
>>
>>

Reply via email to