[jira] [Updated] (SPARK-40820) Creating StructType from Json

Anthony Wainer Cachay Guivin (Jira) Sun, 22 Oct 2023 06:06:08 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anthony Wainer Cachay Guivin updated SPARK-40820:
-------------------------------------------------
          Component/s: Spark Core
    Affects Version/s: 3.5.0
                           (was: 3.3.0)
          Description: 
When create a StructType from a Python dictionary you use 
[StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L792]
 or in scala 
[DataType.fromJson|https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158C7-L158C15]

To create a schema can be created as follows from the code below, but it 
requires to put inside the json: Nullable and Metadata, this is inconsistent 
because within the DataType class this by default.

{code:python}
schema = {
     "name": "name", "type": "string" 
}

StructField.fromJson(schema)
{code}

Python Error:

{code:python}
from pyspark.sql.types import StructField

schema = {
     "name": "c1", "type": "string" 
}

StructField.fromJson(schema)

>>
Traceback (most recent call last):
File "code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "pyspark/sql/types.py", line 583, in fromJson
json["nullable"],
KeyError: 'nullable' 
{code}

Scala Error:
{code:scala}
val schema = """
        |{
        |    "type": "struct",
        |    "fields": [
        |        {
        |            "name": "c1",
        |            "type": "string",
        |            "nullable": false
        |        }
        |    ]
        |}
        |""".stripMargin

DataType.fromJson(schema)

>>
Failed to convert the JSON string '{"name":"c1","type":"string"}' to a field.
java.lang.IllegalArgumentException: Failed to convert the JSON string 
'{"name":"c1","type":"string"}' to a field.
at org.apache.spark.sql.types.DataType$.parseStructField(DataType.scala:268)
at 
org.apache.spark.sql.types.DataType$.$anonfun$parseDataType$1(DataType.scala:225)
{code}

  was:
When create a StructType from a Python dictionary you utilize the 
[StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L569-L571]
 method.

To create a schema can be created as follows from the code below, but it 
requires to put inside the json: Nullable and Metadata, this is inconsistent 
because within the DataType class this by default.
{code:python}
json = {
            "name": "name",
            "type": "string"
        }

StructField.fromJson(json)
{code}
Error:
{code:python}
from pyspark.sql.types import StructField
json = {
            "name": "name",
            "type": "string"
        }
StructField.fromJson(json)

>>
Traceback (most recent call last):
  File "code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "pyspark/sql/types.py", line 583, in fromJson
    json["nullable"],
KeyError: 'nullable' {code}
 

Proposed coding solution:

Instead use indexes for getting from a dictionary, it would be better to use 
.get
{code:python}
def fromJson(cls, json: Dict[str, Any]) -> "StructField":
        return StructField(
            json["name"],
            _parse_datatype_json_value(json["type"]),
            json.get("nullable"),
            json.get("metadata"),
        )
{code}
 


> Creating StructType from Json
> -----------------------------
>
>                 Key: SPARK-40820
>                 URL: https://issues.apache.org/jira/browse/SPARK-40820
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core
>    Affects Versions: 3.5.0
>            Reporter: Anthony Wainer Cachay Guivin
>            Priority: Minor
>              Labels: pull-request-available
>
> When create a StructType from a Python dictionary you use 
> [StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L792]
>  or in scala 
> [DataType.fromJson|https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158C7-L158C15]
> To create a schema can be created as follows from the code below, but it 
> requires to put inside the json: Nullable and Metadata, this is inconsistent 
> because within the DataType class this by default.
> {code:python}
> schema = {
>      "name": "name", "type": "string" 
> }
> StructField.fromJson(schema)
> {code}
> Python Error:
> {code:python}
> from pyspark.sql.types import StructField
> schema = {
>      "name": "c1", "type": "string" 
> }
> StructField.fromJson(schema)
> >>
> Traceback (most recent call last):
> File "code.py", line 90, in runcode
> exec(code, self.locals)
> File "<input>", line 1, in <module>
> File "pyspark/sql/types.py", line 583, in fromJson
> json["nullable"],
> KeyError: 'nullable' 
> {code}
> Scala Error:
> {code:scala}
> val schema = """
>         |{
>         |    "type": "struct",
>         |    "fields": [
>         |        {
>         |            "name": "c1",
>         |            "type": "string",
>         |            "nullable": false
>         |        }
>         |    ]
>         |}
>         |""".stripMargin
> DataType.fromJson(schema)
> >>
> Failed to convert the JSON string '{"name":"c1","type":"string"}' to a field.
> java.lang.IllegalArgumentException: Failed to convert the JSON string 
> '{"name":"c1","type":"string"}' to a field.
> at org.apache.spark.sql.types.DataType$.parseStructField(DataType.scala:268)
> at 
> org.apache.spark.sql.types.DataType$.$anonfun$parseDataType$1(DataType.scala:225)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40820) Creating StructType from Json

Reply via email to