[ 
https://issues.apache.org/jira/browse/SPARK-52355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-52355:
-----------------------------------
    Labels: pull-request-available  (was: )

> VariantVal schema improperly inferred as struct<metadata:binary,value:binary>
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-52355
>                 URL: https://issues.apache.org/jira/browse/SPARK-52355
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 4.0.0
>            Reporter: Austin Warner
>            Priority: Minor
>              Labels: pull-request-available
>
> When creating VariantVal objects locally in Python, the schema is improperly 
> inferred as a struct with metadata and value fields.
>  
> {quote}{{>>> from pyspark.sql.types import VariantVal}}
> {{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)], 
> schema=['value'])}}
> {{>>> df.printSchema()}}
> {{root}}
> {{|-- value: struct (nullable = true)}}
> {{| |-- metadata: binary (nullable = true)}}
> {{| |-- value: binary (nullable = true)}}
> {{>>> df.collect()}}
> {{[Row(value=Row(metadata=bytearray(b'\x01\x00\x00'), 
> value=bytearray(b'\x03\x01\x00\x02\x0c\x01')))]}}
> {quote}
> When the schema is passed explicitly, everything works as intended
> {quote}{{>>> from pyspark.sql.types import VariantVal}}
> {{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)], 
> schema='value variant')}}
> {{>>> df.printSchema()}}
> {{root}}
> {{|-- value: variant (nullable = true)}}
> {{>>> df.collect()}}
> {{[Row(value=VariantVal(bytearray(b'\x03\x01\x00\x02\x0c\x01'), 
> bytearray(b'\x01\x00\x00')))]}}
> {{>>> df.collect()[0].value.toJson()}}
> {{'[1]'}}
> {quote}
> This appears to be because the 
> [{{pyspark.sql.type._infer_type}}|https://github.com/apache/spark/blob/e3321aa44ea255365222c491657b709ef41dc460/python/pyspark/sql/types.py#L2178-L2322]
>  function does not include a case for VariantVal objects



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to