[ https://issues.apache.org/jira/browse/SPARK-52355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-52355: ----------------------------------- Labels: pull-request-available (was: ) > VariantVal schema improperly inferred as struct<metadata:binary,value:binary> > ----------------------------------------------------------------------------- > > Key: SPARK-52355 > URL: https://issues.apache.org/jira/browse/SPARK-52355 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 4.0.0 > Reporter: Austin Warner > Priority: Minor > Labels: pull-request-available > > When creating VariantVal objects locally in Python, the schema is improperly > inferred as a struct with metadata and value fields. > > {quote}{{>>> from pyspark.sql.types import VariantVal}} > {{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)], > schema=['value'])}} > {{>>> df.printSchema()}} > {{root}} > {{|-- value: struct (nullable = true)}} > {{| |-- metadata: binary (nullable = true)}} > {{| |-- value: binary (nullable = true)}} > {{>>> df.collect()}} > {{[Row(value=Row(metadata=bytearray(b'\x01\x00\x00'), > value=bytearray(b'\x03\x01\x00\x02\x0c\x01')))]}} > {quote} > When the schema is passed explicitly, everything works as intended > {quote}{{>>> from pyspark.sql.types import VariantVal}} > {{>>> df = spark.createDataFrame([(VariantVal.parseJson("[1]"),)], > schema='value variant')}} > {{>>> df.printSchema()}} > {{root}} > {{|-- value: variant (nullable = true)}} > {{>>> df.collect()}} > {{[Row(value=VariantVal(bytearray(b'\x03\x01\x00\x02\x0c\x01'), > bytearray(b'\x01\x00\x00')))]}} > {{>>> df.collect()[0].value.toJson()}} > {{'[1]'}} > {quote} > This appears to be because the > [{{pyspark.sql.type._infer_type}}|https://github.com/apache/spark/blob/e3321aa44ea255365222c491657b709ef41dc460/python/pyspark/sql/types.py#L2178-L2322] > function does not include a case for VariantVal objects -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org