Hien Luu created SPARK-27027: -------------------------------- Summary: from_avro function does not deserialize the Avro record of a struct column type correctly Key: SPARK-27027 URL: https://issues.apache.org/jira/browse/SPARK-27027 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Hien Luu
from_avro function produces wrong output of a struct field. See the output at the bottom of the description ===================================================== import org.apache.spark.sql.types._ import org.apache.spark.sql.avro._ import org.apache.spark.sql.functions._ spark.version val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25), (3, "Josh Duke", 50)).toDF("id", "name", "age") val dfStruct = df.withColumn("value", struct("name","age")) dfStruct.show dfStruct.printSchema val dfKV = dfStruct.select(to_avro('id).as("key"), to_avro('value).as("value")) val expectedSchema = StructType(Seq(StructField("name", StringType, true),StructField("age", IntegerType, false))) val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString val avroTypeStr = s""" |{ | "type": "int", | "name": "key" |} """.stripMargin dfKV.select(from_avro('key, avroTypeStr)).show dfKV.select(from_avro('value, avroTypeStruct)).show // output for the last statement and that is not correct +---------------------------------------------+ |from_avro(value, struct<name:string,age:int>)| +---------------------------------------------+ | [Josh Duke, 50]| | [Josh Duke, 50]| | [Josh Duke, 50]| +---------------------------------------------+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org