Philip Adetiloye created SPARK-43201: ----------------------------------------
Summary: Inconsistency between from_avro and from_json function Key: SPARK-43201 URL: https://issues.apache.org/jira/browse/SPARK-43201 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Philip Adetiloye Spark from_avro function does not allow schema to use dataframe column but takes a String schema: {code:java} def from_avro(col: Column, jsonFormatSchema: String): Column {code} This makes it impossible to deserialize rows of Avro records with different schema since only one schema string could be pass externally. Here is what I would expect: {code:java} def from_avro(col: Column, jsonFormatSchema: Column): Column {code} code example: {code:java} import org.apache.spark.sql.functions.from_avro val avroSchema1 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""val val avroSchema2 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val df = Seq( (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) ).toDF("binaryData", "schema") val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))parsed.show() // Output: // +------------+ // | parsedData| // +------------+ // |[apple1, 1.0]| // |[apple2, 2.0]| // +------------+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org