[GitHub] [spark] Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas
Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas URL: https://github.com/apache/spark/pull/24405#discussion_r353590959 ## File path: docs/sql-data-sources-avro.md ## @@ -240,6 +240,14 @@ Data source options of Avro can be set via: function from_avro + +writerSchema Review comment: I would stick to `writerSchema`, mostly because this is also the term used in Avro itself: https://avro.apache.org/docs/1.9.1/api/java/org/apache/avro/hadoop/io/AvroValueDeserializer.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas
Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas URL: https://github.com/apache/spark/pull/24405#discussion_r335344093 ## File path: docs/sql-data-sources-avro.md ## @@ -240,6 +240,14 @@ Data source options of Avro can be set via: function from_avro + +writerSchema +None +Optional Avro schema (in JSON format) that was used to serialize the data. This should be set if the schema provided + for deserialization is compatible with - but not the same as - the one used to originally convert the data to Avro. + Review comment: Would it be possible to link to the Confluent documentation? They have an excellent document on schema compatibility and evolution: https://docs.confluent.io/current/schema-registry/avro.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas
Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas URL: https://github.com/apache/spark/pull/24405#discussion_r335344326 ## File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala ## @@ -153,4 +153,45 @@ class AvroFunctionsSuite extends QueryTest with SharedSparkSession { assert(df.collect().map(_.get(0)) === Seq(Row("one"), Row("two"), Row("three"), Row("four"))) } } + + test("SPARK-27506: roundtrip in to_avro and from_avro with different compatible schemas") { Review comment: I would also add a test with an incompatible schema, for example, changing a `string` to an `int`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org