[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21984 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21984#discussion_r208138182 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: case BinaryType => (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal)) case DateType => -(getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY +(getter, ordinal) => getter.getInt(ordinal) --- End diff -- There are 2 kinds of compatibilities: 1. the file written by old avro data source can be read by the new avro data source 2. the file written by new avro data source can be read by the old avro data source I think we should focus on 1) and ignore 2) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21984#discussion_r207746389 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala --- @@ -100,6 +103,8 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { s"Cannot convert Avro logical type ${other} to Catalyst Timestamp type.") } + // Before we upgrade Avro to 1.8 for logical type support, spark-avo converts Long to Date. --- End diff -- typo: spark-avo -> spark-avro. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21984#discussion_r207724930 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: case BinaryType => (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal)) case DateType => -(getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY +(getter, ordinal) => getter.getInt(ordinal) --- End diff -- I don't think it is behavior change. The only concern is that the Avro file with date type column is written with this built-in package, and read by third party one with user specify schema. The case should be very trivial and we can ignore that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21984#discussion_r207700882 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: case BinaryType => (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal)) case DateType => -(getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY +(getter, ordinal) => getter.getInt(ordinal) --- End diff -- Does this cause a behaviour change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21984#discussion_r207537985 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: case BinaryType => (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal)) case DateType => -(getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY +(getter, ordinal) => getter.getInt(ordinal) --- End diff -- For the write path, let's drop the previous conversion to `Long` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21984 [SPARK-24772][SQL] Avro: support logical date type ## What changes were proposed in this pull request? Support Avro logical date type: https://avro.apache.org/docs/1.8.2/spec.html#Date ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark avro_date Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21984.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21984 commit 16e03572b47b26702232a3e012fb3566cfdfae79 Author: Gengliang Wang Date: 2018-08-03T12:03:21Z support logical date type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org