[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21984


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21984#discussion_r208138182
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case BinaryType =>
 (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
   case DateType =>
-(getter, ordinal) => getter.getInt(ordinal) * 
DateTimeUtils.MILLIS_PER_DAY
+(getter, ordinal) => getter.getInt(ordinal)
--- End diff --

There are 2 kinds of compatibilities:
1. the file written by old avro data source can be read by the new avro 
data source
2. the file written by new avro data source can be read by the old avro 
data source

I think we should focus on 1) and ignore 2)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21984#discussion_r207746389
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala 
---
@@ -100,6 +103,8 @@ class AvroDeserializer(rootAvroType: Schema, 
rootCatalystType: DataType) {
   s"Cannot convert Avro logical type ${other} to Catalyst 
Timestamp type.")
   }
 
+  // Before we upgrade Avro to 1.8 for logical type support, spark-avo 
converts Long to Date.
--- End diff --

typo: spark-avo -> spark-avro.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-04 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21984#discussion_r207724930
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case BinaryType =>
 (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
   case DateType =>
-(getter, ordinal) => getter.getInt(ordinal) * 
DateTimeUtils.MILLIS_PER_DAY
+(getter, ordinal) => getter.getInt(ordinal)
--- End diff --

I don't think it is behavior change. The only concern is that the Avro file 
with date type column is written with this built-in package, and read by third 
party one with user specify schema. The case should be very trivial and we can 
ignore that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21984#discussion_r207700882
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case BinaryType =>
 (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
   case DateType =>
-(getter, ordinal) => getter.getInt(ordinal) * 
DateTimeUtils.MILLIS_PER_DAY
+(getter, ordinal) => getter.getInt(ordinal)
--- End diff --

Does this cause a behaviour change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-03 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21984#discussion_r207537985
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case BinaryType =>
 (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
   case DateType =>
-(getter, ordinal) => getter.getInt(ordinal) * 
DateTimeUtils.MILLIS_PER_DAY
+(getter, ordinal) => getter.getInt(ordinal)
--- End diff --

For the write path, let's drop the previous conversion to `Long`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

2018-08-03 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/21984

[SPARK-24772][SQL] Avro: support logical date type

## What changes were proposed in this pull request?

Support Avro logical date type:
https://avro.apache.org/docs/1.8.2/spec.html#Date

## How was this patch tested?

Unit test 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark avro_date

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21984.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21984


commit 16e03572b47b26702232a3e012fb3566cfdfae79
Author: Gengliang Wang 
Date:   2018-08-03T12:03:21Z

support logical date type




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org