[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

GitBox Tue, 16 Nov 2021 22:17:26 -0800


HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r750909587




##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       Hm, there's something wrong here. We mapped Spark's 
`DayTimeIntervalType` to Java (Scala)'s 
[`java.time.Duration`](https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html)
 in Java but we map it here to Arrow's `IntervalType` that represents a 
calendar instance (see also 
https://github.com/apache/arrow/blob/master/format/Schema.fbs).
   
   I think we should map it to Arrow's `DurationType` (Python's 
[`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)).
 I am working on SPARK-37277 to support this in Arrow conversion at PySpark but 
this became a blocker to me. I am preparing a PR to change this but please let 
me know if you guys have different thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Reply via email to