[GitHub] spark issue #22913: [SPARK-25902][SQL] Add support for dates with millisecon...

2018-11-09 Thread javierluraschi
Github user javierluraschi commented on the issue:

https://github.com/apache/spark/pull/22913
  
Yes, I believe that's the case. This change 
https://github.com/apache/arrow/pull/2887 maps R bindings from `POSIXct` to 
`timestamp` instead of `date`, looks like there is not need for this one, I'll 
close it and send a new PR if we find out something else is missing.

Thanks everyone for your feedback, while not a change in Spark, ended up 
helping the Arrow project.

Best.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22913: [SPARK-25902][SQL] Add support for dates with mil...

2018-11-09 Thread javierluraschi
Github user javierluraschi closed the pull request at:

https://github.com/apache/spark/pull/22913


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22913: [SPARK-25902][SQL] Add support for dates with mil...

2018-11-05 Thread javierluraschi
Github user javierluraschi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22913#discussion_r230833894
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala 
---
@@ -71,6 +71,7 @@ object ArrowUtils {
 case d: ArrowType.Decimal => DecimalType(d.getPrecision, d.getScale)
 case date: ArrowType.Date if date.getUnit == DateUnit.DAY => DateType
 case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => 
TimestampType
+case date: ArrowType.Date if date.getUnit == DateUnit.MILLISECOND => 
TimestampType
--- End diff --

Right... let me keep just the original PR commit for now. So yes, we could 
map to date but we would loose time, so the best mapping we have is to 
`timestamp`.

Let me check back in the `arrow` project as to why `POSIXct` is being 
mapped to `Date` not `TimeStamp` (see 
[arrow/r/src/array.cpp#L461](https://github.com/apache/arrow/blob/3a1dd3feb9ca09f92168f46fcfc06c01305df3ec/r/src/array.cpp#L461)),
 if we can change that then I would agree we don't need this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22913: [SPARK-25902][SQL] Add support for dates with mil...

2018-11-04 Thread javierluraschi
Github user javierluraschi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22913#discussion_r230607581
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala 
---
@@ -71,6 +71,7 @@ object ArrowUtils {
 case d: ArrowType.Decimal => DecimalType(d.getPrecision, d.getScale)
 case date: ArrowType.Date if date.getUnit == DateUnit.DAY => DateType
 case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => 
TimestampType
+case date: ArrowType.Date if date.getUnit == DateUnit.MILLISECOND => 
TimestampType
--- End diff --

Good catch, thanks. Yes, this should be mapped to `Date` in the Arrow 
schema, not `TimeStamp`.

To give more background, Arrow Dates can have a unit of `DateUnit.DAY` or 
`DateUnit.MILLISECOND` (see 
[arrow/vector/types/DateUnit.java#L21-L22](https://github.com/apache/arrow/blob/73d379f4631cd3013371f60876a52615171e6c3b/java/vector/src/main/java/org/apache/arrow/vector/types/DateUnit.java#L21-L22)),
 currently, if a date with milliseconds is passed, this simply fails; 
therefore, this change does not affect other type conversions and is fine to 
map all Arrow dates to Spark dates since now all cases are properly handled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22913: [SPARK-25902][SQL] Add support for dates with mil...

2018-10-31 Thread javierluraschi
GitHub user javierluraschi opened a pull request:

https://github.com/apache/spark/pull/22913

[SPARK-25902][SQL] Add support for dates with milliseconds in Apache Arrow 
bindings

Currently, the Apache Arrow bindings for Java only support `Date` with the 
metric set to `DateUnit.DAY`, see 
[ArrowUtils.scala#L72](https://github.com/apache/spark/blob/8c2edf46d0f89e5ec54968218d89f30a3f8190bc/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala#L72).

However, the Spark Arrow bindings for `R` are adding support to map 
`POSIXct` to `Date` using `DateUnit.MILLISECOND`, see 
https://github.com/rstudio/sparklyr/issues/1733.

This PR does not change existing functionality but rather, adds support for 
the `DateUnit.MILLISECOND` metric in `Date` to avoid hitting a 
`java.lang.UnsupportedOperationException: Unsupported data type: 
Date(MILLISECOND)` exception while loading an Arrow stream to Spark.

This patch was tested with https://github.com/rstudio/sparklyr/issues/1733, 
by building Spark from source and then:

```r
devtools::install_github("apache/arrow", subdir = "r")
devtools::install_github("rstudio/sparklyr", ref = "feature/arrow")
Sys.setenv("SPARK_HOME_VERSION" = "2.3.2")

library(sparklyr)
library(arrow)
sc <- spark_connect(master = "local", spark_home = 
"")

dates <- data.frame(dates = c(
  as.POSIXlt(Sys.time(), "GMT"),
  as.POSIXlt(Sys.time(), "EST"))
)

dates_tbl <- sdf_copy_to(sc, dates, overwrite = T)
    ```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/javierluraschi/spark arrow-date-millis

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22913.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22913


commit f809942de6c241cc7b499c19d0250185ebe26122
Author: Javier Luraschi 
Date:   2018-10-31T18:44:50Z

add support for date milliseconds in arrow bindings




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org