[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...

2018-11-20 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/23000#discussion_r234992152
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78)
   }
 
+  test("SPARK-26002: correct day of year calculations for Julian calendar 
years") {
+TimeZone.setDefault(TimeZoneUTC)
+val c = Calendar.getInstance(TimeZoneUTC)
+c.set(Calendar.MILLISECOND, 0)
+(1000 to 1600 by 100).foreach { year =>
+  // January 1 is the 1st day of year.
+  c.set(year, 0, 1, 0, 0, 0)
+  assert(getYear(getInUTCDays(c.getTimeInMillis)) === year)
+  assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 1)
+  assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 1)
+
+  // March 1 is the 61st day of the year as they are leap years. It is 
true for
+  // even the multiples of 100 as before 1582-10-4 the Julian calendar 
leap year calculation
+  // is used in which every multiples of 4 are leap years
+  c.set(year, 2, 1, 0, 0, 0)
+  assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 61)
+  assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 3)
+
+  // For non-leap years:
+  c.set(year + 1, 2, 1, 0, 0, 0)
+  assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 60)
+}
--- End diff --

The last two (1600-01-01 and 1600-03-01) are already tested as 1600 is 
included in `(1000 to 1600 by 100)`.

I have added a new check for 1582-10-03. 

But I would not add an assert for 1582-10-14 without knowing that is really 
the correct value.

I have checked PostgreSQL but there this 10 days gap is not handled at all: 
from `SELECT EXTRACT(DOY FROM TIMESTAMP '1582-10-03 00:00:00');`  to `SELECT 
EXTRACT(DOY FROM TIMESTAMP '1582-10-16 00:00:00');` from day by day it is 
consecutive days from 276 to 289.
  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...

2018-11-20 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/23000#discussion_r234986530
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78)
   }
 
+  test("SPARK-26002: correct day of year calculations for Julian calendar 
years") {
+TimeZone.setDefault(TimeZoneUTC)
--- End diff --

Thanks! Good catch! It is actually not needed any more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...

2018-11-19 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/23000#discussion_r234819827
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78)
   }
 
+  test("SPARK-26002: correct day of year calculations for Julian calendar 
years") {
+TimeZone.setDefault(TimeZoneUTC)
--- End diff --

Just curious. Do you need to put back the old default when the test is 
over? Or does that not matter here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...

2018-11-19 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/23000#discussion_r234807971
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78)
   }
 
+  test("SPARK-26002: correct day of year calculations for Julian calendar 
years") {
+TimeZone.setDefault(TimeZoneUTC)
+val c = Calendar.getInstance(TimeZoneUTC)
+c.set(Calendar.MILLISECOND, 0)
+(1000 to 1600 by 100).foreach { year =>
+  // January 1 is the 1st day of year.
+  c.set(year, 0, 1, 0, 0, 0)
+  assert(getYear(getInUTCDays(c.getTimeInMillis)) === year)
+  assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 1)
+  assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 1)
+
+  // March 1 is the 61st day of the year as they are leap years. It is 
true for
+  // even the multiples of 100 as before 1582-10-4 the Julian calendar 
leap year calculation
+  // is used in which every multiples of 4 are leap years
+  c.set(year, 2, 1, 0, 0, 0)
+  assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 61)
+  assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 3)
+
+  // For non-leap years:
+  c.set(year + 1, 2, 1, 0, 0, 0)
+  assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 60)
+}
--- End diff --

this is good, but I think its worth adding checks for a couple of special 
cases:

* 1582-10-3
* 1582-10-14 (though I guess the meaning of "dayInYear" is not so clear in 
this case)
* 1600-01-01
* 1600-03-01

I think they'll all be OK after your change, but good to have a check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...

2018-11-10 Thread attilapiros
GitHub user attilapiros opened a pull request:

https://github.com/apache/spark/pull/23000

[SPARK-26002][SQL] Fix day of year calculation for Julian calendar days

## What changes were proposed in this pull request?

Fixing leap year calculations for date operators (year/month/dayOfYear) 
where the Julian calendars are used (before 1582-10-04). In a Julian calendar 
every years which are multiples of 4 are leap years (there is no extra 
exception for years multiples of 100).  

## How was this patch tested?

With a unit test ("SPARK-26002: correct day of year calculations for Julian 
calendar years") which focuses to these corner cases.

Manually:

```
scala> sql("select year('1500-01-01')").show()

+--+
|year(CAST(1500-01-01 AS DATE))|
+--+
|  1500|
+--+

scala> sql("select dayOfYear('1100-01-01')").show()

+---+
|dayofyear(CAST(1100-01-01 AS DATE))|
+---+
|  1|
+---+
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/attilapiros/spark julianOffByDays

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23000.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23000


commit f2bc0b26184c02b6019893c601eb479db9419689
Author: “attilapiros” 
Date:   2018-11-10T13:28:01Z

Fix day of year calculation for Julian calendar days




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org