[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/23000#discussion_r234992152 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite { assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78) } + test("SPARK-26002: correct day of year calculations for Julian calendar years") { +TimeZone.setDefault(TimeZoneUTC) +val c = Calendar.getInstance(TimeZoneUTC) +c.set(Calendar.MILLISECOND, 0) +(1000 to 1600 by 100).foreach { year => + // January 1 is the 1st day of year. + c.set(year, 0, 1, 0, 0, 0) + assert(getYear(getInUTCDays(c.getTimeInMillis)) === year) + assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 1) + assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 1) + + // March 1 is the 61st day of the year as they are leap years. It is true for + // even the multiples of 100 as before 1582-10-4 the Julian calendar leap year calculation + // is used in which every multiples of 4 are leap years + c.set(year, 2, 1, 0, 0, 0) + assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 61) + assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 3) + + // For non-leap years: + c.set(year + 1, 2, 1, 0, 0, 0) + assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 60) +} --- End diff -- The last two (1600-01-01 and 1600-03-01) are already tested as 1600 is included in `(1000 to 1600 by 100)`. I have added a new check for 1582-10-03. But I would not add an assert for 1582-10-14 without knowing that is really the correct value. I have checked PostgreSQL but there this 10 days gap is not handled at all: from `SELECT EXTRACT(DOY FROM TIMESTAMP '1582-10-03 00:00:00');` to `SELECT EXTRACT(DOY FROM TIMESTAMP '1582-10-16 00:00:00');` from day by day it is consecutive days from 276 to 289. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/23000#discussion_r234986530 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite { assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78) } + test("SPARK-26002: correct day of year calculations for Julian calendar years") { +TimeZone.setDefault(TimeZoneUTC) --- End diff -- Thanks! Good catch! It is actually not needed any more. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/23000#discussion_r234819827 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite { assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78) } + test("SPARK-26002: correct day of year calculations for Julian calendar years") { +TimeZone.setDefault(TimeZoneUTC) --- End diff -- Just curious. Do you need to put back the old default when the test is over? Or does that not matter here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/23000#discussion_r234807971 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite { assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78) } + test("SPARK-26002: correct day of year calculations for Julian calendar years") { +TimeZone.setDefault(TimeZoneUTC) +val c = Calendar.getInstance(TimeZoneUTC) +c.set(Calendar.MILLISECOND, 0) +(1000 to 1600 by 100).foreach { year => + // January 1 is the 1st day of year. + c.set(year, 0, 1, 0, 0, 0) + assert(getYear(getInUTCDays(c.getTimeInMillis)) === year) + assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 1) + assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 1) + + // March 1 is the 61st day of the year as they are leap years. It is true for + // even the multiples of 100 as before 1582-10-4 the Julian calendar leap year calculation + // is used in which every multiples of 4 are leap years + c.set(year, 2, 1, 0, 0, 0) + assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 61) + assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 3) + + // For non-leap years: + c.set(year + 1, 2, 1, 0, 0, 0) + assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 60) +} --- End diff -- this is good, but I think its worth adding checks for a couple of special cases: * 1582-10-3 * 1582-10-14 (though I guess the meaning of "dayInYear" is not so clear in this case) * 1600-01-01 * 1600-03-01 I think they'll all be OK after your change, but good to have a check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23000: [SPARK-26002][SQL] Fix day of year calculation fo...
GitHub user attilapiros opened a pull request: https://github.com/apache/spark/pull/23000 [SPARK-26002][SQL] Fix day of year calculation for Julian calendar days ## What changes were proposed in this pull request? Fixing leap year calculations for date operators (year/month/dayOfYear) where the Julian calendars are used (before 1582-10-04). In a Julian calendar every years which are multiples of 4 are leap years (there is no extra exception for years multiples of 100). ## How was this patch tested? With a unit test ("SPARK-26002: correct day of year calculations for Julian calendar years") which focuses to these corner cases. Manually: ``` scala> sql("select year('1500-01-01')").show() +--+ |year(CAST(1500-01-01 AS DATE))| +--+ | 1500| +--+ scala> sql("select dayOfYear('1100-01-01')").show() +---+ |dayofyear(CAST(1100-01-01 AS DATE))| +---+ | 1| +---+ ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/attilapiros/spark julianOffByDays Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23000.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23000 commit f2bc0b26184c02b6019893c601eb479db9419689 Author: âattilapirosâ Date: 2018-11-10T13:28:01Z Fix day of year calculation for Julian calendar days --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org