[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

   Resolution: Fixed
Fix Version/s: 2.0.2
   2.1.0
   1.2.2
   1.3.0
   Status: Resolved  (was: Patch Available)

Committed to 1 branches.
[~jcamachorodriguez] fyi another one went into 2.1. Please let me know if the 
RC is out (doesn't look like it), I can change to 2.1.1

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Fix For: 1.3.0, 1.2.2, 2.1.0, 2.0.2
>
> Attachments: HIVE-13948.patch, HIVE-13948.patch
>
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> Queries as simple as "select date(...);" reproduce the error (if Java tz is 
> set to a problematic tz)
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date 
> from YMD if needed (for many cases e.g. toString for sinks, it would not be 
> needed at all), and/or add a lookup table for timezone used (for popular 
> dates, e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 196

[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

Attachment: HIVE-13948.patch

Minor fixes, mostly to comments. The patch seems to work end-to-end to fix 
problematic queries.
q files need to be run in specific timezones to reproduce original issue (I was 
setting it via JAVA_TOOL_OPTIONS="-Duser.timezone=Canada/Eastern ..."), so no q 
files are added.

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-13948.patch, HIVE-13948.patch
>
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> Queries as simple as "select date(...);" reproduce the error (if Java tz is 
> set to a problematic tz)
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date 
> from YMD if needed (for many cases e.g. toString for sinks, it would not be 
> needed at all), and/or add a lookup table for timezone used (for popular 
> dates, e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 1963-12-14 23:00:00
> America/Argentina/Buenos_Aires: 1964-10-15 01:00:00 != 1964-10-14

[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

Attachment: (was: HIVE-13948.patch)

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-13948.patch
>
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> Queries as simple as "select date(...);" reproduce the error (if Java tz is 
> set to a problematic tz)
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date 
> from YMD if needed (for many cases e.g. toString for sinks, it would not be 
> needed at all), and/or add a lookup table for timezone used (for popular 
> dates, e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 1963-12-14 23:00:00
> America/Argentina/Buenos_Aires: 1964-10-15 01:00:00 != 1964-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1965-10-15 01:00:00 != 1965-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1966-10-15 01:00:00 != 1966-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1967-10-01 01:00:00 != 1967-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1968-10-0

[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

Attachment: HIVE-13948.patch

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-13948.patch
>
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> Queries as simple as "select date(...);" reproduce the error (if Java tz is 
> set to a problematic tz)
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date 
> from YMD if needed (for many cases e.g. toString for sinks, it would not be 
> needed at all), and/or add a lookup table for timezone used (for popular 
> dates, e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 1963-12-14 23:00:00
> America/Argentina/Buenos_Aires: 1964-10-15 01:00:00 != 1964-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1965-10-15 01:00:00 != 1965-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1966-10-15 01:00:00 != 1966-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1967-10-01 01:00:00 != 1967-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1968-10-06 01:00:00 

[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

Status: Patch Available  (was: Open)

[~jdere] [~gopalv] [~ashutoshc] can someone take a look?

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-13948.patch
>
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> Queries as simple as "select date(...);" reproduce the error (if Java tz is 
> set to a problematic tz)
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date 
> from YMD if needed (for many cases e.g. toString for sinks, it would not be 
> needed at all), and/or add a lookup table for timezone used (for popular 
> dates, e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 1963-12-14 23:00:00
> America/Argentina/Buenos_Aires: 1964-10-15 01:00:00 != 1964-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1965-10-15 01:00:00 != 1965-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1966-10-15 01:00:00 != 1966-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1967-10-01 01:00:00 != 1967-09-

[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

Attachment: HIVE-13948.patch

A patch.

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-13948.patch
>
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> Queries as simple as "select date(...);" reproduce the error (if Java tz is 
> set to a problematic tz)
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date 
> from YMD if needed (for many cases e.g. toString for sinks, it would not be 
> needed at all), and/or add a lookup table for timezone used (for popular 
> dates, e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 1963-12-14 23:00:00
> America/Argentina/Buenos_Aires: 1964-10-15 01:00:00 != 1964-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1965-10-15 01:00:00 != 1965-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1966-10-15 01:00:00 != 1966-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1967-10-01 01:00:00 != 1967-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1968-10-06

[jira] [Updated] (HIVE-13948) Incorrect timezone handling in Writable results in wrong dates in queries

2016-06-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13948:

Summary: Incorrect timezone handling in Writable results in wrong dates in 
queries  (was: Incorrect timezone handling in Writable results in wrong dates)

> Incorrect timezone handling in Writable results in wrong dates in queries
> -
>
> Key: HIVE-13948
> URL: https://issues.apache.org/jira/browse/HIVE-13948
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Blocker
>
> Modifying TestDateWritable to cover 200 years,  adding all timezones to the 
> set, and making it accumulate errors, results in the following set (I bet 
> many are duplicates via different names, but there's enough).
> This ONLY logs errors where YMD date mismatches. There are many more where 
> YMD is the same but the time mismatches, omitted for brevity.
> I was investigating some case for a specific date and it seems like the 
> conversion from dates to ms, namely offset calculation that takes the offset 
> at UTC midnight and the offset at arbitrary time derived from that, is 
> completely bogus and it's not clear why it would work.
> I think we either need to derive date from UTC and then create local date YMD 
> if needed, and/or add a lookup table for timezone used (for popular dates, 
> e.g. 1900-present, it would be 40k-odd entries, although the price of 
> building it is another question).
> Format: tz-expected-actual
> {noformat}
> 2016-06-04T18:33:57,499 ERROR [main[]]: io.TestDateWritable 
> (TestDateWritable.java:testDaylightSavingsTime(234)) - 
> DATE MISMATCH:
> Africa/Abidjan: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Accra: 1918-01-01 00:00:52 != 1918-12-31 23:59:08
> Africa/Bamako: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Banjul: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Bissau: 1912-01-01 00:02:20 != 1912-12-31 23:57:40
> Africa/Bissau: 1975-01-01 01:00:00 != 1975-12-31 23:00:00
> Africa/Casablanca: 1913-10-26 00:30:20 != 1913-10-25 23:29:40
> Africa/Ceuta: 1901-01-01 00:21:16 != 1901-12-31 23:38:44
> Africa/Conakry: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Dakar: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/El_Aaiun: 1976-04-14 01:00:00 != 1976-04-13 23:00:00
> Africa/Freetown: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Lome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Monrovia: 1972-05-01 00:44:30 != 1972-04-30 23:15:30
> Africa/Nouakchott: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Ouagadougou: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Sao_Tome: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> Africa/Timbuktu: 1912-01-01 00:16:08 != 1912-12-31 23:43:52
> America/Anguilla: 1912-03-02 00:06:04 != 1912-03-01 23:53:56
> America/Antigua: 1951-01-01 01:00:00 != 1951-12-31 23:00:00
> America/Araguaina: 1914-01-01 00:12:48 != 1914-12-31 23:47:12
> America/Araguaina: 1932-10-03 01:00:00 != 1932-10-02 23:00:00
> America/Araguaina: 1949-12-01 01:00:00 != 1949-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1920-05-01 00:16:48 != 1920-04-30 23:43:12
> America/Argentina/Buenos_Aires: 1930-12-01 01:00:00 != 1930-11-30 23:00:00
> America/Argentina/Buenos_Aires: 1931-10-15 01:00:00 != 1931-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1932-11-01 01:00:00 != 1932-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1933-11-01 01:00:00 != 1933-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1934-11-01 01:00:00 != 1934-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1935-11-01 01:00:00 != 1935-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1936-11-01 01:00:00 != 1936-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1937-11-01 01:00:00 != 1937-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1938-11-01 01:00:00 != 1938-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1939-11-01 01:00:00 != 1939-10-31 23:00:00
> America/Argentina/Buenos_Aires: 1940-07-01 01:00:00 != 1940-06-30 23:00:00
> America/Argentina/Buenos_Aires: 1941-10-15 01:00:00 != 1941-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1943-10-15 01:00:00 != 1943-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1946-10-01 01:00:00 != 1946-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1963-12-15 01:00:00 != 1963-12-14 23:00:00
> America/Argentina/Buenos_Aires: 1964-10-15 01:00:00 != 1964-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1965-10-15 01:00:00 != 1965-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1966-10-15 01:00:00 != 1966-10-14 23:00:00
> America/Argentina/Buenos_Aires: 1967-10-01 01:00:00 != 1967-09-30 23:00:00
> America/Argentina/Buenos_Aires: 1968-10-06 01:00:00 != 1968-10-05 23:00:00
> America/Argentina/Buenos_Aires: 1969-10-05 01:00:00 != 1969-10-04 23:00:00
> America/Argentina/Buenos_Aires: 1974-01-