Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-02-02 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/
---

(Updated Feb. 2, 2016, 7:10 p.m.)


Review request for hive, Ryan Blue, Mohammad Islam, Reuben Kuhnert, and Szehon 
Ho.


Changes
---

I forgot to add data/files/hive2.0_int96_timestamp-bug.parq


Bugs: HIVE-12767
https://issues.apache.org/jira/browse/HIVE-12767


Repository: hive-git


Description
---

The following exit criteria is addressed in this patch:

* Hive will read Parquet MR int96 timestamp data and adjust values using a time 
zone from a table property, if set, or using the local time zone if it is 
absent. No adjustment will be applied to data written by Impala.

* Hive will write Parquet int96 timestamps using a time zone adjustment from 
the same table property, if set, or using the local time zone if it is absent. 
This keeps the data in the table consistent.

* New tables created by Hive will set the table property to UTC if the global 
option to set the property for new tables is enabled.
  * Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
the property unless the global setting to do so is enabled.
  * Tables created using CREATE TABLE LIKE  will copy the property 
of the table that is copied.

To set the timezone table property, use this:
  create table tbl1 (ts timestamp) stored as parquet tblproperties 
('parquet.mr.int96.write.zone'='PST');
  
To set UTC as default timezone table property on new tables created, use this: 
  set parquet.mr.int96.enable.utc.write.zone=true;
  create table tbl2 (ts timestamp) stored as parquet;


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
bfd88f82ee86740f60baed122ca050dd542d637c 
  data/files/hive2.0_int96_timestamp-bug.parq PRE-CREATION 
  data/files/impala_int96_timestamp.parq PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
2e4591313f38d9bd1bfe1b54968c835fdcd53717 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
bfb48a987ce89a373f3da63c9162546c6eda43a9 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
ec0dd818f688ab92feb46be4fb6040ede5ac756a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 53f3b72b790d87a75a7cd1d77d8f011c29c41188 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 74a1a82047613189678716f765bfaa9ac39b7618 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
aace48ee7d145d199163286d21e4ee7694140d6f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
69272dc41dbc5fe29ab4c98e730b591c28f3a297 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
8c880c39823a9279e32ee191d9f250ea0014888f 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
70491390ba2b90f32ef9963be7b19e57672241f3 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
  ql/src/test/queries/clientnegative/parquet_int96_timestamp_errors.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
  ql/src/test/results/clientnegative/parquet_int96_timestamp_errors.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/41821/diff/


Testing
---

Added unit and q-tests:
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q


Thanks,

Sergio Pena



Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-28 Thread Sergio Pena


> On Jan. 22, 2016, 11:15 p.m., Ryan Blue wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java,
> >  line 53
> > 
> >
> > This is a little odd. Normally, I would consider GMT/UTC to be the 
> > calendar that applies no adjustment. But what is happening in this code is 
> > that the UTC millisecond value is being constructed using values in that 
> > zone and then a new timestamp is created from it that is in the local zone. 
> > So local->local produces no adjustment, while GMT->local does.
> > 
> > I was thinking that GMT would always be used to create the 
> > milliseconds, then the calendar would be used to adjust the value by some 
> > amount. UTC by 0, EST by -5, and PST by -8.
> > 
> > I think it may be easier to have a method that does a single conversion 
> > to UTC in milliseconds. Then, adjust the value using a static offset like 
> > the one from `TimeZone.getOffset(utc_ms)`. That would result in a bit less 
> > work done here and I think would make the adjustment easier to reason about.

I can fix this in a follow JIRA to keep the current code without adding extra 
bugs or tests.


> On Jan. 22, 2016, 11:15 p.m., Ryan Blue wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java,
> >  line 232
> > 
> >
> > It looks like this tests that the value written is the result of 
> > `getNanoTime(Timestamp,Calendar).toBinary()`, but doesn't test what that 
> > value is or should be. That is probably the intent, but I wanted to make 
> > sure.

It is testing that the Binary from of the Timestamp passed to addBinary() is 
the same of the one found on the ArrayWritable.

This is the ArrayWritable, the one that Hive makes when writing to Parquet:
ArrayWritable hiveRecord = createGroup(
  createTimestamp(Timestamp.valueOf("2016-01-01 01:01:01"))
);

DataWritableWriter then gets the Timestamp value, and converts it to a Binary 
form. Here we convert the same Timestamp to Binary
to check that the passed value to addBinary() matches.
startMessage();
  startField("ts", 0);
addBinary(NanoTimeUtils.getNanoTime(Timestamp.valueOf("2016-01-01 
01:01:01"),
Calendar.getInstance(TimeZone.getTimeZone("CST"))).toBinary());
  endField("ts", 0);
endMessage();


> On Jan. 22, 2016, 11:15 p.m., Ryan Blue wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java,
> >  line 214
> > 
> >
> > This is a great test for round trip, but I think we should also have a 
> > test with a set of corrupt values from a file written with 5.5 in 
> > America/Los_Angeles (since that's convenient).

I added the negative tests to parquet_int96_timestamp_errors.q


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/#review115731
---


On Jan. 13, 2016, 8:36 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41821/
> ---
> 
> (Updated Jan. 13, 2016, 8:36 p.m.)
> 
> 
> Review request for hive, Ryan Blue, Mohammad Islam, Reuben Kuhnert, and 
> Szehon Ho.
> 
> 
> Bugs: HIVE-12767
> https://issues.apache.org/jira/browse/HIVE-12767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The following exit criteria is addressed in this patch:
> 
> * Hive will read Parquet MR int96 timestamp data and adjust values using a 
> time zone from a table property, if set, or using the local time zone if it 
> is absent. No adjustment will be applied to data written by Impala.
> 
> * Hive will write Parquet int96 timestamps using a time zone adjustment from 
> the same table property, if set, or using the local time zone if it is 
> absent. This keeps the data in the table consistent.
> 
> * New tables created by Hive will set the table property to UTC if the global 
> option to set the property for new tables is enabled.
>   * Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
> the property unless the global setting to do so is enabled.
>   * Tables created using CREATE TABLE LIKE  will copy the 
> property of the table that is copied.
> 
> To set the timezone table property, use this:
>   create table tbl1 (ts timestamp) stored as parquet tblproperties 
> ('parquet.mr.int96.write.zone'='PST');
>   
> To set UTC as default timezone table property on new tables created, use 
> this: 
>   set parquet.mr.int96.enable.utc.write.zone=true;
>   create table tbl2 

Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-28 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/
---

(Updated Jan. 28, 2016, 9:01 p.m.)


Review request for hive, Ryan Blue, Mohammad Islam, Reuben Kuhnert, and Szehon 
Ho.


Bugs: HIVE-12767
https://issues.apache.org/jira/browse/HIVE-12767


Repository: hive-git


Description
---

The following exit criteria is addressed in this patch:

* Hive will read Parquet MR int96 timestamp data and adjust values using a time 
zone from a table property, if set, or using the local time zone if it is 
absent. No adjustment will be applied to data written by Impala.

* Hive will write Parquet int96 timestamps using a time zone adjustment from 
the same table property, if set, or using the local time zone if it is absent. 
This keeps the data in the table consistent.

* New tables created by Hive will set the table property to UTC if the global 
option to set the property for new tables is enabled.
  * Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
the property unless the global setting to do so is enabled.
  * Tables created using CREATE TABLE LIKE  will copy the property 
of the table that is copied.

To set the timezone table property, use this:
  create table tbl1 (ts timestamp) stored as parquet tblproperties 
('parquet.mr.int96.write.zone'='PST');
  
To set UTC as default timezone table property on new tables created, use this: 
  set parquet.mr.int96.enable.utc.write.zone=true;
  create table tbl2 (ts timestamp) stored as parquet;


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
bfd88f82ee86740f60baed122ca050dd542d637c 
  data/files/impala_int96_timestamp.parq PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
2e4591313f38d9bd1bfe1b54968c835fdcd53717 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
bfb48a987ce89a373f3da63c9162546c6eda43a9 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
ec0dd818f688ab92feb46be4fb6040ede5ac756a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 53f3b72b790d87a75a7cd1d77d8f011c29c41188 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 74a1a82047613189678716f765bfaa9ac39b7618 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
aace48ee7d145d199163286d21e4ee7694140d6f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
69272dc41dbc5fe29ab4c98e730b591c28f3a297 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
8c880c39823a9279e32ee191d9f250ea0014888f 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
70491390ba2b90f32ef9963be7b19e57672241f3 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
  ql/src/test/queries/clientnegative/parquet_int96_timestamp_errors.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
  ql/src/test/results/clientnegative/parquet_int96_timestamp_errors.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/41821/diff/


Testing
---

Added unit and q-tests:
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q


Thanks,

Sergio Pena



Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-22 Thread Ryan Blue

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/#review115731
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
(line 139)


One of the test cases listed in the scope doc is when the zone ID is 
unknown, like "Antarctica/South_Pole", that isn't valid. In that case the table 
should not be readable because the correct zone offset can't be determined.

`TimeZone.getTimeZone(String)` will return GMT when the zone id isn't 
recognized, so I'm concerned that this will do the wrong thing and use GMT in 
some cases where the zone should be rejected.

See 
https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html#getTimeZone(java.lang.String)
You might need to make a set of available IDs using 
https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html#getAvailableIDs()



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
(line 168)


It looks like this removes support for 
`HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION.varname`. I think we 
should maintain support for that configuration setting since customers may 
already be using it and we don't want to change behavior. If it is set to true, 
it should cause the calendar to use UTC (so no adjustment is made). Howevre, 
the new table property should override the Hive setting. So I think the check 
should be in the Strings.isNullOrEmpty case.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 (line 57)


Nit: We use UTC elsewhere. I believe that they are identical for our 
purposes, but I want to reduce confusion and would recommend UTC to GMT.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 (line 211)


The file metadata is also passed in [via 
InitContext](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L173).
 If the file has the write zone property set, then it should be validated 
against the configured write zone (if set). When present, the file's value 
should override the table value and there should be a warning if the table 
value doesn't match the file value. That covers cases where files are moved 
from one table to another.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 (line 161)


Shouldn't this use UTC, which will apply an offset of 0?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 (line 165)


When will the zone property be set before this method? Is this how the 
table property is passed in?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
(line 21)


I think calling this the default write zone is not quite correct. It is the 
default zone when the Hive config property is set, but the current zone is the 
default otherwise. What about calling this the PARQUET_INT96_NO_ADJUSTMENT_ZONE?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
(line 53)


This is a little odd. Normally, I would consider GMT/UTC to be the calendar 
that applies no adjustment. But what is happening in this code is that the UTC 
millisecond value is being constructed using values in that zone and then a new 
timestamp is created from it that is in the local zone. So local->local 
produces no adjustment, while GMT->local does.

I was thinking that GMT would always be used to create the milliseconds, 
then the calendar would be used to adjust the value by some amount. UTC by 0, 
EST by -5, and PST by -8.

I think it may be easier to have a method that does a single conversion to 
UTC in milliseconds. Then, adjust the value using a static offset like the one 
from `TimeZone.getOffset(utc_ms)`. That would result in a bit less work done 
here and I think would make the adjustment easier to reason about.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 (line 35)


What is the purpose of this property? Is this just a way to pass the time 
zone? Why not use the property used elsewhere?



ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
(line 227)


It looks like this tests that 

Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-13 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/
---

(Updated Jan. 13, 2016, 8:36 p.m.)


Review request for hive, Ryan Blue, Mohammad Islam, Reuben Kuhnert, and Szehon 
Ho.


Bugs: HIVE-12767
https://issues.apache.org/jira/browse/HIVE-12767


Repository: hive-git


Description
---

The following exit criteria is addressed in this patch:

* Hive will read Parquet MR int96 timestamp data and adjust values using a time 
zone from a table property, if set, or using the local time zone if it is 
absent. No adjustment will be applied to data written by Impala.

* Hive will write Parquet int96 timestamps using a time zone adjustment from 
the same table property, if set, or using the local time zone if it is absent. 
This keeps the data in the table consistent.

* New tables created by Hive will set the table property to UTC if the global 
option to set the property for new tables is enabled.
  * Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
the property unless the global setting to do so is enabled.
  * Tables created using CREATE TABLE LIKE  will copy the property 
of the table that is copied.

To set the timezone table property, use this:
  create table tbl1 (ts timestamp) stored as parquet tblproperties 
('parquet.mr.int96.write.zone'='PST');
  
To set UTC as default timezone table property on new tables created, use this: 
  set parquet.mr.int96.enable.utc.write.zone=true;
  create table tbl2 (ts timestamp) stored as parquet;


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
1bcdc5f49e1a4a0f357842e88cf5fd359685b5ef 
  data/files/impala_int96_timestamp.parq PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
deec8bba45c130c5dfdc482522c0825a71af9d2c 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
bfb48a987ce89a373f3da63c9162546c6eda43a9 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
ec0dd818f688ab92feb46be4fb6040ede5ac756a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 53f3b72b790d87a75a7cd1d77d8f011c29c41188 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 74a1a82047613189678716f765bfaa9ac39b7618 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
aace48ee7d145d199163286d21e4ee7694140d6f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
69272dc41dbc5fe29ab4c98e730b591c28f3a297 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
70491390ba2b90f32ef9963be7b19e57672241f3 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/41821/diff/


Testing
---

Added unit and q-tests:
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q


Thanks,

Sergio Pena



Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-02 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/
---

(Updated Jan. 3, 2016, 12:36 a.m.)


Review request for hive, Ryan Blue, Mohammad Islam, Reuben Kuhnert, and Szehon 
Ho.


Changes
---

Rebase latest changes from master.


Summary (updated)
-

HIVE-12767: Implement table property to address Parquet int96 timestamp bug


Bugs: HIVE-12767
https://issues.apache.org/jira/browse/HIVE-12767


Repository: hive-git


Description
---

The following exit criteria is addressed in this patch:

* Hive will read Parquet MR int96 timestamp data and adjust values using a time 
zone from a table property, if set, or using the local time zone if it is 
absent. No adjustment will be applied to data written by Impala.

* Hive will write Parquet int96 timestamps using a time zone adjustment from 
the same table property, if set, or using the local time zone if it is absent. 
This keeps the data in the table consistent.

* New tables created by Hive will set the table property to UTC if the global 
option to set the property for new tables is enabled.
  * Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
the property unless the global setting to do so is enabled.
  * Tables created using CREATE TABLE LIKE  will copy the property 
of the table that is copied.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
591c0ab097f1f384e287876fc25fd24c09006dfb 
  data/files/impala_int96_timestamp.parq PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
ac0ecd9c31c0e84b8c4628c86b0fdab7d1842f28 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
bfb48a987ce89a373f3da63c9162546c6eda43a9 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
ec0dd818f688ab92feb46be4fb6040ede5ac756a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 53f3b72b790d87a75a7cd1d77d8f011c29c41188 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 74a1a82047613189678716f765bfaa9ac39b7618 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
aace48ee7d145d199163286d21e4ee7694140d6f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
69272dc41dbc5fe29ab4c98e730b591c28f3a297 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
70491390ba2b90f32ef9963be7b19e57672241f3 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/41821/diff/


Testing
---

Added unit and q-tests:
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q


Thanks,

Sergio Pena



Re: Review Request 41821: HIVE-12767: Implement table property to address Parquet int96 timestamp bug

2016-01-02 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41821/
---

(Updated Jan. 3, 2016, 12:38 a.m.)


Review request for hive, Ryan Blue, Mohammad Islam, Reuben Kuhnert, and Szehon 
Ho.


Bugs: HIVE-12767
https://issues.apache.org/jira/browse/HIVE-12767


Repository: hive-git


Description (updated)
---

The following exit criteria is addressed in this patch:

* Hive will read Parquet MR int96 timestamp data and adjust values using a time 
zone from a table property, if set, or using the local time zone if it is 
absent. No adjustment will be applied to data written by Impala.

* Hive will write Parquet int96 timestamps using a time zone adjustment from 
the same table property, if set, or using the local time zone if it is absent. 
This keeps the data in the table consistent.

* New tables created by Hive will set the table property to UTC if the global 
option to set the property for new tables is enabled.
  * Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
the property unless the global setting to do so is enabled.
  * Tables created using CREATE TABLE LIKE  will copy the property 
of the table that is copied.

To set the timezone table property, use this:
  create table tbl1 (ts timestamp) stored as parquet tblproperties 
('parquet.mr.int96.write.zone'='PST');
  
To set UTC as default timezone table property on new tables created, use this: 
  set parquet.mr.int96.enable.utc.write.zone=true;
  create table tbl2 (ts timestamp) stored as parquet;


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
591c0ab097f1f384e287876fc25fd24c09006dfb 
  data/files/impala_int96_timestamp.parq PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
ac0ecd9c31c0e84b8c4628c86b0fdab7d1842f28 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
bfb48a987ce89a373f3da63c9162546c6eda43a9 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
ec0dd818f688ab92feb46be4fb6040ede5ac756a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 53f3b72b790d87a75a7cd1d77d8f011c29c41188 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 74a1a82047613189678716f765bfaa9ac39b7618 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
aace48ee7d145d199163286d21e4ee7694140d6f 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
69272dc41dbc5fe29ab4c98e730b591c28f3a297 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
70491390ba2b90f32ef9963be7b19e57672241f3 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/41821/diff/


Testing
---

Added unit and q-tests:
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestParquetTimestampConverter.java
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q


Thanks,

Sergio Pena