Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-08 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review174264
---


Ship it!




Ship It!

- cheng xu


On May 4, 2017, 6:19 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 4, 2017, 6:19 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 6413c5add6db2e8c9298285b15dba33ee74379a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
>  1e10dbf18742524982606f1e6c6d447d683b2dc3 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/5/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-08 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review174184
---


Ship it!




Ship It!

- Sergio Pena


On May 4, 2017, 10:19 a.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 4, 2017, 10:19 a.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 6413c5add6db2e8c9298285b15dba33ee74379a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
>  1e10dbf18742524982606f1e6c6d447d683b2dc3 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/5/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-04 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/
---

(Updated May 4, 2017, 10:19 a.m.)


Review request for hive, Sergio Pena and Zoltan Ivanfi.


Changes
---

Updated based on the comments. Thanks Sergio Pena, Vihang Karajgaonkar, Cheng 
Xu for the reviews.


Bugs: HIVE-16469
https://issues.apache.org/jira/browse/HIVE-16469


Repository: hive-git


Description
---

HIVE-16469: Parquet timestamp table property is not always taken into account


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
13750cdc34711d22f2adf2f483a6773ad05fb8d2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
f4fadbb61bf45f62945700284c0b050f0984b696 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
6413c5add6db2e8c9298285b15dba33ee74379a8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
b339cc4347eea143dca2f6d98f9aaafdc427 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 
c81499a91c84af3ba33f335506c1c44e7085f13d 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java 
bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
 1e10dbf18742524982606f1e6c6d447d683b2dc3 
  ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
6eadd1b0a3313cbba7a798890b802baae302749e 
  ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
PRE-CREATION 
  
ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
b9a3664458a83f1856e4bc59eba5d56665df61cc 
  ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/58501/diff/5/

Changes: https://reviews.apache.org/r/58501/diff/4-5/


Testing
---

Added qtests for the following cases:
- order by clause
- selfjoin
- calling UDFs with the timestamp values
- where clause with a constant cast as timestamp
- test for HoS
- implicit and explicit timestamp conversions in insert clause

Tested manually but no qtests:
- join between 3 tables all parquet but with different/no timezone property
- subselect in from/where clauses
- exists / union / no exists


Thanks,

Barna Zsombor Klara



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-04 Thread Barna Zsombor Klara


> On May 3, 2017, 5 p.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
> > Lines 115-120 (patched)
> > 
> >
> > Should logs here be warning?

I'm not expecting the exceptions, but good point a warning is probably better.


- Barna Zsombor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173747
---


On May 3, 2017, 12:59 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 3, 2017, 12:59 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 6413c5add6db2e8c9298285b15dba33ee74379a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
>  1e10dbf18742524982606f1e6c6d447d683b2dc3 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/4/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-04 Thread Barna Zsombor Klara


> On May 4, 2017, 1:53 a.m., cheng xu wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
> > Lines 372 (patched)
> > 
> >
> > Can we check the format type to see whether it's Parquet format?

Ahh very good point, thanks for spotting this. We shouldn't polute other tables 
with our property.


> On May 4, 2017, 1:53 a.m., cheng xu wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
> > Line 181 (original), 181 (patched)
> > 
> >
> > Why not passing in the default value here when 
> > PARQUET_INT96_WRITE_ZONE_PROPERTY is not set?

I would prefer not to use the default value here. I want to make sure the 
RecordReader cannot be used without having checked the TimeZone property, so 
the default value is only set in the 
ParquetTableUtils#setParquetTimeZoneIfAbsent where I have the full list of 
table properties. If we use the default value in a case where we forgot to 
check the table property, we may end up reading incorrect timestamp values 
silently from the table.


- Barna Zsombor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173857
---


On May 3, 2017, 12:59 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 3, 2017, 12:59 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 6413c5add6db2e8c9298285b15dba33ee74379a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
>  1e10dbf18742524982606f1e6c6d447d683b2dc3 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/4/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-03 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173857
---




ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
Lines 372 (patched)


Can we check the format type to see whether it's Parquet format?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
Lines 101 (patched)


Can we rename this method like propagateParquetTimeZoneTablePorperty? 
Ususally method with prefix "check" should not have side-effect.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
Line 181 (original), 181 (patched)


Why not passing in the default value here when 
PARQUET_INT96_WRITE_ZONE_PROPERTY is not set?


- cheng xu


On May 3, 2017, 8:59 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 3, 2017, 8:59 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 6413c5add6db2e8c9298285b15dba33ee74379a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
>  1e10dbf18742524982606f1e6c6d447d683b2dc3 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/4/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-03 Thread Vihang Karajgaonkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173747
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
Lines 115-120 (patched)


Should logs here be warning?


- Vihang Karajgaonkar


On May 3, 2017, 12:59 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 3, 2017, 12:59 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 6413c5add6db2e8c9298285b15dba33ee74379a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
>  1e10dbf18742524982606f1e6c6d447d683b2dc3 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/4/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-03 Thread Barna Zsombor Klara


> On May 2, 2017, 5:27 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
> > Line 72 (original), 72 (patched)
> > 
> >
> > How does this work? I don't understand this change.

The user.timezone system property is used to set the default timezone of the 
JVM. If this is set on the HS2 instance then we need to propagate it to the 
child VM spawned by a local task or timestamps read by the local task will be 
incorrect.


> On May 2, 2017, 5:27 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
> > Line 181 (original), 181 (patched)
> > 
> >
> > Is this compatible with old parquet tables? if the property is not set, 
> > then the validateTimeZonemight fail, right? If so, do we want to fail 
> > reading tables that do not have a property set?
> > 
> > Something else to consider, if a user sets a timezone improperly in a 
> > different tool or something  happened that we got an invalid timezone, 
> > then do we want to fail when reading those files? Just  wondering this 
> > scenario, no need to fix it right away.

At this point the timezone property had to be set by 
ParquetTableUtils#setParquetTimeZoneIfAbsent either from the table properties 
or using the default value TimeZone#getDefault. The core problem is that I 
found it very difficult to make sure that  execution path will check the 
table property.
- The FetchOperator works when we have a local task, but the 
MapRedParquetInputFormat does not (MapWork is null). 
- The FetchOperator will not work with a complex query or an order by clause, 
but the InputFormat should work in this case. 
- For statistics gathering only the StatNoJobTask is executed.
I wanted to make sure that if we have an execution path I forgot about, then we 
should rather fail than to read incorrect timestamp values silently.
Similarly in my opinion if the timezone value is incorrect (because it was set 
by another tool) then we should fail instead of reading illadjusted values.


> On May 2, 2017, 5:27 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java
> > Lines 35 (patched)
> > 
> >
> > Why is Map used instead of Map? Aren't all table 
> > properties key, value string pairs?
> > 
> > Also, the ensureTablePropertySet() name seems not related to what we 
> > want to do. I thought it was going to throw an exception if the property 
> > was not set, but it is setting the value on the JobConf. Should we use a 
> > different name, such as setParquetTimeZoneIfNotSet(),  
> > setParquetTimeZoneIfAbsent() or something like that helps us understand 
> > quickly without looking at the javadoc.

We are calling this method with Properties objects (i.e. from the 
FetchOperator) and using Map objects (i.e. from the 
StatsNoJobTask) and the common ancestor for these two is the Map. While it 
is true that the table properties can only be Strings so the Properties should 
only contain String pairs I wanted to avoid the explicit cast.


- Barna Zsombor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173610
---


On May 3, 2017, 12:59 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated May 3, 2017, 12:59 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 13750cdc34711d22f2adf2f483a6773ad05fb8d2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> 

Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-03 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/
---

(Updated May 3, 2017, 12:59 p.m.)


Review request for hive, Sergio Pena and Zoltan Ivanfi.


Changes
---

Updated based on comments.


Bugs: HIVE-16469
https://issues.apache.org/jira/browse/HIVE-16469


Repository: hive-git


Description
---

HIVE-16469: Parquet timestamp table property is not always taken into account


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
757b7fc0eaa39c956014aa446ab1b07fc4abf8d3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
13750cdc34711d22f2adf2f483a6773ad05fb8d2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
f4fadbb61bf45f62945700284c0b050f0984b696 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
6413c5add6db2e8c9298285b15dba33ee74379a8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
b339cc4347eea143dca2f6d98f9aaafdc427 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 
c81499a91c84af3ba33f335506c1c44e7085f13d 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java 
bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java
 1e10dbf18742524982606f1e6c6d447d683b2dc3 
  ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
6eadd1b0a3313cbba7a798890b802baae302749e 
  ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
PRE-CREATION 
  
ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
b9a3664458a83f1856e4bc59eba5d56665df61cc 
  ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/58501/diff/4/

Changes: https://reviews.apache.org/r/58501/diff/3-4/


Testing
---

Added qtests for the following cases:
- order by clause
- selfjoin
- calling UDFs with the timestamp values
- where clause with a constant cast as timestamp
- test for HoS
- implicit and explicit timestamp conversions in insert clause

Tested manually but no qtests:
- join between 3 tables all parquet but with different/no timezone property
- subselect in from/where clauses
- exists / union / no exists


Thanks,

Barna Zsombor Klara



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-05-02 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173610
---




ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
Lines 333 (patched)


If getNextSplits() already sets the table property if it is not set, why 
are we doing it again here?



ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java
Lines 262 (patched)


I've seen this check a few times on the code. Shouldn't be good to create a 
static method that wraps this check? like 
ParquetHiveSerDe.isParquetTable(table)?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
Line 72 (original), 72 (patched)


How does this work? I don't understand this change.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
Lines 72 (patched)


Can this new code be wrapped in another method?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
Line 181 (original), 181 (patched)


Is this compatible with old parquet tables? if the property is not set, 
then the validateTimeZonemight fail, right? If so, do we want to fail 
reading tables that do not have a property set?

Something else to consider, if a user sets a timezone improperly in a 
different tool or something  happened that we got an invalid timezone, then 
do we want to fail when reading those files? Just  wondering this scenario, 
no need to fix it right away.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java
Lines 32 (patched)


Could you write the information about parameters?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java
Lines 35 (patched)


Why is Map used instead of Map? Aren't all table 
properties key, value string pairs?

Also, the ensureTablePropertySet() name seems not related to what we want 
to do. I thought it was going to throw an exception if the property was not 
set, but it is setting the value on the JobConf. Should we use a different 
name, such as setParquetTimeZoneIfNotSet(),  setParquetTimeZoneIfAbsent() 
or something like that helps us understand quickly without looking at the 
javadoc.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java
Lines 168 (patched)


Shouldn't we throw an IllegalArgumentException here? The same for line 174?

I think it makes more sense to use the above exception when arguments are 
not valid.



ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
Line 24 (original), 24 (patched)


Keep the standard here. Let's import each class instead of all.


- Sergio Pena


On April 20, 2017, 2:11 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated April 20, 2017, 2:11 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 004bb2f60299a0635b8f9ca7649ead00b8e16d08 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  

Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-04-20 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/
---

(Updated April 20, 2017, 2:11 p.m.)


Review request for hive, Sergio Pena and Zoltan Ivanfi.


Changes
---

Fixed failing tests.


Bugs: HIVE-16469
https://issues.apache.org/jira/browse/HIVE-16469


Repository: hive-git


Description
---

HIVE-16469: Parquet timestamp table property is not always taken into account


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
004bb2f60299a0635b8f9ca7649ead00b8e16d08 
  ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
f4fadbb61bf45f62945700284c0b050f0984b696 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
b339cc4347eea143dca2f6d98f9aaafdc427 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 
c81499a91c84af3ba33f335506c1c44e7085f13d 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java 
bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
  ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
6eadd1b0a3313cbba7a798890b802baae302749e 
  ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
PRE-CREATION 
  
ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
b9a3664458a83f1856e4bc59eba5d56665df61cc 
  ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/58501/diff/3/

Changes: https://reviews.apache.org/r/58501/diff/2-3/


Testing
---

Added qtests for the following cases:
- order by clause
- selfjoin
- calling UDFs with the timestamp values
- where clause with a constant cast as timestamp
- test for HoS
- implicit and explicit timestamp conversions in insert clause

Tested manually but no qtests:
- join between 3 tables all parquet but with different/no timezone property
- subselect in from/where clauses
- exists / union / no exists


Thanks,

Barna Zsombor Klara



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-04-19 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/
---

(Updated April 19, 2017, 3:37 p.m.)


Review request for hive, Sergio Pena and Zoltan Ivanfi.


Changes
---

Added another qtest and fixed typos.


Bugs: HIVE-16469
https://issues.apache.org/jira/browse/HIVE-16469


Repository: hive-git


Description
---

HIVE-16469: Parquet timestamp table property is not always taken into account


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
004bb2f60299a0635b8f9ca7649ead00b8e16d08 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
f4fadbb61bf45f62945700284c0b050f0984b696 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
b339cc4347eea143dca2f6d98f9aaafdc427 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
  ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
6eadd1b0a3313cbba7a798890b802baae302749e 
  ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
PRE-CREATION 
  
ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
b9a3664458a83f1856e4bc59eba5d56665df61cc 
  ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/58501/diff/2/

Changes: https://reviews.apache.org/r/58501/diff/1-2/


Testing (updated)
---

Added qtests for the following cases:
- order by clause
- selfjoin
- calling UDFs with the timestamp values
- where clause with a constant cast as timestamp
- test for HoS
- implicit and explicit timestamp conversions in insert clause

Tested manually but no qtests:
- join between 3 tables all parquet but with different/no timezone property
- subselect in from/where clauses
- exists / union / no exists


Thanks,

Barna Zsombor Klara



Re: Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-04-18 Thread Zoltan Ivanfi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review172193
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
Lines 86 (patched)


Nit: You should put a space before "because" as well, both here and 4 lines 
below.


- Zoltan Ivanfi


On April 18, 2017, 1:14 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> ---
> 
> (Updated April 18, 2017, 1:14 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
> https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 004bb2f60299a0635b8f9ca7649ead00b8e16d08 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aaafdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/1/
> 
> 
> Testing
> ---
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Review Request 58501: HIVE-16469: Parquet timestamp table property is not always taken into account

2017-04-18 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/
---

Review request for hive and Sergio Pena.


Bugs: HIVE-16469
https://issues.apache.org/jira/browse/HIVE-16469


Repository: hive-git


Description
---

HIVE-16469: Parquet timestamp table property is not always taken into account


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
004bb2f60299a0635b8f9ca7649ead00b8e16d08 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
f4fadbb61bf45f62945700284c0b050f0984b696 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
b339cc4347eea143dca2f6d98f9aaafdc427 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
  ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
6eadd1b0a3313cbba7a798890b802baae302749e 
  ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
PRE-CREATION 
  
ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
b9a3664458a83f1856e4bc59eba5d56665df61cc 
  ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/58501/diff/1/


Testing
---

Added qtests for the following cases:
- order by clause
- selfjoin
- calling UDFs with the timestamp values
- where clause with a constant cast as timestamp
- test for HoS

Tested manually but no qtests:
- join between 3 tables all parquet but with different/no timezone property
- subselect in from/where clauses
- exists / union / no exists


Thanks,

Barna Zsombor Klara