----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58501/#review173610 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java Lines 333 (patched) <https://reviews.apache.org/r/58501/#comment246597> If getNextSplits() already sets the table property if it is not set, why are we doing it again here? ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java Lines 262 (patched) <https://reviews.apache.org/r/58501/#comment246598> I've seen this check a few times on the code. Shouldn't be good to create a static method that wraps this check? like ParquetHiveSerDe.isParquetTable(table)? ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java Line 72 (original), 72 (patched) <https://reviews.apache.org/r/58501/#comment246603> How does this work? I don't understand this change. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java Lines 72 (patched) <https://reviews.apache.org/r/58501/#comment246601> Can this new code be wrapped in another method? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java Line 181 (original), 181 (patched) <https://reviews.apache.org/r/58501/#comment246600> Is this compatible with old parquet tables? if the property is not set, then the validateTimeZone might fail, right? If so, do we want to fail reading tables that do not have a property set? Something else to consider, if a user sets a timezone improperly in a different tool or something happened that we got an invalid timezone, then do we want to fail when reading those files? Just wondering this scenario, no need to fix it right away. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java Lines 32 (patched) <https://reviews.apache.org/r/58501/#comment246595> Could you write the information about parameters? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java Lines 35 (patched) <https://reviews.apache.org/r/58501/#comment246596> Why is Map<?, ?> used instead of Map<String, String>? Aren't all table properties key, value string pairs? Also, the ensureTablePropertySet() name seems not related to what we want to do. I thought it was going to throw an exception if the property was not set, but it is setting the value on the JobConf. Should we use a different name, such as setParquetTimeZoneIfNotSet(), setParquetTimeZoneIfAbsent() or something like that helps us understand quickly without looking at the javadoc. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java Lines 168 (patched) <https://reviews.apache.org/r/58501/#comment246594> Shouldn't we throw an IllegalArgumentException here? The same for line 174? I think it makes more sense to use the above exception when arguments are not valid. ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java Line 24 (original), 24 (patched) <https://reviews.apache.org/r/58501/#comment246599> Keep the standard here. Let's import each class instead of all. - Sergio Pena On April 20, 2017, 2:11 p.m., Barna Zsombor Klara wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58501/ > ----------------------------------------------------------- > > (Updated April 20, 2017, 2:11 p.m.) > > > Review request for hive, Sergio Pena and Zoltan Ivanfi. > > > Bugs: HIVE-16469 > https://issues.apache.org/jira/browse/HIVE-16469 > > > Repository: hive-git > > > Description > ------- > > HIVE-16469: Parquet timestamp table property is not always taken into account > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java > 917e565f28b2c9aaea18033ea3b6b20fa41fcd0a > ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java > 004bb2f60299a0635b8f9ca7649ead00b8e16d08 > ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java > 9c3a664b9aea2d6e050ffe2d7626127827dbc52a > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java > 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java > f4fadbb61bf45f62945700284c0b050f0984b696 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java > 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java > b339cc4347eea143dca2f6d98f9aa7777afdc427 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java > dbd6fb3d0bc8c753abf86e99b52377617f248b5a > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java > c81499a91c84af3ba33f335506c1c44e7085f13d > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java > bf363f32a3ac0a4d790e2925d802c6e210adfb4b > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java > f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f > ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q > PRE-CREATION > ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_int96_timestamp.q > 6eadd1b0a3313cbba7a798890b802baae302749e > > ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out > PRE-CREATION > > ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out > PRE-CREATION > ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out > b9a3664458a83f1856e4bc59eba5d56665df61cc > ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out > PRE-CREATION > > > Diff: https://reviews.apache.org/r/58501/diff/3/ > > > Testing > ------- > > Added qtests for the following cases: > - order by clause > - selfjoin > - calling UDFs with the timestamp values > - where clause with a constant cast as timestamp > - test for HoS > - implicit and explicit timestamp conversions in insert clause > > Tested manually but no qtests: > - join between 3 tables all parquet but with different/no timezone property > - subselect in from/where clauses > - exists / union / no exists > > > Thanks, > > Barna Zsombor Klara > >