-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58501/#review173610
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
Lines 333 (patched)
<https://reviews.apache.org/r/58501/#comment246597>

    If getNextSplits() already sets the table property if it is not set, why 
are we doing it again here?



ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java
Lines 262 (patched)
<https://reviews.apache.org/r/58501/#comment246598>

    I've seen this check a few times on the code. Shouldn't be good to create a 
static method that wraps this check? like 
ParquetHiveSerDe.isParquetTable(table)?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
Line 72 (original), 72 (patched)
<https://reviews.apache.org/r/58501/#comment246603>

    How does this work? I don't understand this change.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
Lines 72 (patched)
<https://reviews.apache.org/r/58501/#comment246601>

    Can this new code be wrapped in another method?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
Line 181 (original), 181 (patched)
<https://reviews.apache.org/r/58501/#comment246600>

    Is this compatible with old parquet tables? if the property is not set, 
then the validateTimeZone    might fail, right? If so, do we want to fail 
reading tables that do not have a property set?
    
    Something else to consider, if a user sets a timezone improperly in a 
different tool or something      happened that we got an invalid timezone, then 
do we want to fail when reading those files? Just      wondering this scenario, 
no need to fix it right away.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java
Lines 32 (patched)
<https://reviews.apache.org/r/58501/#comment246595>

    Could you write the information about parameters?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java
Lines 35 (patched)
<https://reviews.apache.org/r/58501/#comment246596>

    Why is Map<?, ?> used instead of Map<String, String>? Aren't all table 
properties key, value string pairs?
    
    Also, the ensureTablePropertySet() name seems not related to what we want 
to do. I thought it was going to throw an exception if the property was not 
set, but it is setting the value on the JobConf. Should we use a different 
name, such as setParquetTimeZoneIfNotSet(),      setParquetTimeZoneIfAbsent() 
or something like that helps us understand quickly without looking at the 
javadoc.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java
Lines 168 (patched)
<https://reviews.apache.org/r/58501/#comment246594>

    Shouldn't we throw an IllegalArgumentException here? The same for line 174?
    
    I think it makes more sense to use the above exception when arguments are 
not valid.



ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
Line 24 (original), 24 (patched)
<https://reviews.apache.org/r/58501/#comment246599>

    Keep the standard here. Let's import each class instead of all.


- Sergio Pena


On April 20, 2017, 2:11 p.m., Barna Zsombor Klara wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58501/
> -----------------------------------------------------------
> 
> (Updated April 20, 2017, 2:11 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Zoltan Ivanfi.
> 
> 
> Bugs: HIVE-16469
>     https://issues.apache.org/jira/browse/HIVE-16469
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16469: Parquet timestamp table property is not always taken into account
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 917e565f28b2c9aaea18033ea3b6b20fa41fcd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 
> 004bb2f60299a0635b8f9ca7649ead00b8e16d08 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/StatsNoJobTask.java 
> 9c3a664b9aea2d6e050ffe2d7626127827dbc52a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
> 1bd4db7805689ae1f91921ffbb5ff7da59f4bf60 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  f4fadbb61bf45f62945700284c0b050f0984b696 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 2954601ce5bb25905cdb29ca0ca4551c2ca12b95 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java 
> b339cc4347eea143dca2f6d98f9aa7777afdc427 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 
> dbd6fb3d0bc8c753abf86e99b52377617f248b5a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  c81499a91c84af3ba33f335506c1c44e7085f13d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRowGroupFilter.java
>  bf363f32a3ac0a4d790e2925d802c6e210adfb4b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  f2d79cf9d215e9a6e2a5e88cfc78378be860fd1f 
>   ql/src/test/queries/clientnegative/parquet_int96_alter_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/parquet_int96_create_invalid_timezone.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q 
> 6eadd1b0a3313cbba7a798890b802baae302749e 
>   
> ql/src/test/results/clientnegative/parquet_int96_alter_invalid_timezone.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/parquet_int96_create_invalid_timezone.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out 
> b9a3664458a83f1856e4bc59eba5d56665df61cc 
>   ql/src/test/results/clientpositive/spark/parquet_int96_timestamp.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58501/diff/3/
> 
> 
> Testing
> -------
> 
> Added qtests for the following cases:
> - order by clause
> - selfjoin
> - calling UDFs with the timestamp values
> - where clause with a constant cast as timestamp
> - test for HoS
> - implicit and explicit timestamp conversions in insert clause
> 
> Tested manually but no qtests:
> - join between 3 tables all parquet but with different/no timezone property
> - subselect in from/where clauses
> - exists / union / no exists
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>

Reply via email to