Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

Vihang Karajgaonkar via Review Board Mon, 12 Feb 2018 16:50:53 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review197313
-----------------------------------------------------------



Fix it, then Ship it!




Overall the patch looks good to me. Some comments below.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
Line 65 (original), 69 (patched)
<https://reviews.apache.org/r/65478/#comment277453>

    can we rename this member to something more easily understandable like 
dictionaryBasedReader? The name of variable suggests its Dictionary but in fact 
its a DataColumnReader



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
Lines 57 (patched)
<https://reviews.apache.org/r/65478/#comment277459>

    nit, some of the methods are missing the @Override



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
Line 156 (original), 159 (patched)
<https://reviews.apache.org/r/65478/#comment277456>

    why do we need to remove this check and return 1 or 0?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
Line 175 (original), 179 (patched)
<https://reviews.apache.org/r/65478/#comment277461>

    this method has duplicate code, from 
VectorizedPrimitiveColumnReader#decodeDictionaryIds. Can we refactor it to 
remove duplicate code. I think its okay if we do it in a separate change.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
Lines 89 (patched)
<https://reviews.apache.org/r/65478/#comment277462>

    previously, this reader didn't support INTERVAL_DAY_TIME. Does this mean 
that it supports this hive type? If yes, shouldn't this change in 
decodeDictionaryIds method too?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
Lines 253-254 (original), 255-256 (patched)
<https://reviews.apache.org/r/65478/#comment277465>

    before the patch, the code assumes that the ColumnReader's type matches 
with what the hive's type is. So these two lines made sense. But after the 
patch, it is possible that the underlying parquet type does not have 
decimalMetadata so it these lines will throw a NPE. Instead of throwing NPE, we 
should check if decimalMetadata is set and throw IOException here saying that 
underlying type cannot be converted to decimal. What do you think?
    
    Also, may be we should add a TODO here to use default precision and scale 
if its not available for the future.


- Vihang Karajgaonkar


On Feb. 12, 2018, 8:25 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 8:25 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
>  c36640d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  39689f1 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/4/
> 
> 
> Testing
> -------
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>

Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

Reply via email to