----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65478/#review197313 -----------------------------------------------------------
Fix it, then Ship it! Overall the patch looks good to me. Some comments below. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java Line 65 (original), 69 (patched) <https://reviews.apache.org/r/65478/#comment277453> can we rename this member to something more easily understandable like dictionaryBasedReader? The name of variable suggests its Dictionary but in fact its a DataColumnReader ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java Lines 57 (patched) <https://reviews.apache.org/r/65478/#comment277459> nit, some of the methods are missing the @Override ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java Line 156 (original), 159 (patched) <https://reviews.apache.org/r/65478/#comment277456> why do we need to remove this check and return 1 or 0? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java Line 175 (original), 179 (patched) <https://reviews.apache.org/r/65478/#comment277461> this method has duplicate code, from VectorizedPrimitiveColumnReader#decodeDictionaryIds. Can we refactor it to remove duplicate code. I think its okay if we do it in a separate change. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java Lines 89 (patched) <https://reviews.apache.org/r/65478/#comment277462> previously, this reader didn't support INTERVAL_DAY_TIME. Does this mean that it supports this hive type? If yes, shouldn't this change in decodeDictionaryIds method too? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java Lines 253-254 (original), 255-256 (patched) <https://reviews.apache.org/r/65478/#comment277465> before the patch, the code assumes that the ColumnReader's type matches with what the hive's type is. So these two lines made sense. But after the patch, it is possible that the underlying parquet type does not have decimalMetadata so it these lines will throw a NPE. Instead of throwing NPE, we should check if decimalMetadata is set and throw IOException here saying that underlying type cannot be converted to decimal. What do you think? Also, may be we should add a TODO here to use default precision and scale if its not available for the future. - Vihang Karajgaonkar On Feb. 12, 2018, 8:25 a.m., cheng xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65478/ > ----------------------------------------------------------- > > (Updated Feb. 12, 2018, 8:25 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > ------- > > VectorizedParquetReader throws an exception when trying to reading from a > parquet table on which new columns are added. > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java > 907a9b8 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java > c36640d > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java > 08ac57b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java > 39689f1 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java > 9e414dc > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java > 3e5d831 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java > 5d3ebd6 > > ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q > PRE-CREATION > > ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q > PRE-CREATION > ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out > PRE-CREATION > > > Diff: https://reviews.apache.org/r/65478/diff/4/ > > > Testing > ------- > > Newly added UT passed and qtest passed locally. > > > Thanks, > > cheng xu > >