Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 15: Code-Review-2

(1 comment)

I was reading through this again to think about rebasing on it and I think 
spotted a serious bug. I'll -2 for now to prevent merging.

http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645
PS15, Line 1645: Status HdfsParquetScanner::InitColumns(
I think there's a bug here with nested collections in files with multiple row 
groups where the CollectionColumnReaders don't get Reset() for each row group. 
AFAIK we don't have any test files like this.

CollectionColumnReader::Reset() needs to be called on each 
CollectionColumnReader to reset some internal state, which is done via 
InitColumns(), but if the CollectionColumnReader (or an ancestor) is not in 
dict_filterable_columns_ or non_dict_filterable_columns_, then this doesn't 
happen.



-- 
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 15
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Jan 2018 00:14:55 +0000
Gerrit-HasComments: Yes

Reply via email to