Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )
Change subject: IMPALA-4993: extend dictionary filtering to collections ...................................................................... Patch Set 15: Code-Review-2 (1 comment) I was reading through this again to think about rebasing on it and I think spotted a serious bug. I'll -2 for now to prevent merging. http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645 PS15, Line 1645: Status HdfsParquetScanner::InitColumns( I think there's a bug here with nested collections in files with multiple row groups where the CollectionColumnReaders don't get Reset() for each row group. AFAIK we don't have any test files like this. CollectionColumnReader::Reset() needs to be called on each CollectionColumnReader to reset some internal state, which is done via InitColumns(), but if the CollectionColumnReader (or an ancestor) is not in dict_filterable_columns_ or non_dict_filterable_columns_, then this doesn't happen. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 15 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Sat, 13 Jan 2018 00:14:55 +0000 Gerrit-HasComments: Yes