Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. ......................................................................
Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: PS8, Line 65: ProcessSplit() will issue the files' scan ranges : // and those ranges will need scanner threads, so no files are marked completed yet. > hmm, is that stale now? i guess technically not since this now happens in G Good catch. I think this is misleading. Changed to GetNextInternal() http://gerrit.cloudera.org:8080/#/c/6527/8/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: PS8, Line 133: ProcessSplit > what's the deal with making this non-pure? oh, I guess (most) scanners now Happy to address this, but let's discuss approaches first. Options: * add a new virtual function that is a no-op for all scanners except parquet where we do a runtime filter check * move the runtime filter stats and related functions like HdfsParquetScanner::CheckFiltersEffectiveness() into HdfsScanner and just do the runtime filter check for all scanners even though they are useless for non-parquet * other ideas? -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: anujphadke <apha...@cloudera.com> Gerrit-HasComments: Yes