Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
......................................................................


Patch Set 11:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644
PS8, Line 644:         // {min: 10, max: 20}, and query is 'select * from T 
where A = 8'.
> This is not the only case when it can happen. Another example is multi key
Thanks, yeah, so basically it can happen anytime when you have predicates 
against different columns, and the separate row ranges that pass the predicates 
don't have a common subset. I extended the comment.


http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@618
PS11, Line 618:     COUNTER_ADD(num_row_groups_counter_, 1);
> optional: This loop grew really large, > 100 lines, doesn't even fit to my
This change is already quite huge, I would rather not add unrelated 
refactorings.


http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/parquet-page-index.h
File be/src/exec/parquet/parquet-page-index.h:

http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/parquet-page-index.h@39
PS11, Line 39:   /// It reads the raw bytes of the whole page index and stores 
it in an
             :   /// internal buffer.
             :   /// It doesn't expect that the Page index in a particular 
layout, it only
             :   /// expects that the whole page index layed out continuously 
in the file.
             :   /// It needs to be called before the serialization methods.
             :   Status ReadAll();
> Can you mention that this reads the page index for all row groups? For me t
Changed this function to only read the indexes belonging to a single row group. 
Updated the comment.


http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/parquet-page-index.h@61
PS11, Line 61:   /// Common helper for deserialization
> nit: other member comments have . at the end
Done


http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/parquet-page-index.h@65
PS11, Line 65:   /// The scanner that created this object
> nit: other member comments have . at the end
Done


http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/parquet-page-index.cc
File be/src/exec/parquet/parquet-page-index.cc:

http://gerrit.cloudera.org:8080/#/c/12065/11/be/src/exec/parquet/parquet-page-index.cc@81
PS11, Line 81: page_index_buffer_.TryAllocate(size)
> Sorry for the wrong assumptions, I got lost somewhere in HdfsParquetScanner
Made this change, now we only read the indexes of a single row group, and after 
evaluating it we release the allocated buffer.



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Apr 2019 12:50:07 +0000
Gerrit-HasComments: Yes

Reply via email to