Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13582 )

Change subject: IMPALA-8617: Add support for lz4 in parquet
......................................................................


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@97
PS6, Line 97: const string& column_name
style nit: take this by value, and then initialize the member variable as 
column_name_(std::move(column_name))


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@400
PS6, Line 400:   const string& column_name_;
nit: it's dangerous (and anti-style-guide) to store a reference as a member 
variable like this -- whoever passed in this string might well mutate or 
destroy the string leaving a dangling reference here. Typically we lean towards 
using pointers for cases like this, and documenting clearly in the constructor 
that there's a lifetime requirement for a passed-in raw pointer. In this case, 
though, it's not a hot path at all so I think you should just make this a 
'string' vs adding lifetime dependencies


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@773
PS6, Line 773: Parquet file '$0' column '$1' hit an error
nit: change this to something like "Error writing partquet file '$0' column 
'$1': $2" (so it's more obvious this is on the _write_ side rather than the 
read side


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@903
PS6, Line 903:       return Status(Substitute("Parquet file '$0' column '$1' 
hit an error. $2",
same


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@997
PS6, Line 997:   // Map parquet codecs to Impala codecs. Its important to do 
the above check before
             :   // we do any mapping.
             :   parquet::CompressionCodec::type parquet_codec = 
ConvertImpalaToParquetCodec(codec);
             :   codec = ConvertParquetToImpalaCodec(parquet_codec);
this code isn't very intuitive to me. What's going on here? We're mapping the 
impala codec to parquet, and then back to impala? Is this serving some kind of 
canonicalization? If so, can you add some comment explaining why we do this 
back-and-forth round-trip?


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/exec/parquet/parquet-column-readers.cc@1346
PS6, Line 1346: Parquet file '$0' column '$1' hit an error
nit: think this reads better as "Error decompressing parquet file '$0' column 
'$1': $2"

Is the offset of the page readily available? If not, no need to plumb it 
through a bunch of code, but if we can get it easily, I think that would aid in 
debugging if we ever hit this error


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/util/compress.cc
File be/src/util/compress.cc:

http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/util/compress.cc@349
PS6, Line 349:   int64_t length = -1;
nit: we tend towards the "early return" style instead of the "one return" 
style. You can get rid of this local variable and just change the assignments 
below on L355 and L360 to return statements, and then get rid of the 'else' on 
line 356 and 361, for a net reduction of several lines of code


http://gerrit.cloudera.org:8080/#/c/13582/6/be/src/util/compress.cc@383
PS6, Line 383:     CHECK(size > 0)
You can do:

CHECK_GT(size, 0) << "LZ4_compress_default() failed"

which will end up including the 'size' value in the failure.

That said, maybe we're better off with an if statement and return a bad Status 
here indicating a runtime error, since it seems you do have error checking at 
the call site, etc?



--
To view, visit http://gerrit.cloudera.org:8080/13582
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c
Gerrit-Change-Number: 13582
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Thu, 13 Jun 2019 17:36:52 +0000
Gerrit-HasComments: Yes

Reply via email to