Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13396 )

Change subject: IMPALA-8450: Add support for zstd and lz4 in parquet
......................................................................


Patch Set 6:

(9 comments)

I generally agree with Csaba's comments. I had a few additional ones.

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/exec/parquet/parquet-common.cc
File be/src/exec/parquet/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/exec/parquet/parquet-common.cc@28
PS6, Line 28:   THdfsCompression::SNAPPY, // BROTLI
Is this just a placeholder? Confusing since they're not equivalent. Can you add 
a comment to that effect, or, probably better, just add it to the enum.


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/exec/parquet/parquet-common.cc@41
PS6, Line 41:   parquet::CompressionCodec::SNAPPY, // BZIP2
Can you leave a comment that it's just a placeholder since BZIP2 isn't a valid 
parquet codec.


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/exec/parquet/parquet-common.cc@46
PS6, Line 46:   parquet::CompressionCodec::SNAPPY, // ZLIB
ZLIB <=> DEFLATE actually (not sure why we ended up with the same names).


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/service/query-options.cc@759
PS6, Line 759:           return Status(Substitute("Invalid ZSTD compression 
level '$0'. Valid values"
I do wonder if we'd be better off including the compression level in 
COMPRESSION_CODEC, e.g. "zstd:3", simply so that the validation is cleaner. 
This will get confusing if we get into a state where different ranges are valid 
for different codecs - we would have to do some of the validation later.

In some ways it's also nice to avoid adding another query option.

TQueryOptions is internal-only, so it would be safe to replace the current 
compression_codec field with a struct for the result of parsing.


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/codec.h
File be/src/util/codec.h:

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/codec.h@63
PS6, Line 63:    private:
            :     friend class Codec;
            :     friend class HdfsParquetTableWriter;
            :     friend void TestCompression(int, int, int, 
THdfsCompression::type);
> I would prefer to make the members public. If you want to keep private memb
+1


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/compress.cc
File be/src/util/compress.cc:

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/compress.cc@34
PS6, Line 34: #include "util/debug-util.h"
Please put this with the block on l30-31

common/names.h is a "special" header that imports things into the namespace 
based on what system headers were included. So it needs to go after all the 
previous includes.


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/compress.cc@338
PS6, Line 338:     return Status("ZSTD_compress failed with: "
Please use Substitute() instead of string concatenation. That should handle 
conversion of parameters for you.

Actually, it would be better to add an error code and template to 
common/thrift/generate_error_codes.py. That's a best practice that we've been 
pretty bad about following in the code.


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/decompress-test.cc
File be/src/util/decompress-test.cc:

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/decompress-test.cc@483
PS6, Line 483:   const int clevel = (rand() % ZSTD_maxCLevel()) + 1;
We had been trying to avoid using rand() in tests because it makes them 
non-reproducible. In this case maybe not a big deal since you're logging the 
generated number.

In other test we've used RandTestUtil::SeedRng() and a test-local RNG for this 
- e.g. see buffer-pool-test.cc. I think what you have here is fine in practice, 
but maybe use a test-local RNG like std::mt19937 and 
std::uniform_int_distribution to avoid the prohibition on rand().


http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/decompress.cc
File be/src/util/decompress.cc:

http://gerrit.cloudera.org:8080/#/c/13396/6/be/src/util/decompress.cc@35
PS6, Line 35: #include "util/debug-util.h"
Move to the previous block of includes.



--
To view, visit http://gerrit.cloudera.org:8080/13396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98c6dcf3d0a873380e4fa4cf03eb7e924e4ee768
Gerrit-Change-Number: 13396
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 May 2019 23:02:24 +0000
Gerrit-HasComments: Yes

Reply via email to