[kudu-CR] webserver: improve SSL certificate handling
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5015 to look at the new patch set (#3). Change subject: webserver: improve SSL certificate handling .. webserver: improve SSL certificate handling * Allows users to specify a separate location for PEM private-key file using --webserver_private_key_file (previously required private-key and cert. to be in same file). * Allows users to specify a shell command to run to get the password for the webserver's private-key file using --webserver_private_key_password_cmd The separate configuration of cert and key is apparently more commonly used, so this simplifies deployment. The use of a script to provide the password for the private key makes it easier to integrate with external credential providers. In order to test this, I reconfigured our thirdparty curl build to include HTTPS support. This mirrors the change made in Impala by IMPALA-2051 (8bbe45393c7). Change-Id: I4b508cebbe6f31556e6d5a5fba5e5e9fb44cf1b9 --- M src/kudu/rpc/rpc-test-base.h M src/kudu/security/CMakeLists.txt A src/kudu/security/test/test_certs.cc A src/kudu/security/test/test_certs.h M src/kudu/server/CMakeLists.txt M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/util/curl_util.cc M src/kudu/util/curl_util.h M thirdparty/build-definitions.sh 12 files changed, 292 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/5015/3 -- To view, visit http://gerrit.cloudera.org:8080/5015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4b508cebbe6f31556e6d5a5fba5e5e9fb44cf1b9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] webserver: improve SSL certificate handling
Todd Lipcon has posted comments on this change. Change subject: webserver: improve SSL certificate handling .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/5015/2/src/kudu/security/test/test_certs.h File src/kudu/security/test/test_certs.h: Line 28: Status CreateTestSSLCerts(const std::string& dir, > Could you use instead or replace usage of CreateSSLServerCert/CreateSSLPriv This patch adds a TODO there -- the issue is that these certs use a password, but the RPC TLS implementation doesn't yet support using a password. I'll work on that next. http://gerrit.cloudera.org:8080/#/c/5015/2/src/kudu/server/webserver.cc File src/kudu/server/webserver.cc: Line 158: Status s = Subprocess::Call(argv, "" /* stdin */, _password, ); > Adar and I traced through the other day and found that we never shelled out Yea, given it's close to startup I'd say it's safe. It's also the accepted way for people to tie in key/password management stuff in all of the other Hadoop ecosystem components, so we don't have much of a choice. (given this is close to startup I don't think it's worth preforking a separate "forker" process and communicating by a pipe or anything) Line 166: } else { > Is this else attached to the correct if? It would make more sense to me on Done PS2, Line 197: SimpleItoa > consider using std::to_string Done http://gerrit.cloudera.org:8080/#/c/5015/2/src/kudu/util/curl_util.cc File src/kudu/util/curl_util.cc: Line 72: RETURN_NOT_OK(TranslateError(curl_easy_setopt(curl_, CURLOPT_SSL_VERIFYPEER, 0))); > This might be simpler without the if block, and setting the value to veify_ Done -- To view, visit http://gerrit.cloudera.org:8080/5015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4b508cebbe6f31556e6d5a5fba5e5e9fb44cf1b9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: replace gscoped ptr with unique ptr
Todd Lipcon has abandoned this change. Change subject: WIP: replace gscoped_ptr with unique_ptr .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/2248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I21b809da9f9b87ce8fcb1412e45b4336f413a020 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. KUDU-1600 (part 1): bump to CFile version 2 This is preparatory yak-shaving work for KUDU-1600, an effort to prevent double-compression of inherently-compressed encodings such as bitshuffle. In order to enable or disable compression on a per-block basis, we need to change the header of the compressed blocks to indicate whether that block was compressed. Doing so, however, is obviously an incompatible change that earlier versions of Kudu would be confused by. This is not itself a blocker: we haven't guaranteed yet that we will always support downgrade between successive versions of Kudu. However, if someone _does_ downgrade, we would prefer not to crash in confusing ways or silently return corrupt data. Instead, we would like to give an error message that clearly traces back to an incompatible file format. At first glance, it would seem that the route to make such an incompatible change would be to bump the 'major_version' field present in the CFileHeaderPB. However, it turns out, despite this field existing since the very first ever commit to Kudu, we never implemented any code that actually _reads_ the field. So, bumping the major version would not actually prevent an incompatible (earlier) Kudu server from reading a file that it can't understand. Woops! So, in order to make the new CFiles incompatible with the old, the only solution is to actually bump the CFile magic string itself. This commit changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying to read such a file will now result in a "bad magic" error message. It's still not 100% obvious to the average user, but it's better than a weird error about mismatched compression lengths or a crash. While bumping the magic string solves the immediate issue at hand, this is also a good opportunity to fix the structural problem and add a better mechanism to make such changes in the future. Rather than implement the original plan of major/minor version, this patch takes a "feature flags" approach modeled after the 'ext' family of file systems. The advantages of feature flags over a linear 'version number' scheme are: - simplifies non-linear backporting of features to minor releases. For example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this makes it possible to backport Y without X to Kudu 1.8. - allows granular enablement of new features. For example, imagine that we introduce an experimental feature (eg an optimization enabled by a flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that flag. We would then avoid writing the flag into any cfiles, and we would allow compatible downgrade back to 1.2. - allows distinguishing behavior between incompatible and compatible changes I manually validated that, if I wrote data with a new server, and tried to read it with an old one, it gave me an error with "bad magic" as expected. Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Reviewed-on: http://gerrit.cloudera.org:8080/5678 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile.proto M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tools/kudu-tool-test.cc 7 files changed, 81 insertions(+), 24 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] tablet: make time-based flushing configurable
Todd Lipcon has uploaded a new patch set (#2). Change subject: tablet: make time-based flushing configurable .. tablet: make time-based flushing configurable We currently prioritize flushes for any in-memory data which has been sitting longer than 2 minutes. This patch changes the time threshold to be a new experimental flag rather than a hard-coded constant. I found this useful to tweak when I wanted to run some experiments regarding the on-disk size of data. With this flag, I was able to set the threshold low to ensure that all of my inserted data was quickly flushed to disk rather than sitting around in MRS where I couldn't measure it. Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87 --- M src/kudu/tablet/tablet_peer_mm_ops.cc 1 file changed, 7 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5690/2 -- To view, visit http://gerrit.cloudera.org:8080/5690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5679 to look at the new patch set (#4). Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. KUDU-1600 (part 2): store uncompressed blocks when codec can't compress This changes the compressed block header for CFile v2 so that it only includes the uncompressed block length, and not a redundant copy of the compressed block length (which is already known by the length of the block itself). In addition, this enables a special behavior if the stored uncompressed block length is exactly equal to the length of the compressed data. This signifies that the data has not been compressed and that the reader should not execute the configured codec. On the write side, we always execute the codec, and in the case that the resulting compression ratio is not good enough, we just store the uncompressed data as above. By default, the compression ratio is 0.9x (i.e at least 10% space reduction). It can be configured by an experimental flag. Initially, the JIRA had described doing this based on a policy rather than based on the actual output of the codec. However, during development I realized that, even with encodings like BIT_SHUFFLE, there are actually some patterns that can benefit from a second pass of LZ4. And even if the second pass is not effective, it isn't too expensive to try a second pass on the write side anyway. See [1] for a discussion of a dataset that benefits from multiple passes of compression. Along the way, made a few changes to how the compressed block decoder is implemented: * Use strings::Substitute instead of StringPrintf (easier to read) * Cleaned up unnecessary header includes * Made the CompressedBlockDecoder a short-lifetime stack object rather than an allocated member of CFileReader. This avoids an allocation and indirection, and additionally allows us to put more state inside the object itself without worrying about thread safety, etc. * The CompressedBlockDecoder is now stateful, making for an easier-to-use API. [1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f --- M src/kudu/cfile/block_compression.cc M src/kudu/cfile/block_compression.h M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/compression-test.cc 6 files changed, 216 insertions(+), 115 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/4 -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] tablet: make time-based flushing configurable
Todd Lipcon has posted comments on this change. Change subject: tablet: make time-based flushing configurable .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5690/1/src/kudu/tablet/tablet_peer_mm_ops.cc File src/kudu/tablet/tablet_peer_mm_ops.cc: Line 38: DEFINE_int32(flush_mrs_time_threshold_secs, 2 * 60, > Would be nice if the name was more consistent with flush_threshold_mb. So m Done -- To view, visit http://gerrit.cloudera.org:8080/5690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Todd Lipcon has posted comments on this change. Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5679/3/src/kudu/cfile/block_compression.cc File src/kudu/cfile/block_compression.cc: Line 152: // In CFile v2, we ensure that compressed size <= uncompressed size. > I found this statement a little confusing w.r.t. the actual behavior in Com Done -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tpch: improve encodings and compression
Todd Lipcon has posted comments on this change. Change subject: tpch: improve encodings and compression .. Patch Set 1: Nah, I'd rather keep the old perf data because it's useful to see whether it goes up or down after this patch :) -- To view, visit http://gerrit.cloudera.org:8080/5689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Alexey Serbin has posted comments on this change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS1, Line 80: 2_t len = > I don't think it's really problematic -- it'll just feed the empty block to That sounds good to me -- thank you for the clarification. http://gerrit.cloudera.org:8080/#/c/5678/2/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS2, Line 71: const char kMagicStringV1[] = "kuducfil"; : const char kMagicStringV2[] = "kuducfl2"; : const int kMagicLength = 8; > meh... I think it's easy enough to look at them and see that they're the sa I was elaborating on the compile-time check that Adar mentioned in PS1 comment. :) If there is decent run-time test coverage, then it's safe, of course. -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tpch: improve encodings and compression
Jean-Daniel Cryans has posted comments on this change. Change subject: tpch: improve encodings and compression .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] tpch: dont clear data when loading is disabled
Jean-Daniel Cryans has posted comments on this change. Change subject: tpch: dont clear data when loading is disabled .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5691 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] tpch: dont clear data when loading is disabled
Adar Dembo has posted comments on this change. Change subject: tpch: dont clear data when loading is disabled .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5691 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] tpch: improve encodings and compression
Adar Dembo has posted comments on this change. Change subject: tpch: improve encodings and compression .. Patch Set 1: Code-Review+2 No point in deleting the old performance data, right? Since scan performance is likely the same. -- To view, visit http://gerrit.cloudera.org:8080/5689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] tablet: make time-based flushing configurable
Adar Dembo has posted comments on this change. Change subject: tablet: make time-based flushing configurable .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5690/1/src/kudu/tablet/tablet_peer_mm_ops.cc File src/kudu/tablet/tablet_peer_mm_ops.cc: Line 38: DEFINE_int32(flush_mrs_time_threshold_secs, 2 * 60, Would be nice if the name was more consistent with flush_threshold_mb. So maybe make this flush_threshold_secs? Or maybe make the other one flush_mrs_threshold_mb and this one flush_mrs_threshold_secs? -- To view, visit http://gerrit.cloudera.org:8080/5690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Adar Dembo has posted comments on this change. Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5679/3/src/kudu/cfile/block_compression.cc File src/kudu/cfile/block_compression.cc: Line 152: // In CFile v2, we ensure that compressed size <= uncompressed size. I found this statement a little confusing w.r.t. the actual behavior in CompressedBlockBuilder::Compress() (the part that elides compression if compressed_size == uncompressed_size). Perhaps: "// CFile V2 guarantees that compressed size <= uncompressed size (though, as per the file format, if the two are equal the file data was not compressed)" ? -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1751: Change default encodings
Dan Burkert has posted comments on this change. Change subject: KUDU-1751: Change default encodings .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Doc review for 1.2
Dan Burkert has posted comments on this change. Change subject: Doc review for 1.2 .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5676 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tpch: use better encodings, compression in schema
Todd Lipcon has abandoned this change. Change subject: tpch: use better encodings, compression in schema .. Abandoned superceded by https://gerrit.cloudera.org/#/c/5689/1 -- To view, visit http://gerrit.cloudera.org:8080/952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Iccdf743dc5e2f311274d3f5450beee508c2a9524 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Internal Jenkins
[kudu-CR] Doc review for 1.2
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5676 to look at the new patch set (#2). Change subject: Doc review for 1.2 .. Doc review for 1.2 I made a pass through all of the documentation and made improvements as necessary for 1.2: * Specify that the start/stop instructions are deb/rpm specific * Updated the contribution C++ style guide to stop talking about gscoped_ptr * Removed limitation that Spark is only for Spark 1 (now we support Spark 2) * Fixed some branding in index.adoc * Some small tweaks and improvements to the glossary of concepts The larger changes are in the Impala integration. I tried to update the syntax and examples to match the Impala-Kudu release from 11/22/2016 (the most recent release at the time of this writing and at the expected release date for Kudu 1.2). Unfortunately, the syntax and capabilities have changed again between that release and the upcoming Impala 2.8 release. Rather than document the unreleased syntax, I went with the soon-to-be-removed syntax. Similarly, I left in all of the instructions about how to install the Impala-Kudu fork, even though that fork has now been merged back into the main Impala release train. When Impala 2.8 is available, we will have to do another pass here and re-publish docs. Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb --- M docs/administration.adoc M docs/contributing.adoc M docs/developing.adoc M docs/index.adoc M docs/kudu_impala_integration.adoc 5 files changed, 90 insertions(+), 190 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5676/2 -- To view, visit http://gerrit.cloudera.org:8080/5676 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-1751: change default BINARY encoding to DICTIONARY
Todd Lipcon has abandoned this change. Change subject: WIP: KUDU-1751: change default BINARY encoding to DICTIONARY .. Abandoned Rolled into https://gerrit.cloudera.org/#/c/5169/ -- To view, visit http://gerrit.cloudera.org:8080/5170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ie2776ed91a8d5e3ccf3ee024e338f9577e50e522 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] Doc review for 1.2
Todd Lipcon has posted comments on this change. Change subject: Doc review for 1.2 .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5676/1/docs/kudu_impala_integration.adoc File docs/kudu_impala_integration.adoc: PS1, Line 448: DISTRIBUTE BY RANGE (_id) > I believe that's a misconception double checked, I can do: [vd0342.halxg.cloudera.com:21000] > create table todd_test (a int primary key) partition by range(a) ( partition values < 100, partition 100 <= values) stored as kudu; PS1, Line 482: '\<', '\>', `>=`, or `IN` > oh, yea, BETWEEN should get pushed Done PS1, Line 941: 8 > ah, didn't know that. Done -- To view, visit http://gerrit.cloudera.org:8080/5676 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Hello Jean-Daniel Cryans, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5678 to look at the new patch set (#3). Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. KUDU-1600 (part 1): bump to CFile version 2 This is preparatory yak-shaving work for KUDU-1600, an effort to prevent double-compression of inherently-compressed encodings such as bitshuffle. In order to enable or disable compression on a per-block basis, we need to change the header of the compressed blocks to indicate whether that block was compressed. Doing so, however, is obviously an incompatible change that earlier versions of Kudu would be confused by. This is not itself a blocker: we haven't guaranteed yet that we will always support downgrade between successive versions of Kudu. However, if someone _does_ downgrade, we would prefer not to crash in confusing ways or silently return corrupt data. Instead, we would like to give an error message that clearly traces back to an incompatible file format. At first glance, it would seem that the route to make such an incompatible change would be to bump the 'major_version' field present in the CFileHeaderPB. However, it turns out, despite this field existing since the very first ever commit to Kudu, we never implemented any code that actually _reads_ the field. So, bumping the major version would not actually prevent an incompatible (earlier) Kudu server from reading a file that it can't understand. Woops! So, in order to make the new CFiles incompatible with the old, the only solution is to actually bump the CFile magic string itself. This commit changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying to read such a file will now result in a "bad magic" error message. It's still not 100% obvious to the average user, but it's better than a weird error about mismatched compression lengths or a crash. While bumping the magic string solves the immediate issue at hand, this is also a good opportunity to fix the structural problem and add a better mechanism to make such changes in the future. Rather than implement the original plan of major/minor version, this patch takes a "feature flags" approach modeled after the 'ext' family of file systems. The advantages of feature flags over a linear 'version number' scheme are: - simplifies non-linear backporting of features to minor releases. For example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this makes it possible to backport Y without X to Kudu 1.8. - allows granular enablement of new features. For example, imagine that we introduce an experimental feature (eg an optimization enabled by a flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that flag. We would then avoid writing the flag into any cfiles, and we would allow compatible downgrade back to 1.2. - allows distinguishing behavior between incompatible and compatible changes I manually validated that, if I wrote data with a new server, and tried to read it with an old one, it gave me an error with "bad magic" as expected. Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile.proto M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tools/kudu-tool-test.cc 7 files changed, 81 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/5678/3 -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] tablet: make time-based flushing configurable
Hello Jean-Daniel Cryans, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5690 to review the following change. Change subject: tablet: make time-based flushing configurable .. tablet: make time-based flushing configurable We currently prioritize flushes for any in-memory data which has been sitting longer than 2 minutes. This patch changes the time threshold to be a new experimental flag rather than a hard-coded constant. I found this useful to tweak when I wanted to run some experiments regarding the on-disk size of data. With this flag, I was able to set the threshold low to ensure that all of my inserted data was quickly flushed to disk rather than sitting around in MRS where I couldn't measure it. Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87 --- M src/kudu/tablet/tablet_peer_mm_ops.cc 1 file changed, 7 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5690/1 -- To view, visit http://gerrit.cloudera.org:8080/5690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans
[kudu-CR] tpch: improve encodings and compression
Hello Jean-Daniel Cryans, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5689 to review the following change. Change subject: tpch: improve encodings and compression .. tpch: improve encodings and compression Previously all of the columns had been hard-coded to 'PLAIN' encoding. This is no longer our default, nor would we recommend it for the types of data used in the TPCH dataset. This switches to default encodings everywhere, and also enables LZ compression on the "Comment" column. The reduction in data size is as follows: original: size: 993MB median scan time for TPCH1 query: 0.8685 sec with LZ4 'comment': size: 901MB (1.1x compression vs original) scan time: unaffected (query does not read comment column) with LZ4 'comment' and new encodings: size: 342MB (2.9x compression vs original) median scan time: 0.8488 sec Per the above, the on-disk size is reduced by almost 3x and the scan performance is improved by a couple percent (perhaps within the realm of measurement error). This workload is small enough to be fully RAM-resident, but in a larger dataset which is disk-bound on reads, the space reduction should yield a corresponding improvement in scan performance. Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b --- M src/kudu/benchmarks/tpch/tpch-schemas.h 1 file changed, 9 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/5689/1 -- To view, visit http://gerrit.cloudera.org:8080/5689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans
[kudu-CR] tpch: dont clear data when loading is disabled
Hello Jean-Daniel Cryans, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5691 to review the following change. Change subject: tpch: dont clear data when loading is disabled .. tpch: dont clear data when loading is disabled Previously, the benchmark would clear the minicluster data on every startup, even if 'tpch_load_data' was disabled. With this change, we only clear the minicluster when we are loading fresh data. Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a --- M src/kudu/benchmarks/tpch/tpch_real_world.cc 1 file changed, 5 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/5691/1 -- To view, visit http://gerrit.cloudera.org:8080/5691 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Hello Jean-Daniel Cryans, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5679 to look at the new patch set (#3). Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. KUDU-1600 (part 2): store uncompressed blocks when codec can't compress This changes the compressed block header for CFile v2 so that it only includes the uncompressed block length, and not a redundant copy of the compressed block length (which is already known by the length of the block itself). In addition, this enables a special behavior if the stored uncompressed block length is exactly equal to the length of the compressed data. This signifies that the data has not been compressed and that the reader should not execute the configured codec. On the write side, we always execute the codec, and in the case that the resulting compression ratio is not good enough, we just store the uncompressed data as above. By default, the compression ratio is 0.9x (i.e at least 10% space reduction). It can be configured by an experimental flag. Initially, the JIRA had described doing this based on a policy rather than based on the actual output of the codec. However, during development I realized that, even with encodings like BIT_SHUFFLE, there are actually some patterns that can benefit from a second pass of LZ4. And even if the second pass is not effective, it isn't too expensive to try a second pass on the write side anyway. See [1] for a discussion of a dataset that benefits from multiple passes of compression. Along the way, made a few changes to how the compressed block decoder is implemented: * Use strings::Substitute instead of StringPrintf (easier to read) * Cleaned up unnecessary header includes * Made the CompressedBlockDecoder a short-lifetime stack object rather than an allocated member of CFileReader. This avoids an allocation and indirection, and additionally allows us to put more state inside the object itself without worrying about thread safety, etc. * The CompressedBlockDecoder is now stateful, making for an easier-to-use API. [1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f --- M src/kudu/cfile/block_compression.cc M src/kudu/cfile/block_compression.h M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/compression-test.cc 6 files changed, 214 insertions(+), 115 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/3 -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Todd Lipcon has posted comments on this change. Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.cc File src/kudu/cfile/block_compression.cc: Line 137: // Check if the on-disk data size matches with the buffer > Nit: terminate with period. Done Line 144: } else { > Shouldn't we explicitly test for == 2? Otherwise if/when we add cfile v3, o I think if we were to make another change to this format, we would not bump the cfile version number, but instead we'd introduce it as an 'incompatible_feature' flag in the footer. Put another way, I dont expect us to ever have a new cfile version number. PS2, Line 149: uncompressed_size_ > Isn't this -1 at this point? woops. I wonder how this worked... perhaps due to unsigned integer nonsense. Will move it back where it belongs. Line 172: DCHECK_GE(uncompressed_size_, 0); > If you call uncompressed_size() below, you won't need this. yea, though I think calling your own public accessors is a bit ugly because it obscures what's going on (if I saw that I might think something's getting calculated, for example) http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.h File src/kudu/cfile/block_compression.h: Line 43: // CFile version 2 > Should include the bit in here too. Done Line 61: // Sets "*result" to the compressed version of the "data". > Update to data_slices. Done http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/compression-test.cc File src/kudu/cfile/compression-test.cc: Line 77: class TestCompression : public CFileTestBase { > Curious: what prompted the changes to this test? (a) switched from uint32/GROUP_VARINT to INT32/BIT_SHUFFLE since users can no longer specify uint32 columns in the first place and GROUP_VARINT encoding is deprecated. (b) added the 'random' data generation with PLAIN encoding to ensure that some blocks we created were non-compressible and exercise the new code path. Will add a comment. -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Todd Lipcon has posted comments on this change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS1, Line 80: 2_t len = > Is the case of parsed_len == 0 stopped representing a corrupted block? I u I don't think it's really problematic -- it'll just feed the empty block to the underlying encoding's decoder, and that will then notice that it's too short and fire a more specific Corruption error. PS1, Line 144: AndParseHeader()); > nit: consider to be more explicit here: Done http://gerrit.cloudera.org:8080/#/c/5678/2/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS2, Line 71: const char kMagicStringV1[] = "kuducfil"; : const char kMagicStringV2[] = "kuducfl2"; : const int kMagicLength = 8; > What about: meh... I think it's easy enough to look at them and see that they're the same length, and with the constant right above it, it's pretty clear that they are supposed to have the same one. I like compile_asserts for things like checking the sizeof() a particular struct, where you are afraid someone might make some well-meaning change which breaks some previous assumption on sizes, but here it's visible what's going on and if anyone made such a change it would break pretty immediately when you ran any test. -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] docs: KUDU-1767. Document possible client op reordering
Alexey Serbin has posted comments on this change. Change subject: docs: KUDU-1767. Document possible client op reordering .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5464/4//COMMIT_MSG Commit Message: PS4, Line 7: docs: KUDU-1767. Document possible client op reordering > Well, we are documenting KUDU-1767. Are you saying it's confusing? It was a little bit confusing to me because I thought the Jira item mention marks the code which fixes/resolves the mentioned issue. That's what I've seen so far. However, that's not the rule, right? So, probably it's ok to keep it there then. That's why it was a 'nit' :) -- To view, visit http://gerrit.cloudera.org:8080/5464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Adar Dembo has posted comments on this change. Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.cc File src/kudu/cfile/block_compression.cc: Line 137: // Check if the on-disk data size matches with the buffer Nit: terminate with period. Line 144: } else { Shouldn't we explicitly test for == 2? Otherwise if/when we add cfile v3, old deployments will assume the compression header looks like a v2 compression header. PS2, Line 149: uncompressed_size_ Isn't this -1 at this point? Line 172: DCHECK_GE(uncompressed_size_, 0); If you call uncompressed_size() below, you won't need this. http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.h File src/kudu/cfile/block_compression.h: Line 43: // CFile version 2 Should include the bit in here too. Line 61: // Sets "*result" to the compressed version of the "data". Update to data_slices. http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/compression-test.cc File src/kudu/cfile/compression-test.cc: Line 77: class TestCompression : public CFileTestBase { Curious: what prompted the changes to this test? -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] docs: KUDU-1767. Document possible client op reordering
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5464 to look at the new patch set (#5). Change subject: docs: KUDU-1767. Document possible client op reordering .. docs: KUDU-1767. Document possible client op reordering This patch documents the possibility of client reordering when using concurrent flush modes in both the C++ and Java clients. Also adds a note to the "Transaction Semantics" section of the user guide. Tested the docs rendering by building the user guide and the C++ and Java API docs locally. Also updated some old issue URLs to point to the ASF JIRA instance. Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224 --- M docs/transaction_semantics.adoc M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduSession.java M java/kudu-client/src/main/java/org/apache/kudu/client/SessionConfiguration.java M src/kudu/client/client.h 5 files changed, 147 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/5464/5 -- To view, visit http://gerrit.cloudera.org:8080/5464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] docs: KUDU-1767. Document possible client op reordering
Mike Percy has posted comments on this change. Change subject: docs: KUDU-1767. Document possible client op reordering .. Patch Set 4: (10 comments) http://gerrit.cloudera.org:8080/#/c/5464/4//COMMIT_MSG Commit Message: PS4, Line 7: docs: KUDU-1767. Document possible client op reordering > nit: it seems KUDU-1767 is not exactly about documenting write operations r Well, we are documenting KUDU-1767. Are you saying it's confusing? http://gerrit.cloudera.org:8080/#/c/5464/3/docs/transaction_semantics.adoc File docs/transaction_semantics.adoc: PS3, Line 217: In `AUTO_ > nit: noticeable Done http://gerrit.cloudera.org:8080/#/c/5464/4/docs/transaction_semantics.adoc File docs/transaction_semantics.adoc: PS4, Line 220: noticable > nit: noticeable Done http://gerrit.cloudera.org:8080/#/c/5464/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java: PS4, Line 53: read > guess we should remove this too Done http://gerrit.cloudera.org:8080/#/c/5464/4/java/kudu-client/src/main/java/org/apache/kudu/client/SessionConfiguration.java File java/kudu-client/src/main/java/org/apache/kudu/client/SessionConfiguration.java: PS4, Line 57: flush : * simultaneously > nit: may be, stop defining flush via flush and replace with thanks for the good suggestion, done PS4, Line 165: FlushMode#AUTO_FLUSH_BACKGROUND > IIRC this is also true for MANUAL_FLUSH mode as well. Done http://gerrit.cloudera.org:8080/#/c/5464/4/src/kudu/client/client.h File src/kudu/client/client.h: PS4, Line 1227: to Kudu > nit: consider dropping that Done PS4, Line 1227: This : /// is because the buffers may flush concurrently > nit: consider describing what it means in terms of write operations, may be Done PS4, Line 1242: to Kudu > nit: consider dropping Done PS4, Line 1243: This is because the buffers may : /// flush concurrently > nit: ditto with explaining that in terms of write operations sent to the se Done -- To view, visit http://gerrit.cloudera.org:8080/5464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Alexey Serbin has posted comments on this change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS1, Line 80: 2_t len = Is the case of parsed_len == 0 stopped representing a corrupted block? I understand that comparison like 'if (*parsed_len < 0)' does not make sense since parsed_len is uint32_t, but what about the boundary case? PS1, Line 144: AndParseHeader()); nit: consider to be more explicit here: if (PREDICT_FALSE(footer_->incompatible_features() != 0)) { ... } http://gerrit.cloudera.org:8080/#/c/5678/2/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS2, Line 71: const char kMagicStringV1[] = "kuducfil"; : const char kMagicStringV2[] = "kuducfl2"; : const int kMagicLength = 8; What about: const size_t kMagicLength = 8; const char kMagicStringV1[] = "kuducfil"; const char kMagicStringV2[] = "kuducfl2"; static_assert(sizeof(kMagicStringV1) == kMagicLength + 1, "wrong magic length"); static_assert(sizeof(kMagicStringV2) == kMagicLength + 1, "wrong magic length"); -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1751: Change default encodings
Adar Dembo has posted comments on this change. Change subject: KUDU-1751: Change default encodings .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Adar Dembo has posted comments on this change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1751: Change default encodings
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5169 to look at the new patch set (#8). Change subject: KUDU-1751: Change default encodings .. KUDU-1751: Change default encodings * Change numeric (int/float/double) encodings to BIT_SHUFFLE BIT_SHUFFLE is a better default than PLAIN since it's much more compact and generally performs better. * Change BINARY encodings to DICT_ENCODING This is the default in Parquet, and we've seen that it's a common reason that Kudu performs poorly. This automatically falls back to non-dict-encoded for high-cardinality data. Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/type_encodings.cc M src/kudu/tablet/diskrowset-test.cc 3 files changed, 14 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/8 -- To view, visit http://gerrit.cloudera.org:8080/5169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5678 to look at the new patch set (#2). Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. KUDU-1600 (part 1): bump to CFile version 2 This is preparatory yak-shaving work for KUDU-1600, an effort to prevent double-compression of inherently-compressed encodings such as bitshuffle. In order to enable or disable compression on a per-block basis, we need to change the header of the compressed blocks to indicate whether that block was compressed. Doing so, however, is obviously an incompatible change that earlier versions of Kudu would be confused by. This is not itself a blocker: we haven't guaranteed yet that we will always support downgrade between successive versions of Kudu. However, if someone _does_ downgrade, we would prefer not to crash in confusing ways or silently return corrupt data. Instead, we would like to give an error message that clearly traces back to an incompatible file format. At first glance, it would seem that the route to make such an incompatible change would be to bump the 'major_version' field present in the CFileHeaderPB. However, it turns out, despite this field existing since the very first ever commit to Kudu, we never implemented any code that actually _reads_ the field. So, bumping the major version would not actually prevent an incompatible (earlier) Kudu server from reading a file that it can't understand. Woops! So, in order to make the new CFiles incompatible with the old, the only solution is to actually bump the CFile magic string itself. This commit changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying to read such a file will now result in a "bad magic" error message. It's still not 100% obvious to the average user, but it's better than a weird error about mismatched compression lengths or a crash. While bumping the magic string solves the immediate issue at hand, this is also a good opportunity to fix the structural problem and add a better mechanism to make such changes in the future. Rather than implement the original plan of major/minor version, this patch takes a "feature flags" approach modeled after the 'ext' family of file systems. The advantages of feature flags over a linear 'version number' scheme are: - simplifies non-linear backporting of features to minor releases. For example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this makes it possible to backport Y without X to Kudu 1.8. - allows granular enablement of new features. For example, imagine that we introduce an experimental feature (eg an optimization enabled by a flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that flag. We would then avoid writing the flag into any cfiles, and we would allow compatible downgrade back to 1.2. - allows distinguishing behavior between incompatible and compatible changes Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile.proto M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tools/kudu-tool-test.cc 7 files changed, 81 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/5678/2 -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5679 to look at the new patch set (#2). Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. KUDU-1600 (part 2): store uncompressed blocks when codec can't compress This changes the compressed block header for CFile v2 so that it only includes the uncompressed block length, and not a redundant copy of the compressed block length (which is already known by the length of the block itself). In addition, this enables a special behavior if the stored uncompressed block length is exactly equal to the length of the compressed data. This signifies that the data has not been compressed and that the reader should not execute the configured codec. On the write side, we always execute the codec, and in the case that the resulting compression ratio is not good enough, we just store the uncompressed data as above. By default, the compression ratio is 0.9x (i.e at least 10% space reduction). It can be configured by an experimental flag. Initially, the JIRA had described doing this based on a policy rather than based on the actual output of the codec. However, during development I realized that, even with encodings like BIT_SHUFFLE, there are actually some patterns that can benefit from a second pass of LZ4. And even if the second pass is not effective, it isn't too expensive to try a second pass on the write side anyway. See [1] for a discussion of a dataset that benefits from multiple passes of compression. Along the way, made a few changes to how the compressed block decoder is implemented: * Use strings::Substitute instead of StringPrintf (easier to read) * Cleaned up unnecessary header includes * Made the CompressedBlockDecoder a short-lifetime stack object rather than an allocated member of CFileReader. This avoids an allocation and indirection, and additionally allows us to put more state inside the object itself without worrying about thread safety, etc. * The CompressedBlockDecoder is now stateful, making for an easier-to-use API. [1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f --- M src/kudu/cfile/block_compression.cc M src/kudu/cfile/block_compression.h M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/compression-test.cc 6 files changed, 200 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/2 -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Todd Lipcon has posted comments on this change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/5678/1/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 30: : the string 'kuducfl2' > Maybe we should have both here, with a note about how they enable two "vers Done http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile.proto File src/kudu/cfile/cfile.proto: Line 33: message CFileHeaderPB { > Hmm, but don't we need to retain major/minor version in the definition in o Right, if we now open and read a file that has these fields, they'll just be skipped by the new reader. I removed them to save a few bytes of memory since every CFileReader instantiates and retains a header. Given that the values never had any meaning, I don't find it particularly sad that we won't see the output in the various 'dump' tools. http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS1, Line 79: strlen(kMagicStringV2)) > Would be nice to have a compile-time assert that kMagicStringV1 and kMagicS we can't do a compile assert because strlen() cannot be evaluated at compile time. I'll introduce a new constant Line 81: return Status::Corruption("invalid data size"); > Not a huge fan of functions returning a bad status but having also written Done -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1751: Change default INT and BINARY encodings
Dan Burkert has posted comments on this change. Change subject: KUDU-1751: Change default INT and BINARY encodings .. Patch Set 7: (3 comments) Test failures look legit. http://gerrit.cloudera.org:8080/#/c/5169/7//COMMIT_MSG Commit Message: Line 7: KUDU-1751: Change default INT and BINARY encodings I think it's fair to say 'Change default encodings', since the default for every type is changing. PS7, Line 9: i encodings PS7, Line 9: INT The floating point types are also changing. -- To view, visit http://gerrit.cloudera.org:8080/5169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Adar Dembo has posted comments on this change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/5678/1/docs/design-docs/cfile.md File docs/design-docs/cfile.md: Line 30: : the string 'kuducfl2' Maybe we should have both here, with a note about how they enable two "versions" of cfile? http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile.proto File src/kudu/cfile/cfile.proto: Line 33: message CFileHeaderPB { Hmm, but don't we need to retain major/minor version in the definition in order to read older cfile headers? Or is the point that, there was never any code to read them, so there's no point in defining them? Is that true across the board? Even for the CLI tool, which might dump a DebugString() of the PB? http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS1, Line 79: strlen(kMagicStringV2)) Would be nice to have a compile-time assert that kMagicStringV1 and kMagicStringV2 are the same length. Or to use the right one here dependign on the cfile_version. Line 81: return Status::Corruption("invalid data size"); Not a huge fan of functions returning a bad status but having also written to output parameters. Can you fix that? -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Doc review for 1.2
Todd Lipcon has posted comments on this change. Change subject: Doc review for 1.2 .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/5676/1/docs/kudu_impala_integration.adoc File docs/kudu_impala_integration.adoc: PS1, Line 376: DISTRIBUTE BY HASH INTO (id) 16 BUCKETS > You mean I'm documenting the 2.7 syntax since 2.8 isn't released yet. But you're right it should say (id) PS1, Line 413: DISTRIBUTE BY HASH(name) INTO 8 BUCKETS > Also new syntax: same (2.7 syntax) PS1, Line 448: DISTRIBUTE BY RANGE (_id) > PARTITION BY I believe that's a misconception PS1, Line 449: SPLIT ROWS((1439560049342), : (1439566253755), : (1439572458168), : (1439578662581), : (1439584866994), : (1439591071407)) > Now the syntax is different with 'PARTITION x <= VALUES < y'. I'm not sure 2.7 syntax PS1, Line 482: '\<', '\>', `>=`, or `IN` > Does this apply for BETWEEN also, since that acts the same as a couple of c oh, yea, BETWEEN should get pushed PS1, Line 724: re-enable when documenting : // Impala 2.8. > Aren't we documenting Impala 2.8 now or is this some intermediate stage? these docs will release with Kudu 1.2, which will release prior to Impala 2.8 being available. We'll update them again for Impala 2.8 in a couple weeks PS1, Line 941: 8 > I thought it was still true. You have to alter the TBLPROPERTIES kudu.table ah, didn't know that. -- To view, visit http://gerrit.cloudera.org:8080/5676 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Doc review for 1.2
Dan Burkert has posted comments on this change. Change subject: Doc review for 1.2 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5676/1/docs/kudu_impala_integration.adoc File docs/kudu_impala_integration.adoc: PS1, Line 376: DISTRIBUTE BY HASH INTO (id) 16 BUCKETS DISTRIBUTE BY HASH (id) PARTITIONS 16 -- To view, visit http://gerrit.cloudera.org:8080/5676 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [kudu-jepsen] Kudu Jepsen tests
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5492 to look at the new patch set (#19). Change subject: [kudu-jepsen] Kudu Jepsen tests .. [kudu-jepsen] Kudu Jepsen tests This patch contains David's code for the initial kudu-jepsen tests as it was before KUDU-798 was resolved (i.e. as it was when it was failing) and additional updates/fixes: * Extra nemeses for the read/write register linearizability test * Run multiple test scenarios in the scope of the register test * Starting up master server: wait for the catalog manager * Other unsorted fixes for more robust operation The clojure code is integrated into the Kudu maven build and is compiled along with the other projects in a separate 'jepsen' profile. The patch also adds functionality to run the kudu-jepsen tests from the clojure-maven-plugin. The test uses the build machine as the Jepsen control node, running the control logic and the Kudu Java client there. All Jepsen control operations on the DB nodes (i.e. Kudu master and tserver nodes) are run via SSH. The private SSH key should be set prior to running the test: 1. The public part of the SSH key should be added into the 'authorized_keys' file for the root user on all cluster nodes. 2. The private part of the SSH key should be provided to the test either by: * adding the key into the SSH agent on the control node * specifying the path to the key via 'sshKeyPath' property Having the Kudu cluster provisioned and SSH keys deployed, to run the tests against the cluster with master node m0 and tserver nodes {t0..t4}, build the Kudu project from sources and then execute the following in the $KUDU_HOME/java/kudu-jepsen directory: mvn clojure:run -DmasterNodes=m0 -DtserverNodes="t0,t1,t2,t3,t4" after bulding the top-level project with mvn clean compile test-compile -Pjepsen Restrictions: 1. Kudu nodes should run recent Debian/Ubuntu Linux distro (that's due to the internal Jepsen's restrictions). 2. The 'kudu-master', 'kudu-tserver' and 'kudu' binaries in the $KUDU_HOME/build/latest/bin should be built for the OS/distro running on the Kudu cluster nodes. Change-Id: I590c6e78840304b3131666c7037ff9a08dc77dea --- A java/kudu-jepsen/.gitignore A java/kudu-jepsen/README.adoc A java/kudu-jepsen/pom.xml A java/kudu-jepsen/resources/kudu.flags A java/kudu-jepsen/resources/ntp.conf.common A java/kudu-jepsen/resources/ntp.conf.server A java/kudu-jepsen/src/main/clojure/jepsen/kudu.clj A java/kudu-jepsen/src/main/clojure/jepsen/kudu/client.clj A java/kudu-jepsen/src/main/clojure/jepsen/kudu/nemesis.clj A java/kudu-jepsen/src/main/clojure/jepsen/kudu/register.clj A java/kudu-jepsen/src/main/clojure/jepsen/kudu/table.clj A java/kudu-jepsen/src/main/clojure/jepsen/kudu/util.clj A java/kudu-jepsen/src/test/clojure/jepsen/kudu_test.clj A java/kudu-jepsen/src/utils/kudu_test_runner.clj M java/pom.xml 15 files changed, 1,544 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5492/19 -- To view, visit http://gerrit.cloudera.org:8080/5492 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I590c6e78840304b3131666c7037ff9a08dc77dea Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1771. Java client "connection refused" errors logged as "connection reset"
Jun He has uploaded a new change for review. http://gerrit.cloudera.org:8080/5680 Change subject: KUDU-1771. Java client "connection refused" errors logged as "connection reset" .. KUDU-1771. Java client "connection refused" errors logged as "connection reset" This commit changes cleanup() to accepts a string input. It allows a caller to customize its error message based on the caller site context. Change-Id: If9eb40f11757ae76bc430b3f513b96592068d6e2 --- M java/kudu-client/src/main/java/org/apache/kudu/client/TabletClient.java 1 file changed, 8 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/5680/1 -- To view, visit http://gerrit.cloudera.org:8080/5680 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If9eb40f11757ae76bc430b3f513b96592068d6e2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jun He
[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress
Hello Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5679 to review the following change. Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't compress .. KUDU-1600 (part 2): store uncompressed blocks when codec can't compress This changes the compressed block header for CFile v2 so that it only includes the uncompressed block length, and not a redundant copy of the compressed block length (which is already known by the length of the block itself). In addition, this enables a special behavior if the stored uncompressed block length is exactly equal to the length of the compressed data. This signifies that the data has not been compressed and that the reader should not execute the configured codec. On the write side, we always execute the codec, and in the case that the resulting compression ratio is not good enough, we just store the uncompressed data as above. By default, the compression ratio is 0.9x (i.e at least 10% space reduction). It can be configured by an experimental flag. Initially, the JIRA had described doing this based on a policy rather than based on the actual output of the codec. However, during development I realized that, even with encodings like BIT_SHUFFLE, there are actually some patterns that can benefit from a second pass of LZ4. And even if the second pass is not effective, it isn't too expensive to try a second pass on the write side anyway. See [1] for a discussion of a dataset that benefits from multiple passes of compression. Along the way, made a few changes to how the compressed block decoder is implemented: * Use strings::Substitute instead of StringPrintf (easier to read) * Cleaned up unnecessary header includes * Made the CompressedBlockDecoder a short-lifetime stack object rather than an allocated member of CFileReader. This avoids an allocation and indirection, and additionally allows us to put more state inside the object itself without worrying about thread safety, etc. * The CompressedBlockDecoder is now stateful, making for an easier-to-use API. [1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f --- M src/kudu/cfile/block_compression.cc M src/kudu/cfile/block_compression.h M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/compression-test.cc 6 files changed, 200 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/1 -- To view, visit http://gerrit.cloudera.org:8080/5679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo
[kudu-CR] KUDU-1751: Change default INT and BINARY encodings
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5169 to look at the new patch set (#7). Change subject: KUDU-1751: Change default INT and BINARY encodings .. KUDU-1751: Change default INT and BINARY encodings * Change INT incodings to BIT_SHUFFLE BIT_SHUFFLE is a better default than PLAIN since it's much more compact and generally performs better. * Change BINARY encodings to DICT_ENCODING This is the default in Parquet, and we've seen that it's a common reason that Kudu performs poorly. This automatically falls back to non-dict-encoded for high-cardinality data. Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/type_encodings.cc 2 files changed, 12 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/7 -- To view, visit http://gerrit.cloudera.org:8080/5169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2
Hello Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5678 to review the following change. Change subject: KUDU-1600 (part 1): bump to CFile version 2 .. KUDU-1600 (part 1): bump to CFile version 2 This is preparatory yak-shaving work for KUDU-1600, an effort to prevent double-compression of inherently-compressed encodings such as bitshuffle. In order to enable or disable compression on a per-block basis, we need to change the header of the compressed blocks to indicate whether that block was compressed. Doing so, however, is obviously an incompatible change that earlier versions of Kudu would be confused by. This is not itself a blocker: we haven't guaranteed yet that we will always support downgrade between successive versions of Kudu. However, if someone _does_ downgrade, we would prefer not to crash in confusing ways or silently return corrupt data. Instead, we would like to give an error message that clearly traces back to an incompatible file format. At first glance, it would seem that the route to make such an incompatible change would be to bump the 'major_version' field present in the CFileHeaderPB. However, it turns out, despite this field existing since the very first ever commit to Kudu, we never implemented any code that actually _reads_ the field. So, bumping the major version would not actually prevent an incompatible (earlier) Kudu server from reading a file that it can't understand. Woops! So, in order to make the new CFiles incompatible with the old, the only solution is to actually bump the CFile magic string itself. This commit changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying to read such a file will now result in a "bad magic" error message. It's still not 100% obvious to the average user, but it's better than a weird error about mismatched compression lengths or a crash. While bumping the magic string solves the immediate issue at hand, this is also a good opportunity to fix the structural problem and add a better mechanism to make such changes in the future. Rather than implement the original plan of major/minor version, this patch takes a "feature flags" approach modeled after the 'ext' family of file systems. The advantages of feature flags over a linear 'version number' scheme are: - simplifies non-linear backporting of features to minor releases. For example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this makes it possible to backport Y without X to Kudu 1.8. - allows granular enablement of new features. For example, imagine that we introduce an experimental feature (eg an optimization enabled by a flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that flag. We would then avoid writing the flag into any cfiles, and we would allow compatible downgrade back to 1.2. - allows distinguishing behavior between incompatible and compatible changes Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile.proto M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h 6 files changed, 63 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/5678/1 -- To view, visit http://gerrit.cloudera.org:8080/5678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo
[kudu-CR] [java client] KUDU-1643 Prune hash partitions based on IN-list predicates
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5677 to look at the new patch set (#2). Change subject: [java client] KUDU-1643 Prune hash partitions based on IN-list predicates .. [java client] KUDU-1643 Prune hash partitions based on IN-list predicates I change Integer to List so that it can store bitset of hash values. It will search all combinations of in-list column value and calculate the hash values. Change-Id: I8a793c23ff00d19b3d3d062bb222d2c725a93724 --- M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java 2 files changed, 181 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/5677/2 -- To view, visit http://gerrit.cloudera.org:8080/5677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a793c23ff00d19b3d3d062bb222d2c725a93724 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Haijie HongGerrit-Reviewer: Kudu Jenkins