[kudu-CR] webserver: improve SSL certificate handling

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5015

to look at the new patch set (#3).

Change subject: webserver: improve SSL certificate handling
..

webserver: improve SSL certificate handling

* Allows users to specify a separate location for PEM private-key file
  using --webserver_private_key_file (previously required private-key
  and cert. to be in same file).

* Allows users to specify a shell command to run to get the password
  for the webserver's private-key file using
  --webserver_private_key_password_cmd

The separate configuration of cert and key is apparently more commonly
used, so this simplifies deployment. The use of a script to provide the
password for the private key makes it easier to integrate with external
credential providers.

In order to test this, I reconfigured our thirdparty curl build to include
HTTPS support.

This mirrors the change made in Impala by IMPALA-2051 (8bbe45393c7).

Change-Id: I4b508cebbe6f31556e6d5a5fba5e5e9fb44cf1b9
---
M src/kudu/rpc/rpc-test-base.h
M src/kudu/security/CMakeLists.txt
A src/kudu/security/test/test_certs.cc
A src/kudu/security/test/test_certs.h
M src/kudu/server/CMakeLists.txt
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/util/curl_util.cc
M src/kudu/util/curl_util.h
M thirdparty/build-definitions.sh
12 files changed, 292 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/5015/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b508cebbe6f31556e6d5a5fba5e5e9fb44cf1b9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] webserver: improve SSL certificate handling

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: webserver: improve SSL certificate handling
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5015/2/src/kudu/security/test/test_certs.h
File src/kudu/security/test/test_certs.h:

Line 28: Status CreateTestSSLCerts(const std::string& dir,
> Could you use instead or replace usage of CreateSSLServerCert/CreateSSLPriv
This patch adds a TODO there -- the issue is that these certs use a password, 
but the RPC TLS implementation doesn't yet support using a password. I'll work 
on that next.


http://gerrit.cloudera.org:8080/#/c/5015/2/src/kudu/server/webserver.cc
File src/kudu/server/webserver.cc:

Line 158: Status s = Subprocess::Call(argv, "" /* stdin */, 
_password, );
> Adar and I traced through the other day and found that we never shelled out
Yea, given it's close to startup I'd say it's safe. It's also the accepted way 
for people to tie in key/password management stuff in all of the other Hadoop 
ecosystem components, so we don't have much of a choice. (given this is close 
to startup I don't think it's worth preforking a separate "forker" process and 
communicating by a pipe or anything)


Line 166: } else {
> Is this else attached to the correct if?  It would make more sense to me on
Done


PS2, Line 197: SimpleItoa
> consider using std::to_string
Done


http://gerrit.cloudera.org:8080/#/c/5015/2/src/kudu/util/curl_util.cc
File src/kudu/util/curl_util.cc:

Line 72: RETURN_NOT_OK(TranslateError(curl_easy_setopt(curl_, 
CURLOPT_SSL_VERIFYPEER, 0)));
> This might be simpler without the if block, and setting the value to veify_
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4b508cebbe6f31556e6d5a5fba5e5e9fb44cf1b9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: replace gscoped ptr with unique ptr

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has abandoned this change.

Change subject: WIP: replace gscoped_ptr with unique_ptr
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/2248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I21b809da9f9b87ce8fcb1412e45b4336f413a020
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


KUDU-1600 (part 1): bump to CFile version 2

This is preparatory yak-shaving work for KUDU-1600, an effort to prevent
double-compression of inherently-compressed encodings such as
bitshuffle. In order to enable or disable compression on a per-block
basis, we need to change the header of the compressed blocks to indicate
whether that block was compressed. Doing so, however, is obviously an
incompatible change that earlier versions of Kudu would be confused by.

This is not itself a blocker: we haven't guaranteed yet that we will
always support downgrade between successive versions of Kudu. However,
if someone _does_ downgrade, we would prefer not to crash in confusing
ways or silently return corrupt data. Instead, we would like to give an
error message that clearly traces back to an incompatible file format.

At first glance, it would seem that the route to make such an
incompatible change would be to bump the 'major_version' field present
in the CFileHeaderPB. However, it turns out, despite this field existing
since the very first ever commit to Kudu, we never implemented any code
that actually _reads_ the field. So, bumping the major version would not
actually prevent an incompatible (earlier) Kudu server from reading a
file that it can't understand. Woops!

So, in order to make the new CFiles incompatible with the old, the only
solution is to actually bump the CFile magic string itself. This commit
changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying
to read such a file will now result in a "bad magic" error message. It's
still not 100% obvious to the average user, but it's better than a weird
error about mismatched compression lengths or a crash.

While bumping the magic string solves the immediate issue at hand, this
is also a good opportunity to fix the structural problem and add a
better mechanism to make such changes in the future.

Rather than implement the original plan of major/minor version, this
patch takes a "feature flags" approach modeled after the 'ext' family of
file systems. The advantages of feature flags over a linear 'version
number' scheme are:

- simplifies non-linear backporting of features to minor releases. For
  example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this
  makes it possible to backport Y without X to Kudu 1.8.
- allows granular enablement of new features. For example, imagine that
  we introduce an experimental feature (eg an optimization enabled by a
  flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that
  flag. We would then avoid writing the flag into any cfiles, and we
  would allow compatible downgrade back to 1.2.
- allows distinguishing behavior between incompatible and compatible
  changes

I manually validated that, if I wrote data with a new server, and tried
to read it with an old one, it gave me an error with "bad magic" as
expected.

Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Reviewed-on: http://gerrit.cloudera.org:8080/5678
Reviewed-by: Alexey Serbin 
Tested-by: Kudu Jenkins
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/tools/kudu-tool-test.cc
7 files changed, 81 insertions(+), 24 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] tablet: make time-based flushing configurable

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded a new patch set (#2).

Change subject: tablet: make time-based flushing configurable
..

tablet: make time-based flushing configurable

We currently prioritize flushes for any in-memory data which has been
sitting longer than 2 minutes. This patch changes the time threshold to
be a new experimental flag rather than a hard-coded constant.

I found this useful to tweak when I wanted to run some experiments
regarding the on-disk size of data. With this flag, I was able to set
the threshold low to ensure that all of my inserted data was quickly
flushed to disk rather than sitting around in MRS where I couldn't
measure it.

Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87
---
M src/kudu/tablet/tablet_peer_mm_ops.cc
1 file changed, 7 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5690/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5679

to look at the new patch set (#4).

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 216 insertions(+), 115 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] tablet: make time-based flushing configurable

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: tablet: make time-based flushing configurable
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5690/1/src/kudu/tablet/tablet_peer_mm_ops.cc
File src/kudu/tablet/tablet_peer_mm_ops.cc:

Line 38: DEFINE_int32(flush_mrs_time_threshold_secs, 2 * 60,
> Would be nice if the name was more consistent with flush_threshold_mb. So m
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5679/3/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 152:   // In CFile v2, we ensure that compressed size <= uncompressed size.
> I found this statement a little confusing w.r.t. the actual behavior in Com
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 1:

Nah, I'd rather keep the old perf data because it's useful to see whether it 
goes up or down after this patch :)

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS1, Line 80: 2_t len = 
> I don't think it's really problematic -- it'll just feed the empty block to
That sounds good to me -- thank you for the clarification.


http://gerrit.cloudera.org:8080/#/c/5678/2/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS2, Line 71: const char kMagicStringV1[] = "kuducfil";
: const char kMagicStringV2[] = "kuducfl2";
: const int kMagicLength = 8;
> meh... I think it's easy enough to look at them and see that they're the sa
I was elaborating on the compile-time check that Adar mentioned in PS1 comment. 
:)

If there is decent run-time test coverage, then it's safe, of course.


-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] tpch: dont clear data when loading is disabled

2017-01-11 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: tpch: dont clear data when loading is disabled
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5691
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] tpch: dont clear data when loading is disabled

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: tpch: dont clear data when loading is disabled
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5691
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 1: Code-Review+2

No point in deleting the old performance data, right? Since scan performance is 
likely the same.

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] tablet: make time-based flushing configurable

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: tablet: make time-based flushing configurable
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5690/1/src/kudu/tablet/tablet_peer_mm_ops.cc
File src/kudu/tablet/tablet_peer_mm_ops.cc:

Line 38: DEFINE_int32(flush_mrs_time_threshold_secs, 2 * 60,
Would be nice if the name was more consistent with flush_threshold_mb. So maybe 
make this flush_threshold_secs? Or maybe make the other one 
flush_mrs_threshold_mb and this one flush_mrs_threshold_secs?


-- 
To view, visit http://gerrit.cloudera.org:8080/5690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5679/3/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 152:   // In CFile v2, we ensure that compressed size <= uncompressed size.
I found this statement a little confusing w.r.t. the actual behavior in 
CompressedBlockBuilder::Compress() (the part that elides compression if 
compressed_size == uncompressed_size). 

Perhaps: "// CFile V2 guarantees that compressed size <= uncompressed size 
(though, as per the file format, if the two are equal the file data was not 
compressed)" ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1751: Change default encodings

2017-01-11 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1751: Change default encodings
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Doc review for 1.2

2017-01-11 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: Doc review for 1.2
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] tpch: use better encodings, compression in schema

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has abandoned this change.

Change subject: tpch: use better encodings, compression in schema
..


Abandoned

superceded by https://gerrit.cloudera.org/#/c/5689/1

-- 
To view, visit http://gerrit.cloudera.org:8080/952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Iccdf743dc5e2f311274d3f5450beee508c2a9524
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Internal Jenkins


[kudu-CR] Doc review for 1.2

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5676

to look at the new patch set (#2).

Change subject: Doc review for 1.2
..

Doc review for 1.2

I made a pass through all of the documentation and made improvements as
necessary for 1.2:

* Specify that the start/stop instructions are deb/rpm specific
* Updated the contribution C++ style guide to stop talking about
  gscoped_ptr
* Removed limitation that Spark is only for Spark 1 (now we support
  Spark 2)
* Fixed some branding in index.adoc
* Some small tweaks and improvements to the glossary of concepts

The larger changes are in the Impala integration. I tried to update the
syntax and examples to match the Impala-Kudu release from 11/22/2016
(the most recent release at the time of this writing and at the expected
release date for Kudu 1.2). Unfortunately, the syntax and capabilities
have changed again between that release and the upcoming Impala 2.8
release. Rather than document the unreleased syntax, I went with the
soon-to-be-removed syntax.

Similarly, I left in all of the instructions about how to install the
Impala-Kudu fork, even though that fork has now been merged back into
the main Impala release train. When Impala 2.8 is available, we will
have to do another pass here and re-publish docs.

Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb
---
M docs/administration.adoc
M docs/contributing.adoc
M docs/developing.adoc
M docs/index.adoc
M docs/kudu_impala_integration.adoc
5 files changed, 90 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/5676/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-1751: change default BINARY encoding to DICTIONARY

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has abandoned this change.

Change subject: WIP: KUDU-1751: change default BINARY encoding to DICTIONARY
..


Abandoned

Rolled into 
https://gerrit.cloudera.org/#/c/5169/

-- 
To view, visit http://gerrit.cloudera.org:8080/5170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie2776ed91a8d5e3ccf3ee024e338f9577e50e522
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] Doc review for 1.2

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Doc review for 1.2
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5676/1/docs/kudu_impala_integration.adoc
File docs/kudu_impala_integration.adoc:

PS1, Line 448: DISTRIBUTE BY RANGE (_id)
> I believe that's a misconception
double checked, I can do:

[vd0342.halxg.cloudera.com:21000] > create table todd_test (a int primary key) 
partition by range(a) ( partition values < 100, partition 100 <= values) stored 
as kudu;


PS1, Line 482: '\<', '\>', `>=`, or `IN`
> oh, yea, BETWEEN should get pushed
Done


PS1, Line 941: 8
> ah, didn't know that.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5678

to look at the new patch set (#3).

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..

KUDU-1600 (part 1): bump to CFile version 2

This is preparatory yak-shaving work for KUDU-1600, an effort to prevent
double-compression of inherently-compressed encodings such as
bitshuffle. In order to enable or disable compression on a per-block
basis, we need to change the header of the compressed blocks to indicate
whether that block was compressed. Doing so, however, is obviously an
incompatible change that earlier versions of Kudu would be confused by.

This is not itself a blocker: we haven't guaranteed yet that we will
always support downgrade between successive versions of Kudu. However,
if someone _does_ downgrade, we would prefer not to crash in confusing
ways or silently return corrupt data. Instead, we would like to give an
error message that clearly traces back to an incompatible file format.

At first glance, it would seem that the route to make such an
incompatible change would be to bump the 'major_version' field present
in the CFileHeaderPB. However, it turns out, despite this field existing
since the very first ever commit to Kudu, we never implemented any code
that actually _reads_ the field. So, bumping the major version would not
actually prevent an incompatible (earlier) Kudu server from reading a
file that it can't understand. Woops!

So, in order to make the new CFiles incompatible with the old, the only
solution is to actually bump the CFile magic string itself. This commit
changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying
to read such a file will now result in a "bad magic" error message. It's
still not 100% obvious to the average user, but it's better than a weird
error about mismatched compression lengths or a crash.

While bumping the magic string solves the immediate issue at hand, this
is also a good opportunity to fix the structural problem and add a
better mechanism to make such changes in the future.

Rather than implement the original plan of major/minor version, this
patch takes a "feature flags" approach modeled after the 'ext' family of
file systems. The advantages of feature flags over a linear 'version
number' scheme are:

- simplifies non-linear backporting of features to minor releases. For
  example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this
  makes it possible to backport Y without X to Kudu 1.8.
- allows granular enablement of new features. For example, imagine that
  we introduce an experimental feature (eg an optimization enabled by a
  flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that
  flag. We would then avoid writing the flag into any cfiles, and we
  would allow compatible downgrade back to 1.2.
- allows distinguishing behavior between incompatible and compatible
  changes

I manually validated that, if I wrote data with a new server, and tried
to read it with an old one, it gave me an error with "bad magic" as
expected.

Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/tools/kudu-tool-test.cc
7 files changed, 81 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/5678/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] tablet: make time-based flushing configurable

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5690

to review the following change.

Change subject: tablet: make time-based flushing configurable
..

tablet: make time-based flushing configurable

We currently prioritize flushes for any in-memory data which has been
sitting longer than 2 minutes. This patch changes the time threshold to
be a new experimental flag rather than a hard-coded constant.

I found this useful to tweak when I wanted to run some experiments
regarding the on-disk size of data. With this flag, I was able to set
the threshold low to ensure that all of my inserted data was quickly
flushed to disk rather than sitting around in MRS where I couldn't
measure it.

Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87
---
M src/kudu/tablet/tablet_peer_mm_ops.cc
1 file changed, 7 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5690/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic9a546b82e5b5db76473455df1eab2769c443c87
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5689

to review the following change.

Change subject: tpch: improve encodings and compression
..

tpch: improve encodings and compression

Previously all of the columns had been hard-coded to 'PLAIN' encoding.
This is no longer our default, nor would we recommend it for the types
of data used in the TPCH dataset.

This switches to default encodings everywhere, and also enables LZ
compression on the "Comment" column.

The reduction in data size is as follows:

original:
  size: 993MB
  median scan time for TPCH1 query: 0.8685 sec

with LZ4 'comment':
  size: 901MB (1.1x compression vs original)
  scan time: unaffected (query does not read comment column)

with LZ4 'comment' and new encodings:
  size: 342MB (2.9x compression vs original)
  median scan time: 0.8488 sec

Per the above, the on-disk size is reduced by almost 3x and the scan
performance is improved by a couple percent (perhaps within the realm of
measurement error). This workload is small enough to be fully
RAM-resident, but in a larger dataset which is disk-bound on reads, the
space reduction should yield a corresponding improvement in scan performance.

Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
---
M src/kudu/benchmarks/tpch/tpch-schemas.h
1 file changed, 9 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/5689/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 


[kudu-CR] tpch: dont clear data when loading is disabled

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5691

to review the following change.

Change subject: tpch: dont clear data when loading is disabled
..

tpch: dont clear data when loading is disabled

Previously, the benchmark would clear the minicluster data on every
startup, even if 'tpch_load_data' was disabled. With this change, we
only clear the minicluster when we are loading fresh data.

Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a
---
M src/kudu/benchmarks/tpch/tpch_real_world.cc
1 file changed, 5 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/5691/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5691
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieb20f008dd6c5d530e0ba0a6aad21b377e3cc37a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5679

to look at the new patch set (#3).

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 214 insertions(+), 115 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 137: // Check if the on-disk data size matches with the buffer
> Nit: terminate with period.
Done


Line 144:   } else {
> Shouldn't we explicitly test for == 2? Otherwise if/when we add cfile v3, o
I think if we were to make another change to this format, we would not bump the 
cfile version number, but instead we'd introduce it as an 
'incompatible_feature' flag in the footer. Put another way, I dont expect us to 
ever have a new cfile version number.


PS2, Line 149: uncompressed_size_
> Isn't this -1 at this point?
woops. I wonder how this worked... perhaps due to unsigned integer nonsense. 
Will move it back where it belongs.


Line 172:   DCHECK_GE(uncompressed_size_, 0);
> If you call uncompressed_size() below, you won't need this.
yea, though I think calling your own public accessors is a bit ugly because it 
obscures what's going on (if I saw that I might think something's getting 
calculated, for example)


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.h
File src/kudu/cfile/block_compression.h:

Line 43: // CFile version 2
> Should include the  bit in here too.
Done


Line 61:   // Sets "*result" to the compressed version of the "data".
> Update to data_slices.
Done


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/compression-test.cc
File src/kudu/cfile/compression-test.cc:

Line 77: class TestCompression : public CFileTestBase {
> Curious: what prompted the changes to this test?
(a) switched from uint32/GROUP_VARINT to INT32/BIT_SHUFFLE since users can no 
longer specify uint32 columns in the first place and GROUP_VARINT encoding is 
deprecated.
(b) added the 'random' data generation with PLAIN encoding to ensure that some 
blocks we created were non-compressible and exercise the new code path. Will 
add a comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS1, Line 80: 2_t len = 
> Is the case of parsed_len == 0 stopped representing a corrupted block?  I u
I don't think it's really problematic -- it'll just feed the empty block to the 
underlying encoding's decoder, and that will then notice that it's too short 
and fire a more specific Corruption error.


PS1, Line 144: AndParseHeader());
> nit: consider to be more explicit here:
Done


http://gerrit.cloudera.org:8080/#/c/5678/2/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS2, Line 71: const char kMagicStringV1[] = "kuducfil";
: const char kMagicStringV2[] = "kuducfl2";
: const int kMagicLength = 8;
> What about:
meh... I think it's easy enough to look at them and see that they're the same 
length, and with the constant right above it, it's pretty clear that they are 
supposed to have the same one.

I like compile_asserts for things like checking the sizeof() a particular 
struct, where you are afraid someone might make some well-meaning change which 
breaks some previous assumption on sizes, but here it's visible what's going on 
and if anyone made such a change it would break pretty immediately when you ran 
any test.


-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: KUDU-1767. Document possible client op reordering

2017-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: docs: KUDU-1767. Document possible client op reordering
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5464/4//COMMIT_MSG
Commit Message:

PS4, Line 7: docs: KUDU-1767. Document possible client op reordering
> Well, we are documenting KUDU-1767. Are you saying it's confusing?
It was a little bit confusing to me because I thought the Jira item mention 
marks the code which fixes/resolves the mentioned issue.  That's what I've seen 
so far.

However, that's not the rule, right?  So, probably it's ok to keep it there 
then.  That's why it was a 'nit' :)


-- 
To view, visit http://gerrit.cloudera.org:8080/5464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.cc
File src/kudu/cfile/block_compression.cc:

Line 137: // Check if the on-disk data size matches with the buffer
Nit: terminate with period.


Line 144:   } else {
Shouldn't we explicitly test for == 2? Otherwise if/when we add cfile v3, old 
deployments will assume the compression header looks like a v2 compression 
header.


PS2, Line 149: uncompressed_size_
Isn't this -1 at this point?


Line 172:   DCHECK_GE(uncompressed_size_, 0);
If you call uncompressed_size() below, you won't need this.


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/block_compression.h
File src/kudu/cfile/block_compression.h:

Line 43: // CFile version 2
Should include the  bit in here too.


Line 61:   // Sets "*result" to the compressed version of the "data".
Update to data_slices.


http://gerrit.cloudera.org:8080/#/c/5679/2/src/kudu/cfile/compression-test.cc
File src/kudu/cfile/compression-test.cc:

Line 77: class TestCompression : public CFileTestBase {
Curious: what prompted the changes to this test?


-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] docs: KUDU-1767. Document possible client op reordering

2017-01-11 Thread Mike Percy (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5464

to look at the new patch set (#5).

Change subject: docs: KUDU-1767. Document possible client op reordering
..

docs: KUDU-1767. Document possible client op reordering

This patch documents the possibility of client reordering when using
concurrent flush modes in both the C++ and Java clients.

Also adds a note to the "Transaction Semantics" section of the user
guide.

Tested the docs rendering by building the user guide and the C++ and
Java API docs locally.

Also updated some old issue URLs to point to the ASF JIRA instance.

Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224
---
M docs/transaction_semantics.adoc
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduSession.java
M 
java/kudu-client/src/main/java/org/apache/kudu/client/SessionConfiguration.java
M src/kudu/client/client.h
5 files changed, 147 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/5464/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] docs: KUDU-1767. Document possible client op reordering

2017-01-11 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: docs: KUDU-1767. Document possible client op reordering
..


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/5464/4//COMMIT_MSG
Commit Message:

PS4, Line 7: docs: KUDU-1767. Document possible client op reordering
> nit: it seems KUDU-1767 is not exactly about documenting write operations r
Well, we are documenting KUDU-1767. Are you saying it's confusing?


http://gerrit.cloudera.org:8080/#/c/5464/3/docs/transaction_semantics.adoc
File docs/transaction_semantics.adoc:

PS3, Line 217: In `AUTO_
> nit: noticeable
Done


http://gerrit.cloudera.org:8080/#/c/5464/4/docs/transaction_semantics.adoc
File docs/transaction_semantics.adoc:

PS4, Line 220: noticable
> nit: noticeable
Done


http://gerrit.cloudera.org:8080/#/c/5464/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java:

PS4, Line 53: read
> guess we should remove this too
Done


http://gerrit.cloudera.org:8080/#/c/5464/4/java/kudu-client/src/main/java/org/apache/kudu/client/SessionConfiguration.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/SessionConfiguration.java:

PS4, Line 57: flush
:  * simultaneously
> nit: may be, stop defining flush via flush and replace with
thanks for the good suggestion, done


PS4, Line 165: FlushMode#AUTO_FLUSH_BACKGROUND
> IIRC this is also true for MANUAL_FLUSH mode as well.
Done


http://gerrit.cloudera.org:8080/#/c/5464/4/src/kudu/client/client.h
File src/kudu/client/client.h:

PS4, Line 1227:  to Kudu
> nit: consider dropping that
Done


PS4, Line 1227: This
  : ///   is because the buffers may flush concurrently
> nit: consider describing what it means in terms of write operations, may be
Done


PS4, Line 1242: to Kudu
> nit: consider dropping
Done


PS4, Line 1243: This is because the buffers may
  : ///   flush concurrently
> nit: ditto with explaining that in terms of write operations sent to the se
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65215d833c65e54fcf080d61adc5f6ed3d303224
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS1, Line 80: 2_t len = 
Is the case of parsed_len == 0 stopped representing a corrupted block?  I 
understand that comparison like 'if (*parsed_len < 0)' does not make sense 
since parsed_len is uint32_t, but what about the boundary case?


PS1, Line 144: AndParseHeader());
nit: consider to be more explicit here:
  if (PREDICT_FALSE(footer_->incompatible_features() != 0)) {
...
  }


http://gerrit.cloudera.org:8080/#/c/5678/2/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS2, Line 71: const char kMagicStringV1[] = "kuducfil";
: const char kMagicStringV2[] = "kuducfl2";
: const int kMagicLength = 8;
What about:

const size_t kMagicLength = 8;
const char kMagicStringV1[] = "kuducfil";
const char kMagicStringV2[] = "kuducfl2";

static_assert(sizeof(kMagicStringV1) == kMagicLength + 1, "wrong magic length");
static_assert(sizeof(kMagicStringV2) == kMagicLength + 1, "wrong magic length");


-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1751: Change default encodings

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1751: Change default encodings
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1751: Change default encodings

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#8).

Change subject: KUDU-1751: Change default encodings
..

KUDU-1751: Change default encodings

* Change numeric (int/float/double) encodings to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

* Change BINARY encodings to DICT_ENCODING

This is the default in Parquet, and we've seen that it's a common reason
that Kudu performs poorly. This automatically falls back to
non-dict-encoded for high-cardinality data.

Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
M src/kudu/tablet/diskrowset-test.cc
3 files changed, 14 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5678

to look at the new patch set (#2).

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..

KUDU-1600 (part 1): bump to CFile version 2

This is preparatory yak-shaving work for KUDU-1600, an effort to prevent
double-compression of inherently-compressed encodings such as
bitshuffle. In order to enable or disable compression on a per-block
basis, we need to change the header of the compressed blocks to indicate
whether that block was compressed. Doing so, however, is obviously an
incompatible change that earlier versions of Kudu would be confused by.

This is not itself a blocker: we haven't guaranteed yet that we will
always support downgrade between successive versions of Kudu. However,
if someone _does_ downgrade, we would prefer not to crash in confusing
ways or silently return corrupt data. Instead, we would like to give an
error message that clearly traces back to an incompatible file format.

At first glance, it would seem that the route to make such an
incompatible change would be to bump the 'major_version' field present
in the CFileHeaderPB. However, it turns out, despite this field existing
since the very first ever commit to Kudu, we never implemented any code
that actually _reads_ the field. So, bumping the major version would not
actually prevent an incompatible (earlier) Kudu server from reading a
file that it can't understand. Woops!

So, in order to make the new CFiles incompatible with the old, the only
solution is to actually bump the CFile magic string itself. This commit
changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying
to read such a file will now result in a "bad magic" error message. It's
still not 100% obvious to the average user, but it's better than a weird
error about mismatched compression lengths or a crash.

While bumping the magic string solves the immediate issue at hand, this
is also a good opportunity to fix the structural problem and add a
better mechanism to make such changes in the future.

Rather than implement the original plan of major/minor version, this
patch takes a "feature flags" approach modeled after the 'ext' family of
file systems. The advantages of feature flags over a linear 'version
number' scheme are:

- simplifies non-linear backporting of features to minor releases. For
  example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this
  makes it possible to backport Y without X to Kudu 1.8.
- allows granular enablement of new features. For example, imagine that
  we introduce an experimental feature (eg an optimization enabled by a
  flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that
  flag. We would then avoid writing the flag into any cfiles, and we
  would allow compatible downgrade back to 1.2.
- allows distinguishing behavior between incompatible and compatible
  changes

Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/tools/kudu-tool-test.cc
7 files changed, 81 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/5678/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5679

to look at the new patch set (#2).

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 200 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5678/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 30: : the string 'kuducfl2'
> Maybe we should have both here, with a note about how they enable two "vers
Done


http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile.proto
File src/kudu/cfile/cfile.proto:

Line 33: message CFileHeaderPB {
> Hmm, but don't we need to retain major/minor version in the definition in o
Right, if we now open and read a file that has these fields, they'll just be 
skipped by the new reader. I removed them to save a few bytes of memory since 
every CFileReader instantiates and retains a header.

Given that the values never had any meaning, I don't find it particularly sad 
that we won't see the output in the various 'dump' tools.


http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS1, Line 79:  strlen(kMagicStringV2))
> Would be nice to have a compile-time assert that kMagicStringV1 and kMagicS
we can't do a compile assert because strlen() cannot be evaluated at compile 
time. I'll introduce a new constant


Line 81: return Status::Corruption("invalid data size");
> Not a huge fan of functions returning a bad status but having also written 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1751: Change default INT and BINARY encodings

2017-01-11 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1751: Change default INT and BINARY encodings
..


Patch Set 7:

(3 comments)

Test failures look legit.

http://gerrit.cloudera.org:8080/#/c/5169/7//COMMIT_MSG
Commit Message:

Line 7: KUDU-1751: Change default INT and BINARY encodings
I think it's fair to say 'Change default encodings', since the default for 
every type is changing.


PS7, Line 9: i
encodings


PS7, Line 9: INT
The floating point types are also changing.


-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5678/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 30: : the string 'kuducfl2'
Maybe we should have both here, with a note about how they enable two 
"versions" of cfile?


http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile.proto
File src/kudu/cfile/cfile.proto:

Line 33: message CFileHeaderPB {
Hmm, but don't we need to retain major/minor version in the definition in order 
to read older cfile headers? Or is the point that, there was never any code to 
read them, so there's no point in defining them? Is that true across the board? 
Even for the CLI tool, which might dump a DebugString() of the PB?


http://gerrit.cloudera.org:8080/#/c/5678/1/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS1, Line 79:  strlen(kMagicStringV2))
Would be nice to have a compile-time assert that kMagicStringV1 and 
kMagicStringV2 are the same length. Or to use the right one here dependign on 
the cfile_version.


Line 81: return Status::Corruption("invalid data size");
Not a huge fan of functions returning a bad status but having also written to 
output parameters. Can you fix that?


-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] Doc review for 1.2

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Doc review for 1.2
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5676/1/docs/kudu_impala_integration.adoc
File docs/kudu_impala_integration.adoc:

PS1, Line 376: DISTRIBUTE BY HASH INTO (id) 16 BUCKETS
> You mean
I'm documenting the 2.7 syntax since 2.8 isn't released yet. But you're right 
it should say (id)


PS1, Line 413: DISTRIBUTE BY HASH(name) INTO 8 BUCKETS
> Also new syntax:
same (2.7 syntax)


PS1, Line 448: DISTRIBUTE BY RANGE (_id)
> PARTITION BY
I believe that's a misconception


PS1, Line 449: SPLIT ROWS((1439560049342),
 :  (1439566253755),
 :  (1439572458168),
 :  (1439578662581),
 :  (1439584866994),
 :  (1439591071407))
> Now the syntax is different with 'PARTITION x <= VALUES < y'. I'm not sure 
2.7 syntax


PS1, Line 482: '\<', '\>', `>=`, or `IN`
> Does this apply for BETWEEN also, since that acts the same as a couple of c
oh, yea, BETWEEN should get pushed


PS1, Line 724: re-enable when documenting
 : // Impala 2.8.
> Aren't we documenting Impala 2.8 now or is this some intermediate stage?
these docs will release with Kudu 1.2, which will release prior to Impala 2.8 
being available. We'll update them again for Impala 2.8 in a couple weeks


PS1, Line 941: 8
> I thought it was still true. You have to alter the TBLPROPERTIES kudu.table
ah, didn't know that.


-- 
To view, visit http://gerrit.cloudera.org:8080/5676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Doc review for 1.2

2017-01-11 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: Doc review for 1.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5676/1/docs/kudu_impala_integration.adoc
File docs/kudu_impala_integration.adoc:

PS1, Line 376: DISTRIBUTE BY HASH INTO (id) 16 BUCKETS
DISTRIBUTE BY HASH (id) PARTITIONS 16


-- 
To view, visit http://gerrit.cloudera.org:8080/5676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I64ad4925ba4402ef3733a0ab62da91f1049848fb
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] [kudu-jepsen] Kudu Jepsen tests

2017-01-11 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5492

to look at the new patch set (#19).

Change subject: [kudu-jepsen] Kudu Jepsen tests
..

[kudu-jepsen] Kudu Jepsen tests

This patch contains David's code for the initial kudu-jepsen tests
as it was before KUDU-798 was resolved (i.e. as it was when it was failing)
and additional updates/fixes:
  * Extra nemeses for the read/write register linearizability test
  * Run multiple test scenarios in the scope of the register test
  * Starting up master server: wait for the catalog manager
  * Other unsorted fixes for more robust operation

The clojure code is integrated into the Kudu maven build and is compiled
along with the other projects in a separate 'jepsen' profile.

The patch also adds functionality to run the kudu-jepsen tests
from the clojure-maven-plugin.  The test uses the build machine
as the Jepsen control node, running the control logic and the Kudu
Java client there.

All Jepsen control operations on the DB nodes (i.e. Kudu master and
tserver nodes) are run via SSH.  The private SSH key should be set prior
to running the test:

  1. The public part of the SSH key should be added into the
 'authorized_keys' file for the root user on all cluster nodes.

  2. The private part of the SSH key should be provided to the test
 either by:
   * adding the key into the SSH agent on the control node
   * specifying the path to the key via 'sshKeyPath' property

Having the Kudu cluster provisioned and SSH keys deployed, to run
the tests against the cluster with master node m0 and tserver nodes
{t0..t4}, build the Kudu project from sources and then execute
the following in the $KUDU_HOME/java/kudu-jepsen directory:

  mvn clojure:run -DmasterNodes=m0 -DtserverNodes="t0,t1,t2,t3,t4"

after bulding the top-level project with

  mvn clean compile test-compile -Pjepsen

Restrictions:
  1. Kudu nodes should run recent Debian/Ubuntu Linux distro
 (that's due to the internal Jepsen's restrictions).

  2. The 'kudu-master', 'kudu-tserver' and 'kudu' binaries in
 the $KUDU_HOME/build/latest/bin should be built for the OS/distro
 running on the Kudu cluster nodes.

Change-Id: I590c6e78840304b3131666c7037ff9a08dc77dea
---
A java/kudu-jepsen/.gitignore
A java/kudu-jepsen/README.adoc
A java/kudu-jepsen/pom.xml
A java/kudu-jepsen/resources/kudu.flags
A java/kudu-jepsen/resources/ntp.conf.common
A java/kudu-jepsen/resources/ntp.conf.server
A java/kudu-jepsen/src/main/clojure/jepsen/kudu.clj
A java/kudu-jepsen/src/main/clojure/jepsen/kudu/client.clj
A java/kudu-jepsen/src/main/clojure/jepsen/kudu/nemesis.clj
A java/kudu-jepsen/src/main/clojure/jepsen/kudu/register.clj
A java/kudu-jepsen/src/main/clojure/jepsen/kudu/table.clj
A java/kudu-jepsen/src/main/clojure/jepsen/kudu/util.clj
A java/kudu-jepsen/src/test/clojure/jepsen/kudu_test.clj
A java/kudu-jepsen/src/utils/kudu_test_runner.clj
M java/pom.xml
15 files changed, 1,544 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5492/19
-- 
To view, visit http://gerrit.cloudera.org:8080/5492
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I590c6e78840304b3131666c7037ff9a08dc77dea
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1771. Java client "connection refused" errors logged as "connection reset"

2017-01-11 Thread Jun He (Code Review)
Jun He has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5680

Change subject: KUDU-1771. Java client "connection refused" errors logged as 
"connection reset"
..

KUDU-1771. Java client "connection refused" errors logged as
"connection reset"

This commit changes cleanup() to accepts a string input.
It allows a caller to customize its error message based
on the caller site context.

Change-Id: If9eb40f11757ae76bc430b3f513b96592068d6e2
---
M java/kudu-client/src/main/java/org/apache/kudu/client/TabletClient.java
1 file changed, 8 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/5680/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If9eb40f11757ae76bc430b3f513b96592068d6e2
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jun He 


[kudu-CR] KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5679

to review the following change.

Change subject: KUDU-1600 (part 2): store uncompressed blocks when codec can't 
compress
..

KUDU-1600 (part 2): store uncompressed blocks when codec can't compress

This changes the compressed block header for CFile v2 so that it only
includes the uncompressed block length, and not a redundant copy of the
compressed block length (which is already known by the length of the
block itself).

In addition, this enables a special behavior if the stored uncompressed
block length is exactly equal to the length of the compressed data. This
signifies that the data has not been compressed and that the reader
should not execute the configured codec.

On the write side, we always execute the codec, and in the case that the
resulting compression ratio is not good enough, we just store the
uncompressed data as above. By default, the compression ratio is 0.9x
(i.e at least 10% space reduction). It can be configured by an
experimental flag.

Initially, the JIRA had described doing this based on a policy rather
than based on the actual output of the codec. However, during
development I realized that, even with encodings like BIT_SHUFFLE, there
are actually some patterns that can benefit from a second pass of LZ4.
And even if the second pass is not effective, it isn't too expensive to
try a second pass on the write side anyway. See [1] for a discussion of
a dataset that benefits from multiple passes of compression.

Along the way, made a few changes to how the compressed block decoder is
implemented:

* Use strings::Substitute instead of StringPrintf (easier to read)
* Cleaned up unnecessary header includes
* Made the CompressedBlockDecoder a short-lifetime stack object rather
  than an allocated member of CFileReader. This avoids an allocation and
  indirection, and additionally allows us to put more state inside the
  object itself without worrying about thread safety, etc.
* The CompressedBlockDecoder is now stateful, making for an
  easier-to-use API.

[1] https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
---
M src/kudu/cfile/block_compression.cc
M src/kudu/cfile/block_compression.h
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/compression-test.cc
6 files changed, 200 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/5679/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8306f1f2139ebf7500a83ee46b4ccd6c7b5e137f
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 


[kudu-CR] KUDU-1751: Change default INT and BINARY encodings

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5169

to look at the new patch set (#7).

Change subject: KUDU-1751: Change default INT and BINARY encodings
..

KUDU-1751: Change default INT and BINARY encodings

* Change INT incodings to BIT_SHUFFLE

BIT_SHUFFLE is a better default than PLAIN since it's much more compact
and generally performs better.

* Change BINARY encodings to DICT_ENCODING

This is the default in Parquet, and we've seen that it's a common reason
that Kudu performs poorly. This automatically falls back to
non-dict-encoded for high-cardinality data.

Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/type_encodings.cc
2 files changed, 12 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/5169/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32db89337026eb6be1ff450a6cb2b2862f7a
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1600 (part 1): bump to CFile version 2

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5678

to review the following change.

Change subject: KUDU-1600 (part 1): bump to CFile version 2
..

KUDU-1600 (part 1): bump to CFile version 2

This is preparatory yak-shaving work for KUDU-1600, an effort to prevent
double-compression of inherently-compressed encodings such as
bitshuffle. In order to enable or disable compression on a per-block
basis, we need to change the header of the compressed blocks to indicate
whether that block was compressed. Doing so, however, is obviously an
incompatible change that earlier versions of Kudu would be confused by.

This is not itself a blocker: we haven't guaranteed yet that we will
always support downgrade between successive versions of Kudu. However,
if someone _does_ downgrade, we would prefer not to crash in confusing
ways or silently return corrupt data. Instead, we would like to give an
error message that clearly traces back to an incompatible file format.

At first glance, it would seem that the route to make such an
incompatible change would be to bump the 'major_version' field present
in the CFileHeaderPB. However, it turns out, despite this field existing
since the very first ever commit to Kudu, we never implemented any code
that actually _reads_ the field. So, bumping the major version would not
actually prevent an incompatible (earlier) Kudu server from reading a
file that it can't understand. Woops!

So, in order to make the new CFiles incompatible with the old, the only
solution is to actually bump the CFile magic string itself. This commit
changes it from 'kuducfil' to 'kuducfl2'. An incompatible server trying
to read such a file will now result in a "bad magic" error message. It's
still not 100% obvious to the average user, but it's better than a weird
error about mismatched compression lengths or a crash.

While bumping the magic string solves the immediate issue at hand, this
is also a good opportunity to fix the structural problem and add a
better mechanism to make such changes in the future.

Rather than implement the original plan of major/minor version, this
patch takes a "feature flags" approach modeled after the 'ext' family of
file systems. The advantages of feature flags over a linear 'version
number' scheme are:

- simplifies non-linear backporting of features to minor releases. For
  example, if Kudu 2.0 adds feature X, and 2.1 adds feature Y, this
  makes it possible to backport Y without X to Kudu 1.8.
- allows granular enablement of new features. For example, imagine that
  we introduce an experimental feature (eg an optimization enabled by a
  flag) in Kudu 1.3, and a user upgrades to 1.3 without enabling that
  flag. We would then avoid writing the flag into any cfiles, and we
  would allow compatible downgrade back to 1.2.
- allows distinguishing behavior between incompatible and compatible
  changes

Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
6 files changed, 63 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/5678/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I13999de3d406fb184cea3e6e8d52ffd80b8d8ee3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 


[kudu-CR] [java client] KUDU-1643 Prune hash partitions based on IN-list predicates

2017-01-11 Thread Haijie Hong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5677

to look at the new patch set (#2).

Change subject: [java client] KUDU-1643 Prune hash partitions based on IN-list 
predicates
..

[java client] KUDU-1643 Prune hash partitions based on IN-list predicates

I change Integer to List so that it can store bitset of
hash values.
It will search all combinations of in-list column value and calculate the hash
values.

Change-Id: I8a793c23ff00d19b3d3d062bb222d2c725a93724
---
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
2 files changed, 181 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/5677/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a793c23ff00d19b3d3d062bb222d2c725a93724
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong 
Gerrit-Reviewer: Kudu Jenkins