Hello Will Berkeley, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8982

to look at the new patch set (#2).

Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected
......................................................................

KUDU-2253 Deltafile on-disk size is 3x larger than expected

While looking into the performance of the integration test written for
KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I
found that the on-disk deltafiles written are about 3x larger than
expected. The culprit is an optimization in the CFile value index which
is turned off for delta files. The optimization truncates large keys
after the first unique byte between sequential values. The deltafile
values, in the case of this integration test, include the small
DeltaKey, and the 8KiB updated value. As a result the BTree interior
nodes are being completely filled by only ~4 values (32KiB cblock size
by default). This makes the BTree far less effective, and means that the
full updated data is written many times. We expect fixing this will
improve performance for update-heavy workloads with large values (for
example, YCSB).

Unfortunately, fixing the issue is not quite as simple as enabling the
optimization for deltafiles, since in the normal course of seeking
through deltafiles during a scan, we deserialze the value index keys
into a DeltaKey. If the values are truncated this deserialization step
can fail. Instead, this patch adds overridable value index key encoding
to CFileWriter, and delta file overrides it to only encode the delta
key, which is usually very short, and a maximum of ~18 bytes.

Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c
---
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/tablet/deltafile.cc
3 files changed, 40 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/2
--
To view, visit http://gerrit.cloudera.org:8080/8982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c
Gerrit-Change-Number: 8982
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>

Reply via email to