Adar Dembo has submitted this change and it was merged.

Change subject: tpch: improve encodings and compression
......................................................................


tpch: improve encodings and compression

Previously all of the columns had been hard-coded to 'PLAIN' encoding.
This is no longer our default, nor would we recommend it for the types
of data used in the TPCH dataset.

This switches to default encodings everywhere, and also enables LZ
compression on the "Comment" column.

The reduction in data size is as follows:

original:
  size: 993MB
  median scan time for TPCH1 query: 0.8685 sec

with LZ4 'comment':
  size: 901MB (1.1x compression vs original)
  scan time: unaffected (query does not read comment column)

with LZ4 'comment' and new encodings:
  size: 342MB (2.9x compression vs original)
  median scan time: 0.8488 sec

Per the above, the on-disk size is reduced by almost 3x and the scan
performance is improved by a couple percent (perhaps within the realm of
measurement error). This workload is small enough to be fully
RAM-resident, but in a larger dataset which is disk-bound on reads, the
space reduction should yield a corresponding improvement in scan performance.

Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Reviewed-on: http://gerrit.cloudera.org:8080/5689
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <a...@cloudera.com>
---
M src/kudu/benchmarks/tpch/tpch-schemas.h
1 file changed, 9 insertions(+), 8 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to