[kudu-CR] tpch: improve encodings and compression

2017-01-12 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] tpch: improve encodings and compression

2017-01-12 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged.

Change subject: tpch: improve encodings and compression
..


tpch: improve encodings and compression

Previously all of the columns had been hard-coded to 'PLAIN' encoding.
This is no longer our default, nor would we recommend it for the types
of data used in the TPCH dataset.

This switches to default encodings everywhere, and also enables LZ
compression on the "Comment" column.

The reduction in data size is as follows:

original:
  size: 993MB
  median scan time for TPCH1 query: 0.8685 sec

with LZ4 'comment':
  size: 901MB (1.1x compression vs original)
  scan time: unaffected (query does not read comment column)

with LZ4 'comment' and new encodings:
  size: 342MB (2.9x compression vs original)
  median scan time: 0.8488 sec

Per the above, the on-disk size is reduced by almost 3x and the scan
performance is improved by a couple percent (perhaps within the realm of
measurement error). This workload is small enough to be fully
RAM-resident, but in a larger dataset which is disk-bound on reads, the
space reduction should yield a corresponding improvement in scan performance.

Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Reviewed-on: http://gerrit.cloudera.org:8080/5689
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/benchmarks/tpch/tpch-schemas.h
1 file changed, 9 insertions(+), 8 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 1:

Nah, I'd rather keep the old perf data because it's useful to see whether it 
goes up or down after this patch :)

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: tpch: improve encodings and compression
..


Patch Set 1: Code-Review+2

No point in deleting the old performance data, right? Since scan performance is 
likely the same.

-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] tpch: improve encodings and compression

2017-01-11 Thread Todd Lipcon (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5689

to review the following change.

Change subject: tpch: improve encodings and compression
..

tpch: improve encodings and compression

Previously all of the columns had been hard-coded to 'PLAIN' encoding.
This is no longer our default, nor would we recommend it for the types
of data used in the TPCH dataset.

This switches to default encodings everywhere, and also enables LZ
compression on the "Comment" column.

The reduction in data size is as follows:

original:
  size: 993MB
  median scan time for TPCH1 query: 0.8685 sec

with LZ4 'comment':
  size: 901MB (1.1x compression vs original)
  scan time: unaffected (query does not read comment column)

with LZ4 'comment' and new encodings:
  size: 342MB (2.9x compression vs original)
  median scan time: 0.8488 sec

Per the above, the on-disk size is reduced by almost 3x and the scan
performance is improved by a couple percent (perhaps within the realm of
measurement error). This workload is small enough to be fully
RAM-resident, but in a larger dataset which is disk-bound on reads, the
space reduction should yield a corresponding improvement in scan performance.

Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
---
M src/kudu/benchmarks/tpch/tpch-schemas.h
1 file changed, 9 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/5689/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I168eb1c4ff619556f6879a20fe335a6158d0e81b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jean-Daniel Cryans