Todd Lipcon has submitted this change and it was merged. Change subject: Enable compression and smaller block size for composite key index ......................................................................
Enable compression and smaller block size for composite key index After running the time series workload on d2106 for a couple months, I found a couple interesting things: - The composite key index (aka "ad hoc index") was taking 6.4 bytes per row (vs 5.23 *bits* for the actual data). Compressing it with 'lzop' on that dataset gained a 6.2x savings. Thus, this patch changes this index to be compressed using LZ4 by default, which should save space. On the tpch lineitem table, it saved about 15%. The performance cost should be fairly minimal -- we always random-access the index blocks, and in the case of a cache miss, the cost of decompression is tiny compared to the cost of the resulting disk seek. - Once we reached ~12B rows, the system degenerated into a seeky mess. Looking at tracing revealed that we spent a lot of time reading composite indexes, indicating they weren't fitting well in the cache. I theorize that making these index blocks smaller should decrease the amount of excess data that gets pulled into the cache when we read them. Given that these blocks are always random-accessed and never scanned, using small block sizes makes intuitive sense. Eventually, both of these options should be table properties, but it was easier to just set better defaults for now as a quick improvement. Change-Id: I2b7bfc7a4961c764d262524292ec56e3969af728 Reviewed-on: http://gerrit.cloudera.org:8080/953 Reviewed-by: Jean-Daniel Cryans Tested-by: Kudu Jenkins --- M src/kudu/tablet/diskrowset.cc 1 file changed, 6 insertions(+), 0 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2b7bfc7a4961c764d262524292ec56e3969af728 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]>
