[ 
https://issues.apache.org/jira/browse/KUDU-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826771#comment-15826771
 ] 

Todd Lipcon commented on KUDU-1835:
-----------------------------------

As a data point, here's a little script I ran on a WAL dir from an internal 
production workload showing that LZOP gets 9-10x compression and gzip gets 
14-15x compression:

{code}
# for x in /data/1/kudu/tablet/wal/wals/c9d36f087779437a812036db75d7e006/wal* ; 
do raw_size=$(stat -c '%s' $x) ; gzip_size=$(cat $x | gzip -c | wc -c) ; 
lzop_size=$(cat $x | lzop -c | wc -c) ; echo $raw_size "$lzop_size 
($[$raw_size/$lzop_size]x) $gzip_size ($[$raw_size/$gzip_size]x)" ; done
67914806 8822979 (7x) 5801108 (11x)
69050539 8587242 (8x) 5786937 (11x)
67752983 6745962 (10x) 4591334 (14x)
68524538 6452417 (10x) 4316684 (15x)
69306281 6805018 (10x) 4548035 (15x)
67832665 7254455 (9x) 4826115 (14x)
67112269 7164280 (9x) 4765893 (14x)
67334182 7105344 (9x) 4802748 (14x)
67744136 6938754 (9x) 4799502 (14x)
67980985 7152674 (9x) 4740059 (14x)
68014865 7076908 (9x) 4699722 (14x)
69000245 7183600 (9x) 4772002 (14x)
{code}

> Support compression of the WAL
> ------------------------------
>
>                 Key: KUDU-1835
>                 URL: https://issues.apache.org/jira/browse/KUDU-1835
>             Project: Kudu
>          Issue Type: Improvement
>          Components: log, perf
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In some workloads, particularly those which get good compression rates of the 
> underlying data (eg via dictionary coding), the WAL becomes a big bottleneck 
> for write performance. In addition, the large size of WALs can often mean 
> that old WALs get GCed rapidly and cause lagging replicas to get evicted 
> after only a temporary bout of slowness. Making WALs smaller would mean that 
> we can retain more history without the cost of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to