[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229806#comment-15229806 ] Branimir Lambov commented on CASSANDRA-4338: Switch to byte buffers, and making a direct/on-heap choice that makes best sense for the subclass or compressor was implemented as part of CASSANDRA-8709, which is included in 2.2. The issue is now obsolete, unless we want to backport the patch to 2.1. > Experiment with direct buffer in SequentialWriter > - > > Key: CASSANDRA-4338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Jonathan Ellis >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 2.1.x > > Attachments: 4338-gc.tar.gz, 4338.benchmark.png, > 4338.benchmark.snappycompressor.png, 4338.single_node.read.png, > 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, > gc-trunk.png, gc-with-patch-me.png > > > Using a direct buffer instead of a heap-based byte[] should let us avoid a > copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228440#comment-15228440 ] Sylvain Lebresne commented on CASSANDRA-4338: - [~blambov] Can you comment on Jonathan's question above? > Experiment with direct buffer in SequentialWriter > - > > Key: CASSANDRA-4338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Jonathan Ellis >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 2.1.x > > Attachments: 4338-gc.tar.gz, 4338.benchmark.png, > 4338.benchmark.snappycompressor.png, 4338.single_node.read.png, > 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, > gc-trunk.png, gc-with-patch-me.png > > > Using a direct buffer instead of a heap-based byte[] should let us avoid a > copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682261#comment-14682261 ] Jonathan Ellis commented on CASSANDRA-4338: --- Is this obsoleted by CASSANDRA-9500? Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Branimir Lambov Priority: Minor Labels: performance Fix For: 2.1.x Attachments: 4338-gc.tar.gz, 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796473#comment-13796473 ] Marcus Eriksson commented on CASSANDRA-4338: reads should be exactly the same performance, nothing has been touched there. i want to do the same experiment for RAR/CRAR (reading into a direct BB and decompressing off-heap), will do that soon i hope Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792686#comment-13792686 ] Ryan McGuire commented on CASSANDRA-4338: - Hmm, reading from a single node may not have a very high statistical significance: [data from second attempt|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.4338.CompressedSequentialWriter.single_node.2.jsonmetric=interval_op_rateoperation=stress-readsmoothing=4] Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792732#comment-13792732 ] Jonathan Ellis commented on CASSANDRA-4338: --- Maybe we need those stress improvements [~benedict] was working on. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792787#comment-13792787 ] Benedict commented on CASSANDRA-4338: - I've not deliberately tested out my patch on writes, but I wouldn't expect as dramatic an improvement in consistency once I/O starts entering the picture. Might well make some difference, though. For the read run, not sure what happened there on the Marcus branch. It looks to me like (maybe) some of the stress workers get ahead and finish first, leaving the cache less polluted for the remaining workers. Inconsistent worker count was the cause of persistent drops in performance for my read tests (but here it could explain peaks). If so, my patch will fix that, though could also try running with a lower thread count to confirm. If you want to try with my patch (which will maintain same thread count throughout), any of the linked repos in ticket [CASSANDRA-4718|https://issues.apache.org/jira/browse/CASSANDRA-4718] will do. Btw, have we considered benchmarking these snappy changes for messaging service connections? Might well reduce the software side of the network overhead, although not as dramatically. I do see most of the connection CPU being used in snappy native arrayCopy. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791289#comment-13791289 ] Marcus Eriksson commented on CASSANDRA-4338: ok, thanks, doesnt look like a big difference then i kind of like that the big dips in performance (caused by GC probably) are basically gone though Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791547#comment-13791547 ] Jonathan Ellis commented on CASSANDRA-4338: --- Ryan, can you also test on a single node? If the single-node improvements are still swamped by the network overhead... but if we can reduce that with some of the other efforts going on (CASSANDRA-1632, CASSANDRA-4718) then local performance will matter more. But if Ryan doesn't see much difference on a single node either then we should figure out what the environment difference is. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791548#comment-13791548 ] Jonathan Ellis commented on CASSANDRA-4338: --- I also did some quick looking for a ByteBuffer-capable Checksum implementation. hadoop-common has a NativeCrc32 (using the new intel instructions I think), but only for verifying checkums and not generating them. Adler32 gets {{update(ByteBuffer)}... in jdk8. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791713#comment-13791713 ] Ryan McGuire commented on CASSANDRA-4338: - On a single node: !4338.single_node.write.png! The read was weird, I don't know what that spike is. I'm rerunning this to see if it does it again: !4338.single_node.read.png! Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, 4338.single_node.read.png, 4338.single_node.write.png, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790208#comment-13790208 ] Marcus Eriksson commented on CASSANDRA-4338: yep, that looks very similar did you run it with -I SnappyCompressor ? ive rebased and (force) pushed to https://github.com/krummas/cassandra/tree/marcuse/4338 to get the latency stuff in Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790383#comment-13790383 ] Ryan McGuire commented on CASSANDRA-4338: - {quote} did you run it with -I SnappyCompressor ? {quote} No, I missed that variable, I'll rerun with that as well as for latency metrics. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790573#comment-13790573 ] Ryan McGuire commented on CASSANDRA-4338: - With SnappyCompressor: !4338.benchmark.snappycompressor.png! Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338.benchmark.snappycompressor.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789713#comment-13789713 ] Ryan McGuire commented on CASSANDRA-4338: - trunk is working again, so I have a baseline now: !4388.benchmark.png! [data here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.4338.CompressedSequentialWriter.jsonmetric=interval_op_rateoperation=stress-writesmoothing=3] Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789722#comment-13789722 ] Ryan McGuire commented on CASSANDRA-4338: - And I notice that the marcuse/4338 line still doesn't have latency metrics, if you'd like me to re-run for those stats, I can. Just need to rebase off of CASSANDRA-6153 (or rewrite my tool to use a known good cassandra-stress; right now it just takes the one from the same branch it's testing.) Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338.benchmark.png, 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788367#comment-13788367 ] Ryan McGuire commented on CASSANDRA-4338: - I started to run a benchmark for this but I found CASSANDRA-6153 and CASSANDRA-6154 standing in my way. [Here's the data|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.4338.CompressedSequentialWriter.jsonmetric=interval_op_rateoperation=stress-writesmoothing=4] for my test with [~krummas]' patch, but it's missing any sort of baseline because of those above bugs. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783895#comment-13783895 ] Jonathan Ellis commented on CASSANDRA-4338: --- Promising! Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk-me.png, gc-trunk.png, gc-with-patch-me.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783883#comment-13783883 ] Marcus Eriksson commented on CASSANDRA-4338: So, got the CompressedSequentialWriter working, code pushed to github: https://github.com/krummas/cassandra/commits/marcuse/4338. It uses snappys direct bytebuffer support, and a custom Adler32 made by me that can checksum direct byte buffers (code here: https://github.com/krummas/adler32 (probably only builds on linux, did not spend much time on it). Micro benchmarks look great, almost no GC at all with the patched version (the benchmark is left in main(...) in CompressedSequentialWriter.java): .h2 Trunk !gc-trunk-me.png! .h2 Patched !gc-with-patch-me.png! Proper single-node stress benchmarks look good as well: h2. Trunk {noformat} total,interval_op_rate,interval_key_rate,latency,95th,99.9th,elapsed_time 394141,39414,39414,0.0,0.0,0.0,10 1078321,68418,68418,0.0,0.0,0.0,20 1726219,64789,64789,0.0,0.0,0.0,30 2327295,60107,60107,0.0,0.0,0.0,40 2928533,60123,60123,0.0,0.0,0.0,50 3533878,60534,60534,0.0,0.0,0.0,60 3602168,6829,6829,0.0,0.0,0.0,70 3967820,36565,36565,0.0,0.0,0.0,80 4647217,67939,67939,0.0,0.0,0.0,91 5248142,60092,60092,0.0,0.0,0.0,101 5930662,68252,68252,0.0,0.0,0.0,111 6417903,48724,48724,0.0,0.0,0.0,121 6952933,53503,53503,0.0,0.0,0.0,131 7221662,26872,26872,0.0,0.0,0.0,141 7221662,0,0,0.0,0.0,0.0,151 7221662,0,0,0.0,0.0,0.0,161 7221662,0,0,0.0,0.0,0.0,172 7221662,0,0,0.0,0.0,0.0,182 7221662,0,0,0.0,0.0,0.0,192 7509240,28757,28757,0.0,0.0,0.0,202 7780984,27174,27174,0.0,0.0,0.0,212 7780984,0,0,0.0,0.0,0.0,222 7780984,0,0,0.0,0.0,0.0,232 7780984,0,0,0.0,0.0,0.0,242 8414140,63315,63315,0.0,0.0,0.0,252 8968246,55410,55410,0.0,0.0,0.0,263 9669857,70161,70161,0.0,0.0,0.0,273 10236467,56661,56661,0.0,0.0,0.0,283 10774593,53812,53812,0.0,0.0,0.0,293 10824657,5006,5006,0.0,0.0,0.0,303 11165174,34051,34051,0.0,0.0,0.0,313 11165174,0,0,0.0,0.0,0.0,323 11165174,0,0,0.0,0.0,0.0,333 11165174,0,0,0.0,0.0,0.0,343 11304248,13907,13907,0.0,0.0,0.0,354 11927380,62313,62313,0.0,0.0,0.0,364 12526960,59958,59958,0.0,0.0,0.0,374 13234647,70768,70768,0.0,0.0,0.0,384 13792652,55800,55800,0.0,0.0,0.0,394 14329718,53706,53706,0.0,0.0,0.0,404 14512350,18263,18263,0.0,0.0,0.0,414 14512929,57,57,0.0,0.0,0.0,424 14710476,19754,19754,0.0,0.0,0.0,434 14710476,0,0,0.0,0.0,0.0,445 14710476,0,0,0.0,0.0,0.0,455 15061043,35056,35056,0.0,0.0,0.0,465 15760509,69946,69946,0.0,0.0,0.0,475 16461318,70080,70080,0.0,0.0,0.0,485 17126749,66543,66543,0.0,0.0,0.0,495 17708154,58140,58140,0.0,0.0,0.0,505 18226801,51864,51864,0.0,0.0,0.0,515 18226801,0,0,0.0,0.0,0.0,526 18227225,42,42,0.0,0.0,0.0,536 18858228,63100,63100,0.0,0.0,0.0,546 19459047,60081,60081,0.0,0.0,0.0,556 19988583,52953,52953,0.0,0.0,0.0,566 2000,1141,1141,0.0,0.0,0.0,567 Averages from the middle 80% of values: interval_op_rate : 34003 interval_key_rate : 34003 latency median: 0.0 latency 95th percentile : 0.0 latency 99.9th percentile : 0.0 Total operation time : 00:09:27 END {noformat} h2. Patched version {noformat} total,interval_op_rate,interval_key_rate,latency,95th,99.9th,elapsed_time 398380,39838,39838,0.0,0.0,0.0,10 1090332,69195,69195,0.0,0.0,0.0,20 1756859,66652,66652,0.0,0.0,0.0,30 2408330,65147,65147,0.0,0.0,0.0,40 3021314,61298,61298,0.0,0.0,0.0,50 3602221,58090,58090,0.0,0.0,0.0,60 3602221,0,0,0.0,0.0,0.0,70 4086404,48418,48418,0.0,0.0,0.0,80 4670997,58459,58459,0.0,0.0,0.0,91 5328657,65766,65766,0.0,0.0,0.0,101 5950535,62187,62187,0.0,0.0,0.0,111 6544475,59394,59394,0.0,0.0,0.0,121 7163644,61916,61916,0.0,0.0,0.0,131 7307634,14399,14399,0.0,0.0,0.0,141 7331684,2405,2405,0.0,0.0,0.0,151 7989707,65802,65802,0.0,0.0,0.0,161 8653302,66359,66359,0.0,0.0,0.0,172 9273188,61988,61988,0.0,0.0,0.0,182 9935986,66279,66279,0.0,0.0,0.0,192 10489010,55302,55302,0.0,0.0,0.0,202 10909996,42098,42098,0.0,0.0,0.0,212 10962871,5287,5287,0.0,0.0,0.0,222 11274293,31142,31142,0.0,0.0,0.0,232 11274293,0,0,0.0,0.0,0.0,242 11274293,0,0,0.0,0.0,0.0,252 11297105,2281,2281,0.0,0.0,0.0,263 11946842,64973,64973,0.0,0.0,0.0,273 12509283,56244,56244,0.0,0.0,0.0,283 13205933,69665,69665,0.0,0.0,0.0,293 13809534,60360,60360,0.0,0.0,0.0,303 14334735,52520,52520,0.0,0.0,0.0,313 14615255,28052,28052,0.0,0.0,0.0,323 14615958,70,70,0.0,0.0,0.0,333 14841997,22603,22603,0.0,0.0,0.0,343 14841997,0,0,0.0,0.0,0.0,354 14841997,0,0,0.0,0.0,0.0,364 15262968,42097,42097,0.0,0.0,0.0,374 15943731,68076,68076,0.0,0.0,0.0,384 16619205,67547,67547,0.0,0.0,0.0,394 17197417,57821,57821,0.0,0.0,0.0,404 17776353,57893,57893,0.0,0.0,0.0,414 18235461,45910,45910,0.0,0.0,0.0,424 18267460,3199,3199,0.0,0.0,0.0,434 18592152,32469,32469,0.0,0.0,0.0,445 18732480,14032,14032,0.0,0.0,0.0,455 19328150,59567,59567,0.0,0.0,0.0,465 19930114,60196,60196,0.0,0.0,0.0,475 2000,6988,6988,0.0,0.0,0.0,479 Averages
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778871#comment-13778871 ] Marcus Eriksson commented on CASSANDRA-4338: so, using a direct bytebuffer in SequentialWriter generates alot less garbage in my micro benchmarks (will post patch and graphs later) - mostly by not having to copy the incoming byte array, instead just pushing the data to a direct BB. It is also a bit faster (~5%), maybe just because of less gc. Making it work with CompressedSequentialWriter is not as easy since we then need to either use a standard byte[] buffer and compress that before pushing it off-heap/to disk or copy to the heap, compress and then push it back. Neither will be any improvement. but, then i found out that snappy can compress a direct byte buffer without copying anything to the heap: https://github.com/xerial/snappy-java/blob/develop/src/main/java/org/xerial/snappy/Snappy.java#L126 problem is that LZ4 does not support that (yet?): https://github.com/jpountz/lz4-java/issues/9 Hadoop seems to ship their own native code to solve this problem: https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/ also related: https://issues.apache.org/jira/browse/HADOOP-8148 i will experiment with making this work with snappy and see how much we can gain by doing it. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778902#comment-13778902 ] Jonathan Ellis commented on CASSANDRA-4338: --- /throws up the [~jpountz] signal Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779098#comment-13779098 ] Adrien Grand commented on CASSANDRA-4338: - Interesting, I was wondering whether people actually need to compress from/to byte buffers. Now that I know that some do, I can try to move this issue forward. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774798#comment-13774798 ] Jonathan Ellis commented on CASSANDRA-4338: --- Also relevant, Radim said he got a large improvement from mmap-based writes in CASSANDRA-5473. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Marcus Eriksson Priority: Minor Labels: performance Fix For: 2.1 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629663#comment-13629663 ] Jonathan Ellis commented on CASSANDRA-4338: --- Relevant: http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Aleksey Yeschenko Priority: Minor Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439010#comment-13439010 ] Jonathan Ellis commented on CASSANDRA-4338: --- Any difference in cpu usage with the direct buffer patch? If we're not maxing out CPU then it wouldn't necessarily run faster even if it's more efficient. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Priority: Minor Fix For: 1.2.0 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401848#comment-13401848 ] Jonathan Ellis commented on CASSANDRA-4338: --- I'd vote for: - test with LCS, with/without compression (maybe even reduce sstable size to 1MB to really stress sstable creation) - enable gc logging, count promotion failures so we have quantitative data (if we see zero both ways, we may need a more complex test) - if instead we see nonzero promotion failures both ways, at about the same rate, we might need to look at using our cleaner hack to free the direct buffers, or use a buffer based on FreeableMemory, to avoid the phantomreference crap that DirectBuffer normally inflicts on GC Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Priority: Minor Fix For: 1.2 Attachments: gc-4338-patched.png, gc-trunk.png Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter
[ https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295771#comment-13295771 ] Jonathan Ellis commented on CASSANDRA-4338: --- Using direct buffers for RAR and CRAR may also help avoid heap fragmentation. Experiment with direct buffer in SequentialWriter - Key: CASSANDRA-4338 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Priority: Minor Fix For: 1.2 Using a direct buffer instead of a heap-based byte[] should let us avoid a copy into native memory when we flush the buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira