subject:"\[jira\] \[Commented\] \(CASSANDRA\-6746\) Reads have a slow ramp up in speed"


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945607#comment-13945607
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


bq. In practice, moving the WILLNEED into the getSegment() call is dangerous as 
the segment is used past the initial 64Kb, and if we rely on ourselves only for 
read-ahead this could result in very substandard performance for larger rows. 
We also probably want to only WILLNEED the actual size of the buffer we expect 
to read for compressed files.

Yes, this is only PoC to see if the scheme works for platters. Just a couple of 
things, for the optimal performance we need an information from the index about 
the size of the row, so we can mark SEQUENTIAL a). whole row if the row is less 
then indexing threshold, b). portions of the row on the index boundaries. 
Original 1 page WILLNEED (very conservative) is used to make sure that read can 
quickly grab the first portion of the buffer while extended read-ahead 
prefetches everything else. This still works for the big rows because we are 
forced to read the header of the row first (key at least) and then when we 
seek() to the position indicated by column index and we want to hint that we 
are going to read for the portion of the row, so large rows are suffering more 
from the fact that we have to over-buffer then WILLNEED. I wish we could have 
useful mmap'ed buffer implementation, so madvice as such as we do fadvice would 
no longer be required...

There is a way to solve cold cache problem from the parts of the data from 
original SSTables that have been read before, I did some work with mincore() 
previously and can revisit if needed. The problem we are trying to solve with 
dropping the cache for memtable and compacted SSTables (in memory restricted 
and/or slow I/O systems) is keeping page cache for the old files creates more 
jitter and slows down warmup of the newly created SSTable. 

 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-24 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945879#comment-13945879
 ] 

Benedict commented on CASSANDRA-6746:
-

I assumed that the idea is to turn off read-ahead (using FADV_RANDOM) and to 
always manage it ourselves. Which may well be an ok idea - but will require 
some other changes. I must admit I don't quite follow all of what you're 
saying: since we always buffer (except for mmapped files) 64K, WILLNEEDing the 
entire block seems perfectly acceptable since we have to read it anyway (it 
doesn't matter where the column occurs, unless it happens to be in a different 
chunk).

I must admit I am left a smidgen generally concerned about the WILLNEED flag 
and its implications for treatment of the page after the first use. I would 
feel more comfortable dropping all uses of WILLNEED from the codebase, without 
clearer definitions of the semantics, since we can probably do without them 
after CASSANDRA-6916. But I don't feel so strongly about this.

bq. is keeping page cache for the old files creates more jitter and slows down 
warmup of the newly created SSTable.

CASSANDRA-6916 should solve this problem.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945950#comment-13945950
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


The idea is pretty simple - we know SSTables are generally random access to 
locate the data but once the data is located it's all sequential. So what the 
patch tries to do is to exploit both of the properties - set whole file as 
FADV_RANDOM (which doesn't disable read-ahead but limits it substantially) 
after that for every getSegment(position) which happens to be on the key or 
column index boundary, mark first 64KB as FADV_SEQUENTIAL to extend read-ahead 
for that portion of the file and FADV_WILLNEED first page because everything 
else is going to be prefetched by the extended read-ahead window for that 64KB 
block so we don't have to be very aggressive. Ideally we want to 
FADV_SEQUENTIAL the whole row but we can't really do that, even better would be 
to do it on mmap'ed files as it's only required once per row or column block.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946062#comment-13946062
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] I almost forgot, can you please do one more test both nodes 
(read after write and mixed) with buffered io patch + preheat_kernel_page_cache 
set to true in yaml?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-24 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946072#comment-13946072
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

Sure. Do you still want the flush+compact after the write?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946115#comment-13946115
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry]  No, let's not do that this time, thanks!

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-24 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946183#comment-13946183
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

[~xedin] Here's results for preheat_kernel_page_cache: true

[mixed 
read/write|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.write-mixed.preheat_kernel_page_cache.jsonmetric=op_rateoperation=mixedsmoothing=1xmin=0xmax=573.1ymin=0ymax=97223.5]

[solo 
read|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.write-read.preheat_kernel_page_cache.jsonmetric=op_rateoperation=readsmoothing=1xmin=0xmax=565.62ymin=0ymax=79165.9|

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-23 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944580#comment-13944580
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] Yes, it would not eliminate it completely just shorten the 
duration and speed up initial warmup, but this drop in operation is worrisome, 
can you check if that could be something JVM related or something on Cassandra 
side happening at the same time with drop in op rate?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-23 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944654#comment-13944654
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://localhost:8000/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.json]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-23 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944690#comment-13944690
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] Thanks for the results, this looks promising although I one 
question for me remains why is there that deep for buffered-io patch, It might 
be related to the last compaction combining 4 sstables into one... Can you 
please do the following experiment - write the data, force a flush + major 
compaction, once all compactions complete run the buffered-io-tweaks patch to 
see if that deep in the middle of the run is actually caused by compaction 
replacing pre-heated file set with completely cold file?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-23 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944760#comment-13944760
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

@benedict can you confirm if my stress options look alright for mixed mode, do 
you have a better suggestion?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-20 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942588#comment-13942588
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

Unfortunately, I currently don't have access to any bare metal machines with 
SSDs. 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-20 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942596#comment-13942596
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] Thanks for the update, personally I can wait for some time with 
this, meanwhile I have couple of ideas how to improve the situation so maybe I 
will submit a patch to test right when you have the machines :)

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938223#comment-13938223
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] Are the drives SSDs?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-17 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938233#comment-13938233
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

No:

{code}
 hdparm -I /dev/sdb | head

/dev/sdb:

ATA device, with non-removable media
Model Number:   WDC WD5003ABYX-18WERA0  
Serial Number:  WD-WMAYP2797667
Firmware Revision:  01.01S02
Transport:  Serial, SATA 1.0a, SATA II Extensions, SATA Rev 
2.5, SATA Rev 2.6
Standards:
Supported: 8 7 6 5 
{code}

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938239#comment-13938239
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


I guess I should have been clearer about this one, we actually want readahead 
to be set to bigger number or default, that would speed up sequential part of 
the read significantly, because kernels' adaptive readahead logic would kick in 
and detect that file is being read sequential and do a right thing. 

The test I asked you to do should have uncovered effect of readahead on our 
reads which is pretty significant but quiet the opposite on HDD vs. SDD, still 
this gives us pretty good picture.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-17 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938244#comment-13938244
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

The default on these machines was 256, and indeed the 4096 setting I tried did 
speed it up.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-17 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938246#comment-13938246
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

I can also try this on ssd instances.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938253#comment-13938253
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


Would be awesome if you could try it out on the SSD machines too!

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-16 Thread Edward Capriolo (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937024#comment-13937024
]

Edward Capriolo commented on CASSANDRA-6746:

{quote}
FWIW, I'd be okay with dropping mmap mode entirely since compression has been
the default for almost two years now.
{quote}
Are you implying that because compression is the default no one uses
uncompressed tables any more? If so I disagree. In cases with small rows 1-10
columns compression can hurt your performance. I am assuming what happens is de
serializing compressed blocks 4 k blocks when rows are small creates much more
young gen and ends up being a bottleneck. Several operators have told me they
do not use compression on high read column families.

Reads have a slow ramp up in speed
--

Key: CASSANDRA-6746
URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
Labels: performance
Fix For: 2.1 beta2

Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt,
cassandra-2.0-bdplab-trial-fincore.tar.bz2,
cassandra-2.1-bdplab-trial-fincore.tar.bz2

On a physical four node cluister I am doing a big write and then a big read.
The read takes a long time to ramp up to respectable speeds.
!2.1_vs_2.0_read.png!
[See data
here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936159#comment-13936159
 ] 

Benedict commented on CASSANDRA-6746:
-

Probably a good idea since we use standard IO for compressed files, which is 
the default. We should probably do more performance testing with this setup 
anyway; stress defaults to compression off, like it always used to.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-15 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936184#comment-13936184
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

FWIW, I'd be okay with dropping mmap mode entirely since compression has been 
the default for almost two years now.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-14 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935429#comment-13935429
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

I think compaction and new flushes are distinct scenarios.

# On compaction, the current default of WONTNEED most data, but WILLNEED 
partitions that are hot in the key cache, seems like the Right Thing To Do for 
datasets that are larger than memory (i.e., almost all production datasets)
# On flush, we default to WONTNEED, which is probably equivalent to no advice 
for larger-than-memory datasets but harms us unnecessarily on smaller datasets

So while I'd be okay with changing the default on flush, I'm not okay with 
changing it for compaction.  Right now those are not distinguished by 
SSTableWriter so it would be a bit more work.  But, that wouldn't be sufficient 
to make our test here look good because nothing will be hot yet when compaction 
first kicks in.

So at the end of the day, we need CCM to explicitly request non-default 
behavior either way.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935792#comment-13935792
 ] 

Benedict commented on CASSANDRA-6746:
-

I will do some empirical testing so we have some data to work with. It seems to 
me that trickle flushing would still be better than this, although we could 
still DONTNEED after trickle sync for compaction. WILLNEEDing a large file 
_after flush_ is potentially even worse behaviour, though, as if the DONTNEED 
has been obeyed (or they've fallen out of cache due to not being read during 
flush - which is probably likely during a large flush) we're just proactively 
inducing a period of high intensity random seeks for data that would naturally 
be read in anyway if they are needed, and otherwise would not.

That said, it might be easier to just pick an approach (the one you suggest is 
certainly better than what we currently do), and then deliver iterative 
replacement, as it solves all of the above problems.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-14 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935801#comment-13935801
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

Let me know if you have any patches you want me to benchrmark on the cluster. 
It's fairly push button for me.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-14 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935844#comment-13935844
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~iamaleksey] asked me to look at this ticket, so here is what I have - this 
very much looks like readahead is killing the performance on random access with 
empty page cache, unfortunatelly we don't set POSIX_FADV_RANDOM as there is no 
way to do it from Java with mmap'ed files... So [~enigmacurry] what you can do 
to check if it's actually the reason - try getting current readahead window 
with blockdev -getra device to see how big it currently is (should be 
something like 256 which is 128KB), set it to something lower or disable at all 
with blockdev --setra readahead-size-in-512-byte-blocks divice, note (!) 
that both of the commands return value in multiple of 512-byte blocks. After 
that is done, try running your test again to see if the read performance 
increases more rapidly. The situation you are triggering is kind of an edge 
case which is caused by us trying to be smart about discarding things that we 
no longer need in memory, so when writes are done in isolation it's natural 
that reads that are going to suffer consequences but on the bright side with 
WONTNEED'ing all of the compacted sstables and newly flushed ones we create 
more room for durty data which smoothens fsync trickle effect.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935903#comment-13935903
 ] 

Benedict commented on CASSANDRA-6746:
-

[~xedin] judging by how severe the slow down is you're probably correct, 
however that only changes the shape of the slope (should make it a considerably 
longer slow period, but much less slow), it doesn't eliminate the effect. We 
should be managing our cache behaviour better either way.

bq. with WONTNEED'ing all of the compacted sstables and newly flushed ones we 
create more room for durty data which smoothens fsync trickle effect.

Not sure exactly what you're referring to here; fsync trickle is off by 
default, and dirty data will always evict clean data. IF we trickle fsync, 
DONTNEED may well help to reduce cache churn by causing us to recycle the same 
approximate allotment of memory for flushing, which is what I'm suggesting - 
but we don't currently, so all DONTNEED achieves is trashing the buffer as we 
approach fully flushed, it doesn't make room for flushing.

I realised I actually have fincore data Ryan collected for me from these runs 
that demonstrates large files being flushed with DONTNEED still retain a 
majority of their data in page cache during the flush, presumably for exactly 
the reason given of racing-ahead writes. Then lose it all completely:

{code}
File   Size   Cached   Perc
Keyspace1-Standard1-tmp-ka-109-Data.db   391,446,528   230,858,752 58.98
Keyspace1-Standard1-tmp-ka-109-Data.db 1,062,600,704   470,552,576 44.28
Keyspace1-Standard1-tmp-ka-109-Data.db 1,467,744,256   681,713,664 46.45
Keyspace1-Standard1-tmp-ka-109-Data.db 2,068,840,448   860,418,048 41.59
Keyspace1-Standard1-tmp-ka-109-Data.db 2,402,760,276 1,026,052,096 42.70
Keyspace1-Standard1-ka-109-Data.db 2,402,760,276   113,348,608 4.72
{code}

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935904#comment-13935904
 ] 

Benedict commented on CASSANDRA-6746:
-

Also, on POSIX_FADV_RANDOM: I experimented with settings this using CLibrary we 
use for DONTNEED etc, and found performance actually degraded on my machine for 
a stress read workload interestingly.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-14 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935922#comment-13935922
]

Pavel Yaskevich commented on CASSANDRA-6746:

bq. judging by how severe the slow down is you're probably correct, however
that only changes the shape of the slope (should make it a considerably longer
slow period, but much less slow), it doesn't eliminate the effect. We should be
managing our cache behaviour better either way.

Well, I wasn't even implying that changing read-ahead on blockdev would
eliminate the effect described here, but it would only smoothen the effect of
the cold page cache. The things I asked Ryan to do is just to check if that's
the case, when it pretty much looks like it is.

bq. Not sure exactly what you're referring to here; fsync trickle is off by
default, and dirty data will always evict clean data. IF we trickle fsync,
DONTNEED may well help to reduce cache churn by causing us to recycle the same
approximate allotment of memory for flushing, which is what I'm suggesting -
but we don't currently, so all DONTNEED achieves is trashing the buffer as we
approach fully flushed, it doesn't make room for flushing.

What I mean here - with fsync trickle enabled DONTNEED reduces the amount of
housekeeping kernel has to for page cache especially in the write-heavy use
case like this test.

bq. Also, on POSIX_FADV_RANDOM: I experimented with settings this using
CLibrary we use for DONTNEED etc, and found performance actually degraded on my
machine for a stress read workload interestingly.

I'm not sure how are you doing that, can you elaborate? it's only possible to
set POSIX_FADV_RANDOM when disk_access_mode is standard in yaml, by default
we use we do mmap which requires us to do madvice for RANDOM instead of
fadvice, so your slowdown is kind of expected there.

Reads have a slow ramp up in speed
--

Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt,
cassandra-2.0-bdplab-trial-fincore.tar.bz2,
cassandra-2.1-bdplab-trial-fincore.tar.bz2

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935923#comment-13935923
 ] 

Benedict commented on CASSANDRA-6746:
-

bq. I'm not sure how are you doing that, can you elaborate? it's only possible 
to set POSIX_FADV_RANDOM when disk_access_mode is standard in yaml, by 
default we use we do mmap which requires us to do madvice for RANDOM instead of 
fadvice, so your slowdown is kind of expected there.

Good point, this is exactly what I was doing.

Sounds like we're agreeing with the other stuff already :-)

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-14 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935926#comment-13935926
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


So the better test would be to switch to standard and do POSIX_FADV_RANDOM on 
the actual file descriptor used for reading data when SSTableReader is open and 
check if that makes any difference for you vs. disc_access_mode standard but 
without POSIX_FADV_RANDOM.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-13 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933089#comment-13933089
 ] 

Benedict commented on CASSANDRA-6746:
-

bq. I get that, but the reason this was introduced in the first place was 
because the default behavior of (at least some) Linux kernels was to evict 
other data in favor of the newly flushed.

Do you have a reference for that discussion? I couldn't find it searching JIRA.

Whilst my research can't rule this out as a possibility, it seems as though it 
would be unlikely. The age of the recently written data would be low, certainly 
lower than any hot data, so that once it is actually synced to disk it is 
likely to be in the inactive_clean list and free for reclaim.

It's possible that the non-trickle-fsync default interplays badly with this, 
with us permitting the entire sstable to hit the page cache and evict 
everything else whilst the OS catches up. But without that scenario I would be 
really surprised to see this behaviour of keeping written once pages over hotly 
read data.

Either way, in the scenario that we are compacting hot data (probably more 
likely, since amount of compaction performed to data should decline with age, 
so we'll be mostly compacting younger data) the current behaviour is the worst 
possible scenario, with the apparently still going strong 2.6 (and possibly 
later) kernels definitely trashing the hot cache. So I think unless we detect 
the kernel version and set the default based on the known better behaviour of 
DONTNEED, it seems this is the better default to me. But we could perhaps 
change the defaults for trickle fsync as well (say, set it to true and 100MB by 
default) so that the OS has plenty of opportunity to reclaim the pages we're 
writing if it needs to.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933539#comment-13933539
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

CASSANDRA-2635

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-13 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933665#comment-13933665
 ] 

Benedict commented on CASSANDRA-6746:
-

It seems like all of the discussion on the prior ticket CASSANDRA-1470 that 
introduced it circles around how to implement it, and not on whether 
implementing it is actually necessary. It seems to be taken as read that it is, 
but I'm not totally convinced by that.

Either way, probably the best long term solution that would definitely work is 
to perform the incremental replacement I previously suggested, as this would 
allow us to DONTNEED the old sstables incrementally, thereby saving at minimum 
as much memory churn as we can save optimally with this approach, and then 
leave the new pages to the OS to decide what to do with. If they're not hot the 
newly freed memory from the old tables should give plenty enough room for the 
regular ageing algorithm to kick in and ensure they're selected for eviction in 
preference to anything that is in use; it also bounds how much of the system 
memory can churn, which is currently unbounded (although large tricklefsync 
would achieve this also).

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933805#comment-13933805
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

The 2635 issue description is from a production cluster.  (We've applied this 
patch locally in order to turn of page skipping...  It's better than completely 
disabling DONTNEED because the cache skipping does make sense and has no 
relevant (that I can see) detrimental effects in some cases, like when dumping 
caches.)

This replaced an approach of DONTNEEDing the old sstables, which would crater 
read requests since old sstables will still be in use during compaction before 
the new ones are completely live.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-13 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933823#comment-13933823
 ] 

Benedict commented on CASSANDRA-6746:
-

But that's the opposite issue, surely? They found disabling DONTNEED was good - 
for exactly the same reason, that their OS was obeying it unequivocally - not 
that _enabling_ it (for the opposite situation, flushing) was beneficial.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-12 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932078#comment-13932078
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

I'm not sure that's a good idea.  Certainly for larger-than-memory datasets, I 
don't think we want to blow away hot cache on every flush.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-12 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932258#comment-13932258
 ] 

Benedict commented on CASSANDRA-6746:
-

What makes you say we'll blow away our hot cache on flush?

Note that the variable is poorly named, and in fact only provides 
populateOnFlush = !skipIOCache, i.e. if true we do not explicitly DONTNEED, 
but we do not WILLNEED. We could potentially make it a ternary operator (false, 
true, null) where true forces a WILLNEED on flush, so that the name actually 
makes sense.


 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-12 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932837#comment-13932837
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

Oh, and in particular [~benedict], you may be happy to know that was run with 
the new stress.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-03-12 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932862#comment-13932862
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

bq. if true we do not explicitly DONTNEED, but we do not WILLNEED either

I get that, but the reason this was introduced in the first place was because 
the default behavior of (at least some) Linux kernels was to evict other data 
in favor of the newly flushed.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 6746.txt, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-02-28 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915658#comment-13915658
 ] 

Benedict commented on CASSANDRA-6746:
-


2.0:
 INFO [main] 2014-02-27 18:54:12,724 CLibrary.java (line 63) JNA not found. 
Native methods will be disabled.
2.1:
INFO  [main] 2014-02-27 19:51:10,440 CLibrary.java:117 - JNA mlockall successful

The result being that we do not skip IO cache in 2.0. Assuming the OS 
actually listens to the DONTNEED command, this results in an empty page cache 
even if the OS could make room for it. This is OS dependent, as whilst testing 
on my own box, I found my OS would keep the pages cached anyway. On searching I 
found that this behaviour was modified in the linux kernel sometime around 
2010/11, as referenced 
[here|http://lists.samba.org/archive/rsync/2010-November/025827.html], although 
it's not clear which kernel it first made it into, clearly it is not in the 
build on this cluster and is on my laptop.

Note this is confirmed by fincore on the data files: in 2.0 all files remain 
100% cached at all times; in 2.1 they drop to 0% cached immediately after 
compaction completes.

I'm not sure what the correct response to this is. Largely this is simply 
behaving as expected, except that really issuing a DONTNEED when we probably DO 
need is not a great idea. The rationale of course is that if we're compacting 
stale data we don't want to pollute the page cache; but if we're compacting 
live data we will actively destroy the page cache when the OS listens 
stringently to the DONTNEED (which in this case it apparently does even though 
it has plenty of room to ignore us). Unless we can be smarter about issuing 
these commands, I think issuing them isn't actually such a great idea, at least 
not on kernel versions that elicit this behaviour. However I'm not convinced 
they make sense on newer kernel versions either, as live reads going to files 
that are about to be discarded could still leave us with an empty buffer cache, 
as the new pages are evicted in advance of becoming the live versions. In this 
scenario simply letting the kernel keep whatever pages it wants is probably 
best so there's no sudden performance cliff, although moving to incremental 
opening of the new sstables might be a better solution to this, so that reads 
transfer progressively, always to data that is already in buffer, keeping the 
transition smooth.


 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

[
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915949#comment-13915949
]

Jonathan Ellis commented on CASSANDRA-6746:
---

bq. The result being that we do not skip IO cache in 2.0.

Well, we do if JNA is installed -- the difference is that we don't ship JNA out
of the box. :)

bq. if we're compacting live data we will actively destroy the page cache when
the OS listens stringently to the DONTNEED

The behavior on flush and compaction is actually slightly different:

* on flush, we actively DONTNEED unless populate_io_cache_on_flush is enabled
[false by default]
* on compact, we WILLNEED partitions that are hot in the key cache, unless
compaction_preheat_key_cache is disabled [true by default]. Nothing is
WONTNEEDed.

We should probably create tables for short tests with
populate_io_cache_on_flush enabled. /cc [~enigmacurry] [~mshuler]

Reads have a slow ramp up in speed
--

Attachments: 2.1_vs_2.0_read.png

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-02-28 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915953#comment-13915953
 ] 

Benedict commented on CASSANDRA-6746:
-

bq. Nothing is WONTNEEDed.

On compaction we DONTNEED the replacement sstable.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915966#comment-13915966
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

bq. On compaction we DONTNEED the replacement sstable.

No, we don't.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915965#comment-13915965
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

bq.  on flush, we actively DONTNEED unless populate_io_cache_on_flush is enabled

Incidentally, I'm inclined to think that we should make the default leave it 
alone and change the option to be normal/dontneed/willneed instead.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915981#comment-13915981
 ] 

Jonathan Ellis commented on CASSANDRA-6746:
---

Looks like this was the original intent back in CASSANDRA-2635.  During review 
the option name was changed to _on_flush which is inaccurate.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed