[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351109#comment-15351109 ] Aleksey Yeschenko commented on CASSANDRA-7075: -- bq. It's probably not worth pursuing this with the SEDA architecture. Lots of effort for little gain (now that we have CL compression). Well, we do have a patch, so we might as well get some benchmark numbers instead of speculating, before just giving up on this. bq. But we might need to do CL-segment-per-thread for TPC Right. Eventually. Not sure how it's relevant here, though. > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Local Write-Read Paths >Reporter: Tupshin Harper >Priority: Minor > Labels: performance > Fix For: 3.x > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350577#comment-14350577 ] Benedict commented on CASSANDRA-7075: - It's also worth considering if we shouldn't simply round-robin on segment advance, rather than on each mutation. If we wanted to improve the performance of batch commit log latency we could round-robin on sync also (so the next batch may potentially start committing before the prior batch has completed). Since this is going into 3.0, and we're hoping for CASSANDRA-6696 to land then also, it might alternatively make sense to tie the two together, and write CL entries to the disk that will ultimately own the sstables also. This might simplify the cognitive burden on recovery from a disk failure, since there would be no risk of partial updates being played for an owned range and queries being served from it. So no consideration would need to be given to these partial failure scenarios. > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350565#comment-14350565 ] T Jake Luciani commented on CASSANDRA-7075: --- Might want to hold off till CASSANDRA-8771? Should make this much simpler > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350560#comment-14350560 ] Benedict commented on CASSANDRA-7075: - So, before I dive into this too closely, it's worth noting that C* actually has no requirement that replay happens in order. There was a requirement for counters prior to 2.1, but this has been lifted, and given our eventually consistent nature I don't see us ever rolling back the clock here. So if we can significantly simplfy the code by imposing this requirement, I suggest we do so. > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218639#comment-14218639 ] Jason Brown commented on CASSANDRA-7075: Sure, will do. > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218596#comment-14218596 ] Jonathan Ellis commented on CASSANDRA-7075: --- Do you have time to review, [~jasobrown]? > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218069#comment-14218069 ] Branimir Lambov commented on CASSANDRA-7075: First draft of the multi-volume commit log can be found [here|https://github.com/blambov/cassandra/compare/7075-commitlog-volumes-2]. This is still a work in progress, but while I'm looking at ways to properly test everything, I'd be interested in some opinions on where to take this next. To be able to spread the load between drives, the new implementation switches 'volumes' on every sync request. Each volume has its own writing thread (which in the compressed case will also be doing the compression); the segment management thread, which handles creating and recycling segments, remains shared for now. Each volume writes in its own CommitLogSegment, so in effect we may write some mutations in one segment, switch to the segment in the other drive, then switch back to writing in the first-- which means that the order of mutations is no longer defined first by the segment ID. To deal with this I exposed the concept of a 'section', which existed before as the set of mutations between two sync markers, and gave the section an ID which now replaces the segment ID in ReplayPositions. Every time we start writing to a volume, a new section with a fresh ID is created. Every time we switch volumes, a write for the old section is scheduled and either the volume is put back at the end of a queue of ready-to-use volumes (if the segment is not exhausted or there is an available reserve segment) or the management thread is woken to prepare a new segment and put the volume back in the queue when one is ready. Because of the new ordering, commit log replay now has to be able to sort and operate on the level of sections (for new logs) as well as on the level of segments (for legacy logs). The machinery is refactored a little to permit this, and the new code is also used to select a non-conflicting section ID at start. For full flexibility commit log volumes are configured separately from data volumes. If necessary, multiple volumes can be assigned to the same drive. With archiving it's not clear where archived logs should be restored, thus I created an option to specify that as well (with a default of sending them to the first CL volume). The current code has more locking than I'd like, most importantly in CLSM.advanceVolume(), which is called every time a disk synchronization is requested (also when a segment is full, but that has much lower frequency). There is a noticeable impact on performance; I need more performance testing in various configurations to quantify it. I can see three ways to continue from here: # Leave the locking as it is, which permits flexibility in the ordering of volumes in the queue. This can be made use of by making queuedVolumes a priority queue, ordered, e.g. by expected sync finish time. The latter will be able to handle heterogeneous situations (e.g. SSDs + HDDs; more importantly uneven distribution of requests from other parts of the code on the drives) very well. I think this option will result in the least complex code and the highest flexibility of the solution. # Not permit reordering of volumes in the queue, which lets section IDs be assigned on queue entry rather than exit; with a little more work switching to a new section from the queue can be made a single compare-and-swap. In this option the load necessarily has to be spread evenly between the specified CL volumes (not necessarily between the drives as a user still may give multiple directories on the same drive). With a single CL volume and possibly in homogeneous scenarios this option should result in the best performance. # As above, but put sections in the queue only when the previous sync for the volume has completed. This option can use the drives' performance most efficiently, but it needs another queuing layer to be able to properly deal with situations where all drives are busy and mutations are still incoming. I'm leaning towards (1) for the flexibility, but that may be a performance regression in the single-volume case. Is it worth investing the time to try out two or all three options? > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144179#comment-14144179 ] Jonathan Ellis commented on CASSANDRA-7075: --- Thanks, Gregory! > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144164#comment-14144164 ] Gregory Burd commented on CASSANDRA-7075: - The paper "Aether: A Scalable Approach to Logging" (http://pandis.net/resources/vldb10aether.pdf) has a great many insights into how and when an ARIES/WAL can be optimized. I know that's a bit different from the commitlog/ss-table/memtable used in Cassandra, but there are many ideas which overlap and might carry over or at least inspire you. > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes
[ https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102128#comment-14102128 ] Benedict commented on CASSANDRA-7075: - [~blambov]: since you're looking at CASSANDRA-6809, it makes sense to address this afterwards, once your fingerprints are on the CL. Worth bearing in mind for your work there. > Add the ability to automatically distribute your commitlogs across all data > volumes > --- > > Key: CASSANDRA-7075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7075 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Tupshin Harper >Assignee: Branimir Lambov >Priority: Minor > Labels: performance > Fix For: 3.0 > > > given the prevalance of ssds (no need to separate commitlog and data), and > improved jbod support, along with CASSANDRA-3578, it seems like we should > have an option to have one commitlog per data volume, to even the load. i've > been seeing more and more cases where there isn't an obvious "extra" volume > to put the commitlog on, and sticking it on only one of the jbodded ssd > volumes leads to IO imbalance. -- This message was sent by Atlassian JIRA (v6.2#6252)