[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436059#comment-13436059 ] Jonathan Ellis commented on CASSANDRA-4292: --- Your instincts were better than mine: combining compaction and flush i/o into a single executor was a mistake. We could band-aid it by adding some kind of semaphore mechanism to make sure we always leave at least one thread free for flushing but this still won't let us max out on flushing temporarily at the expense of compaction, without introducing extremely complicated preemption logic. So, color me convinced that we need to keep separate executors for flush and compaction. Additionally, the more I think about it the less I think the DBT abstraction is what we want here. Or at a higher level: I don't think we want to be that strict about one thread per disk. Which was my fault in the first place, sorry! If we instead just follow the above disk prioritization logic, we'll still get effectively thread-per-disk until disks start to run out of space. But having a (standard) flexible pool of threads means that we generalize much better to SSDs, where having substantially more threads than disks makes sense (since compaction becomes CPU bound). So I think we can simplify our approach a lot, perhaps by having a global Directory state that tracks space remaining and how many i/o tasks are running on each, that we can use when handing out flush and compaction targets. The executor architecture won't need to change. (May want to introduce a DirectoryBoundRunnable abstraction, whose run method encapsulates updating i/o task count and space free after running the flush/compaction, but without trying it I'm not sure if that actually works as imagined.) Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292.txt, 4292-v2.txt, 4292-v3.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435532#comment-13435532 ] Yuki Morishita commented on CASSANDRA-4292: --- I ran tests against patched and trunk with modified stress tool to write to 3 CFs with leveled compaction. Node consists of 6 spinning disks and C* uses those as data directories. Although I see difference in disk usage(patched version distributes load evenly among disks), there is still no difference in performance in both write and compaction. It seems that sometimes memtable flushing is blocked when long running compaction is already started, and causing GC pressure on patched node. Looks like I need to find the way to avoid queuing up memtable flush tasks. Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292.txt, 4292-v2.txt, 4292-v3.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427392#comment-13427392 ] Jonathan Ellis commented on CASSANDRA-4292: --- v3 looks good enough to do some performance testing to see if it's worth polishing more. :) bq. Can we use CopyOnWriteArrayList Nit: Looking at this again it should probably actually be an ImmutableList. Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292-v2.txt, 4292-v3.txt, 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425036#comment-13425036 ] Jonathan Ellis commented on CASSANDRA-4292: --- - need to use a single DiskWriter for both compaction and flushing or we lose on most of the benefits here. One solution: rename CompactionManager to IOManager, and use that. Another could be to move it into StorageService. - compactionexecutor needs to be cleaned up since it's no longer serving the executor role. again, cleanup could be straightforward if we morph CM into IOManager (and merge CompactionExecutor + DiskWriter). Could be nice to get the kind of progress reporting on flushes that we now have on compaction. - DiskWriter: Can we use CopyOnWriteArrayList instead of synchronized block? Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292-v2.txt, 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423951#comment-13423951 ] Yuki Morishita commented on CASSANDRA-4292: --- Here's the code for choosing disk from attached patch. {code} // DiskWriter.java private ExecutorService selectExecutor(DiskBoundTask task) { // sort by available disk space SortedSetDiskBoundTaskExecutor executors; synchronized (perDiskTaskExecutors) { executors = ImmutableSortedSet.copyOf(perDiskTaskExecutors); } // if there is disk with sufficient space and no activity running on it, then use it for (DiskBoundTaskExecutor executor : executors) { long spaceAvailable = executor.getEstimatedAvailableSpace(); if (task.getExpectedWriteSize() spaceAvailable executor.getActiveCount() == 0) return executor; } // if not, use the one that has largest free space if (task.getExpectedWriteSize() executors.first().getEstimatedAvailableSpace()) return executors.first(); else return task.recalculateWriteSize() ? selectExecutor(task) : null; // retry if needed } {code} Before choosing disk, we sort by available disk space, but then choose the one that 1) fits for new sstable and 2) has zero task. If we cannot find, then 3) we choose the one with largest free space. So I think above code works as you described. Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292-v2.txt, 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423953#comment-13423953 ] Jonathan Ellis commented on CASSANDRA-4292: --- Hmm, may have been looking at the wrong patch. Will reinspect. Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292-v2.txt, 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424149#comment-13424149 ] Jonathan Ellis commented on CASSANDRA-4292: --- Can you rebase post-CASSANDRA-2116? Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292-v2.txt, 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422574#comment-13422574 ] Jonathan Ellis commented on CASSANDRA-4292: --- bq. Directory is chosen based on available space in both queue and disk. We still want to prioritize disks that have no tasks yet, since ipos are a bigger bottleneck than space, in general. So specifically, we want to prioritize in order of: # enough space for the new sstable (boolean) # zero tasks (boolean) # total free space (long) We may want to test changing #2 to ordering by task count... both have pros and cons. Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292-v2.txt, 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420904#comment-13420904 ] Jonathan Ellis commented on CASSANDRA-4292: --- Looks reasonable to me so far. A couple points: - we'll want to prefer (1) disks that have no current writes, then (2) disks with the least projected data (including the estimated size of currently active writes) - compaction should use this executor as well Nit: probably cleaner to use a Map for the new getLocationForDisk method Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Yuki Morishita Fix For: 1.2 Attachments: 4292.txt As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues
[ https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294697#comment-13294697 ] Jonathan Ellis commented on CASSANDRA-4292: --- We'll also want to reserve space for in-progress writes; currently we just use the raw free space as reported by the OS, which means that when disks are close to evenly matched we're highly likely to stack multiple new sstables on the same one instead of spreading them out. Per-disk I/O queues --- Key: CASSANDRA-4292 URL: https://issues.apache.org/jira/browse/CASSANDRA-4292 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Priority: Minor As noted in CASSANDRA-809, we have a certain amount of flush (and compaction) threads, which mix and match disk volumes indiscriminately. It may be worth creating a tight thread - disk affinity, to prevent unnecessary conflict at that level. OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how much pain this actually causes in practice in the meantime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira