[ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216873#comment-14216873 ]
Alan Boudreault commented on CASSANDRA-7386: -------------------------------------------- devs, I've tested this issue with and without the patch and analysed the disk usage of 3 scenarios. The patch works well and fix important issues related to multiple directories. I'm sharing with you the results with the graphes (attached below): For all my tests, I have been able to reproduce the issues using multiple directories. No need to *hammer* the node with compaction and repair, I simply limited the concurrent_compactors and the compaction_throughput_mb_per_sec to slow things. This makes the disk busy during the pick selection. h4. Test 1 * 2 Disks of the same size * Goal: stress the server to fill all disks h5. Result - No Patch Only one disk is filled and the other one is never filled. Cassandra-stress crashed with WriteTimeoutException while the second disk remains at ~20% of disk usage. h5. Result - With Patch Success. Both disk are filled at approximatively the same speed. h4. Test 2 * 5 disks total of the same size * 2 disks initially filled at ~20% * 3 disks added later * Goal: stress the server to fill all disks h5. Result - No Patch * The first 2 disks aren't used at the beginning since they are already at 20% of disk usage. (That's ok) * Some new data are written * 2 newly added disks are used for the initial data, when they reach 20% of disk usage... all 4 disks are filled at approximatively the same speed. * The last disk that is running a compaction is almost never used and remains at 15% of disk usage when cassandra-stress crash with write timeouts. h5. Result - With Patch Success. All disks have been filled at approximatively the same speed. I can notice that Cassandra doesn't wait untill all 3 newly added disks are at 20% to re-use the disk 1 and 2, but it keeps things OK and reduce the difference through the run. h4. Test 3 * 5 disks total. * 4 disks of 2G of size * 1 disk of 10G of size (5x more than the other ones) * Goal: stress the server to fill all disks h5. Result - No Patch * The disk #5 (10G of size) is initially use then an internal compaction is started. * All the 4 other disks are completely filled and the disk 5 is never used anymore. Cassandra-stress crash with write timeout and the disk5 remains at 15% of disk usage with more than 8G of free space. h5. Result - With Patch Success. All 5 disks are filled at approximatively the same speed. See the result images attached below.. > JBOD threshold to prevent unbalanced disk utilization > ----------------------------------------------------- > > Key: CASSANDRA-7386 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7386 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Chris Lohfink > Assignee: Robert Stupp > Priority: Minor > Fix For: 2.1.3 > > Attachments: 7386-2.0-v3.txt, 7386-2.1-v3.txt, 7386-v1.patch, > 7386v2.diff, Mappe1.ods, mean-writevalue-7disks.png, > patch_2_1_branch_proto.diff, sstable-count-second-run.png, > test1_no_patch.jpg, test1_with_patch.jpg, test2_no_patch.jpg, > test2_with_patch.jpg, test3_no_patch.jpg, test3_with_patch.jpg > > > Currently the pick the disks are picked first by number of current tasks, > then by free space. This helps with performance but can lead to large > differences in utilization in some (unlikely but possible) scenarios. Ive > seen 55% to 10% and heard reports of 90% to 10% on IRC. With both LCS and > STCS (although my suspicion is that STCS makes it worse since harder to be > balanced). > I purpose the algorithm change a little to have some maximum range of > utilization where it will pick by free space over load (acknowledging it can > be slower). So if a disk A is 30% full and disk B is 5% full it will never > pick A over B until it balances out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)