[ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219318#comment-14219318 ]
Alan Boudreault edited comment on CASSANDRA-7386 at 11/20/14 12:23 PM: ----------------------------------------------------------------------- Devs, this is the result of my regression test without and with the patch. Note: the compaction concurrency is set to 4 and the throughput unlimited. h4. Test * 12 disks total of 2G of size. * Goal: run the following command to fill the disk: cassandra-stress WRITE n=2000000 -col size=FIXED\(1000\) -mode native prepared cql3 -schema keyspace=r1 h5. Result - No Patch [^test_regression_no_patch.jpg] All disk are filled in ~420 seconds. Casandra-stress crashed with write timeouts at around n=650000 h5. Result - With Patch [^test_regression_with_patch.jpg] Cassandra-stress finished all its work (~13 minutes, n=2000000) and all disks are under 60% of disk usage. Any idea what's going on? Am I doing something wrong in my test case? was (Author: aboudreault): Devs, this is the result of my regression test without and with the patch. Note: the compaction concurrency is set to 4 and the throughput unlimited. h4. Test * 12 disks total of 2G of size. * Goal: run the following command to fill the disk: cassandra-stress WRITE n=2000000 -col size=FIXED\(1000\) -mode native prepared cql3 -schema keyspace=r1 h5. Result - No Patch !test_regression_no_patch.jpg! All disk are filled in ~420 seconds. Casandra-stress crashed with write timeouts at around n=650000 h5. Result - With Patch !test_regression_with_patch.jpg! Cassandra-stress finished all its work (~13 minutes, n=2000000) and all disks are under 60% of disk usage. Any idea what's going on? Am I doing something wrong in my test case? > JBOD threshold to prevent unbalanced disk utilization > ----------------------------------------------------- > > Key: CASSANDRA-7386 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7386 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Chris Lohfink > Assignee: Robert Stupp > Priority: Minor > Fix For: 2.1.3 > > Attachments: 7386-2.0-v3.txt, 7386-2.0-v4.txt, 7386-2.0-v5.txt, > 7386-2.1-v3.txt, 7386-2.1-v4.txt, 7386-2.1-v5.txt, 7386-v1.patch, > 7386v2.diff, Mappe1.ods, mean-writevalue-7disks.png, > patch_2_1_branch_proto.diff, sstable-count-second-run.png, > test1_no_patch.jpg, test1_with_patch.jpg, test2_no_patch.jpg, > test2_with_patch.jpg, test3_no_patch.jpg, test3_with_patch.jpg, > test_regression_no_patch.jpg, test_regression_with_patch.jpg > > > Currently the pick the disks are picked first by number of current tasks, > then by free space. This helps with performance but can lead to large > differences in utilization in some (unlikely but possible) scenarios. Ive > seen 55% to 10% and heard reports of 90% to 10% on IRC. With both LCS and > STCS (although my suspicion is that STCS makes it worse since harder to be > balanced). > I purpose the algorithm change a little to have some maximum range of > utilization where it will pick by free space over load (acknowledging it can > be slower). So if a disk A is 30% full and disk B is 5% full it will never > pick A over B until it balances out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)