[ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Stupp updated CASSANDRA-7386: ------------------------------------ Attachment: 7386v2.diff Mappe1.ods Here's a working version of the patch. It adds new metrics to each data directory: * {{readTasks}} counts the read requests * {{writeTasks}} counts the write requests * {{writeValue*}} exposes the "write value" for each data directory for mean, one/five/fifteen minutes The data directory with the highest "write value" is chosen for new sstables. "Write value" is calculated using the formula: {{freeRatio / weightedRate}} where {{freeRatio = availableBytes / totalBytes}} and {{weightedRate = writeRate + readRate / 2}}. "divide by 2" has been randomly chosen since not every read operation hits the disks. {{readRate}} is taken from {{SSTableReader.incrementReadCount()}} but I had to add a call to {{incrementReadCount()}} to some classes in code. I did not add it to {{RandomAccessReader}} or {{SegmentedFile}} because this patch should not influence performance too much. I did not experience much with the formula but created a sheet ({{Mappe1.ods}}) that shows the "write value" in a matrix if freeRatio vs. weightedRate. I've run {{cassandra-stress}} against (a single node, single data-directory) C* instance and saw that the writeValue behaves as expected. But that's only half the battle. The patch has to be verified in a "real production like cluster". "weight value" needs to be compared with {{iostat}}, {{df}} etc. Is there any possibility to do that? > JBOD threshold to prevent unbalanced disk utilization > ----------------------------------------------------- > > Key: CASSANDRA-7386 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7386 > Project: Cassandra > Issue Type: Improvement > Reporter: Chris Lohfink > Priority: Minor > Attachments: 7386-v1.patch, 7386v2.diff, Mappe1.ods, > patch_2_1_branch_proto.diff > > > Currently the pick the disks are picked first by number of current tasks, > then by free space. This helps with performance but can lead to large > differences in utilization in some (unlikely but possible) scenarios. Ive > seen 55% to 10% and heard reports of 90% to 10% on IRC. With both LCS and > STCS (although my suspicion is that STCS makes it worse since harder to be > balanced). > I purpose the algorithm change a little to have some maximum range of > utilization where it will pick by free space over load (acknowledging it can > be slower). So if a disk A is 30% full and disk B is 5% full it will never > pick A over B until it balances out. -- This message was sent by Atlassian JIRA (v6.2#6252)