[ 
https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-7386:
------------------------------------

    Attachment: 7386v2.diff
                Mappe1.ods

Here's a working version of the patch.

It adds new metrics to each data directory:
* {{readTasks}} counts the read requests
* {{writeTasks}} counts the write requests
* {{writeValue*}} exposes the "write value" for each data directory for mean, 
one/five/fifteen minutes

The data directory with the highest "write value" is chosen for new sstables.

"Write value" is calculated using the formula:
{{freeRatio / weightedRate}} where {{freeRatio = availableBytes / totalBytes}} 
and {{weightedRate = writeRate + readRate / 2}}. "divide by 2" has been 
randomly chosen since not every read operation hits the disks.

{{readRate}} is taken from {{SSTableReader.incrementReadCount()}} but I had to 
add a call to {{incrementReadCount()}} to some classes in code. I did not add 
it to {{RandomAccessReader}} or {{SegmentedFile}} because this patch should not 
influence performance too much.

I did not experience much with the formula but created a sheet ({{Mappe1.ods}}) 
that shows the "write value" in a matrix if freeRatio vs. weightedRate.

I've run {{cassandra-stress}} against (a single node, single data-directory) C* 
instance and saw that the writeValue behaves as expected.

But that's only half the battle. The patch has to be verified in a "real 
production like cluster". "weight value" needs to be compared with {{iostat}}, 
{{df}} etc. Is there any possibility to do that?

> JBOD threshold to prevent unbalanced disk utilization
> -----------------------------------------------------
>
>                 Key: CASSANDRA-7386
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7386
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Lohfink
>            Priority: Minor
>         Attachments: 7386-v1.patch, 7386v2.diff, Mappe1.ods, 
> patch_2_1_branch_proto.diff
>
>
> Currently the pick the disks are picked first by number of current tasks, 
> then by free space.  This helps with performance but can lead to large 
> differences in utilization in some (unlikely but possible) scenarios.  Ive 
> seen 55% to 10% and heard reports of 90% to 10% on IRC.  With both LCS and 
> STCS (although my suspicion is that STCS makes it worse since harder to be 
> balanced).
> I purpose the algorithm change a little to have some maximum range of 
> utilization where it will pick by free space over load (acknowledging it can 
> be slower).  So if a disk A is 30% full and disk B is 5% full it will never 
> pick A over B until it balances out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to