[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287089#comment-16287089 ] Jeff Jirsa commented on CASSANDRA-6758: --- Think this is done. In 4.0, we have CASSANDRA-11503 (nodetool repaired/unrepaired by sstables), CASSANDRA-13774 (repaired/unrepaired by bytes), CASSANDRA-13289 (track an ideal consistency level beyond what acks the write), and CASSANDRA-13257 (repair preview). Seems like that covers the intent of this ticket. Propose we close as wontfix, because it's basically done by those others. > Measure data consistency in the cluster > --- > > Key: CASSANDRA-6758 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 > Project: Cassandra > Issue Type: New Feature >Reporter: Jimmy Mårdell >Priority: Minor > Labels: proposed-wontfix > > Running multi-DC Cassandra can be a challenge as the cluster easily tends to > get out-of-sync. We have been thinking it would be nice to measure how out of > sync a cluster is and expose those metrics somehow. > One idea would be to just run the first half of the repair process and output > the result of the differencer. If you use Random or the Murmur3 partitioner, > it should be enough to calculate the merkle tree over a small subset of the > ring as the result can be extrapolated. > This could be exposed in nodetool. Either a separate command or perhaps a > dry-run flag to repair? > Not sure about the output format. I think it would be nice to have one value > ("% consistent"?) within a DC, and also one value for every pair of DC's > perhaps? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910212#comment-13910212 ] Benedict commented on CASSANDRA-6758: - This doesn't seem like a bad idea at all. The only problem that I can see is that the first half of the repair process is actually one of the more expensive actions a cluster can perform, as the entire cluster needs to walk all of its data to compute its merkle tree. I wonder if it would be possible to calculate and save an abbreviated merkle tree when writing each sstable, that could be combined cheaply to give this answer. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910214#comment-13910214 ] Benedict commented on CASSANDRA-6758: - bq. I wonder if it would be possible to calculate and save an abbreviated merkle tree when writing each sstable, that could be combined cheaply to give this answer. Hmm. Thinking about it for just a few seconds more, this is highly unlikely to be workable since we would be hashing multiple versions of a partition key. It might be workable for datasets that are append only, but not sure if it's a worthwhile optimisation for that case alone. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910293#comment-13910293 ] Jonathan Ellis commented on CASSANDRA-6758: --- That is (one reason) why we took a different approach with CASSANDRA-5351. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910298#comment-13910298 ] Jimmy Mårdell commented on CASSANDRA-6758: -- Right. I realize this ticket is almost irrelevant in 2.1. But that's still a long way away (at least if you follow DSE). This would be some kind of mitigation until then, and should be done in 1.2 preferably. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910811#comment-13910811 ] sankalp kohli commented on CASSANDRA-6758: -- If you are depending on extrapolated, you can see how many ranges did not match during repair. This will give you a similar answer as this info is logged. Also since, a tree range/leaf can have multiple rows, you can calculate that by knowing the number of rows per instance and dividing by 32k leafs of Merkle tree. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910830#comment-13910830 ] Jimmy Mårdell commented on CASSANDRA-6758: -- A range is not the same as a leaf, is it? If two leaves with the same parent mismatches, it's still only one range (I think?). So it's hard to know from the logs how much was out of sync. We've had problems in the past with overstreaming causing serious performance problems. Had we known the cluster was that out of sync, we might have taken some extra measure before running the repair. With subrange repairs, and CASSANDRA-6713, perhaps this will no longer be an issue. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6758) Measure data consistency in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911092#comment-13911092 ] sankalp kohli commented on CASSANDRA-6758: -- Yes if there is a mismatch of an inner node in the tree, it will log that. May be we can sum the ranges which do not match in Differencer in 1.2. Regarding performance problems with lot of streaming. I think we should pause the streams if Cassandra detects that lot of data is being transferred causing the disk to get full or L0 to grow. I had created this JIRA https://issues.apache.org/jira/browse/CASSANDRA-6752 This will also make things easy to operate from such problems as you don't need to do sub range repairs. Measure data consistency in the cluster --- Key: CASSANDRA-6758 URL: https://issues.apache.org/jira/browse/CASSANDRA-6758 Project: Cassandra Issue Type: New Feature Reporter: Jimmy Mårdell Priority: Minor Running multi-DC Cassandra can be a challenge as the cluster easily tends to get out-of-sync. We have been thinking it would be nice to measure how out of sync a cluster is and expose those metrics somehow. One idea would be to just run the first half of the repair process and output the result of the differencer. If you use Random or the Murmur3 partitioner, it should be enough to calculate the merkle tree over a small subset of the ring as the result can be extrapolated. This could be exposed in nodetool. Either a separate command or perhaps a dry-run flag to repair? Not sure about the output format. I think it would be nice to have one value (% consistent?) within a DC, and also one value for every pair of DC's perhaps? -- This message was sent by Atlassian JIRA (v6.1.5#6160)