[jira] [Assigned] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd reassigned CASSANDRA-11748: - Assignee: (was: Nirmal Singh KPS) > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Priority: Urgent > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663994#comment-16663994 ] Matt Byrd edited comment on CASSANDRA-11748 at 10/25/18 4:46 PM: - I think it would be great to try and fix these related issues in the 4.0 timeframe. I'd be keen on trying the above outlined approach, I'll have a go at sketching it out in a PR to see what folks think. To reiterate what I believe to be fundamental problem: The way we tee up a schema pull whenever a relevant gossip event shows a node with a different schema version, results in far too many superfluous pulls for the same schema contents. When there are sufficient endpoints and a sufficiently large schema doing so can lead to the instance OOMing. The above proposed solution solves this by decoupling the schema pulls from the incoming gossip messages and instead using gossip to update the nodes view of which other nodes have which schema version and then having a thread periodically check and attempt to resolve any inconsistencies. There are some details to flesh out and I think an important part will be to ensure we have tests to demonstrate the issues and demonstrate we've fixed them. I'm hoping that we can perhaps leverage [CASSANDRA-14821|https://issues.apache.org/jira/browse/CASSANDRA-14821] to do so. Though we may want to augment this with dtests or something else. Let me know if you have any thoughts on the above approach, perhaps a sketch in code will help better illuminate it and help flush out potential problems. [~iamaleksey] / [~spo...@gmail.com] / [~michael.fong] / [~jjirsa] was (Author: mbyrd): I think it would be great to try and fix these related issues in the 4.0 timeframe. I'd be keen on trying the above outlined approach, I'll have a go at sketching it out in a PR to see what folks think. To reiterate what I believe to be fundamental problem: The way we tee up a schema pull whenever a relevant gossip event shows a node with a different schema version, results in far too many superfluous pulls for the same schema contents. When there are sufficient endpoints and a sufficiently large schema doing so can lead to the instance OOMing. The above proposed solution solves this by decoupling the schema pulls from the incoming gossip messages and instead using gossip to update the nodes view of which other nodes have which schema version and then having a thread periodically check and attempt to resolve any inconsistencies. There are some details to flesh out and I think an important part will be to ensure we have tests to demonstrate the issues and demonstrate we've fixed them. I'm hoping that we can perhaps leverage [CASSANDRA-14821|https://issues.apache.org/jira/browse/CASSANDRA-14821] to do so. Though we may want to augment this with dtests or something else. Let me know if you have any thoughts on the above approach, perhaps a sketch in code will help better illuminate it and help flush out potential problems. [~iamaleksey][~spo...@gmail.com][~michael.fong][~jjirsa] > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663994#comment-16663994 ] Matt Byrd commented on CASSANDRA-11748: --- I think it would be great to try and fix these related issues in the 4.0 timeframe. I'd be keen on trying the above outlined approach, I'll have a go at sketching it out in a PR to see what folks think. To reiterate what I believe to be fundamental problem: The way we tee up a schema pull whenever a relevant gossip event shows a node with a different schema version, results in far too many superfluous pulls for the same schema contents. When there are sufficient endpoints and a sufficiently large schema doing so can lead to the instance OOMing. The above proposed solution solves this by decoupling the schema pulls from the incoming gossip messages and instead using gossip to update the nodes view of which other nodes have which schema version and then having a thread periodically check and attempt to resolve any inconsistencies. There are some details to flesh out and I think an important part will be to ensure we have tests to demonstrate the issues and demonstrate we've fixed them. I'm hoping that we can perhaps leverage [CASSANDRA-14821|https://issues.apache.org/jira/browse/CASSANDRA-14821] to do so. Though we may want to augment this with dtests or something else. Let me know if you have any thoughts on the above approach, perhaps a sketch in code will help better illuminate it and help flush out potential problems. [~iamaleksey][~spo...@gmail.com][~michael.fong][~jjirsa] > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a
[jira] [Created] (CASSANDRA-14531) Only include data owned by the node in totals for repaired, un-repaired and pending repair.
Matt Byrd created CASSANDRA-14531: - Summary: Only include data owned by the node in totals for repaired, un-repaired and pending repair. Key: CASSANDRA-14531 URL: https://issues.apache.org/jira/browse/CASSANDRA-14531 Project: Cassandra Issue Type: Improvement Components: Metrics, Repair Reporter: Matt Byrd Fix For: 4.x If there is data which is left over from a topology change and is not yet cleaned up, it will be included in the total for BytesRepaired, BytesUnrepaired or BytesPendingRepair metrics. This can distort the total and lead to misleading metrics (albeit potentially short-lived). As an operator if you wanted to keep track of percent repaired, you might not have an accurate idea of the relevant percent repaired under such conditions. I propose we only include sstables owned by the node in the totals for BytesRepaired, BytesUnrepaired, BytesPendingRepair and PercentRepaired. It feels more logical to only emit metrics like repaired/un-repaired for data which can actually be repaired. When an SStable is partially owned by the node, we can compute the size which falls within the token-range by binary searching the index for the uncompressed offsets. We can finally also emit a metric which consists of all the data which is not owned by the node. This might also be helpful for operators to discover whether there is data which is not owned by the node and hence the need to run cleanup. On slight complication is that with a large number of sstables and a reasonable number of vnodes, computing these values now becomes a bit expensive. There is probably a way of keeping some of these metrics updated online rather than re-computing periodically, though this might be a bit fiddly. Alternately using things like the interval tree or some other data-structure might be enough to ensure it performs sufficiently and doesn't add undue overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14531) Only include data owned by the node in totals for repaired, un-repaired and pending repair.
[ https://issues.apache.org/jira/browse/CASSANDRA-14531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd reassigned CASSANDRA-14531: - Assignee: Matt Byrd > Only include data owned by the node in totals for repaired, un-repaired and > pending repair. > --- > > Key: CASSANDRA-14531 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14531 > Project: Cassandra > Issue Type: Improvement > Components: Metrics, Repair >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > If there is data which is left over from a topology change and is not yet > cleaned up, it will be included in the total for BytesRepaired, > BytesUnrepaired or BytesPendingRepair metrics. > This can distort the total and lead to misleading metrics (albeit > potentially short-lived). > As an operator if you wanted to keep track of percent repaired, you might > not have an accurate idea of the relevant percent repaired under such > conditions. > I propose we only include sstables owned by the node in the totals for > BytesRepaired, BytesUnrepaired, BytesPendingRepair and PercentRepaired. It > feels more logical to only emit metrics like repaired/un-repaired for data > which can actually be repaired. > When an SStable is partially owned by the node, we can compute the size which > falls within the token-range by binary searching the index for the > uncompressed offsets. > We can finally also emit a metric which consists of all the data which is > not owned by the node. > This might also be helpful for operators to discover whether there is data > which is not owned by the node and hence the need to run cleanup. > On slight complication is that with a large number of sstables and a > reasonable number of vnodes, computing these values now becomes a bit > expensive. There is probably a way of keeping some of these metrics updated > online rather than re-computing periodically, though this might be a bit > fiddly. Alternately using things like the interval tree or some other > data-structure might be enough to ensure it performs sufficiently and doesn't > add undue overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13557) allow different NUMACTL_ARGS to be passed in
[ https://issues.apache.org/jira/browse/CASSANDRA-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129276#comment-16129276 ] Matt Byrd commented on CASSANDRA-13557: --- for reference the commit is actually here I believe: [af20226dcadc6f15e245b3c786233d783d77b914|https://github.com/apache/cassandra/commit/af20226dcadc6f15e245b3c786233d783d77b914] > allow different NUMACTL_ARGS to be passed in > > > Key: CASSANDRA-13557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13557 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 3.0.15, 3.11.1, 4.0 > > > Currently in bin/cassandra the following is hardcoded: > NUMACTL_ARGS="--interleave=all" > Ideally users of cassandra/bin could pass in a different set of NUMACTL_ARGS > if they wanted to say bind the process to a socket for cpu/memory reasons, > rather than having to comment out/modify this line in the deployed > cassandra/bin. e.g as described in: > https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html > This could be done by just having the default be set to "--interleave=all" > but pickup any value which has already been set for the variable NUMACTL_ARGS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129137#comment-16129137 ] Matt Byrd commented on CASSANDRA-11748: --- Hi [~mcfongtw], Hopefully I'm interpreting your comments correctly, I believe it's further analysis of the particular problem rather than suggestions for improvement? Firstly I agree that having an unbounded number of concurrent migration tasks is the root of the problem (along with the other pre-condition of having a suitably large schema and somehow missing a schema update, either being down or being on another major version from where the change took place): {quote} 1. Have migration checks and requests fired asynchronously and finally stack up the all message at the receiver end merge the schema one-by-one at {code} Schema.instance.mergeSchemaAndAnnounceVersion() {code} {quote} Rather than trying to de-dupe the schema mutations at the receiver end (which might help reduce how much is retained on the heap but ultimately doesn't get at the heart of the problem). {quote} 2. Send the receiver the complete copy of schema, instead of delta copy of schema out of diff between two nodes. {quote} Sending the whole copy of the schema came into play here: https://issues.apache.org/jira/browse/CASSANDRA-1391. I believe reverting this behaviour is probably out of scope of any 3.0 update, but perhaps for a future patch we can negotiate the delta rather than sending the whole schema. This would be a good improvement, but I don't think it's strictly necessary for solving this particular problem. {quote} 3. Last but not least, the most mysterious problem that leads to OOM and we could not figure out why back then, is that there are hundreds of migration task all fired nearly simultaneously, within 2 s. The number of rpcs does not match with the nodes in cluster, but is close to number of second taken for the node to reboot. {quote} It's possible there is something else going on in addition here, although one thing that I've observed (as mentioned above) is that due to all the heap pressure from the large mutations sent concurrently, the node itself can pause for several seconds and hence both be marked as DOWN by the remote nodes and mark those remote nodes DOWN itself, followed by then marking them UP and doing a another schema pull as a result. This spiral often results in many more migration tasks than are necessary, before either OOMing out or finally applying the required schema change. If you still have your logs you could check roughly how many on on UP messages for other endpoints occurred on a problematic instance and compare that to the number of migration tasks. At any rate I believe either rate limiting the migration tasks either globally or per schema version or indeed coming up with an alternative mechanism which serialises the schema pulls should address the problem. I'll take a look at a proposition by [~iamaleksey] to pass the information about schema versions to a map and move the actual triggering of pull requests onto a frequently run periodic task, which reads this map and decides an appropriate course of action to resolve the schema difference. (This way we can collect all this information arriving asynchronously and for example de-duplicate repeated calls for the same schema version for different endpoints.) I think the main advantage of such an approach (as opposed to limiting the number of migration tasks by schema version) is that it removes the possibility of ending up with a stale schema due to the limiting, however it's worth noting that doing the limit per schema version and expiring the limits, already goes a long way to reducing this possibility. I'll try and dig up that version of the patch for reference/comparison. [~iamaleksey], [~spod] Please let me know if there is anything in particular about the way you want this to behave or you feel I've misrepresented the idea in any way. One further thing that did occur to me was that trying to balance avoiding superfluous schema pulls against ensuring we converge as quickly as possible, might necessitate some degree of parallelism. For example if we pick a node to pull schema from and it's partitioned off, so we don't hear back for a while(or ever), under such a scenario we probably want to be proactively scheduling a pull from elsewhere to avoid waiting too long for a timeout. I'm sure there will be some other details to work out too but I think the general approach makes sense. > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug >
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122194#comment-16122194 ] Matt Byrd commented on CASSANDRA-11748: --- {quote} But we should at least take the schema Ids and/or endpoints into account as well. It just doesn't make sense to queue 50 requests for the same schema Id and potentially drop requests for a different schema afterwards. {quote} Yes, I did also have a patch with an expiring map of schema-version to counter and was limiting it per schema version, but decided to keep it simple, since the single limit sufficed for a particular scenario. Less relevant, but it also provides some protection in the rather strange case that there are actually lots of different schema versions in the cluster. I could resurrect the schema version patch, but it sounds like we're considering a slightly different approach. {quote} Schedule that pull with a delay instead, give the new node a chance to pull the new schema from one of the nodes in the cluster. It'll most likely converge by the time the delay has passed, so we'd just abort the request if schema versions now match. {quote} Once a node has been up for MIGRATION_DELAY_IN_MS and doesn't have an empty schema, it will always schedule the task to pull schema with a delay of MIGRATION_DELAY_IN_MS and then do a further check within the task itself to see if the schema versions still differ before asking for schema. Though admittedly this problem does still exist if two nodes start up at the same time, they may pull from each other. I suppose we're going to schedule a pull from a newer node too, then assuming we successively merge the schema together we end up hopefully at the final desired state? Although in the interim I suppose it's possible a node might come into play with a slightly older schema, but I suppose that can just happen whenever a DOWN node comes up with out of date schema. It's also possible that if the node is so overwhelmed by the reverse problem, it won't have made it to the correct schema version in MIGRATION_DELAY_IN_MS and hence will start sending it's old schema back at all the other nodes in the cluster, fortunately the sending happens on the migration stage so is single threaded and less likely to cause OOMS. > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117330#comment-16117330 ] Matt Byrd commented on CASSANDRA-11748: --- Hey [~iamaleksey], I know as part of https://issues.apache.org/jira/browse/CASSANDRA-10699 and related JIRAS you are intent on reworking schema quite a bit. I'm trying to determine whether the migration limit patches linked above will still be necessary in addition to the changes you're making. It sounded like the serialised schema itself might become a bit cheaper (reducing the heap cost of the sending the big serialised mutation) however the fundamental PULL model of getting schema changes on startup wouldn't change. I.e if you are a node in a large cluster and are been down whilst a schema change occurs, when you startup, you will still ask for schema from all the other nodes as they appear in your view and only stop asking when you've successfully applied the schema. I suppose if this is the case, then we probably still need the above linked patch for trunk, what do you think? Thanks, Matt > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8076) Expose an mbean method to poll for repair job status
[ https://issues.apache.org/jira/browse/CASSANDRA-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111857#comment-16111857 ] Matt Byrd commented on CASSANDRA-8076: -- Looks like this issue was trying to address a similar problem described in: https://issues.apache.org/jira/browse/CASSANDRA-13480 In the patch there I added a method which allows one to get the parent repair status of a given repair: https://github.com/apache/cassandra/commit/20d5ce8b9b587be2f0b7bc5765254e8dc6e0bd3b Which is sort of similar to the mentioned method here. Additionally nodetool now also checks for this status when notifications are lost and periodically so we don't hang indefinitely when notifications are lost. Those using Jmx directly can do something analogous if the desire. [~yukim] Would you mind taking a quick look CASSANDRA-13480 and If appropriate close this as a duplicate of that? > Expose an mbean method to poll for repair job status > > > Key: CASSANDRA-8076 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8076 > Project: Cassandra > Issue Type: Improvement >Reporter: Philip S Doctor >Assignee: Yuki Morishita > Attachments: 8076-2.0.txt > > > Given the int reply-id from forceRepairAsync, allow a client to request the > status of this ID via jmx. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-13480: -- Status: Ready to Commit (was: Patch Available) > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: repair > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-13480: -- Reviewer: Chris Lohfink (was: Blake Eggleston) > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: repair > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13557) allow different NUMACTL_ARGS to be passed in
[ https://issues.apache.org/jira/browse/CASSANDRA-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056599#comment-16056599 ] Matt Byrd commented on CASSANDRA-13557: --- |3.0|3.11|Trunk| |[branch|https://github.com/Jollyplum/cassandra/tree/13557]|[branch|https://github.com/Jollyplum/cassandra/tree/13557-3.11]|[branch|https://github.com/Jollyplum/cassandra/tree/13557]| |[testall|https://circleci.com/gh/Jollyplum/cassandra/19#tests/containers/3]|[testall|https://circleci.com/gh/Jollyplum/cassandra/20]|[testall|https://circleci.com/gh/Jollyplum/cassandra/6]| > allow different NUMACTL_ARGS to be passed in > > > Key: CASSANDRA-13557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13557 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > Currently in bin/cassandra the following is hardcoded: > NUMACTL_ARGS="--interleave=all" > Ideally users of cassandra/bin could pass in a different set of NUMACTL_ARGS > if they wanted to say bind the process to a socket for cpu/memory reasons, > rather than having to comment out/modify this line in the deployed > cassandra/bin. e.g as described in: > https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html > This could be done by just having the default be set to "--interleave=all" > but pickup any value which has already been set for the variable NUMACTL_ARGS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054926#comment-16054926 ] Matt Byrd commented on CASSANDRA-11748: --- |3.0|3.11|Trunk| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]|[branch|https://github.com/Jollyplum/cassandra/tree/11748-3.11]|[branch|https://github.com/Jollyplum/cassandra/tree/11748]| |[dtest|]|[dtest|]|[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/93/testReport/]| |[testall|https://circleci.com/gh/Jollyplum/cassandra/15]|[testall|https://circleci.com/gh/Jollyplum/cassandra/16]|[testall|https://circleci.com/gh/Jollyplum/cassandra/17]| > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054909#comment-16054909 ] Matt Byrd edited comment on CASSANDRA-13480 at 6/19/17 11:11 PM: - ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]| |[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]| |[testall|https://circleci.com/gh/Jollyplum/cassandra/14]| was (Author: mbyrd): ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]| |[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]||[testall|https://circleci.com/gh/Jollyplum/cassandra/14]| > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-13480: -- Reviewer: Blake Eggleston Reproduced In: 3.0.13, 2.1.16, 4.x (was: 2.1.16, 3.0.13, 4.x) Status: Patch Available (was: Open) > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054909#comment-16054909 ] Matt Byrd edited comment on CASSANDRA-13480 at 6/19/17 11:03 PM: - ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]| |[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]||[testall|https://circleci.com/gh/Jollyplum/cassandra/14]| was (Author: mbyrd): ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]| |[testall|https://circleci.com/gh/Jollyplum/cassandra/14]| |[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]| > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054909#comment-16054909 ] Matt Byrd edited comment on CASSANDRA-13480 at 6/19/17 11:02 PM: - ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]| |[testall|https://circleci.com/gh/Jollyplum/cassandra/14]| |[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]| was (Author: mbyrd): ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]|[testall|https://circleci.com/gh/Jollyplum/cassandra/14]|[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]| > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054909#comment-16054909 ] Matt Byrd commented on CASSANDRA-13480: --- ||Trunk||| |[branch|https://github.com/Jollyplum/cassandra/tree/13480]|[testall|https://circleci.com/gh/Jollyplum/cassandra/14]|[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/98/]| > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13570) allow sub-range repairs (specifying -et -st) for a preview of repaired data
[ https://issues.apache.org/jira/browse/CASSANDRA-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046826#comment-16046826 ] Matt Byrd commented on CASSANDRA-13570: --- Done, thanks > allow sub-range repairs (specifying -et -st) for a preview of repaired data > --- > > Key: CASSANDRA-13570 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13570 > Project: Cassandra > Issue Type: Improvement >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.0 > > > I don't see any inherent reason for restricting preview repairs of repaired > data to not allow specifying start and end tokens. > The restriction seems to be coming from the fact that incremental=true in > RepairOption, which is the case but it's not truly an incremental repair > since we're only previewing. > {code:java} > if (option.isIncremental() && !option.isGlobal()) > { > throw new IllegalArgumentException("Incremental repairs cannot be > run against a subset of tokens or ranges"); > } > {code} > It would be helpful to allow this, so that operators could sequence a sweep > over the entirety of the token-space in a more gradual fashion. > Also it might help in examining which portions of the token-space differ. > Can anyone see any reasons for not allowing this? > I.e just changing the above to something like: > {code:java} > if (option.isIncremental() && !option.getPreviewKind().isPreview() && > !option.isGlobal()) > { > throw new IllegalArgumentException("Incremental repairs cannot > be run against a subset of tokens or ranges"); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13569) Schedule schema pulls just once per endpoint
[ https://issues.apache.org/jira/browse/CASSANDRA-13569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044683#comment-16044683 ] Matt Byrd commented on CASSANDRA-13569: --- [~spod] Yes avoiding multiple schema migrations in flight per endpoint seems like a strict improvement. Maybe the issue on CASSANDRA-11748 can be addressed there separately or perhaps if CASSANDRA-10699 reworks the mechanism CASSANDRA-11748 will no longer be a problem. > Schedule schema pulls just once per endpoint > > > Key: CASSANDRA-13569 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13569 > Project: Cassandra > Issue Type: Improvement > Components: Distributed Metadata >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski > Fix For: 3.0.x, 3.11.x, 4.x > > > Schema mismatches detected through gossip will get resolved by calling > {{MigrationManager.maybeScheduleSchemaPull}}. This method may decide to > schedule execution of {{MigrationTask}}, but only after using a > {{MIGRATION_DELAY_IN_MS = 6}} delay (for reasons unclear to me). > Meanwhile, as long as the migration task hasn't been executed, we'll continue > to have schema mismatches reported by gossip and will have corresponding > {{maybeScheduleSchemaPull}} calls, which will schedule further tasks with the > mentioned delay. Some local testing shows that dozens of tasks for the same > endpoint will eventually be executed and causing the same, stormy behavior > for this very endpoints. > My proposal would be to simply not schedule new tasks for the same endpoint, > in case we still have pending tasks waiting for execution after > {{MIGRATION_DELAY_IN_MS}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13569) Schedule schema pulls just once per endpoint
[ https://issues.apache.org/jira/browse/CASSANDRA-13569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035393#comment-16035393 ] Matt Byrd commented on CASSANDRA-13569: --- Sure n.p [~spo...@gmail.com] Yes, so adding jitter in MIGRATION_DELAY_IN_MS could help when we're past: {code:java} | runtimeMXBean.getUptime() < MIGRATION_DELAY_IN_MS) {code} However it doesn't help on startup. Initially in trying to solve CASSANDRA-11748, I did also think about adding random the delay for even this branch (where we've only been up a short amount of time). This just didn't seem that straightforward to do and also guarantee that we wouldn't hit the problem described in CASSANDRA-11748. How do you know what is enough random delay? what if you actually delay getting the schema legitimately? I suppose the concerns in this ticket are similar but not exactly the same as CASSANDRA-11748, though I admit that rate limiting the number of schema pulls per endpoint to one at a time seems sensible and might possibly help a bit with CASSANDRA-11748. The schema is being pulled repeatedly from the same instances in CASSANDRA-11748, but I'm not sure rate limiting alone as described above will definitely solve it, perhaps it will make it less likely to OOM, but we're still going to have a lot of incoming serialised schemas from lots of nodes and we're still left with this sort of rough limit to scalability of "number of nodes * size of serialised schema" (albeit perhaps with a different threshold). Maybe some upcoming changes in CASSANDRA-10699 and related tickets may make the problem CASSANDRA-11748 even less likely, since part of the problem is that we're sending the entire serialised schema inside a mutation, which can end up being quite large if you have lots of tables or lots of columns in lots of tables. Also, for reference I believe the migration delay was added in the following ticket, in order to give a schema alteration sufficient time to propagate from the node where it changed, and not have a migration task race with this change and pull the whole schema instead of receive the delta: https://issues.apache.org/jira/browse/CASSANDRA-5025 > Schedule schema pulls just once per endpoint > > > Key: CASSANDRA-13569 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13569 > Project: Cassandra > Issue Type: Improvement > Components: Distributed Metadata >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski > Fix For: 3.0.x, 3.11.x, 4.x > > > Schema mismatches detected through gossip will get resolved by calling > {{MigrationManager.maybeScheduleSchemaPull}}. This method may decide to > schedule execution of {{MigrationTask}}, but only after using a > {{MIGRATION_DELAY_IN_MS = 6}} delay (for reasons unclear to me). > Meanwhile, as long as the migration task hasn't been executed, we'll continue > to have schema mismatches reported by gossip and will have corresponding > {{maybeScheduleSchemaPull}} calls, which will schedule further tasks with the > mentioned delay. Some local testing shows that dozens of tasks for the same > endpoint will eventually be executed and causing the same, stormy behavior > for this very endpoints. > My proposal would be to simply not schedule new tasks for the same endpoint, > in case we still have pending tasks waiting for execution after > {{MIGRATION_DELAY_IN_MS}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-10699) Make schema alterations strongly consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035130#comment-16035130 ] Matt Byrd edited comment on CASSANDRA-10699 at 6/2/17 5:56 PM: --- In particular I'm interested in how to avoid concurrent schema changes causing problems. [~iamaleksey] Is the plan to use Paxos to linearise the schema changes? btw the original assignee change was not intentional. was (Author: mbyrd): In particular I'm interested in how to avoid concurrent schema changes causing problems. [~iamaleksey] Is the plan to use Paxos to linearise the schema changes? > Make schema alterations strongly consistent > --- > > Key: CASSANDRA-10699 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10699 > Project: Cassandra > Issue Type: Sub-task >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko > Fix For: 4.0 > > > Schema changes do not necessarily commute. This has been the case before > CASSANDRA-5202, but now is particularly problematic. > We should employ a strongly consistent protocol instead of relying on > marshalling {{Mutation}} objects with schema changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-10699) Make schema alterations strongly consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd reassigned CASSANDRA-10699: - Assignee: Aleksey Yeschenko (was: Matt Byrd) > Make schema alterations strongly consistent > --- > > Key: CASSANDRA-10699 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10699 > Project: Cassandra > Issue Type: Sub-task >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko > Fix For: 4.0 > > > Schema changes do not necessarily commute. This has been the case before > CASSANDRA-5202, but now is particularly problematic. > We should employ a strongly consistent protocol instead of relying on > marshalling {{Mutation}} objects with schema changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10699) Make schema alterations strongly consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035130#comment-16035130 ] Matt Byrd commented on CASSANDRA-10699: --- In particular I'm interested in how to avoid concurrent schema changes causing problems. [~iamaleksey] Is the plan to use Paxos to linearise the schema changes? > Make schema alterations strongly consistent > --- > > Key: CASSANDRA-10699 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10699 > Project: Cassandra > Issue Type: Sub-task >Reporter: Aleksey Yeschenko >Assignee: Matt Byrd > Fix For: 4.0 > > > Schema changes do not necessarily commute. This has been the case before > CASSANDRA-5202, but now is particularly problematic. > We should employ a strongly consistent protocol instead of relying on > marshalling {{Mutation}} objects with schema changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-10699) Make schema alterations strongly consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd reassigned CASSANDRA-10699: - Assignee: Matt Byrd > Make schema alterations strongly consistent > --- > > Key: CASSANDRA-10699 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10699 > Project: Cassandra > Issue Type: Sub-task >Reporter: Aleksey Yeschenko >Assignee: Matt Byrd > Fix For: 4.0 > > > Schema changes do not necessarily commute. This has been the case before > CASSANDRA-5202, but now is particularly problematic. > We should employ a strongly consistent protocol instead of relying on > marshalling {{Mutation}} objects with schema changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13570) allow sub-range repairs (specifying -et -st) for a preview of repaired data
Matt Byrd created CASSANDRA-13570: - Summary: allow sub-range repairs (specifying -et -st) for a preview of repaired data Key: CASSANDRA-13570 URL: https://issues.apache.org/jira/browse/CASSANDRA-13570 Project: Cassandra Issue Type: Improvement Reporter: Matt Byrd Assignee: Matt Byrd Priority: Minor Fix For: 4.x I don't see any inherent reason for restricting preview repairs of repaired data to not allow specifying start and end tokens. The restriction seems to be coming from the fact that incremental=true in RepairOption, which is the case but it's not truly an incremental repair since we're only previewing. {code:java} if (option.isIncremental() && !option.isGlobal()) { throw new IllegalArgumentException("Incremental repairs cannot be run against a subset of tokens or ranges"); } {code} It would be helpful to allow this, so that operators could sequence a sweep over the entirety of the token-space in a more gradual fashion. Also it might help in examining which portions of the token-space differ. Can anyone see any reasons for not allowing this? I.e just changing the above to something like: {code:java} if (option.isIncremental() && !option.getPreviewKind().isPreview() && !option.isGlobal()) { throw new IllegalArgumentException("Incremental repairs cannot be run against a subset of tokens or ranges"); } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034971#comment-16034971 ] Matt Byrd commented on CASSANDRA-11748: --- So I believe the crux of this problem is: On startup if our schema differs from one of the nodes on the same version of messaging service as ourselves, we'll pull the schema from said node upon marking it as UP, hence with a large cluster and a large schema, we're pulling many copies of the serialised schema onto the heap, potentially causing pressure on the heap and eventually OOMS. To make matters worse, when we hit the GC pauses this seems to result in the other nodes being marked as DOWN and then UP again, pulling the schema once again. As a result the instance OOMs on startup, with a large enough schema and cluster this is probably deterministic. This can happen when a node is down for a while and has missed a schema change, or if the given upgrade path results in a schema version change, which somehow is not reflected quickly enough locally, then maybeScheduleSchemaPull runs and decides to pull it remotely. When you startup and see hundreds of nodes all with the same schema version that you need it probably doesn't make much sense to pull it from every single one of them, if instead we just limit the number of schema migration tasks in flight, we can limit or stop this behaviour from occuring. I've got a patch which does just this and fixes a dtest reproduction I've written. I had some other variants that limited the number of in-flight tasks per schema version for example, but it seemed that a straightforward limit was sufficient. Admittedly I'm not certain that the upgrade problem still exists, but starting a node without the latest schema should still cause this problem. There is a expiry to the limit to avoid getting stuck in a state where the counter for inflight isn't decremented properly (which during testing I found can occur, whenever a message sent via messaging fails to even be sent properly, hence neither the failure nor success callback is ever called). I'll attach some links shortly. > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line
[jira] [Created] (CASSANDRA-13557) allow different NUMACTL_ARGS to be passed in
Matt Byrd created CASSANDRA-13557: - Summary: allow different NUMACTL_ARGS to be passed in Key: CASSANDRA-13557 URL: https://issues.apache.org/jira/browse/CASSANDRA-13557 Project: Cassandra Issue Type: Improvement Components: Configuration Reporter: Matt Byrd Assignee: Matt Byrd Priority: Minor Fix For: 4.x Currently in bin/cassandra the following is hardcoded: NUMACTL_ARGS="--interleave=all" Ideally users of cassandra/bin could pass in a different set of NUMACTL_ARGS if they wanted to say bind the process to a socket for cpu/memory reasons, rather than having to comment out/modify this line in the deployed cassandra/bin. e.g as described in: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html This could be done by just having the default be set to "--interleave=all" but pickup any value which has already been set for the variable NUMACTL_ARGS. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989556#comment-15989556 ] Matt Byrd commented on CASSANDRA-13480: --- So the patch I have currently also caches the notifications for repairs for a limited time on the co-ordinator, it was initially targeting a release where we didn't yet have the repair history tables. I suppose there is a concern that caching these notifications could under some circumstances cause unwanted extra heap usage. (Similarly to the notifications buffer, although at least here we're only caching a subset that we care more about) So using the repair history tables instead and exposing this information by imx seems like a reasonable alternative. There are perhaps a couple of kinks to work out, but I'll have a go at adapting the patch that I have to work in this way. For one we only have the cmd id int sent back to the nodetool process (rather than the parent session id which the internal table is partition keyed off) We could either keep track of the cmd id int -> parent session uuid in the co-ordinator, either in memory cached to expire or in another internal table, or we could parse the uuid out of the notification sent for the start of the parent repair. Parsing the message is a bit brittle though and not full proof in theory (we could miss that notification also). Ideally I suppose running a repair could return and communicate on the basis of the parent session uuid rather than the int cmd id, but this is a pretty major overhaul and has all sorts of compatibility questions. > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 4.x > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
Matt Byrd created CASSANDRA-13480: - Summary: nodetool repair can hang forever if we lose the notification for the repair completing/failing Key: CASSANDRA-13480 URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Matt Byrd Assignee: Matt Byrd Priority: Minor Fix For: 4.x When a Jmx lost notification occurs, sometimes the lost notification in question is the notification which let's RepairRunner know that the repair is finished (ProgressEventType.COMPLETE or even ERROR for that matter). This results in nodetool process running the repair hanging forever. I have a test which reproduces the issue here: https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test To fix this, If on receiving a notification that notifications have been lost (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via Jmx to receive all the relevant notifications we're interested in, we can replay those we missed and avoid this scenario. It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself might be lost and so for good measure I have made RepairRunner poll periodically to see if there were any notifications that had been sent but we didn't receive (scoped just to the particular tag for the given repair). Users who don't use nodetool but go via jmx directly, can still use this new endpoint and implement similar behaviour in their clients as desired. I'm also expiring the notifications which have been kept on the server side. Please let me know if you've any questions or can think of a different approach, I also tried setting: JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" but this didn't fix the test. I suppose it might help under certain scenarios but in this test we don't even send that many notifications so I'm not surprised it doesn't fix it. It seems like getting lost notifications is always a potential problem with jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
[ https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968247#comment-15968247 ] Matt Byrd commented on CASSANDRA-13307: --- Hey [~michaelsembwever] Did you still want me to take a look? sounds like the failures can be explained by flakiness? > The specification of protocol version in cqlsh means the python driver > doesn't automatically downgrade protocol version. > > > Key: CASSANDRA-13307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: doc-impacting > Fix For: 3.11.x > > > Hi, > Looks like we've regressed on the issue described in: > https://issues.apache.org/jira/browse/CASSANDRA-9467 > In that we're no longer able to connect from newer cqlsh versions > (e.g trunk) to older versions of Cassandra with a lower version of the > protocol (e.g 2.1 with protocol version 3) > The problem seems to be that we're relying on the ability for the client to > automatically downgrade protocol version implemented in Cassandra here: > https://issues.apache.org/jira/browse/CASSANDRA-12838 > and utilised in the python client here: > https://datastax-oss.atlassian.net/browse/PYTHON-240 > The problem however comes when we implemented: > https://datastax-oss.atlassian.net/browse/PYTHON-537 > "Don't downgrade protocol version if explicitly set" > (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of > fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) > Since we do explicitly specify the protocol version in the bin/cqlsh.py. > I've got a patch which just adds an option to explicitly specify the protocol > version (for those who want to do that) and then otherwise defaults to not > setting the protocol version, i.e using the protocol version from the client > which we ship, which should by default be the same protocol as the server. > Then it should downgrade gracefully as was intended. > Let me know if that seems reasonable. > Thanks, > Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
[ https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947667#comment-15947667 ] Matt Byrd commented on CASSANDRA-13307: --- Hey [~tjake] are you at all keen to review? or shall I see if someone else can? Thanks > The specification of protocol version in cqlsh means the python driver > doesn't automatically downgrade protocol version. > > > Key: CASSANDRA-13307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 3.11.x > > > Hi, > Looks like we've regressed on the issue described in: > https://issues.apache.org/jira/browse/CASSANDRA-9467 > In that we're no longer able to connect from newer cqlsh versions > (e.g trunk) to older versions of Cassandra with a lower version of the > protocol (e.g 2.1 with protocol version 3) > The problem seems to be that we're relying on the ability for the client to > automatically downgrade protocol version implemented in Cassandra here: > https://issues.apache.org/jira/browse/CASSANDRA-12838 > and utilised in the python client here: > https://datastax-oss.atlassian.net/browse/PYTHON-240 > The problem however comes when we implemented: > https://datastax-oss.atlassian.net/browse/PYTHON-537 > "Don't downgrade protocol version if explicitly set" > (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of > fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) > Since we do explicitly specify the protocol version in the bin/cqlsh.py. > I've got a patch which just adds an option to explicitly specify the protocol > version (for those who want to do that) and then otherwise defaults to not > setting the protocol version, i.e using the protocol version from the client > which we ship, which should by default be the same protocol as the server. > Then it should downgrade gracefully as was intended. > Let me know if that seems reasonable. > Thanks, > Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
[ https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-13307: -- Status: Patch Available (was: Open) https://github.com/Jollyplum/cassandra/commit/b52b27810bf0d3bb9caafe21fde6120cf53c7382 https://github.com/apache/cassandra/pull/96 https://github.com/Jollyplum/cassandra/tree/13307 > The specification of protocol version in cqlsh means the python driver > doesn't automatically downgrade protocol version. > > > Key: CASSANDRA-13307 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Fix For: 3.11.x > > > Hi, > Looks like we've regressed on the issue described in: > https://issues.apache.org/jira/browse/CASSANDRA-9467 > In that we're no longer able to connect from newer cqlsh versions > (e.g trunk) to older versions of Cassandra with a lower version of the > protocol (e.g 2.1 with protocol version 3) > The problem seems to be that we're relying on the ability for the client to > automatically downgrade protocol version implemented in Cassandra here: > https://issues.apache.org/jira/browse/CASSANDRA-12838 > and utilised in the python client here: > https://datastax-oss.atlassian.net/browse/PYTHON-240 > The problem however comes when we implemented: > https://datastax-oss.atlassian.net/browse/PYTHON-537 > "Don't downgrade protocol version if explicitly set" > (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of > fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) > Since we do explicitly specify the protocol version in the bin/cqlsh.py. > I've got a patch which just adds an option to explicitly specify the protocol > version (for those who want to do that) and then otherwise defaults to not > setting the protocol version, i.e using the protocol version from the client > which we ship, which should by default be the same protocol as the server. > Then it should downgrade gracefully as was intended. > Let me know if that seems reasonable. > Thanks, > Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.
Matt Byrd created CASSANDRA-13307: - Summary: The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version. Key: CASSANDRA-13307 URL: https://issues.apache.org/jira/browse/CASSANDRA-13307 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Matt Byrd Assignee: Matt Byrd Priority: Minor Hi, Looks like we've regressed on the issue described in: https://issues.apache.org/jira/browse/CASSANDRA-9467 In that we're no longer able to connect from newer cqlsh versions (e.g trunk) to older versions of Cassandra with a lower version of the protocol (e.g 2.1 with protocol version 3) The problem seems to be that we're relying on the ability for the client to automatically downgrade protocol version implemented in Cassandra here: https://issues.apache.org/jira/browse/CASSANDRA-12838 and utilised in the python client here: https://datastax-oss.atlassian.net/browse/PYTHON-240 The problem however comes when we implemented: https://datastax-oss.atlassian.net/browse/PYTHON-537 "Don't downgrade protocol version if explicitly set" (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534) Since we do explicitly specify the protocol version in the bin/cqlsh.py. I've got a patch which just adds an option to explicitly specify the protocol version (for those who want to do that) and then otherwise defaults to not setting the protocol version, i.e using the protocol version from the client which we ship, which should by default be the same protocol as the server. Then it should downgrade gracefully as was intended. Let me know if that seems reasonable. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299242#comment-14299242 ] Matt Byrd commented on CASSANDRA-7688: -- So I suppose the reason for suggesting exposing the same call via cql, was that at least abstractly it was clear what this meant. I concede that plumbing all this through might not be straightforward. The problem with putting it in a system table is, what exactly do you put there? The current computation is a somewhat expensive on demand computation that is generally done relatively rarely. Was your intent to just periodically execute this function and dump the results into system tables? Or did you have something different in mind? Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Assignee: Aleksey Yeschenko Fix For: 2.1.3 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8052) OOMs from allocating large arrays when deserializing (e.g probably corrupted EstimatedHistogram data)
Matt Byrd created CASSANDRA-8052: Summary: OOMs from allocating large arrays when deserializing (e.g probably corrupted EstimatedHistogram data) Key: CASSANDRA-8052 URL: https://issues.apache.org/jira/browse/CASSANDRA-8052 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Matt Byrd We've seen nodes with what are presumably corrupted sstables repeatedly OOM on attempted startup with such a message: {code} java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:266) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:292) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:282) at org.apache.cassandra.io.sstable.SSTableReader.openMetadata(SSTableReader.java:234) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:194) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} It's probably not a coincidence that it's throwing an exception here since this seems to be the first byte of the file read. Presumably the correct operational process is just to replace the node, however I was wondering if generally we might want to validate lengths when we deserialise things? This could avoid allocating large byte buffers causing unpredictable OOMs and instead throw an exception to be handled as appropriate. In this particular instance, there is no need for an unduly large size for the estimated histogram. Admittedly things are slightly different in 2.1, though I suspect a similar thing might have happened with: {code} int numComponents = in.readInt(); // read toc MapMetadataType, Integer toc = new HashMap(numComponents); {code} Doing a find usages of DataInputStream.readInt() reveals quite a few places where an int is read in and then an ArrayList, array or map of that size is created. In some cases this size might validly vary over a java int, or be in a performance critical or delicate piece of code where one doesn't want such checks. Also there are other checksums and mechanisms at play which make some input less likely to be corrupted. However, is it maybe worth a pass over instances of this type of input, to try and avoid such cases where it makes sense? Perhaps there are less likely but worse failure modes present and hidden? E.g if the deserialisation is happens to be for a message sent to some or all nodes say. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091315#comment-14091315 ] Matt Byrd commented on CASSANDRA-7688: -- Originally I was just thinking of exposing the same method available in thrift, via some cql syntax i.e: essentially from StorageProxy: public ListPairRangeToken, Long getSplits(String keyspaceName, String cfName, RangeToken range, int keysPerSplit, CFMetaData metadata) This in turn actually operates on the index intervals in memory, getting appropriately sized splits given the samples taken. Can you please elaborate on what the idea is behind storing this info in a system table? It would seem that you would need to keep doing the above computation or something similar and write the result to a system table. I would have thought it’d be easier to just expose the Storage proxy call via cql? Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Fix For: 2.1.1 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7543) Assertion error when compacting large row with map//list field or range tombstone
[ https://issues.apache.org/jira/browse/CASSANDRA-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064110#comment-14064110 ] Matt Byrd commented on CASSANDRA-7543: -- Thanks for looking at this. With the attached patch my repro script no longer reproduces the problem. It might also be nice to include the value of openedMarkerSize in the debug log line for the dataSize, if only to avoid confusion when debugging, however it's not strictly necessary. Assertion error when compacting large row with map//list field or range tombstone - Key: CASSANDRA-7543 URL: https://issues.apache.org/jira/browse/CASSANDRA-7543 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Matt Byrd Assignee: Yuki Morishita Labels: compaction, map Fix For: 1.2.19 Attachments: 0001-add-rangetombstone-test.patch, 0002-fix-rangetomebstone-not-included-in-LCR-size-calc.patch Hi, So in a couple of clusters we're hitting this problem when compacting large rows with a schema which contains the map data-type. Here is an example of the error: {code} java.lang.AssertionError: incorrect row data size 87776427 written to /cassandra/X/Y/X-Y-tmp-ic-2381-Data.db; correct is 87845952 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:163) org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) {code} I have a python script which reproduces the problem, by just writing lots of data to a single partition key with a schema that contains the map data-type. I added some debug logging and found that the difference in bytes seen in the reproduction (255) was due to the following pieces of data being written: {code} DEBUG [CompactionExecutor:3] 2014-07-13 00:38:42,891 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: [java.nio.HeapByteBuffer[pos=0 lim=34 cap=34], java.nio.HeapByteBuffer[pos=0 lim=34 cap=34]](deletedAt=1405237116014999, localDeletion=1405237116) startPosition: 262476 endPosition: 262561 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:43,007 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@3e5b5939 startPosition: 328157 endPosition: 328242 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:44,159 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@fc3299b startPosition: 984105 endPosition: 984190 diff: 85 {code} So looking at the code you can see that there are extra range tombstones written on the column index border (in ColumnIndex where tombstoneTracker.writeOpenedMarker is called) which aren't accounted for in LazilyCompactedRow.columnSerializedSize. This is where the difference comes from in the assertion error, so the solution is just to account for this data. I have a patch which does just this, by keeping track of the extra data written out via tombstoneTracker.writeOpenedMarker in ColumnIndex and adding it back to the dataSize in LazilyCompactedRow.java, where it serialises out the row size. After applying the patch the reproduction stops producing the AssertionError. I know this is not a problem in 2.0 + because of singe pass compaction, however there are lots of 1.2 clusters out there still which might run into this. Please let me know if you've any questions. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7543) Assertion error when compacting large row with map//list field or range tombstone
Matt Byrd created CASSANDRA-7543: Summary: Assertion error when compacting large row with map//list field or range tombstone Key: CASSANDRA-7543 URL: https://issues.apache.org/jira/browse/CASSANDRA-7543 Project: Cassandra Issue Type: Bug Components: Core Environment: linux cassandra 1.2.16 Reporter: Matt Byrd Hi, So in a couple of clusters we're hitting this problem when compacting large rows with a schema which contains the map data-type. Here is an example of the error: {code} java.lang.AssertionError: incorrect row data size 87776427 written to /cassandra/X/Y/X-Y-tmp-ic-2381-Data.db; correct is 87845952 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:163) org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) {code} I have a python script which reproduces the problem, by just writing lots of data to a single partition key with a schema that contains the map data-type. I added some debug logging and found that the difference in bytes seen in the reproduction (255) was due to the following pieces of data being written: {code} DEBUG [CompactionExecutor:3] 2014-07-13 00:38:42,891 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: [java.nio.HeapByteBuffer[pos=0 lim=34 cap=34], java.nio.HeapByteBuffer[pos=0 lim=34 cap=34]](deletedAt=1405237116014999, localDeletion=1405237116) startPosition: 262476 endPosition: 262561 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:43,007 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@3e5b5939 startPosition: 328157 endPosition: 328242 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:44,159 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@fc3299b startPosition: 984105 endPosition: 984190 diff: 85 {code} So looking at the code you can see that there are extra range tombstones written on the column index border (in ColumnIndex where tombstoneTracker.writeOpenedMarker is called) which aren't accounted for in LazilyCompactedRow.columnSerializedSize. This is where the difference comes from in the assertion error, so the solution is just to account for this data. I have a patch which does just this, by keeping track of the extra data written out via tombstoneTracker.writeOpenedMarker in ColumnIndex and adding it back to the dataSize in LazilyCompactedRow.java, where it serialises out the row size. After applying the patch the reproduction stops producing the AssertionError. I know this is not a problem in 2.0 + because of singe pass compaction, however there are lots of 1.2 clusters out there still which might run into this. Please let me know if you've any questions. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7543) Assertion error when compacting large row with map//list field or range tombstone
[ https://issues.apache.org/jira/browse/CASSANDRA-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061463#comment-14061463 ] Matt Byrd commented on CASSANDRA-7543: -- I believe this is the same issue, which wasn't fixed in 1.2.x, the recommendation being to move to 2.0 where the error was unlikely to occur. Assertion error when compacting large row with map//list field or range tombstone - Key: CASSANDRA-7543 URL: https://issues.apache.org/jira/browse/CASSANDRA-7543 Project: Cassandra Issue Type: Bug Components: Core Environment: linux cassandra 1.2.16 Reporter: Matt Byrd Labels: compaction, map Hi, So in a couple of clusters we're hitting this problem when compacting large rows with a schema which contains the map data-type. Here is an example of the error: {code} java.lang.AssertionError: incorrect row data size 87776427 written to /cassandra/X/Y/X-Y-tmp-ic-2381-Data.db; correct is 87845952 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:163) org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) {code} I have a python script which reproduces the problem, by just writing lots of data to a single partition key with a schema that contains the map data-type. I added some debug logging and found that the difference in bytes seen in the reproduction (255) was due to the following pieces of data being written: {code} DEBUG [CompactionExecutor:3] 2014-07-13 00:38:42,891 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: [java.nio.HeapByteBuffer[pos=0 lim=34 cap=34], java.nio.HeapByteBuffer[pos=0 lim=34 cap=34]](deletedAt=1405237116014999, localDeletion=1405237116) startPosition: 262476 endPosition: 262561 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:43,007 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@3e5b5939 startPosition: 328157 endPosition: 328242 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:44,159 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@fc3299b startPosition: 984105 endPosition: 984190 diff: 85 {code} So looking at the code you can see that there are extra range tombstones written on the column index border (in ColumnIndex where tombstoneTracker.writeOpenedMarker is called) which aren't accounted for in LazilyCompactedRow.columnSerializedSize. This is where the difference comes from in the assertion error, so the solution is just to account for this data. I have a patch which does just this, by keeping track of the extra data written out via tombstoneTracker.writeOpenedMarker in ColumnIndex and adding it back to the dataSize in LazilyCompactedRow.java, where it serialises out the row size. After applying the patch the reproduction stops producing the AssertionError. I know this is not a problem in 2.0 + because of singe pass compaction, however there are lots of 1.2 clusters out there still which might run into this. Please let me know if you've any questions. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7543) Assertion error when compacting large row with map//list field or range tombstone
[ https://issues.apache.org/jira/browse/CASSANDRA-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-7543: - Environment: linux (was: linux cassandra 1.2.16) Assertion error when compacting large row with map//list field or range tombstone - Key: CASSANDRA-7543 URL: https://issues.apache.org/jira/browse/CASSANDRA-7543 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Matt Byrd Labels: compaction, map Hi, So in a couple of clusters we're hitting this problem when compacting large rows with a schema which contains the map data-type. Here is an example of the error: {code} java.lang.AssertionError: incorrect row data size 87776427 written to /cassandra/X/Y/X-Y-tmp-ic-2381-Data.db; correct is 87845952 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:163) org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208) {code} I have a python script which reproduces the problem, by just writing lots of data to a single partition key with a schema that contains the map data-type. I added some debug logging and found that the difference in bytes seen in the reproduction (255) was due to the following pieces of data being written: {code} DEBUG [CompactionExecutor:3] 2014-07-13 00:38:42,891 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: [java.nio.HeapByteBuffer[pos=0 lim=34 cap=34], java.nio.HeapByteBuffer[pos=0 lim=34 cap=34]](deletedAt=1405237116014999, localDeletion=1405237116) startPosition: 262476 endPosition: 262561 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:43,007 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@3e5b5939 startPosition: 328157 endPosition: 328242 diff: 85 DEBUG [CompactionExecutor:3] 2014-07-13 00:38:44,159 ColumnIndex.java (line 168) DATASIZE writeOpenedMarker columnIndex: org.apache.cassandra.db.ColumnIndex$Builder@6678a9d0 firstColumn: org.apache.cassandra.db.Column@fc3299b startPosition: 984105 endPosition: 984190 diff: 85 {code} So looking at the code you can see that there are extra range tombstones written on the column index border (in ColumnIndex where tombstoneTracker.writeOpenedMarker is called) which aren't accounted for in LazilyCompactedRow.columnSerializedSize. This is where the difference comes from in the assertion error, so the solution is just to account for this data. I have a patch which does just this, by keeping track of the extra data written out via tombstoneTracker.writeOpenedMarker in ColumnIndex and adding it back to the dataSize in LazilyCompactedRow.java, where it serialises out the row size. After applying the patch the reproduction stops producing the AssertionError. I know this is not a problem in 2.0 + because of singe pass compaction, however there are lots of 1.2 clusters out there still which might run into this. Please let me know if you've any questions. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7533) Let MAX_OUTSTANDING_REPLAY_COUNT be configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061506#comment-14061506 ] Matt Byrd commented on CASSANDRA-7533: -- Just to add a bit more context, we had a single instance of Cassandra get fairly stuck replaying commitlogs. It was burning through 2000% cpu + for over four hours with no end in sight, so we killed it removed commit logs brought it up and ran repair. (This was in q.a thankfully) The problem can easily be reproduce by just writing 100,000 cql row (range deletes) to the same partition key, stopping Cassandra and starting it again. I admit this is somewhat of an anti-pattern, but still quite a dramatic effect from not very much data. The problem exercised here is that: 1. We contend in the memtable to do this insert in a CAS loop. 2. the work done in this loop becomes ever more expensive as RangeTombstoneList.dataSize is iterated over to compute the size. Point 2. effectively fixed in 2.1 with all the off-heap allocation, the dataSize calculation effectively becomes more online. To resolve this problem in 2.0 you could also keep this tally of dataSize online, or maybe start keeping it online once the list is sufficiently big to cause a problem. Doing this seemed to help a lot, but far simpler was just toggling the concurrency of the commitlog replay, which can be achieved by lowering MAX_OUTSTANDING_REPLAY_COUNT (in our case setting this to 1 seemed to help). Thanks, Matt Let MAX_OUTSTANDING_REPLAY_COUNT be configurable Key: CASSANDRA-7533 URL: https://issues.apache.org/jira/browse/CASSANDRA-7533 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremiah Jordan Assignee: Yuki Morishita Priority: Minor Fix For: 2.0.10 There are some workloads where commit log replay will run into contention issues with multiple things updating the same partition. Through some testing it was found that lowering CommitLogReplayer.java MAX_OUTSTANDING_REPLAY_COUNT can help with this issue. The calculations added in CASSANDRA-6655 are one such place things get bottlenecked. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-5345) Potential problem with GarbageCollectorMXBean
[ https://issues.apache.org/jira/browse/CASSANDRA-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-5345: - Reproduced In: 1.0.7 Potential problem with GarbageCollectorMXBean - Key: CASSANDRA-5345 URL: https://issues.apache.org/jira/browse/CASSANDRA-5345 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Environment: JVM:JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_30 typical 6 node 2 availability zone Mutli DC cluster on linux vms with and mx4j-tools.jar and jna.jar both on path. Default configuration bar token setup(equispaced), sensible cassandra-topology.properties file and use of said snitch. Reporter: Matt Byrd Assignee: Ryan McGuire Priority: Trivial I am not certain this is definitely a bug, but I thought it might be worth posting to see if someone with more JVM//JMX knowledge could disprove my reasoning. Apologies if I've failed to understand something. We've seen an intermittent problem where there is an uncaught exception in the scheduled task of logging gc results in GcInspector.java: {code} ... ERROR [ScheduledTasks:1] 2013-03-08 01:09:06,335 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ScheduledTasks:1,5,main] java.lang.reflect.UndeclaredThrowableException at $Proxy0.getName(Unknown Source) at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:95) at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: javax.management.InstanceNotFoundException: java.lang:name=ParNew,type=GarbageCollector at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:662) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:106) at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:148) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:248) ... 13 more ... {code} I think the problem, may be caused by the following reasoning: In GcInspector we populate a list of mxbeans when the GcInspector instance is instantiated: {code} ... ListGarbageCollectorMXBean beans = new ArrayListGarbageCollectorMXBean(); MBeanServer server = ManagementFactory.getPlatformMBeanServer(); try { ObjectName gcName = new ObjectName(ManagementFactory.GARBAGE_COLLECTOR_MXBEAN_DOMAIN_TYPE + ,*); for (ObjectName name : server.queryNames(gcName, null)) { GarbageCollectorMXBean gc = ManagementFactory.newPlatformMXBeanProxy(server, name.getCanonicalName(), GarbageCollectorMXBean.class); beans.add(gc); } } catch (Exception e) { throw new RuntimeException(e); } ... {code} Cassandra then periodically calls: {code} ... private void logGCResults() { for (GarbageCollectorMXBean gc : beans) { Long previousTotal = gctimes.get(gc.getName()); ... {code} In the oracle javadocs, they seem to suggest that these beans could disappear at any time.(I'm not sure why when or how this might happen) http://docs.oracle.com/javase/6/docs/api/ See: getGarbageCollectorMXBeans {code} ... public static ListGarbageCollectorMXBean getGarbageCollectorMXBeans() Returns a list of GarbageCollectorMXBean objects in the Java virtual machine. The Java virtual machine may have one or more
[jira] [Updated] (CASSANDRA-5345) Potential problem with GarbageCollectorMXBean
[ https://issues.apache.org/jira/browse/CASSANDRA-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-5345: - Reproduced In: (was: 1.0.7) Since Version: 1.0.7 Potential problem with GarbageCollectorMXBean - Key: CASSANDRA-5345 URL: https://issues.apache.org/jira/browse/CASSANDRA-5345 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Environment: JVM:JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_30 typical 6 node 2 availability zone Mutli DC cluster on linux vms with and mx4j-tools.jar and jna.jar both on path. Default configuration bar token setup(equispaced), sensible cassandra-topology.properties file and use of said snitch. Reporter: Matt Byrd Assignee: Ryan McGuire Priority: Trivial I am not certain this is definitely a bug, but I thought it might be worth posting to see if someone with more JVM//JMX knowledge could disprove my reasoning. Apologies if I've failed to understand something. We've seen an intermittent problem where there is an uncaught exception in the scheduled task of logging gc results in GcInspector.java: {code} ... ERROR [ScheduledTasks:1] 2013-03-08 01:09:06,335 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ScheduledTasks:1,5,main] java.lang.reflect.UndeclaredThrowableException at $Proxy0.getName(Unknown Source) at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:95) at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: javax.management.InstanceNotFoundException: java.lang:name=ParNew,type=GarbageCollector at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:662) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:106) at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:148) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:248) ... 13 more ... {code} I think the problem, may be caused by the following reasoning: In GcInspector we populate a list of mxbeans when the GcInspector instance is instantiated: {code} ... ListGarbageCollectorMXBean beans = new ArrayListGarbageCollectorMXBean(); MBeanServer server = ManagementFactory.getPlatformMBeanServer(); try { ObjectName gcName = new ObjectName(ManagementFactory.GARBAGE_COLLECTOR_MXBEAN_DOMAIN_TYPE + ,*); for (ObjectName name : server.queryNames(gcName, null)) { GarbageCollectorMXBean gc = ManagementFactory.newPlatformMXBeanProxy(server, name.getCanonicalName(), GarbageCollectorMXBean.class); beans.add(gc); } } catch (Exception e) { throw new RuntimeException(e); } ... {code} Cassandra then periodically calls: {code} ... private void logGCResults() { for (GarbageCollectorMXBean gc : beans) { Long previousTotal = gctimes.get(gc.getName()); ... {code} In the oracle javadocs, they seem to suggest that these beans could disappear at any time.(I'm not sure why when or how this might happen) http://docs.oracle.com/javase/6/docs/api/ See: getGarbageCollectorMXBeans {code} ... public static ListGarbageCollectorMXBean getGarbageCollectorMXBeans() Returns a list of GarbageCollectorMXBean objects in the Java virtual machine. The Java virtual machine may
[jira] [Updated] (CASSANDRA-5345) Potential problem with GarbageCollectorMXBean
[ https://issues.apache.org/jira/browse/CASSANDRA-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Byrd updated CASSANDRA-5345: - Priority: Major (was: Trivial) Potential problem with GarbageCollectorMXBean - Key: CASSANDRA-5345 URL: https://issues.apache.org/jira/browse/CASSANDRA-5345 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Environment: JVM:JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_30 typical 6 node 2 availability zone Mutli DC cluster on linux vms with and mx4j-tools.jar and jna.jar both on path. Default configuration bar token setup(equispaced), sensible cassandra-topology.properties file and use of said snitch. Reporter: Matt Byrd Assignee: Ryan McGuire I am not certain this is definitely a bug, but I thought it might be worth posting to see if someone with more JVM//JMX knowledge could disprove my reasoning. Apologies if I've failed to understand something. We've seen an intermittent problem where there is an uncaught exception in the scheduled task of logging gc results in GcInspector.java: {code} ... ERROR [ScheduledTasks:1] 2013-03-08 01:09:06,335 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ScheduledTasks:1,5,main] java.lang.reflect.UndeclaredThrowableException at $Proxy0.getName(Unknown Source) at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:95) at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: javax.management.InstanceNotFoundException: java.lang:name=ParNew,type=GarbageCollector at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:662) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:106) at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:148) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:248) ... 13 more ... {code} I think the problem, may be caused by the following reasoning: In GcInspector we populate a list of mxbeans when the GcInspector instance is instantiated: {code} ... ListGarbageCollectorMXBean beans = new ArrayListGarbageCollectorMXBean(); MBeanServer server = ManagementFactory.getPlatformMBeanServer(); try { ObjectName gcName = new ObjectName(ManagementFactory.GARBAGE_COLLECTOR_MXBEAN_DOMAIN_TYPE + ,*); for (ObjectName name : server.queryNames(gcName, null)) { GarbageCollectorMXBean gc = ManagementFactory.newPlatformMXBeanProxy(server, name.getCanonicalName(), GarbageCollectorMXBean.class); beans.add(gc); } } catch (Exception e) { throw new RuntimeException(e); } ... {code} Cassandra then periodically calls: {code} ... private void logGCResults() { for (GarbageCollectorMXBean gc : beans) { Long previousTotal = gctimes.get(gc.getName()); ... {code} In the oracle javadocs, they seem to suggest that these beans could disappear at any time.(I'm not sure why when or how this might happen) http://docs.oracle.com/javase/6/docs/api/ See: getGarbageCollectorMXBeans {code} ... public static ListGarbageCollectorMXBean getGarbageCollectorMXBeans() Returns a list of GarbageCollectorMXBean objects in the Java virtual machine. The Java virtual machine may have one or more GarbageCollectorMXBean objects. It
[jira] [Commented] (CASSANDRA-5345) Potential problem with GarbageCollectorMXBean
[ https://issues.apache.org/jira/browse/CASSANDRA-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946228#comment-13946228 ] Matt Byrd commented on CASSANDRA-5345: -- The cluster in question was running on 1.0.7, however the code in question has remained static since well before that and doesn't look to have changed since. (though admittedly the problem could somehow be being caused elsewhere, jvm maybe?) I've upped the priority to major. Have you been able to reproduce? or seen the problem anywhere else? Any further details about your environment is set up and how you deploy may also help those trying to reproduce. Some common but perhaps co-incidental things about the two occurrences: 1. virtual machines (though not both AWS) 2. multi D.C , wouldn't have though this would be relevant but Arya does seem to see the problem after removing a d.c. 3. Slightly old Jvm versions... I no longer have access to the cluster where we saw this previously but let me know if I can help in any other way. Potential problem with GarbageCollectorMXBean - Key: CASSANDRA-5345 URL: https://issues.apache.org/jira/browse/CASSANDRA-5345 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Environment: JVM:JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_30 typical 6 node 2 availability zone Mutli DC cluster on linux vms with and mx4j-tools.jar and jna.jar both on path. Default configuration bar token setup(equispaced), sensible cassandra-topology.properties file and use of said snitch. Reporter: Matt Byrd Assignee: Ryan McGuire I am not certain this is definitely a bug, but I thought it might be worth posting to see if someone with more JVM//JMX knowledge could disprove my reasoning. Apologies if I've failed to understand something. We've seen an intermittent problem where there is an uncaught exception in the scheduled task of logging gc results in GcInspector.java: {code} ... ERROR [ScheduledTasks:1] 2013-03-08 01:09:06,335 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ScheduledTasks:1,5,main] java.lang.reflect.UndeclaredThrowableException at $Proxy0.getName(Unknown Source) at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:95) at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: javax.management.InstanceNotFoundException: java.lang:name=ParNew,type=GarbageCollector at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:662) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:106) at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:148) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:248) ... 13 more ... {code} I think the problem, may be caused by the following reasoning: In GcInspector we populate a list of mxbeans when the GcInspector instance is instantiated: {code} ... ListGarbageCollectorMXBean beans = new ArrayListGarbageCollectorMXBean(); MBeanServer server = ManagementFactory.getPlatformMBeanServer(); try { ObjectName gcName = new ObjectName(ManagementFactory.GARBAGE_COLLECTOR_MXBEAN_DOMAIN_TYPE + ,*); for (ObjectName name : server.queryNames(gcName, null)) { GarbageCollectorMXBean gc = ManagementFactory.newPlatformMXBeanProxy(server, name.getCanonicalName(),
[jira] [Created] (CASSANDRA-6797) compaction and scrub data directories race on startup
Matt Byrd created CASSANDRA-6797: Summary: compaction and scrub data directories race on startup Key: CASSANDRA-6797 URL: https://issues.apache.org/jira/browse/CASSANDRA-6797 Project: Cassandra Issue Type: Bug Components: Core Environment: macos (and linux) Reporter: Matt Byrd Priority: Minor Hi, On doing a rolling restarting of a 2.0.5 cluster in several environments I'm seeing the following error: {code} INFO [CompactionExecutor:1] 2014-03-03 17:11:07,549 CompactionTask.java (line 115) Compacting [SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compactio n_race/node1/data/system/local/system-local-jb-15-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-16-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/syst em-local-jb-14-Data.db')] INFO [CompactionExecutor:1] 2014-03-03 17:11:07,557 ColumnFamilyStore.java (line 254) Initializing system_traces.sessions INFO [CompactionExecutor:1] 2014-03-03 17:11:07,560 ColumnFamilyStore.java (line 254) Initializing system_traces.events WARN [main] 2014-03-03 17:11:07,608 ColumnFamilyStore.java (line 473) Removing orphans for /Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13: [CompressionInfo.db, Filter.db, Index.db, TOC.txt, Summary.db, Data.db, Statistics. db] ERROR [main] 2014-03-03 17:11:07,609 CassandraDaemon.java (line 479) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file system-local-jb-13-CompressionInfo.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:111) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:106) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:476) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:462) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:552) INFO [CompactionExecutor:1] 2014-03-03 17:11:07,612 CompactionTask.java (line 275) Compacted 4 sstables to [/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-17,]. 10,963 bytes to 5,572 (~50% of original) in 57ms = 0.093226MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, } {code} Seems like a potential race, since compactions are occurring whilst the existing data directories are being scrubbed. Probably an in progress compaction looks like an incomplete one and results in it being attempted to be scrubbed whilst in progress. On the attempt to delete in the scrubDataDirectories we discover that it no longer exists, presumably because it has now been compacted away. This then causes an assertion error and the node fails to start up. Here is a ccm script which just stops and starts a 3 node 2.0.5 cluster repeatedly. It seems to fairly reliably reproduce the problem, in less than ten iterations: {code} #!/bin/bash ccm create compaction_race -v 2.0.5 ccm populate -n 3 ccm start for i in $(seq 0 1000); do echo $i; ccm stop ccm start grep ERR ~/.ccm/compaction_race/*/logs/system.log; done {code} Someone else should probably confirm that this is what is going wrong, however if it is, the solution might be as simple as to disable autocompactions slightly earlier in CassandraDaemon.setup. Or alternatively if there isn't a good reason why we are first scrubbing the system tables and then scrubbing all keyspaces (including the system keyspace), you could perhaps just scrub solely the non system keyspaces on the second scrub. Please let me know if there is anything else I can provide. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6797) compaction and scrub data directories race on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918938#comment-13918938 ] Matt Byrd commented on CASSANDRA-6797: -- I think this may be the same or a similar issue, but since the repro is more complicated and the environment windows, I thought I'd file this ticket also. compaction and scrub data directories race on startup - Key: CASSANDRA-6797 URL: https://issues.apache.org/jira/browse/CASSANDRA-6797 Project: Cassandra Issue Type: Bug Components: Core Environment: macos (and linux) Reporter: Matt Byrd Priority: Minor Labels: compaction, concurrency, starting Hi, On doing a rolling restarting of a 2.0.5 cluster in several environments I'm seeing the following error: {code} INFO [CompactionExecutor:1] 2014-03-03 17:11:07,549 CompactionTask.java (line 115) Compacting [SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compactio n_race/node1/data/system/local/system-local-jb-15-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-16-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/syst em-local-jb-14-Data.db')] INFO [CompactionExecutor:1] 2014-03-03 17:11:07,557 ColumnFamilyStore.java (line 254) Initializing system_traces.sessions INFO [CompactionExecutor:1] 2014-03-03 17:11:07,560 ColumnFamilyStore.java (line 254) Initializing system_traces.events WARN [main] 2014-03-03 17:11:07,608 ColumnFamilyStore.java (line 473) Removing orphans for /Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13: [CompressionInfo.db, Filter.db, Index.db, TOC.txt, Summary.db, Data.db, Statistics. db] ERROR [main] 2014-03-03 17:11:07,609 CassandraDaemon.java (line 479) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file system-local-jb-13-CompressionInfo.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:111) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:106) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:476) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:462) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:552) INFO [CompactionExecutor:1] 2014-03-03 17:11:07,612 CompactionTask.java (line 275) Compacted 4 sstables to [/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-17,]. 10,963 bytes to 5,572 (~50% of original) in 57ms = 0.093226MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, } {code} Seems like a potential race, since compactions are occurring whilst the existing data directories are being scrubbed. Probably an in progress compaction looks like an incomplete one and results in it being attempted to be scrubbed whilst in progress. On the attempt to delete in the scrubDataDirectories we discover that it no longer exists, presumably because it has now been compacted away. This then causes an assertion error and the node fails to start up. Here is a ccm script which just stops and starts a 3 node 2.0.5 cluster repeatedly. It seems to fairly reliably reproduce the problem, in less than ten iterations: {code} #!/bin/bash ccm create compaction_race -v 2.0.5 ccm populate -n 3 ccm start for i in $(seq 0 1000); do echo $i; ccm stop ccm start grep ERR ~/.ccm/compaction_race/*/logs/system.log; done {code} Someone else should probably confirm that this is what is going wrong, however if it is, the solution might be as simple as to disable autocompactions slightly earlier in CassandraDaemon.setup. Or alternatively if there isn't a good reason why we are first scrubbing the system tables and then scrubbing all keyspaces (including the system keyspace), you could perhaps just scrub solely the non system keyspaces on the second scrub. Please let me know if there is anything else I can provide. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6797) compaction and scrub data directories race on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918938#comment-13918938 ] Matt Byrd edited comment on CASSANDRA-6797 at 3/4/14 5:18 AM: -- I think CASSANDRA-6795 may be the same or a similar issue, but since the reproduction is more complicated and the environment is windows, I thought I'd file this ticket also. was (Author: mbyrd): I think this may be the same or a similar issue, but since the repro is more complicated and the environment windows, I thought I'd file this ticket also. compaction and scrub data directories race on startup - Key: CASSANDRA-6797 URL: https://issues.apache.org/jira/browse/CASSANDRA-6797 Project: Cassandra Issue Type: Bug Components: Core Environment: macos (and linux) Reporter: Matt Byrd Priority: Minor Labels: compaction, concurrency, starting Hi, On doing a rolling restarting of a 2.0.5 cluster in several environments I'm seeing the following error: {code} INFO [CompactionExecutor:1] 2014-03-03 17:11:07,549 CompactionTask.java (line 115) Compacting [SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compactio n_race/node1/data/system/local/system-local-jb-15-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-16-Data.db'), SSTableReader(path='/Users/Matthew/.ccm/compaction_race/node1/data/system/local/syst em-local-jb-14-Data.db')] INFO [CompactionExecutor:1] 2014-03-03 17:11:07,557 ColumnFamilyStore.java (line 254) Initializing system_traces.sessions INFO [CompactionExecutor:1] 2014-03-03 17:11:07,560 ColumnFamilyStore.java (line 254) Initializing system_traces.events WARN [main] 2014-03-03 17:11:07,608 ColumnFamilyStore.java (line 473) Removing orphans for /Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-13: [CompressionInfo.db, Filter.db, Index.db, TOC.txt, Summary.db, Data.db, Statistics. db] ERROR [main] 2014-03-03 17:11:07,609 CassandraDaemon.java (line 479) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file system-local-jb-13-CompressionInfo.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:111) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:106) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:476) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:462) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:552) INFO [CompactionExecutor:1] 2014-03-03 17:11:07,612 CompactionTask.java (line 275) Compacted 4 sstables to [/Users/Matthew/.ccm/compaction_race/node1/data/system/local/system-local-jb-17,]. 10,963 bytes to 5,572 (~50% of original) in 57ms = 0.093226MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, } {code} Seems like a potential race, since compactions are occurring whilst the existing data directories are being scrubbed. Probably an in progress compaction looks like an incomplete one and results in it being attempted to be scrubbed whilst in progress. On the attempt to delete in the scrubDataDirectories we discover that it no longer exists, presumably because it has now been compacted away. This then causes an assertion error and the node fails to start up. Here is a ccm script which just stops and starts a 3 node 2.0.5 cluster repeatedly. It seems to fairly reliably reproduce the problem, in less than ten iterations: {code} #!/bin/bash ccm create compaction_race -v 2.0.5 ccm populate -n 3 ccm start for i in $(seq 0 1000); do echo $i; ccm stop ccm start grep ERR ~/.ccm/compaction_race/*/logs/system.log; done {code} Someone else should probably confirm that this is what is going wrong, however if it is, the solution might be as simple as to disable autocompactions slightly earlier in CassandraDaemon.setup. Or alternatively if there isn't a good reason why we are first scrubbing the system tables and then scrubbing all keyspaces (including the system keyspace), you could perhaps just scrub solely the non system keyspaces on the second scrub. Please let me know if there is anything else I can provide. Thanks, Matt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-5345) Potential problem with GarbageCollectorMXBean
Matt Byrd created CASSANDRA-5345: Summary: Potential problem with GarbageCollectorMXBean Key: CASSANDRA-5345 URL: https://issues.apache.org/jira/browse/CASSANDRA-5345 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Environment: JVM:JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_30 typical 6 node 2 availability zone Mutli DC cluster on linux vms with and mx4j-tools.jar and jna.jar both on path. Default configuration bar token setup(equispaced), sensible cassandra-topology.properties file and use of said snitch. Reporter: Matt Byrd Priority: Trivial I am not certain this is definitely a bug, but I thought it might be worth posting to see if someone with more JVM//JMX knowledge could disprove my reasoning. Apologies if I've failed to understand something. We've seen an intermittent problem where there is an uncaught exception in the scheduled task of logging gc results in GcInspector.java: {code} ... ERROR [ScheduledTasks:1] 2013-03-08 01:09:06,335 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ScheduledTasks:1,5,main] java.lang.reflect.UndeclaredThrowableException at $Proxy0.getName(Unknown Source) at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:95) at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: javax.management.InstanceNotFoundException: java.lang:name=ParNew,type=GarbageCollector at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:662) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:106) at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:148) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:248) ... 13 more ... {code} I think the problem, may be caused by the following reasoning: In GcInspector we populate a list of mxbeans when the GcInspector instance is instantiated: {code} ... ListGarbageCollectorMXBean beans = new ArrayListGarbageCollectorMXBean(); MBeanServer server = ManagementFactory.getPlatformMBeanServer(); try { ObjectName gcName = new ObjectName(ManagementFactory.GARBAGE_COLLECTOR_MXBEAN_DOMAIN_TYPE + ,*); for (ObjectName name : server.queryNames(gcName, null)) { GarbageCollectorMXBean gc = ManagementFactory.newPlatformMXBeanProxy(server, name.getCanonicalName(), GarbageCollectorMXBean.class); beans.add(gc); } } catch (Exception e) { throw new RuntimeException(e); } ... {code} Cassandra then periodically calls: {code} ... private void logGCResults() { for (GarbageCollectorMXBean gc : beans) { Long previousTotal = gctimes.get(gc.getName()); ... {code} In the oracle javadocs, they seem to suggest that these beans could disappear at any time.(I'm not sure why when or how this might happen) http://docs.oracle.com/javase/6/docs/api/ See: getGarbageCollectorMXBeans {code} ... public static ListGarbageCollectorMXBean getGarbageCollectorMXBeans() Returns a list of GarbageCollectorMXBean objects in the Java virtual machine. The Java virtual machine may have one or more GarbageCollectorMXBean objects. It may add or remove GarbageCollectorMXBean during execution. Returns: a list of GarbageCollectorMXBean objects. ... {code} Correct me if I'm wrong, but do you think this might be causing the problem? That somehow the JVM decides to remove the GarbageCollectorMXBean temporarily or