[jira] Commented: (CASSANDRA-1609) Cluster restart re-adds removed tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920426#action_12920426 ] Nick Bailey commented on CASSANDRA-1609: * handleStateLeft wasn't actually removing the endpoint from gossip before. Was that an oversight or intended? * you removed a check from handleStateLeft to make sure the token was a member before removing it. Intentional? Cluster restart re-adds removed tokens -- Key: CASSANDRA-1609 URL: https://issues.apache.org/jira/browse/CASSANDRA-1609 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Jonathan Ellis Fix For: 0.7.0 Attachments: 1609.txt After a cluster restart one of our nodes began reporting tokens that had been removed a good while ago (week or more) in it's nodetool ring output. This probably has something to do with our change to persist the ring in CASSANDRA-1518 and removetoken changes in CASSANDRA-1216. The node didn't actually gossip the removed tokens so they showed up in TMD but not gossip. Additionally all nodes began reporting a node that had been removed maybe an hour ago. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1605) RemoveToken waits for dead nodes
RemoveToken waits for dead nodes Key: CASSANDRA-1605 URL: https://issues.apache.org/jira/browse/CASSANDRA-1605 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Fix For: 0.7.0 RemoveToken will wait for replication confirmation from nodes that are down. It should only wait for live nodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1605) RemoveToken waits for dead nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1605: --- Attachment: 0001-Use-failure-detector-when-detecting-new-nodes.patch Updated to use the failure detector when determining who to wait for. RemoveToken waits for dead nodes Key: CASSANDRA-1605 URL: https://issues.apache.org/jira/browse/CASSANDRA-1605 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Use-failure-detector-when-detecting-new-nodes.patch RemoveToken will wait for replication confirmation from nodes that are down. It should only wait for live nodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1586) Explicitly expose ongoing per-node compaction tasks and est. % done
[ https://issues.apache.org/jira/browse/CASSANDRA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918706#action_12918706 ] Nick Bailey commented on CASSANDRA-1586: Isn't this the same as CASSANDRA-1516 Explicitly expose ongoing per-node compaction tasks and est. % done --- Key: CASSANDRA-1586 URL: https://issues.apache.org/jira/browse/CASSANDRA-1586 Project: Cassandra Issue Type: Wish Components: Core Reporter: paul cannon Priority: Minor CompactionManagerMBean exports the number of bytes compacted and the total number to be compacted during a given job, but I don't believe it says what sort of compaction it's doing (anti-compaction, validation compaction, index build, or an SSTable build after receiving an incoming streamed file). Is it possible to have multiple threads going in the compaction manager? (It doesn't look like it, but I don't see any explicit maximumPoolSize setting.) If so, cassandra should also expose all ongoing compaction tasks and progress for each individually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1589) Explicitly expose whether a node is bootstrapping
[ https://issues.apache.org/jira/browse/CASSANDRA-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918719#action_12918719 ] Nick Bailey commented on CASSANDRA-1589: Bootstrapping nodes go into bootstrapping mode. I believe if you call nodetool streams it outputs the mode which will say if it is bootstrapping or not. There might be another command that specifies that as well. Can you see if thats what you want. Explicitly expose whether a node is bootstrapping - Key: CASSANDRA-1589 URL: https://issues.apache.org/jira/browse/CASSANDRA-1589 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: paul cannon Priority: Minor This is mostly addressed by CASSANDRA-1489, but is there some way to tell whether a node is currently bootstrapping or not? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1573) StreamOut fails to start an empty stream
[ https://issues.apache.org/jira/browse/CASSANDRA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1573: --- Priority: Blocker (was: Major) StreamOut fails to start an empty stream Key: CASSANDRA-1573 URL: https://issues.apache.org/jira/browse/CASSANDRA-1573 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Blocker Fix For: 0.7.0 Attachments: 0001-Remove-empty-list-check-from-StreamOut.patch StreamOut only starts a stream if there are actually files to transfer. This means callbacks will never get called for streams that don't actually have anything to transfer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1573) StreamOut fails to start an empty stream
[ https://issues.apache.org/jira/browse/CASSANDRA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1573: --- Attachment: 0001-Remove-empty-list-check-from-StreamOut.patch There is an empty check in begin() anyway so no need for the check in StreamOut. StreamOut fails to start an empty stream Key: CASSANDRA-1573 URL: https://issues.apache.org/jira/browse/CASSANDRA-1573 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Remove-empty-list-check-from-StreamOut.patch StreamOut only starts a stream if there are actually files to transfer. This means callbacks will never get called for streams that don't actually have anything to transfer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1574) Bootstrapping is broken
Bootstrapping is broken --- Key: CASSANDRA-1574 URL: https://issues.apache.org/jira/browse/CASSANDRA-1574 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Nick Bailey Fix For: 0.7.0 Bootstrap doesn't block for streaming requests which means nodetool move isn't blocking. More importantly bootstrap fails to call finishBootstrapping if no stream requests are ever made. This means its impossible to perform moves if you have no keyspaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1574) Bootstrapping is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1574: --- Attachment: 0001-Updated-bootstrapping-to-block.patch Updated to block with a latch. If nothing is going to be streamed waiting on the latch will return immediately and finish the bootstrap. Bootstrapping is broken --- Key: CASSANDRA-1574 URL: https://issues.apache.org/jira/browse/CASSANDRA-1574 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Updated-bootstrapping-to-block.patch Bootstrap doesn't block for streaming requests which means nodetool move isn't blocking. More importantly bootstrap fails to call finishBootstrapping if no stream requests are ever made. This means its impossible to perform moves if you have no keyspaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1574) Bootstrapping is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1574: --- Priority: Blocker (was: Major) Assignee: Jonathan Ellis (was: Nick Bailey) Bootstrapping is broken --- Key: CASSANDRA-1574 URL: https://issues.apache.org/jira/browse/CASSANDRA-1574 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Jonathan Ellis Priority: Blocker Fix For: 0.7.0 Attachments: 0001-Updated-bootstrapping-to-block.patch Bootstrap doesn't block for streaming requests which means nodetool move isn't blocking. More importantly bootstrap fails to call finishBootstrapping if no stream requests are ever made. This means its impossible to perform moves if you have no keyspaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1573) StreamOut fails to start an empty stream
[ https://issues.apache.org/jira/browse/CASSANDRA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1573: --- Assignee: Jonathan Ellis (was: Nick Bailey) StreamOut fails to start an empty stream Key: CASSANDRA-1573 URL: https://issues.apache.org/jira/browse/CASSANDRA-1573 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Jonathan Ellis Priority: Blocker Fix For: 0.7.0 Attachments: 0001-Remove-empty-list-check-from-StreamOut.patch StreamOut only starts a stream if there are actually files to transfer. This means callbacks will never get called for streams that don't actually have anything to transfer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1573) StreamOut fails to start an empty stream
[ https://issues.apache.org/jira/browse/CASSANDRA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917792#action_12917792 ] Nick Bailey commented on CASSANDRA-1573: Looks good to me. StreamOut fails to start an empty stream Key: CASSANDRA-1573 URL: https://issues.apache.org/jira/browse/CASSANDRA-1573 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Jonathan Ellis Priority: Blocker Fix For: 0.7.0 Attachments: 0001-Remove-empty-list-check-from-StreamOut.patch, 1573-v2.txt StreamOut only starts a stream if there are actually files to transfer. This means callbacks will never get called for streams that don't actually have anything to transfer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1574) Bootstrapping is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917791#action_12917791 ] Nick Bailey commented on CASSANDRA-1574: You have a comment in StorageService that spells finished as finishec. Besides that looks good to me. Bootstrapping is broken --- Key: CASSANDRA-1574 URL: https://issues.apache.org/jira/browse/CASSANDRA-1574 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 2 Reporter: Nick Bailey Assignee: Jonathan Ellis Priority: Blocker Fix For: 0.7.0 Attachments: 0001-Updated-bootstrapping-to-block.patch, 1574-v2.txt Bootstrap doesn't block for streaming requests which means nodetool move isn't blocking. More importantly bootstrap fails to call finishBootstrapping if no stream requests are ever made. This means its impossible to perform moves if you have no keyspaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1516) Explicitly expose ongoing per-node tasks (like compactions, repair, etc) and est. % done
[ https://issues.apache.org/jira/browse/CASSANDRA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915414#action_12915414 ] Nick Bailey commented on CASSANDRA-1516: For compaction, it seems like it would be most useful to add a 'compaction id'. The current progress calls just indicate the progress of the current column family being compacted. It would be useful to have each compaction triggered by the same call have the same id. That would allow us to tell the difference between the compactions triggered by a forceCompaction call vs. a repair call. Explicitly expose ongoing per-node tasks (like compactions, repair, etc) and est. % done Key: CASSANDRA-1516 URL: https://issues.apache.org/jira/browse/CASSANDRA-1516 Project: Cassandra Issue Type: Wish Components: Core Reporter: paul cannon Priority: Minor Fix For: 0.7.0 Attachments: 0001-Mbean-for-tracking-progress-on-a-pending-drain.patch Original Estimate: 5h Remaining Estimate: 5h so major compaction, read-only compaction, repair, drain, and bootstrap are the on-node tasks I'm pretty sure make sense to be exposed in this way. snapshotting might make sense too. some of these can be detected using heuristics on the number of tasks alive in different stages, but it would be preferable if it could be made more clear. the interface should allow multiple tasks and respective percentages to be ongoing at the same time, especially since some tasks (like repair) implicitly kick off other sub-tasks (compaction) and both of them would normally show up at the same time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1516) Explicitly expose ongoing per-node tasks (like compactions, repair, etc) and est. % done
[ https://issues.apache.org/jira/browse/CASSANDRA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914187#action_12914187 ] Nick Bailey commented on CASSANDRA-1516: bq. drain we can address by adding progress to CFS.flushWriter and a drained N out of M method to StorageService By adding progress do you just mean number of threads queued+active vs number of threads scheduled? Is that enough insight or do we need to include amount of data converted from memtables to sstables? The problem with number of threads approach is we use a fixed size blocking queue. Once that fills up calls to submit threads just block so we don't have an accurate view of what is left to execute. Explicitly expose ongoing per-node tasks (like compactions, repair, etc) and est. % done Key: CASSANDRA-1516 URL: https://issues.apache.org/jira/browse/CASSANDRA-1516 Project: Cassandra Issue Type: Wish Components: Core Reporter: paul cannon Priority: Minor Fix For: 0.7.0 Original Estimate: 5h Remaining Estimate: 5h so major compaction, read-only compaction, repair, drain, and bootstrap are the on-node tasks I'm pretty sure make sense to be exposed in this way. snapshotting might make sense too. some of these can be detected using heuristics on the number of tasks alive in different stages, but it would be preferable if it could be made more clear. the interface should allow multiple tasks and respective percentages to be ongoing at the same time, especially since some tasks (like repair) implicitly kick off other sub-tasks (compaction) and both of them would normally show up at the same time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0001-Add-callbacks-to-streaming.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0002-Modify-removeToken-to-be-similar-to-decommission.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0003-Fixes-to-old-tests.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0001-Modify-removeToken-to-be-similar-to-decommission.patch 0002-Additional-tests-for-removeToken.patch Patches: * 0001 ** Modifies the removeToken operation to follow a pattern of NORMAL-REMOVING-LEFT, rather than the current pattern of a coordinator node setting its own status to a special cased version of NORMAL. ** Fixes a small bug in StreamHeader serialization ** Adds the ability to either get the status of a remove operation taking place or force a remove operation to finish immediately * 0002 ** Tests for removing tokens ** Move shared code for creating a ring to Util class Removal Process: * Normal Case *# Coordinator sets status of failed node to REMOVING *# Coordinator blocks on confirmation from other nodes *# Any newly responsible nodes stream data *# Newly responsible nodes send confirmation once all data has streamed *# Coordinator updates status of failed node to LEFT *# Done * Failure Cases ** Coordinator failure *** If the coordinator fails the remove operation will need to be retried *** This can be done on any node in the cluster. ** Newly responsible node failure *** If a newly responsible node fails but comes back up, it should see the REMOVING status in gossip and restart the operation *** If a newly responsible node fails permanently or a streaming operation fails and the node stays up, the coordinator will block forever while waiting for confirmation. The best solution is to force the remove operation to complete and then run repair on the failed node. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1449) consistent nodetool blocking behavior
[ https://issues.apache.org/jira/browse/CASSANDRA-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912758#action_12912758 ] Nick Bailey commented on CASSANDRA-1449: This patch also doesn't cover removeToken. The current implementation only blocks until the current node has replicated any data it needs. I can add blocking behavior in CASSANDRA-1216 although I had planned on making it non-blocking. If it is blocking there is a chance something fails and it blocks forever. consistent nodetool blocking behavior - Key: CASSANDRA-1449 URL: https://issues.apache.org/jira/browse/CASSANDRA-1449 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Brandon Williams Assignee: Nirmal Ranganathan Priority: Minor Fix For: 0.7.0 Attachments: 0001-Updated-all-non-blocking-Nodetool-commands-to-blocki.patch Some operations in nodetool block. Some don't. We should choose a behavior and enforce it, preferably blocking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1489: --- Attachment: (was: 0001-Progress-added-to-incoming-streams.patch) Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 0.7.0 Reporter: paul cannon Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1489: --- Attachment: (was: 0002-Progess-added-to-outgoing-streams.patch) Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 0.7.0 Reporter: paul cannon Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1489: --- Attachment: (was: 0003-Moved-FileStreamTask-to-streaming.patch) Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 0.7.0 Reporter: paul cannon Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: (was: 0001-Add-callbacks-to-streaming.patch) Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: (was: 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch) Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: 0001-Add-callbacks-to-streaming.patch 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch Decommission uses the callback for StreamOut. Updated with the change to remove bootstrap sources from StorageService Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: (was: 0001-Add-callbacks-to-streaming.patch) Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Attachments: 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: (was: 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch) Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1506) combine initiated and requested streaming paths
[ https://issues.apache.org/jira/browse/CASSANDRA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909899#action_12909899 ] Nick Bailey commented on CASSANDRA-1506: The comment for the FileStatus constructor is innacurate now. Besides that the changes look good to me. combine initiated and requested streaming paths - Key: CASSANDRA-1506 URL: https://issues.apache.org/jira/browse/CASSANDRA-1506 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7 beta 2 Attachments: 1506.txt make the source the only one responsible for handing out files, proceeding to the next as they are acked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: 0001-Update-callbacks-for-streaming.patch 0002-Modify-bootstrap-to-use-streaming-callbacks.patch Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-Update-callbacks-for-streaming.patch, 0002-Modify-bootstrap-to-use-streaming-callbacks.patch Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-Update-callbacks-for-streaming.patch, 0002-Modify-bootstrap-to-use-streaming-callbacks.patch Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1504) clean up streaming, part VII
[ https://issues.apache.org/jira/browse/CASSANDRA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909469#action_12909469 ] Nick Bailey commented on CASSANDRA-1504: * streamManagers should probably be renamed to streamSessions in Stream(In/Out)Session * StreamInSession uses a ConcurrentHashMap and StreamOutSession uses a NonBlockingHashMap * Both session objects are pretty similar. Is fix streaming VIII going to be combining them? Should we just do that now? * StreamInSession ** files get added to the activeStreams set by IncomingStreamReader but never removed ** You removed activeStreams from getSources and getIncomingFIles. Personally I think those should be included but if not then there is no reason to keep activeStreams around. clean up streaming, part VII Key: CASSANDRA-1504 URL: https://issues.apache.org/jira/browse/CASSANDRA-1504 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-rename-Manager-Session.txt, 0002-avoid-exposing-StreamContext-outside-the-Session-manag.txt, 0003-replace-StreamContext-with-Pair.txt, 0004-replace-List-Map-with-LinkedHashMap.txt, 0005-move-FileStatusHandler-methods-to-IncomingStreamReader.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1504) clean up streaming, part VII
[ https://issues.apache.org/jira/browse/CASSANDRA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909488#action_12909488 ] Nick Bailey commented on CASSANDRA-1504: +1 LGTM clean up streaming, part VII Key: CASSANDRA-1504 URL: https://issues.apache.org/jira/browse/CASSANDRA-1504 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-rename-Manager-Session.txt, 0002-avoid-exposing-StreamContext-outside-the-Session-manag.txt, 0003-replace-StreamContext-with-Pair.txt, 0004-replace-List-Map-with-LinkedHashMap.txt, 0005-move-FileStatusHandler-methods-to-IncomingStreamReader.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: (was: 0001-Update-callbacks-for-streaming.patch) Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: (was: 0002-Modify-bootstrap-to-use-streaming-callbacks.patch) Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1503) Modify bootstrap to use streaming callbacks
[ https://issues.apache.org/jira/browse/CASSANDRA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1503: --- Attachment: 0001-Add-callbacks-to-streaming.patch 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch Fixed that and rebased to include the changes from CASSANDRA-1504 Modify bootstrap to use streaming callbacks --- Key: CASSANDRA-1503 URL: https://issues.apache.org/jira/browse/CASSANDRA-1503 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-Bootstrap-to-use-streaming-callbacks.patch Bootstrapping uses a weird method of removing bootstrap sources to determine if a bootstrap is finished. This should just use a stream callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reassigned CASSANDRA-1489: -- Assignee: Nick Bailey (was: Nirmal Ranganathan) Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 0.7.0 Reporter: paul cannon Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Attachments: 0001-Progress-added-to-incoming-streams.patch, 0002-Progess-added-to-outgoing-streams.patch, 0003-Moved-FileStreamTask-to-streaming.patch Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908071#action_12908071 ] Nick Bailey commented on CASSANDRA-1489: It doesn't seem like this should be too hard from either the sending or receiving side. Both have a list of pending files they are expecting to send/receive. Granularity can be easily readded to the read/write commands. After each read or write the streamcontext or pending file just needs to be updated and then can be presented in jmx. Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: paul cannon Priority: Minor Fix For: 0.7.0 Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1489: --- Attachment: 0001-Progress-added-to-incoming-streams.patch 0002-Progess-added-to-outgoing-streams.patch 0003-Moved-FileStreamTask-to-streaming.patch I think this pretty much accomplishes what you want. The output is similar to: {noformat} Receiving from: Sending to: 127.0.0.1: /var/folders/Jf/JfRRRNwbE68TqmlnMrMc+3OrLLI/-Tmp-/Keyspace14260283256390104155Standard1/Keyspace1/Standard1-e-0-Data.db/[(0,143), (289,435)] progress=289/289 /var/folders/Jf/JfRRRNwbE68TqmlnMrMc+3OrLLI/-Tmp-/Keyspace14260283256390104155Standard1/Keyspace1/Standard1-e-0-Data.db/[(0,143), (289,435)] progress=289/289 {noformat} Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: paul cannon Priority: Minor Fix For: 0.7.0 Attachments: 0001-Progress-added-to-incoming-streams.patch, 0002-Progess-added-to-outgoing-streams.patch, 0003-Moved-FileStreamTask-to-streaming.patch Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1489) Expose progress made on PendingStreams through JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908203#action_12908203 ] Nick Bailey commented on CASSANDRA-1489: The current patch doesn't break up the sending and receiving any so the progress will jump randomly. We could set a limit on the number of bytes we try to read or write at a time but I'm not sure if there is much overhead with each call. Expose progress made on PendingStreams through JMX -- Key: CASSANDRA-1489 URL: https://issues.apache.org/jira/browse/CASSANDRA-1489 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 0.7.0 Reporter: paul cannon Priority: Minor Fix For: 0.7.0 Attachments: 0001-Progress-added-to-incoming-streams.patch, 0002-Progess-added-to-outgoing-streams.patch, 0003-Moved-FileStreamTask-to-streaming.patch Original Estimate: 2h Remaining Estimate: 2h So I thought originally that the pairs of numbers shown after filenames returned from org.apache.cassandra.service:type=StreamingService's getOutgoingFiles/getIncomingFiles calls were (octetsSent, totalSizeInOctets). Now I know that they are ranges, and that there can be several of them with one file, and that makes sense, but there doesn't seem to be any way to query the progress made on a PendingFile stream. We would very much like to be able to inspect that from the outside, in whatever way makes sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903538#action_12903538 ] Nick Bailey commented on CASSANDRA-1216: After some more thinking I think there are two problems here. * The timeout for waiting on a stream to complete - An arbitrary timeout here is not the right way to do this. What we really need is the concept of stream progress. We should be able to verify that a stream is progressing or not and based on that retry it. CASSANDRA-1438 kind of relates to this problem and could be modified to implement this. * The timeout waiting for nodes to confirm replication - Ideally there could be no timeout here. The problem though is if a node that should be grabbing data goes down permanently, removeToken will wait forever. I think it's reasonable to have some sort of timeout in this case. A log message/error can indicate which machines were being waited on for replication. An administrator should know if that machine went down or is still streaming. That will determine if repair needs to be run. The alternative to this I guess would be periodically waking up and checking that the nodes we are waiting on are still alive. That wouldn't be particularly hard to implement I don't think returning immediately from the call is the right approach. That is part of the reason why this ticket is created. In the case that replication fails somewhere, there is no feedback to the user. At least timing out eventually provides information about which machines we think failed to replicate data. As far as multiple remove calls and the coordinator going down. I think there should be a 'force' option in the case the coordinator goes down and you believe the rest of the nodes completed the operation. To prevent multiple calls to removeToken there should just be a check to make sure the coordinator is dead before another call can be performed. So besides those few changes above, I think we should either implement this part way with a time out for stream replication or postpone completion here until we add the concept of stream progress. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1438) Stream*Manager doesn't clean up broken Streams
Stream*Manager doesn't clean up broken Streams -- Key: CASSANDRA-1438 URL: https://issues.apache.org/jira/browse/CASSANDRA-1438 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Fix For: 0.7 beta 2 StreamInManager and StreamOutManager only remove stream contexts/managers when a stream completes successfully. Any broken streams will cause objects to hang around and never get garbage collected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1364) Consolidate cassandra commands in bin/
[ https://issues.apache.org/jira/browse/CASSANDRA-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902483#action_12902483 ] Nick Bailey commented on CASSANDRA-1364: The problem that led to creating this ticket was actually packaging. Since cassandra gets started as a service the init.d script sets up the environment variables to point to the correct places. The various commands in bin don't have these set up however so they fail to work. The solution was to patch all the commands. I was hoping with this ticket to have a more elegant solution but even if there is just a shared setup script that would make the patch simpler. It seems weird to have a setup script in bin though. The rest of the scripts are actually useful commands and I'm not sure a setup script like that should be installed to /usr/bin. Consolidate cassandra commands in bin/ -- Key: CASSANDRA-1364 URL: https://issues.apache.org/jira/browse/CASSANDRA-1364 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.6.3 Reporter: Nick Bailey Priority: Minor Fix For: 0.7.0 Pretty much every script in bin has the same first 30 lines or so. We need to remove some of the duplication here. This could be accomplished by consolidating some commands into a single script or adding an initializer script they all call. I think I prefer consolidating at least some of the commands. For example the *tool commands could easily be one cassandra-tool command. It may even be possible to incorporate most of them into the cassandra script and have different commands for starting a node or using the tools. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1427) Optimize loadbalance/move for moves within the current range
[ https://issues.apache.org/jira/browse/CASSANDRA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reassigned CASSANDRA-1427: -- Assignee: Nick Bailey Optimize loadbalance/move for moves within the current range Key: CASSANDRA-1427 URL: https://issues.apache.org/jira/browse/CASSANDRA-1427 Project: Cassandra Issue Type: Sub-task Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Fix For: 0.8 Currently our move/loadbalance operations only implement case 2 of the Ruhl algorithm described at https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079. We should add functionality to optimize moves that take/give ranges to a node's direct neighbors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902509#action_12902509 ] Nick Bailey commented on CASSANDRA-1216: Re: timeouts Yes I'm just not sure how to approach determining the right values for these. Depends mostly on the amount of data and network bandwidth. Re: RemoveTest Yeah. The message sink in the test immediately responds to the stream request saying there are no files to stream. This makes the StreamInManager think the data didn't exist remotely. Doing it that way seems much easier than trying to make the test actually stream something. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902523#action_12902523 ] Nick Bailey commented on CASSANDRA-1216: I believe the only consequences of calling removeToken on another node when the coordinator goes down would be that the entire operation would be repeated. So any data that was transferred before would be transferred again. I think this is the right behavior since there is no way of knowing what was transferred before the coordinator went down. It might be useful to add a 'force' option though. If the coordinator goes down and the token gets stuck in a REMOVING state you may want to force removal rather than redoing the entire operation. It should be possible to remove the timeout so that removeToken blocks until the transfer is completely finished. The code for streaming in the remote data blocks until all streams are complete and the code for sending a confirmation to the coordinator will keep retrying until it is received or the coordinator dies. I think this would work if a check was added so that you can only call removeToken a second time if the coordinator is down. It wouldn't handle two calls that occurred before the state made its way through gossip though. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7 beta 1 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1427) Optimize loadbalance/move for moves within the current range
[ https://issues.apache.org/jira/browse/CASSANDRA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902546#action_12902546 ] Nick Bailey commented on CASSANDRA-1427: Currently a move is just decommission then bootstrap. This means that if loadbalance is called, the token the node is going to move to isn't calculated until the node has fully left the ring. It might make sense for this ticket to implement this as a special case for when a token to move to is actually specified. Any loadbalancer implemented in CASSANDRA-1418 will need to calculate the token it plans to move to before the move operation takes place. Optimize loadbalance/move for moves within the current range Key: CASSANDRA-1427 URL: https://issues.apache.org/jira/browse/CASSANDRA-1427 Project: Cassandra Issue Type: Sub-task Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Assignee: Nick Bailey Fix For: 0.8 Currently our move/loadbalance operations only implement case 2 of the Ruhl algorithm described at https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079. We should add functionality to optimize moves that take/give ranges to a node's direct neighbors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-786) RPM Packages
[ https://issues.apache.org/jira/browse/CASSANDRA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-786: -- Attachment: (was: apache-cassandra.spec) RPM Packages Key: CASSANDRA-786 URL: https://issues.apache.org/jira/browse/CASSANDRA-786 Project: Cassandra Issue Type: Improvement Components: Contrib Affects Versions: 0.6.3 Reporter: Daniel Lundin Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Attachments: 768-update-spec-for-trunk.diff, 786-adjust-jars.patch, cassandra.spec, cassandra.spec RPM packages (and debs of course) would be nice,especially now that cassandra is maturing and gaining more interest. Lowering the threshold for getting cassandra running and getting started is also important. I think the RabbitMQ project has an admirable Download and install experience, not to mention the rather cute 2 min guarantee. Definitely a good inspiration. I've been studying Cloudera's Hadoop packages, which are very nice, and really appreciate the separate packages for configuration. This allows easy deployment of node configuration to a cluster. I'll have a spec file for building RHEL5 / CentOS packages ready for review and attached here in a bit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-786) RPM Packages
[ https://issues.apache.org/jira/browse/CASSANDRA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-786: -- Attachment: apache-cassandra.spec The previous spec file would remove the cassandra user and the configuration alternative when upgrading the package. RPM Packages Key: CASSANDRA-786 URL: https://issues.apache.org/jira/browse/CASSANDRA-786 Project: Cassandra Issue Type: Improvement Components: Contrib Affects Versions: 0.6.3 Reporter: Daniel Lundin Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Attachments: 768-update-spec-for-trunk.diff, 786-adjust-jars.patch, apache-cassandra.spec, cassandra.spec, cassandra.spec RPM packages (and debs of course) would be nice,especially now that cassandra is maturing and gaining more interest. Lowering the threshold for getting cassandra running and getting started is also important. I think the RabbitMQ project has an admirable Download and install experience, not to mention the rather cute 2 min guarantee. Definitely a good inspiration. I've been studying Cloudera's Hadoop packages, which are very nice, and really appreciate the separate packages for configuration. This allows easy deployment of node configuration to a cluster. I'll have a spec file for building RHEL5 / CentOS packages ready for review and attached here in a bit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1364) Consolidate cassandra commands in bin/
[ https://issues.apache.org/jira/browse/CASSANDRA-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reassigned CASSANDRA-1364: -- Assignee: (was: Nick Bailey) Consolidate cassandra commands in bin/ -- Key: CASSANDRA-1364 URL: https://issues.apache.org/jira/browse/CASSANDRA-1364 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.6.3 Reporter: Nick Bailey Priority: Minor Fix For: 0.7.0 Pretty much every script in bin has the same first 30 lines or so. We need to remove some of the duplication here. This could be accomplished by consolidating some commands into a single script or adding an initializer script they all call. I think I prefer consolidating at least some of the commands. For example the *tool commands could easily be one cassandra-tool command. It may even be possible to incorporate most of them into the cassandra script and have different commands for starting a node or using the tools. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1427) Optimize loadbalance/move for moves within the current range
Optimize loadbalance/move for moves within the current range Key: CASSANDRA-1427 URL: https://issues.apache.org/jira/browse/CASSANDRA-1427 Project: Cassandra Issue Type: Sub-task Components: Core Affects Versions: 0.7 beta 1 Reporter: Nick Bailey Fix For: 0.8 Currently our move/loadbalance operations only implement case 2 of the Ruhl algorithm described at https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079. We should add functionality to optimize moves that take/give ranges to a node's direct neighbors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0003-Additional-unit-tests-for-removeToken.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0001-Modify-removeToken-to-be-similar-to-decommission.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0002-Fixes-to-old-tests.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0004-Additional-tests-for-removeToken.patch removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0001-Add-callbacks-to-streaming.patch 0002-Modify-removeToken-to-be-similar-to-decommission.patch 0003-Fixes-to-old-tests.patch removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901589#action_12901589 ] Nick Bailey commented on CASSANDRA-1216: bq. (minor nit) I wish there were a way to assert that tmd.getLeavingNodes() actually has nodes in it. This is what tmd.isLeaving() does bq. testStartRemoving should assert preconditions before calling ss.onChange (it also makes the same assertion twice). I'm not sure what preconditions you mean. I added an assertion to make sure there are no endpoints already leaving. bq. SS.removeToken() shouldn't throw a RuntimeException, Do you think the UnsupportedOperationExceptions should be removed as well? These existed previously. I modified the callback support for streaming so that the code should wait for all streams to finish before confirming. I also added a reply to the ReplicationFinishedHandler so the IAsyncResult will be updated. Thoughts? The timeout values for waiting on the latches still need to be updated. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Add-callbacks-to-streaming.patch, 0002-Modify-removeToken-to-be-similar-to-decommission.patch, 0003-Fixes-to-old-tests.patch, 0004-Additional-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1283#action_1283 ] Nick Bailey commented on CASSANDRA-1216: Yeah I wasn't really understanding that streaming/messaging code at all. The current StreamOut implementation has a callback concept however. I think this should be moved into the StreamContext object and then both StreamOut and StreamIn can perform callbacks on actual stream completion. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1366) utf8 error in DEBUG output in CommitLog.java
[ https://issues.apache.org/jira/browse/CASSANDRA-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899170#action_12899170 ] Nick Bailey commented on CASSANDRA-1366: Jeremy's log indicates this is happening when replaying the LocationInfo CF so I think this is a different but similar problem. utf8 error in DEBUG output in CommitLog.java Key: CASSANDRA-1366 URL: https://issues.apache.org/jira/browse/CASSANDRA-1366 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Assignee: Nick Bailey Fix For: 0.7 beta 2 Looks like the bug Johan saw a while back where debug output was throwing a UTF8 error has manifested itself in CommitLog.java on line 279. INFO 18:32:40,951 Replaying /var/lib/cassandra/commitlog/CommitLog-1281058340642.log DEBUG 18:32:40,953 Replaying /var/lib/cassandra/commitlog/CommitLog-1281058340642.log starting at 276 DEBUG 18:32:40,953 Reading mutation at 276 DEBUG 18:32:40,956 replaying mutation for system...@77fe4169: {ColumnFamily(LocationInfo [B:false:1...@1281058340821,])} DEBUG 18:32:40,965 Reading mutation at 424 INFO 18:32:40,966 Finished reading /var/lib/cassandra/commitlog/CommitLog-1281058340642.log ERROR 18:32:40,967 Exception encountered during startup. org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes [-64, -88, 101, 51] at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:43) at org.apache.cassandra.db.Column.getString(Column.java:247) at org.apache.cassandra.db.marshal.AbstractType.getColumnsString(AbstractType.java:85) at org.apache.cassandra.db.ColumnFamily.toString(ColumnFamily.java:379) at org.apache.commons.lang.ObjectUtils.toString(ObjectUtils.java:241) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3073) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3133) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:279) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:174) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) Exception encountered during startup. org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes [-64, -88, 101, 51] at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:43) at org.apache.cassandra.db.Column.getString(Column.java:247) at org.apache.cassandra.db.marshal.AbstractType.getColumnsString(AbstractType.java:85) at org.apache.cassandra.db.ColumnFamily.toString(ColumnFamily.java:379) at org.apache.commons.lang.ObjectUtils.toString(ObjectUtils.java:241) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3073) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3133) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:279) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:174) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0001-Modify-removeToken-to-be-similar-to-decommission.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0001-Modify-removeToken-to-be-similar-to-decommission.patch 0002-Fixes-to-old-tests.patch 0003-Additional-unit-tests-for-removeToken.patch Rebased. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0002-Fixes-to-old-tests.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0003-Additional-unit-tests-for-removeToken.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898442#action_12898442 ] Nick Bailey commented on CASSANDRA-1216: Re-rebased. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0003-Additional-unit-tests-for-removeToken.patch removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 beta 2 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1366) utf8 error in DEBUG output in CommitLog.java
[ https://issues.apache.org/jira/browse/CASSANDRA-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reassigned CASSANDRA-1366: -- Assignee: Nick Bailey utf8 error in DEBUG output in CommitLog.java Key: CASSANDRA-1366 URL: https://issues.apache.org/jira/browse/CASSANDRA-1366 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Assignee: Nick Bailey Fix For: 0.7 beta 1 Looks like the bug Johan saw a while back where debug output was throwing a UTF8 error has manifested itself in CommitLog.java on line 279. INFO 18:32:40,951 Replaying /var/lib/cassandra/commitlog/CommitLog-1281058340642.log DEBUG 18:32:40,953 Replaying /var/lib/cassandra/commitlog/CommitLog-1281058340642.log starting at 276 DEBUG 18:32:40,953 Reading mutation at 276 DEBUG 18:32:40,956 replaying mutation for system...@77fe4169: {ColumnFamily(LocationInfo [B:false:1...@1281058340821,])} DEBUG 18:32:40,965 Reading mutation at 424 INFO 18:32:40,966 Finished reading /var/lib/cassandra/commitlog/CommitLog-1281058340642.log ERROR 18:32:40,967 Exception encountered during startup. org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes [-64, -88, 101, 51] at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:43) at org.apache.cassandra.db.Column.getString(Column.java:247) at org.apache.cassandra.db.marshal.AbstractType.getColumnsString(AbstractType.java:85) at org.apache.cassandra.db.ColumnFamily.toString(ColumnFamily.java:379) at org.apache.commons.lang.ObjectUtils.toString(ObjectUtils.java:241) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3073) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3133) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:279) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:174) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) Exception encountered during startup. org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes [-64, -88, 101, 51] at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:43) at org.apache.cassandra.db.Column.getString(Column.java:247) at org.apache.cassandra.db.marshal.AbstractType.getColumnsString(AbstractType.java:85) at org.apache.cassandra.db.ColumnFamily.toString(ColumnFamily.java:379) at org.apache.commons.lang.ObjectUtils.toString(ObjectUtils.java:241) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3073) at org.apache.commons.lang.StringUtils.join(StringUtils.java:3133) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:279) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:174) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1322) allow configuring Pig without cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1322: --- Attachment: (was: 0001-Read-pig-configuration-from-environment-variables.patch) allow configuring Pig without cassandra.yaml Key: CASSANDRA-1322 URL: https://issues.apache.org/jira/browse/CASSANDRA-1322 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.6.3 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Read-pig-configuration-from-environment-variables.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1322) allow configuring Pig without cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1322: --- Attachment: 0001-Read-pig-configuration-from-environment-variables.patch Removed ability to use cassandra.yaml allow configuring Pig without cassandra.yaml Key: CASSANDRA-1322 URL: https://issues.apache.org/jira/browse/CASSANDRA-1322 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.6.3 Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Read-pig-configuration-from-environment-variables.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1126) Allow loading of cassandra.yaml from a location given on the commandline
[ https://issues.apache.org/jira/browse/CASSANDRA-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895730#action_12895730 ] Nick Bailey commented on CASSANDRA-1126: The patch in CASSANDRA-1347 modifies cassandra.in.sh to only modify CASSANDRA_CONF if it is not already set. Not exactly the goal of the title of this ticket but allows for the same result. Personally I don't really see a need for being able to rename the configuration files. Allow loading of cassandra.yaml from a location given on the commandline Key: CASSANDRA-1126 URL: https://issues.apache.org/jira/browse/CASSANDRA-1126 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Erik Onnen Priority: Trivial Fix For: 0.7.0 Attachments: DatabaseDescriptor.java.2.patch, DatabaseDescriptor.java.patch As a convenience, predominantly for testing but also for some levels of automated ops, it would be helpful to allow cassandra.yaml to be specified explicitly during startup as opposed to always reading it from the classpath which cannot be altered at runtime (not easily anyway). Sample patch attached that reads -D property cassandra.conf and gives it preference over any entry on the classpath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1126) Allow loading of cassandra.yaml from a location given on the commandline
[ https://issues.apache.org/jira/browse/CASSANDRA-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895736#action_12895736 ] Nick Bailey commented on CASSANDRA-1126: Hmm good point. It would still be possible to do it with that patch though. You'd just need a directory for each conf and to wrap the call to cassandra with something that sets the conf variable. Not quite as easy as the command line I suppose. Allow loading of cassandra.yaml from a location given on the commandline Key: CASSANDRA-1126 URL: https://issues.apache.org/jira/browse/CASSANDRA-1126 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Erik Onnen Priority: Trivial Fix For: 0.7.0 Attachments: DatabaseDescriptor.java.2.patch, DatabaseDescriptor.java.patch As a convenience, predominantly for testing but also for some levels of automated ops, it would be helpful to allow cassandra.yaml to be specified explicitly during startup as opposed to always reading it from the classpath which cannot be altered at runtime (not easily anyway). Sample patch attached that reads -D property cassandra.conf and gives it preference over any entry on the classpath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1126) Allow loading of cassandra.yaml from a location given on the commandline
[ https://issues.apache.org/jira/browse/CASSANDRA-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895738#action_12895738 ] Nick Bailey commented on CASSANDRA-1126: Actually you could just do: CASSANDRA_CONF='path/to/conf' bin/cassandra for each call. Not really that diffrent than -Dcassandra.conf Allow loading of cassandra.yaml from a location given on the commandline Key: CASSANDRA-1126 URL: https://issues.apache.org/jira/browse/CASSANDRA-1126 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7 beta 1 Reporter: Erik Onnen Priority: Trivial Fix For: 0.7.0 Attachments: DatabaseDescriptor.java.2.patch, DatabaseDescriptor.java.patch As a convenience, predominantly for testing but also for some levels of automated ops, it would be helpful to allow cassandra.yaml to be specified explicitly during startup as opposed to always reading it from the classpath which cannot be altered at runtime (not easily anyway). Sample patch attached that reads -D property cassandra.conf and gives it preference over any entry on the classpath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1364) Consolidate cassandra commands in bin/
Consolidate cassandra commands in bin/ -- Key: CASSANDRA-1364 URL: https://issues.apache.org/jira/browse/CASSANDRA-1364 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.6.3 Reporter: Nick Bailey Priority: Minor Fix For: 0.7.0 Pretty much every script in bin has the same first 30 lines or so. We need to remove some of the duplication here. This could be accomplished by consolidating some commands into a single script or adding an initializer script they all call. I think I prefer consolidating at least some of the commands. For example the *tool commands could easily be one cassandra-tool command. It may even be possible to incorporate most of them into the cassandra script and have different commands for starting a node or using the tools. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1347) Move JVM_OPTS from cassandra.in.sh to the conf directory
[ https://issues.apache.org/jira/browse/CASSANDRA-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1347: --- Attachment: configuration-improvements.txt Moved jvm options to conf/cassandra-env.sh I also made a few minor changes that make packaging and the init.d script in contrib work well together. Assuming this change is approved I don't see why debian/cassandra.in.sh needs to exist anymore. A debian specific jvm configuration could exist in conf or contrib perhaps. Move JVM_OPTS from cassandra.in.sh to the conf directory Key: CASSANDRA-1347 URL: https://issues.apache.org/jira/browse/CASSANDRA-1347 Project: Cassandra Issue Type: Improvement Components: Packaging Affects Versions: 0.6.3 Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 0.7.0 Attachments: configuration-improvements.txt Configuring the jvm options from the cassandra.in.sh script doesn't really make sense from a packaging perspective. These should exist in the conf directory and be overridden by deployment specific packages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (CASSANDRA-786) RPM Packages
[ https://issues.apache.org/jira/browse/CASSANDRA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reopened CASSANDRA-786: --- Assignee: (was: Peter Halliday) Reopening because nothing ever got committed to trunk. I think we should at least have a spec file in contrib for people to use as a basis if they aren't using riptano or another solution. Also I made a few changes to the last spec file that was attached. RPM Packages Key: CASSANDRA-786 URL: https://issues.apache.org/jira/browse/CASSANDRA-786 Project: Cassandra Issue Type: Improvement Components: Contrib Reporter: Daniel Lundin Priority: Minor Attachments: 768-update-spec-for-trunk.diff, 786-adjust-jars.patch, cassandra.spec, cassandra.spec RPM packages (and debs of course) would be nice,especially now that cassandra is maturing and gaining more interest. Lowering the threshold for getting cassandra running and getting started is also important. I think the RabbitMQ project has an admirable Download and install experience, not to mention the rather cute 2 min guarantee. Definitely a good inspiration. I've been studying Cloudera's Hadoop packages, which are very nice, and really appreciate the separate packages for configuration. This allows easy deployment of node configuration to a cluster. I'll have a spec file for building RHEL5 / CentOS packages ready for review and attached here in a bit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-786) RPM Packages
[ https://issues.apache.org/jira/browse/CASSANDRA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-786: -- Attachment: apache-cassandra.spec Some changes to the last attached spec file: * I think the package name should be apache-cassandra as it differentiates an rpm from something that might be provided by Riptano or similar. * Version updated to 0.70 * The package ant-nodeps is a dependency * The build step should perform ant clean and indicate the release flag in order for the jar to build correctly. * I modified how configuration is handled. The package installs the basic configuration at /usr/share/cassandra/default.conf and /etc/cassandra/default.conf. Then alternatives are set up to point /etc/cassandra/conf here with a priority of 0. This allows deployment specific packages to install a custom alternative with a higher priority. * The spec file should default to using the cassandra.in.sh in conf. * Ticket CASSANDRA-1347 addresses making cassandra.in.sh not actually perform jvm configuration. This allows someone to do all configuration by using an alternative /etc/hadoop/conf directory. * Modified the log files and conf files to give read access to the world. RPM Packages Key: CASSANDRA-786 URL: https://issues.apache.org/jira/browse/CASSANDRA-786 Project: Cassandra Issue Type: Improvement Components: Contrib Reporter: Daniel Lundin Priority: Minor Attachments: 768-update-spec-for-trunk.diff, 786-adjust-jars.patch, apache-cassandra.spec, cassandra.spec, cassandra.spec RPM packages (and debs of course) would be nice,especially now that cassandra is maturing and gaining more interest. Lowering the threshold for getting cassandra running and getting started is also important. I think the RabbitMQ project has an admirable Download and install experience, not to mention the rather cute 2 min guarantee. Definitely a good inspiration. I've been studying Cloudera's Hadoop packages, which are very nice, and really appreciate the separate packages for configuration. This allows easy deployment of node configuration to a cluster. I'll have a spec file for building RHEL5 / CentOS packages ready for review and attached here in a bit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-786) RPM Packages
[ https://issues.apache.org/jira/browse/CASSANDRA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894103#action_12894103 ] Nick Bailey commented on CASSANDRA-786: --- Any reason why this hasn't made it into trunk yet? RPM Packages Key: CASSANDRA-786 URL: https://issues.apache.org/jira/browse/CASSANDRA-786 Project: Cassandra Issue Type: Improvement Components: Contrib Reporter: Daniel Lundin Assignee: Peter Halliday Priority: Minor Attachments: 768-update-spec-for-trunk.diff, 786-adjust-jars.patch, cassandra.spec, cassandra.spec RPM packages (and debs of course) would be nice,especially now that cassandra is maturing and gaining more interest. Lowering the threshold for getting cassandra running and getting started is also important. I think the RabbitMQ project has an admirable Download and install experience, not to mention the rather cute 2 min guarantee. Definitely a good inspiration. I've been studying Cloudera's Hadoop packages, which are very nice, and really appreciate the separate packages for configuration. This allows easy deployment of node configuration to a cluster. I'll have a spec file for building RHEL5 / CentOS packages ready for review and attached here in a bit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1322) allow configuring Pig without cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reassigned CASSANDRA-1322: -- Assignee: Nick Bailey allow configuring Pig without cassandra.yaml Key: CASSANDRA-1322 URL: https://issues.apache.org/jira/browse/CASSANDRA-1322 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1291) nodetool ring prints incorrect IP address
[ https://issues.apache.org/jira/browse/CASSANDRA-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893349#action_12893349 ] Nick Bailey commented on CASSANDRA-1291: I believe Jignesh's change is almost certainly the problem. It's not consistently reproduceable because it depends on the hash value of the endpoint. Jignesh, you should modify RackUnawareStrategy as well and attach and submit the patch. nodetool ring prints incorrect IP address - Key: CASSANDRA-1291 URL: https://issues.apache.org/jira/browse/CASSANDRA-1291 Project: Cassandra Issue Type: Bug Affects Versions: 0.7 beta 1 Reporter: Brandon Williams Fix For: 0.7 beta 1 Nodetool's ring output on trunk seems to duplicate an ipaddress instead of printing them all. To reproduce: spin up a 3 node cluster and create a keyspace, examine nodetool ring. One ip address will show up twice, though the tokens are correctly displayed for the three machines. This is a cosmetic error, if you call getLiveNodes via JMX you can see all three IPs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0001-Modify-removeToken-command-to-make-it-similar-to-dec.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0002-Fixes-to-old-tests.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0001-Modify-removeToken-to-be-similar-to-decommission.patch 0002-Fixes-to-old-tests.patch 0003-Additional-unit-tests-for-removeToken.patch Some fixes and tests added. There is one thing that still needs to be fixed. * Currently the call to removeToken blocks either: ** until all nodes confirm that they have replicated the data for the dead node. ** or a timeout is reached * I'm not sure what the timeout for this should be. Additionally when nodes throughout the ring attempt to replicate data there should be a similar timeout before they give up on a source and retry. * Also clients may timeout before the timeout is even reached or all the data is replicated. I'm not sure how the user will be able to determine if the remove finished correctly or repair should be run. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0001-Modify-removeToken-to-be-similar-to-decommission.patch Updated 0001 patch. It was missing a class before. Oops. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: (was: 0001-Modify-removeToken-to-be-similar-to-decommission.patch) removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7.0 Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch, 0003-Additional-unit-tests-for-removeToken.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1291) nodetool ring prints incorrect IP address
[ https://issues.apache.org/jira/browse/CASSANDRA-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892978#action_12892978 ] Nick Bailey commented on CASSANDRA-1291: Do you have any moves taking place when you see this error? Looks like we include pending ranges in the output. Pending ranges are stored in a HashMultimap which uses a Hashset to store the collection of values for each key. Since HashSet doesn't guarantee order that could be the issue here. nodetool ring prints incorrect IP address - Key: CASSANDRA-1291 URL: https://issues.apache.org/jira/browse/CASSANDRA-1291 Project: Cassandra Issue Type: Bug Affects Versions: 0.7 beta 1 Reporter: Brandon Williams Fix For: 0.7 beta 1 Nodetool's ring output on trunk seems to duplicate an ipaddress instead of printing them all. To reproduce: spin up a 3 node cluster and create a keyspace, examine nodetool ring. One ip address will show up twice, though the tokens are correctly displayed for the three machines. This is a cosmetic error, if you call getLiveNodes via JMX you can see all three IPs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1216: --- Attachment: 0001-Modify-removeToken-command-to-make-it-similar-to-dec.patch 0002-Fixes-to-old-tests.patch * 0001 - changes to make removeToken behave similarly to decomission * 0002 - fixes to existing tests since the state for STATE_LEFT changed I am still working on some good unit tests for these changes but these are the changes so far. The new process for removeToken is basically the one outlined above. One change is that instead of a STATE_REMOVED state it seemed like tokens that are removed should just go into STATE_LEFT similar to nodes that are decommissioned. One thing I'm not sure of is the timeout values for waiting for replications to stream and for waiting for replication notifications. Currently they are just set arbitrarily in that patch. Need to determine good values for these. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 Attachments: 0001-Modify-removeToken-command-to-make-it-similar-to-dec.patch, 0002-Fixes-to-old-tests.patch this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885653#action_12885653 ] Nick Bailey commented on CASSANDRA-1216: It seems like this should follow a pattern similar to decommissioning a node. * If nodeA has removeToken called on it, it becomes responsible for nodeB, the node to remove * nodeA sets the MOVE_STATE of nodeB to STATE_REMOVING * This is gossipped throughout the ring. * Nodes see this change and fetch any ranges they are becoming responsible for ** After this is complete they will need to notify nodeA somehow that this is complete * Once nodeA sees all replications have finished, change state of nodeB to STATE_REMOVED * All nodes then remove nodeB from their ring. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885702#action_12885702 ] Nick Bailey commented on CASSANDRA-1216: A side effect of this approach may be that you would need to call removeToken on a node that had seen the token previously. removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey reassigned CASSANDRA-1216: -- Assignee: Nick Bailey removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
[ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884736#action_12884736 ] Nick Bailey commented on CASSANDRA-1216: So clearly that solution would fail in the case of the node that is attempting to retrive the data failing. Perhaps a better solution is simply not removing the node until replication and done. Perhaps marking it with a new state? removetoken drops node from ring before re-replicating its data is finished --- Key: CASSANDRA-1216 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Nick Bailey Fix For: 0.7 this means that if something goes wrong during the re-replication (e.g. a source node is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart the process (other than the Big Hammer of running repair) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1196) Invalid UTF-8 data should cause exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1196: --- Attachment: 0001-Initial-fixes-to-utf-decoding.patch 0002-Unit-test-for-UTF8-fixes.patch Added a decode method to FBUtilities and changed partitioners to throw RuntimeExceptions on decoding failures. Invalid UTF-8 data should cause exceptions -- Key: CASSANDRA-1196 URL: https://issues.apache.org/jira/browse/CASSANDRA-1196 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Assignee: Nick Bailey Priority: Minor Fix For: 0.7 Attachments: 0001-Initial-fixes-to-utf-decoding.patch, 0002-Unit-test-for-UTF8-fixes.patch Our current method for decoding UTF-8 data in OrderPreservingPartitioner and CollatingOrderPreservingPartitioner will silently decode invalid UTF-8 data. This may also be a problem UTF8Type. Instead, we should probably throw an exception, since bad UTF-8 data means either user error or corruption. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1196) Invalid UTF-8 data should cause exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1196: --- Attachment: 0002-Unit-test-for-UTF8-fixes.patch Updated test case. Invalid UTF-8 data should cause exceptions -- Key: CASSANDRA-1196 URL: https://issues.apache.org/jira/browse/CASSANDRA-1196 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Assignee: Nick Bailey Priority: Minor Fix For: 0.7 Attachments: 0001-Initial-fixes-to-utf-decoding.patch, 0002-Unit-test-for-UTF8-fixes.patch Our current method for decoding UTF-8 data in OrderPreservingPartitioner and CollatingOrderPreservingPartitioner will silently decode invalid UTF-8 data. This may also be a problem UTF8Type. Instead, we should probably throw an exception, since bad UTF-8 data means either user error or corruption. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1196) Invalid UTF-8 data should cause exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1196: --- Attachment: (was: 0002-Unit-test-for-UTF8-fixes.patch) Invalid UTF-8 data should cause exceptions -- Key: CASSANDRA-1196 URL: https://issues.apache.org/jira/browse/CASSANDRA-1196 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Assignee: Nick Bailey Priority: Minor Fix For: 0.7 Attachments: 0001-Initial-fixes-to-utf-decoding.patch Our current method for decoding UTF-8 data in OrderPreservingPartitioner and CollatingOrderPreservingPartitioner will silently decode invalid UTF-8 data. This may also be a problem UTF8Type. Instead, we should probably throw an exception, since bad UTF-8 data means either user error or corruption. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1232) UTF8Type.compare() is slow and dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1232: --- Attachment: 0001-Fixes-to-UTF8Type-compare-and-getString-methods.patch Updated to extend BytesType and use a decode method from FBUtilities to catch encoding problems. This patch uses the patches provided by [CASSANDRA-1196|https://issues.apache.org/jira/browse/CASSANDRA-1196] UTF8Type.compare() is slow and dangerous Key: CASSANDRA-1232 URL: https://issues.apache.org/jira/browse/CASSANDRA-1232 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Folke Behrens Assignee: Nick Bailey Fix For: 0.6.4 Attachments: 0001-Fixes-to-UTF8Type-compare-and-getString-methods.patch UTF8Type converts both byte arrays into Strings and then compares them. This is unnecessary and slow because UTF-8 encoded Strings are already directly comparable. Higher codepoints yield higher initial and subsequent bytes. One can safely use BytesType.compare() for UTF-8. Maybe UTF8Type should be a subclass only overriding getString(). BTW, It's also dangerous to ignore invalid byte sequences. At this point the byte array should contain valid UTF-8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1232) UTF8Type.compare() is slow and dangerous
[ https://issues.apache.org/jira/browse/CASSANDRA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883978#action_12883978 ] Nick Bailey commented on CASSANDRA-1232: Hmm I changed the method to explicitly decode as UTF8. Would we prefer to assume UTF8 at this point? UTF8Type.compare() is slow and dangerous Key: CASSANDRA-1232 URL: https://issues.apache.org/jira/browse/CASSANDRA-1232 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Folke Behrens Assignee: Nick Bailey Fix For: 0.6.4 Attachments: 0001-Fixes-to-UTF8Type-compare-and-getString-methods.patch UTF8Type converts both byte arrays into Strings and then compares them. This is unnecessary and slow because UTF-8 encoded Strings are already directly comparable. Higher codepoints yield higher initial and subsequent bytes. One can safely use BytesType.compare() for UTF-8. Maybe UTF8Type should be a subclass only overriding getString(). BTW, It's also dangerous to ignore invalid byte sequences. At this point the byte array should contain valid UTF-8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1196) Invalid UTF-8 data should cause exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1196: --- Attachment: (was: 0002-Unit-test-for-UTF8-fixes.patch) Invalid UTF-8 data should cause exceptions -- Key: CASSANDRA-1196 URL: https://issues.apache.org/jira/browse/CASSANDRA-1196 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Assignee: Nick Bailey Priority: Minor Fix For: 0.7 Attachments: 0001-Initial-fixes-to-utf-decoding.patch Our current method for decoding UTF-8 data in OrderPreservingPartitioner and CollatingOrderPreservingPartitioner will silently decode invalid UTF-8 data. This may also be a problem UTF8Type. Instead, we should probably throw an exception, since bad UTF-8 data means either user error or corruption. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1196) Invalid UTF-8 data should cause exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Bailey updated CASSANDRA-1196: --- Attachment: 0002-Better-Unit-Testing.patch Fixed unit test for decode. Invalid UTF-8 data should cause exceptions -- Key: CASSANDRA-1196 URL: https://issues.apache.org/jira/browse/CASSANDRA-1196 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Assignee: Nick Bailey Priority: Minor Fix For: 0.7 Attachments: 0001-Initial-fixes-to-utf-decoding.patch, 0002-Better-Unit-Testing.patch Our current method for decoding UTF-8 data in OrderPreservingPartitioner and CollatingOrderPreservingPartitioner will silently decode invalid UTF-8 data. This may also be a problem UTF8Type. Instead, we should probably throw an exception, since bad UTF-8 data means either user error or corruption. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.