[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528707#comment-17528707 ] Jan Karlsson commented on CASSANDRA-16718: -- [~brandon.williams] The ticket has been quiet for awhile now. What is the status on this? > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528707#comment-17528707 ] Jan Karlsson edited comment on CASSANDRA-16718 at 4/27/22 9:57 AM: --- [~brandon.williams] The ticket has been quiet for awhile now. What is the status on this? Are we waiting for a reviewer? was (Author: jan karlsson): [~brandon.williams] The ticket has been quiet for awhile now. What is the status on this? > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502267#comment-17502267 ] Jan Karlsson edited comment on CASSANDRA-17407 at 3/8/22, 7:18 AM: --- Valid point, although I am not sure how much use the average person will get out of it. Datacenter names usually have some pattern that makes it easy to see errors. Even with 10 dcs, it would be rather easy for the user to do a nodetool status and compare the lists. Although I suppose a more precise error message couldn't hurt and the improved validation on the dtest would be good. ||Branch||Circle||dtest|| |[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCI|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/10/workflows/f5008a69-2798-4b03-802d-fb6d0128d68a]|[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]| edit: Fixed 3.11 link was (Author: jan karlsson): Valid point, although I am not sure how much use the average person will get out of it. Datacenter names usually have some pattern that makes it easy to see errors. Even with 10 dcs, it would be rather easy for the user to do a nodetool status and compare the lists. Although I suppose a more precise error message couldn't hurt and the improved validation on the dtest would be good. ||Branch||Circle||dtest|| |[3.11|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|[CircleCI|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/10/workflows/f5008a69-2798-4b03-802d-fb6d0128d68a]|[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]| > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502267#comment-17502267 ] Jan Karlsson commented on CASSANDRA-17407: -- Valid point, although I am not sure how much use the average person will get out of it. Datacenter names usually have some pattern that makes it easy to see errors. Even with 10 dcs, it would be rather easy for the user to do a nodetool status and compare the lists. Although I suppose a more precise error message couldn't hurt and the improved validation on the dtest would be good. ||Branch||Circle||dtest|| |[3.11|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|[CircleCI|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/10/workflows/f5008a69-2798-4b03-802d-fb6d0128d68a]|[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]| > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500056#comment-17500056 ] Jan Karlsson edited comment on CASSANDRA-17407 at 3/2/22, 11:09 AM: Sure thing. I created a [dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1] that tests this behavior. I added a check for the validation of the local data center as well as I couldn't find any other place where we test this. was (Author: jan karlsson): Sure thing. I created a [test|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1] that tests this behavior. I added a check for the validation of the local data center as well as I couldn't find any other place where we test this. > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500056#comment-17500056 ] Jan Karlsson commented on CASSANDRA-17407: -- Sure thing. I created a [test|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1] that tests this behavior. I added a check for the validation of the local data center as well as I couldn't find any other place where we test this. > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499506#comment-17499506 ] Jan Karlsson commented on CASSANDRA-17407: -- You might have a point about this feeling more like a bug than an improvement. It certainly is confusing behavior for users. I was thinking something like this for the patch: ||Patch||Test|| |[3.11\|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCi\|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/9/workflows/ca527be8-4145-4c55-bfb4-7cec670e7b4c]| The patch applies cleanly to 4.0/trunk. I can provide patches for the other versions too if needed. > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499506#comment-17499506 ] Jan Karlsson edited comment on CASSANDRA-17407 at 3/1/22, 12:38 PM: You might have a point about this feeling more like a bug than an improvement. It certainly is confusing behavior for users. I was thinking something like this for the patch: ||Patch||Test|| |[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCi|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/9/workflows/ca527be8-4145-4c55-bfb4-7cec670e7b4c]| The patch applies cleanly to 4.0/trunk. I can provide patches for the other versions too if needed. was (Author: jan karlsson): You might have a point about this feeling more like a bug than an improvement. It certainly is confusing behavior for users. I was thinking something like this for the patch: ||Patch||Test|| |[3.11\|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCi\|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/9/workflows/ca527be8-4145-4c55-bfb4-7cec670e7b4c]| The patch applies cleanly to 4.0/trunk. I can provide patches for the other versions too if needed. > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17407) Validate existence of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-17407: - Summary: Validate existence of DCs when repairing (was: Validate existance of DCs when repairing) > Validate existence of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17407) Validate existance of DCs when repairing
[ https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498940#comment-17498940 ] Jan Karlsson commented on CASSANDRA-17407: -- I'd be happy to provide a patch if we decide this to be a good addition. > Validate existance of DCs when repairing > > > Key: CASSANDRA-17407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 > Project: Cassandra > Issue Type: Improvement >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > > With the new validation of data centers in the replication factor, it might > be good to give similar treatment to repair. > Currently the behavior of the --in-dc flag only validates that it contains > the local data center. > If a list is given containing nonexistent data centers, the repair will pass > without errors or warning as long as this list also contains the local data > center. > My suggestion would be to validate all the data centers and give an error > when a nonexistent data center is given. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17407) Validate existance of DCs when repairing
Jan Karlsson created CASSANDRA-17407: Summary: Validate existance of DCs when repairing Key: CASSANDRA-17407 URL: https://issues.apache.org/jira/browse/CASSANDRA-17407 Project: Cassandra Issue Type: Improvement Reporter: Jan Karlsson Assignee: Jan Karlsson With the new validation of data centers in the replication factor, it might be good to give similar treatment to repair. Currently the behavior of the --in-dc flag only validates that it contains the local data center. If a list is given containing nonexistent data centers, the repair will pass without errors or warning as long as this list also contains the local data center. My suggestion would be to validate all the data centers and give an error when a nonexistent data center is given. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416492#comment-17416492 ] Jan Karlsson commented on CASSANDRA-16718: -- Observed this by fetching the peers table before stopping node2 in my dtest. Without the patch I observed preferred_ip populated but with the patch it is null throughout the test. > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416038#comment-17416038 ] Jan Karlsson commented on CASSANDRA-16718: -- LGTM. Seems to fix the issue completely. However, preferred_ip is null after your patch throughout the test. Is this an intended side effect? > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411763#comment-17411763 ] Jan Karlsson commented on CASSANDRA-16718: -- Great findings so far. Thank you for taking the time to dig into this. I agree that the old local address is persisted somewhere and therefore used by the existing node. However, in an attempt to verify your findings I modified my test case to manually change the preferred_ip before I start the last node so that it points to the correct address. The test still fails even with an updated preferred_ip. My original thought was that the Gossiper was persisting this ip in endpointStateMap. During checkEndpointCollison, the UP node will attempt to connect through the local address before this address is updated by the ShadowRound. > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392196#comment-17392196 ] Jan Karlsson commented on CASSANDRA-16718: -- Took a look at the code base. It seems to be quite a difficult thing to change as it is intertwined with the message pools per node. Maybe someone with more experience with the networking can shed some light on the issue. > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson reassigned CASSANDRA-16718: Assignee: (was: Jan Karlsson) > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson reassigned CASSANDRA-16718: Assignee: Jan Karlsson > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358575#comment-17358575 ] Jan Karlsson commented on CASSANDRA-16718: -- dtest to repro can be found [here|https://github.com/itskarlsson/cassandra-dtest/tree/CASSANDRA-16718]. It works if prefer_local is set to false. > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Consistency/Bootstrap and Decommission >Reporter: Jan Karlsson >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0, 4.0-rc1 > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
[ https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-16718: - Fix Version/s: 4.0-rc1 4.0 3.11.11 3.0.25 > Changing listen_address with prefer_local may lead to issues > > > Key: CASSANDRA-16718 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Consistency/Bootstrap and Decommission >Reporter: Jan Karlsson >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0, 4.0-rc1 > > > Many container based solution function by assigning new listen_addresses when > nodes are stopped. Changing the listen_address is usually as simple as > turning off the node and changing the yaml file. > However, if prefer_local is enabled, I observed that nodes were unable to > join the cluster and fail with 'Unable to gossip with any seeds'. > Trace shows that the changing node will try to communicate with the existing > node but the response is never received. I assume it is because the existing > node attempts to communicate with the local address during the shadow round. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues
Jan Karlsson created CASSANDRA-16718: Summary: Changing listen_address with prefer_local may lead to issues Key: CASSANDRA-16718 URL: https://issues.apache.org/jira/browse/CASSANDRA-16718 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip, Consistency/Bootstrap and Decommission Reporter: Jan Karlsson Many container based solution function by assigning new listen_addresses when nodes are stopped. Changing the listen_address is usually as simple as turning off the node and changing the yaml file. However, if prefer_local is enabled, I observed that nodes were unable to join the cluster and fail with 'Unable to gossip with any seeds'. Trace shows that the changing node will try to communicate with the existing node but the response is never received. I assume it is because the existing node attempts to communicate with the local address during the shadow round. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16577) Node waits for schema agreement on removed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317944#comment-17317944 ] Jan Karlsson commented on CASSANDRA-16577: -- Tried to reproduce with your patch. It worked both on 4.0-rc1-SNAPSHOT and 3.11.11-SNAPSHOT. As for the code, LGTM. One nit would be to include the ignore log message into the exception instead of the warning. > Node waits for schema agreement on removed nodes > > > Key: CASSANDRA-16577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16577 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Consistency/Bootstrap and Decommission >Reporter: Jan Karlsson >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x, 4.0-beta > > > CASSANDRA-15158 might have introduced a bug where bootstrapping nodes wait > for schema agreement from nodes that have been removed if token allocation > for keyspace is enabled. > > It is fairly easy to reproduce with the following steps: > {noformat} > // Create 3 node cluster > ccm create test --vnodes -n 3 -s -v 3.11.10 > // Remove two nodes > ccm node2 decommission > ccm node3 decommission > ccm node2 remove > ccm node3 remove > // Create keyspace to change the schema. It works if the schema never changes. > ccm node1 cqlsh -x "CREATE KEYSPACE k WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 1};" > // Add allocate parameter > ccm updateconf 'allocate_tokens_for_keyspace: k' > // Add node2 again to cluster > ccm add node2 -i 127.0.0.2 -j 7200 -r 2200 > ccm node2 start{noformat} > > This will cause node2 to throw exception on startup: > {noformat} > WARN [main] 2021-04-08 14:10:53,272 StorageService.java:941 - There are > nodes in the cluster with a different schema version than us we did not > merged schemas from, our version : (a5da47ec-ffe3-3111-b2f3-325f771f1539), > outstanding versions -> endpoints : > {8e9ec79e-5ed2-3949-8ac8-794abfee3837=[/127.0.0.3]} > ERROR [main] 2021-04-08 14:10:53,274 CassandraDaemon.java:803 - Exception > encountered during startup > java.lang.RuntimeException: Didn't receive schemas for all known versions > within the timeout > at > org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:947) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:206) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:177) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1073) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:753) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395) > [apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633) > [apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786) > [apache-cassandra-3.11.10.jar:3.11.10] > INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,279 > HintsService.java:209 - Paused hints dispatch > WARN [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 Gossiper.java:1670 > - No local state, state is in silent shutdown, or node hasn't joined, not > announcing shutdown > INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 > MessagingService.java:985 - Waiting for messaging service to quiesce > INFO [ACCEPT-/127.0.0.2] 2021-04-08 14:10:53,281 MessagingService.java:1346 > - MessagingService has terminated the accept() thread > INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,416 > HintsService.java:209 - Paused hints dispatch{noformat} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16577) Node waits for schema agreement on removed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-16577: - Fix Version/s: 3.0.24 3.11.10 4.0-beta 4.0 > Node waits for schema agreement on removed nodes > > > Key: CASSANDRA-16577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16577 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Consistency/Bootstrap and Decommission >Reporter: Jan Karlsson >Priority: Normal > Fix For: 3.0.24, 3.11.10, 4.0, 4.0-beta > > > CASSANDRA-15158 might have introduced a bug where bootstrapping nodes wait > for schema agreement from nodes that have been removed if token allocation > for keyspace is enabled. > > It is fairly easy to reproduce with the following steps: > {noformat} > // Create 3 node cluster > ccm create test --vnodes -n 3 -s -v 3.11.10 > // Remove two nodes > ccm node2 decommission > ccm node3 decommission > ccm node2 remove > ccm node3 remove > // Create keyspace to change the schema. It works if the schema never changes. > ccm node1 cqlsh -x "CREATE KEYSPACE k WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': 1};" > // Add allocate parameter > ccm updateconf 'allocate_tokens_for_keyspace: k' > // Add node2 again to cluster > ccm add node2 -i 127.0.0.2 -j 7200 -r 2200 > ccm node2 start{noformat} > > This will cause node2 to throw exception on startup: > {noformat} > WARN [main] 2021-04-08 14:10:53,272 StorageService.java:941 - There are > nodes in the cluster with a different schema version than us we did not > merged schemas from, our version : (a5da47ec-ffe3-3111-b2f3-325f771f1539), > outstanding versions -> endpoints : > {8e9ec79e-5ed2-3949-8ac8-794abfee3837=[/127.0.0.3]} > ERROR [main] 2021-04-08 14:10:53,274 CassandraDaemon.java:803 - Exception > encountered during startup > java.lang.RuntimeException: Didn't receive schemas for all known versions > within the timeout > at > org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:947) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:206) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:177) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1073) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:753) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) > ~[apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395) > [apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633) > [apache-cassandra-3.11.10.jar:3.11.10] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786) > [apache-cassandra-3.11.10.jar:3.11.10] > INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,279 > HintsService.java:209 - Paused hints dispatch > WARN [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 Gossiper.java:1670 > - No local state, state is in silent shutdown, or node hasn't joined, not > announcing shutdown > INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 > MessagingService.java:985 - Waiting for messaging service to quiesce > INFO [ACCEPT-/127.0.0.2] 2021-04-08 14:10:53,281 MessagingService.java:1346 > - MessagingService has terminated the accept() thread > INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,416 > HintsService.java:209 - Paused hints dispatch{noformat} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16577) Node waits for schema agreement on removed nodes
Jan Karlsson created CASSANDRA-16577: Summary: Node waits for schema agreement on removed nodes Key: CASSANDRA-16577 URL: https://issues.apache.org/jira/browse/CASSANDRA-16577 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip, Consistency/Bootstrap and Decommission Reporter: Jan Karlsson CASSANDRA-15158 might have introduced a bug where bootstrapping nodes wait for schema agreement from nodes that have been removed if token allocation for keyspace is enabled. It is fairly easy to reproduce with the following steps: {noformat} // Create 3 node cluster ccm create test --vnodes -n 3 -s -v 3.11.10 // Remove two nodes ccm node2 decommission ccm node3 decommission ccm node2 remove ccm node3 remove // Create keyspace to change the schema. It works if the schema never changes. ccm node1 cqlsh -x "CREATE KEYSPACE k WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};" // Add allocate parameter ccm updateconf 'allocate_tokens_for_keyspace: k' // Add node2 again to cluster ccm add node2 -i 127.0.0.2 -j 7200 -r 2200 ccm node2 start{noformat} This will cause node2 to throw exception on startup: {noformat} WARN [main] 2021-04-08 14:10:53,272 StorageService.java:941 - There are nodes in the cluster with a different schema version than us we did not merged schemas from, our version : (a5da47ec-ffe3-3111-b2f3-325f771f1539), outstanding versions -> endpoints : {8e9ec79e-5ed2-3949-8ac8-794abfee3837=[/127.0.0.3]} ERROR [main] 2021-04-08 14:10:53,274 CassandraDaemon.java:803 - Exception encountered during startup java.lang.RuntimeException: Didn't receive schemas for all known versions within the timeout at org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:947) ~[apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:206) ~[apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:177) ~[apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1073) ~[apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:753) ~[apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) ~[apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395) [apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633) [apache-cassandra-3.11.10.jar:3.11.10] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786) [apache-cassandra-3.11.10.jar:3.11.10] INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,279 HintsService.java:209 - Paused hints dispatch WARN [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 Gossiper.java:1670 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 MessagingService.java:985 - Waiting for messaging service to quiesce INFO [ACCEPT-/127.0.0.2] 2021-04-08 14:10:53,281 MessagingService.java:1346 - MessagingService has terminated the accept() thread INFO [StorageServiceShutdownHook] 2021-04-08 14:10:53,416 HintsService.java:209 - Paused hints dispatch{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16316) Tracing continues after session completed
[ https://issues.apache.org/jira/browse/CASSANDRA-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251749#comment-17251749 ] Jan Karlsson commented on CASSANDRA-16316: -- Sure thing. Here is a [dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:16316] that tests for this issue. I tested it both with and without patch on 3.11.9 to verify. > Tracing continues after session completed > - > > Key: CASSANDRA-16316 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16316 > Project: Cassandra > Issue Type: Bug > Components: Observability/Tracing >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 2.2.x, 3.0.x, 3.11.x > > > We saw the system_trace.events table increasing in size continuously without > any trace requests being issued. > I traced the issue back to a specific version and patch. I believe we have > removed the call to reset the trace flag in CASSANDRA-15041 which causes > tracing to continue in the thread even after it is finished with the request. > Reproduced like follows: > 1. ccm test -n 1 -v 3.11.9 > 2. Enable authentication/authorization > 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is > default value. I am guessing this is because the update is done in the > calling thread) > 4. select * from some table a bunch of times until PermissionRoleCache is > refreshed > 5. Watch system_traces.events grow > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16316) Tracing continues after session completed
[ https://issues.apache.org/jira/browse/CASSANDRA-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246496#comment-17246496 ] Jan Karlsson edited comment on CASSANDRA-16316 at 12/9/20, 12:36 PM: - You are probably right about trunk. I also tried reproducing it on there without success. As for a fix, I think something simple like calling maybeResetTraceSessionWrapper should suffice. Something like [this|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:16316-3.11]. I can provide patches for the other versions if needed. was (Author: jan karlsson): You are probably right about trunk. I also tried reproducing it on there without success. As for a fix, I think something simple like calling maybeResetTraceSessionWrapper should suffice. Something like [this|[https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:16316-3.11]]. I can provide patches for the other versions if needed. > Tracing continues after session completed > - > > Key: CASSANDRA-16316 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16316 > Project: Cassandra > Issue Type: Bug > Components: Observability/Tracing >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 2.2.x, 3.0.x, 3.11.x > > > We saw the system_trace.events table increasing in size continuously without > any trace requests being issued. > I traced the issue back to a specific version and patch. I believe we have > removed the call to reset the trace flag in CASSANDRA-15041 which causes > tracing to continue in the thread even after it is finished with the request. > Reproduced like follows: > 1. ccm test -n 1 -v 3.11.9 > 2. Enable authentication/authorization > 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is > default value. I am guessing this is because the update is done in the > calling thread) > 4. select * from some table a bunch of times until PermissionRoleCache is > refreshed > 5. Watch system_traces.events grow > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16316) Tracing continues after session completed
[ https://issues.apache.org/jira/browse/CASSANDRA-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246496#comment-17246496 ] Jan Karlsson commented on CASSANDRA-16316: -- You are probably right about trunk. I also tried reproducing it on there without success. As for a fix, I think something simple like calling maybeResetTraceSessionWrapper should suffice. Something like [this|[https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:16316-3.11]]. I can provide patches for the other versions if needed. > Tracing continues after session completed > - > > Key: CASSANDRA-16316 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16316 > Project: Cassandra > Issue Type: Bug > Components: Observability/Tracing >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Normal > Fix For: 2.2.x, 3.0.x, 3.11.x > > > We saw the system_trace.events table increasing in size continuously without > any trace requests being issued. > I traced the issue back to a specific version and patch. I believe we have > removed the call to reset the trace flag in CASSANDRA-15041 which causes > tracing to continue in the thread even after it is finished with the request. > Reproduced like follows: > 1. ccm test -n 1 -v 3.11.9 > 2. Enable authentication/authorization > 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is > default value. I am guessing this is because the update is done in the > calling thread) > 4. select * from some table a bunch of times until PermissionRoleCache is > refreshed > 5. Watch system_traces.events grow > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16316) Tracing continues after session completed
Jan Karlsson created CASSANDRA-16316: Summary: Tracing continues after session completed Key: CASSANDRA-16316 URL: https://issues.apache.org/jira/browse/CASSANDRA-16316 Project: Cassandra Issue Type: Bug Components: Observability/Tracing Reporter: Jan Karlsson Assignee: Jan Karlsson We saw the system_trace.events table increasing in size continuously without any trace requests being issued. I traced the issue back to a specific version and patch. I believe we have removed the call to reset the trace flag in CASSANDRA-15041 which causes tracing to continue in the thread even after it is finished with the request. Reproduced like follows: 1. ccm test -n 1 -v 3.11.9 2. Enable authentication/authorization 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is default value. I am guessing this is because the update is done in the calling thread) 4. select * from some table a bunch of times until PermissionRoleCache is refreshed 5. Watch system_traces.events grow -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14710) Use quilt to patch cassandra.in.sh in Debian packaging
[ https://issues.apache.org/jira/browse/CASSANDRA-14710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833757#comment-16833757 ] Jan Karlsson commented on CASSANDRA-14710: -- Took a look at the patch and LGTM. Seems to all apply cleanly and installs just fine in a Debian docker container. > Use quilt to patch cassandra.in.sh in Debian packaging > -- > > Key: CASSANDRA-14710 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14710 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Michael Shuler >Assignee: Michael Shuler >Priority: Normal > Fix For: 4.0 > > Attachments: CASSANDRA-14710_c.in.sh.patch.txt > > > While working on CASSANDRA-14707, I found the debian/cassandra.in.sh file is > outdated and is missing some elements from bin/cassandra.in.sh. This should > not be a separately maintained file, so let's use quilt to patch the few bits > that need to be updated on Debian package installations. > * rm debian/cassandra.in.sh > * create quilt patch for path updates needed > * update debian/cassandra.install to install our patched bin/cassandra.in.sh -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14904) SSTableloader doesn't understand listening for CQL connections on multiple ports
[ https://issues.apache.org/jira/browse/CASSANDRA-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814423#comment-16814423 ] Jan Karlsson commented on CASSANDRA-14904: -- I scraped together some time to have a look. LGTM for the most part, but I have some thoughts. I have been thinking of the use case where both native_transport_port and the native_transport_port_ssl are set. 1. With this patch, the behavior will be that we will always use the native_transport_port_ssl if both are set unless overridden by command line. I don't necessarily see a problem with that but it might not be very transparent behavior. 2. No matter what we choose to do about this behavior, a test case that tests the case of both being set would be good to add. > SSTableloader doesn't understand listening for CQL connections on multiple > ports > > > Key: CASSANDRA-14904 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14904 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Kurt Greaves >Assignee: Ian Cleasby >Priority: Low > Fix For: 4.0, 3.11.x > > > sstableloader only searches the yaml for native_transport_port, so if > native_transport_port_ssl is set and encryption is enabled sstableloader will > fail to connect as it will use the non-SSL port for the connection. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-10091) Integrated JMX authn & authz
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-10091: - Component/s: (was: Legacy/Observability) > Integrated JMX authn & authz > > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature > Components: Local/Config, Local/Startup and Shutdown >Reporter: Jan Karlsson >Assignee: Sam Tunnicliffe >Priority: Minor > Labels: doc-impacting, security > Fix For: 3.6 > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Component/s: (was: Legacy/Tools) > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13404: - Component/s: (was: Legacy/Streaming and Messaging) > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature > Components: Messaging/Client >Reporter: Jan Karlsson >Assignee: Per Otterström >Priority: Major > Labels: security > Fix For: 4.x > > Attachments: 13404-trunk-v2.patch, 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Component/s: (was: Legacy/Streaming and Messaging) > Repair grows data on nodes, causes load to become unbalanced > > > Key: CASSANDRA-8366 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Environment: 4 node cluster > 2.1.2 Cassandra > Inserts and reads are done with CQL driver >Reporter: Jan Karlsson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 2.1.5 > > Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, > results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, > results-500_2_inc_repairs.txt, > results-500_full_repair_then_inc_repairs.txt, > results-500_inc_repairs_not_parallel.txt, > run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, > run3_no_compact_before_repair.log, test.sh, testv2.sh > > > There seems to be something weird going on when repairing data. > I have a program that runs 2 hours which inserts 250 random numbers and reads > 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. > I use size-tiered compaction for my cluster. > After those 2 hours I run a repair and the load of all nodes goes up. If I > run incremental repair the load goes up alot more. I saw the load shoot up 8 > times the original size multiple times with incremental repair. (from 2G to > 16G) > with node 9 8 7 and 6 the repro procedure looked like this: > (Note that running full repair first is not a requirement to reproduce.) > {noformat} > After 2 hours of 250 reads + 250 writes per second: > UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > Repair -pr -par on all nodes sequentially > UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > after rolling restart > UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > compact all nodes sequentially > UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > restart once more > UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > {noformat} > Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13404: - Component/s: Legacy/Streaming and Messaging > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Streaming and Messaging, Messaging/Client >Reporter: Jan Karlsson >Assignee: Per Otterström >Priority: Major > Labels: security > Fix For: 4.x > > Attachments: 13404-trunk-v2.patch, 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Component/s: Consistency/Repair > Repair grows data on nodes, causes load to become unbalanced > > > Key: CASSANDRA-8366 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Legacy/Streaming and Messaging > Environment: 4 node cluster > 2.1.2 Cassandra > Inserts and reads are done with CQL driver >Reporter: Jan Karlsson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 2.1.5 > > Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, > results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, > results-500_2_inc_repairs.txt, > results-500_full_repair_then_inc_repairs.txt, > results-500_inc_repairs_not_parallel.txt, > run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, > run3_no_compact_before_repair.log, test.sh, testv2.sh > > > There seems to be something weird going on when repairing data. > I have a program that runs 2 hours which inserts 250 random numbers and reads > 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. > I use size-tiered compaction for my cluster. > After those 2 hours I run a repair and the load of all nodes goes up. If I > run incremental repair the load goes up alot more. I saw the load shoot up 8 > times the original size multiple times with incremental repair. (from 2G to > 16G) > with node 9 8 7 and 6 the repro procedure looked like this: > (Note that running full repair first is not a requirement to reproduce.) > {noformat} > After 2 hours of 250 reads + 250 writes per second: > UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > Repair -pr -par on all nodes sequentially > UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > after rolling restart > UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > compact all nodes sequentially > UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > restart once more > UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > {noformat} > Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Component/s: Legacy/Tools > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools, Tool/bulk load >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13404: - Component/s: (was: Legacy/Streaming and Messaging) Messaging/Client > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature > Components: Messaging/Client >Reporter: Jan Karlsson >Assignee: Per Otterström >Priority: Major > Labels: security > Fix For: 4.x > > Attachments: 13404-trunk-v2.patch, 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Component/s: (was: Legacy/Tools) Tool/bulk load > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14789) Configuring nodetool from a file
[ https://issues.apache.org/jira/browse/CASSANDRA-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653632#comment-16653632 ] Jan Karlsson commented on CASSANDRA-14789: -- {{I had a look at how we can do this, but was not impressed with the options we have. Airline does not seem to jive(no pun intended) well with going into the code to fetch defaults from a file. Overriding the different parameters in the abstract class does not seem to be too smooth. Doing it in the script calling Nodetool might be a little more clean. Sourcing in a file could allow to manipulate the ARGS variable by adding lines likes this:}} {{JMX_PORT=7199}} {{ARGS="$ARGS -h 127.0.0.2"}} {{There are a few concerns I have with this approach. Firstly, It might some security risks associated, but file permissions can help with that. Secondly, we will be practically requiring the user to provide lines of bash script. I would like to avoid that, but I am not sure how to do that without having a map of all the available options and grepping the ARGS parameter with each options.}} {{All in all, it might not be as bad, considering that this is an optional feature for more advanced use cases.}} {{This solution is definitely quite non intrusive, but it does mean that parameters could be placed twice into the command run. It is a little iffy but it should still work.}} > Configuring nodetool from a file > > > Key: CASSANDRA-14789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14789 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor > Fix For: 4.x > > > Nodetool has a lot of options that can be set. SSL can be configured through > a file[1], but most other parameters must be provided when running the > command. It would be helpful to be able to configure its parameters through a > file much like how cqlsh can be configured[2]. > > [1] https://issues.apache.org/jira/browse/CASSANDRA-9090 > [2] > [https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14789) Configuring nodetool from a file
Jan Karlsson created CASSANDRA-14789: Summary: Configuring nodetool from a file Key: CASSANDRA-14789 URL: https://issues.apache.org/jira/browse/CASSANDRA-14789 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jan Karlsson Assignee: Jan Karlsson Fix For: 4.x Nodetool has a lot of options that can be set. SSL can be configured through a file[1], but most other parameters must be provided when running the command. It would be helpful to be able to configure its parameters through a file much like how cqlsh can be configured[2]. [1] https://issues.apache.org/jira/browse/CASSANDRA-9090 [2] [https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Reproduced In: 3.0.15, 2.2.9 (was: 2.2.9, 4.0) Fix Version/s: (was: 4.x) > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581085#comment-16581085 ] Jan Karlsson commented on CASSANDRA-13639: -- I tried running it with both 3.0.15 and trunk. I was not able to reproduce this on latest trunk but I could get this behavior on 3.0.15. Seems the changes have fixed this issue. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565346#comment-16565346 ] Jan Karlsson edited comment on CASSANDRA-13639 at 8/1/18 1:56 PM: -- I apologize for my long absence but I have time look into this now. {quote}If outboundBindAny would be set to true, then the SSL Socket would be bound to any local address, which is most likely not what we want, so not sure why we would ever want to set outboundBindAny to true anyway. {quote} I actually believe the contrary. Having the SSL Socket bound to the address which is specified by your operating system's routing is precisely what we want. It seens fishy that we always pick the local address and ignore the routing of the operating system. {quote}I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL-}} or {{-sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. {quote} I can see the point of adding a flag for the simple fact that we would not break backward compatibility, but we should also consider that picking the first interface no matter what routing is set up seems like faulty behavior. If we choose to go this route to keep backwards compatibility, we should describe this behavior in the documentation. The error I received was rather strange when I hit this issue locally on my machine and required me to dig quite deep to find the root cause. was (Author: jan karlsson): I apologize for my long absence but I have time look into this now. {quote}If outboundBindAny would be set to true, then the SSL Socket would be bound to any local address, which is most likely not what we want, so not sure why we would ever want to set outboundBindAny to true anyway. {quote} I actually believe the contrary. Having the SSL Socket bound to the local address which is specified by your operating system's routing is precisely what we want. It seens fishy that we always pick the local address and ignore the routing of the operating system. {quote}I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL-}} or {{-sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. {quote} I can see the point of adding a flag for the simple fact that we would not break backward compatibility, but we should also consider that picking the first interface no matter what routing is set up seems like faulty behavior. If we choose to go this route to keep backwards compatibility, we should describe this behavior in the documentation. The error I received was rather strange when I hit this issue locally on my machine and required me to dig quite deep to find the root cause. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565346#comment-16565346 ] Jan Karlsson edited comment on CASSANDRA-13639 at 8/1/18 1:55 PM: -- I apologize for my long absence but I have time look into this now. {quote}If outboundBindAny would be set to true, then the SSL Socket would be bound to any local address, which is most likely not what we want, so not sure why we would ever want to set outboundBindAny to true anyway. {quote} I actually believe the contrary. Having the SSL Socket bound to the local address which is specified by your operating system's routing is precisely what we want. It seens fishy that we always pick the local address and ignore the routing of the operating system. {quote}I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL-}} or {{-sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. {quote} I can see the point of adding a flag for the simple fact that we would not break backward compatibility, but we should also consider that picking the first interface no matter what routing is set up seems like faulty behavior. If we choose to go this route to keep backwards compatibility, we should describe this behavior in the documentation. The error I received was rather strange when I hit this issue locally on my machine and required me to dig quite deep to find the root cause. was (Author: jan karlsson): I apologize for my long absence but I have time look into this now. {quote}If outboundBindAny would be set to true, then the SSL Socket would be bound to any local address, which is most likely not what we want, so not sure why we would ever want to set outboundBindAny to true anyway. {quote} I actually believe the contrary. Having the SSL Socket bound to the local address which is specified by your operating system's routing instead of always picking the local address(aka the first interface) is precisely what we want. {quote}I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL}} or {{--sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. {quote} > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565346#comment-16565346 ] Jan Karlsson commented on CASSANDRA-13639: -- I apologize for my long absence but I have time look into this now. {quote}If outboundBindAny would be set to true, then the SSL Socket would be bound to any local address, which is most likely not what we want, so not sure why we would ever want to set outboundBindAny to true anyway. {quote} I actually believe the contrary. Having the SSL Socket bound to the local address which is specified by your operating system's routing instead of always picking the local address(aka the first interface) is precisely what we want. {quote}I agree with [~spo...@gmail.com] here because I think having a cmd line parameter seems to be better. Something like {{--localOutboundAddressSSL}} or {{--sslLocalOutboundAddress}}, which defaults to {{FBUtilities.getLocalAddress()}}. {quote} > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Major > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092807#comment-16092807 ] Jan Karlsson commented on CASSANDRA-13639: -- If SSL is enabled, {SSTableLoader} always uses the hostname no matter how your routing is set up. If you have a second interface that you route all {SSTableLoader} traffic from, it will still pick your first network interface because it corresponds with your hostname. Thereby overriding any routing you might have set up. This screams bug to me. The correct behavior would be for {SSTableLoader} to use the normal routing of the server. I am unclear why we set the from address specifically ourself instead of leaving it blank. I can see that it might be useful to have it as a command variable as well. However it is quite strange to set up a 'connect from' address. {code} if (encryptionOptions != null && encryptionOptions.internode_encryption != EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none) { if (outboundBindAny) return SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort); else return SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort, FBUtilities.getLocalAddress(), 0); }{code} I am a little unclear of why the code is the way it is. The method is only called with {outboundBindAny} set to false. It seems to me that calling it without the {FBUtilities} call would be the correct way of calling it. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092807#comment-16092807 ] Jan Karlsson edited comment on CASSANDRA-13639 at 7/19/17 8:54 AM: --- If SSL is enabled, {{SSTableLoader}} always uses the hostname no matter how your routing is set up. If you have a second interface that you route all {{SSTableLoader}} traffic from, it will still pick your first network interface because it corresponds with your hostname. Thereby overriding any routing you might have set up. This screams bug to me. The correct behavior would be for {{SSTableLoader}} to use the normal routing of the server. I am unclear why we set the from address specifically ourself instead of leaving it blank. I can see that it might be useful to have it as a command variable as well. However it is quite strange to set up a 'connect from' address. {code} if (encryptionOptions != null && encryptionOptions.internode_encryption != EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none) { if (outboundBindAny) return SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort); else return SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort, FBUtilities.getLocalAddress(), 0); }{code} I am a little unclear of why the code is the way it is. The method is only called with {{outboundBindAny}} set to false. It seems to me that calling it without the {{FBUtilities}} call would be the correct way of calling it. was (Author: jan karlsson): If SSL is enabled, {SSTableLoader} always uses the hostname no matter how your routing is set up. If you have a second interface that you route all {SSTableLoader} traffic from, it will still pick your first network interface because it corresponds with your hostname. Thereby overriding any routing you might have set up. This screams bug to me. The correct behavior would be for {SSTableLoader} to use the normal routing of the server. I am unclear why we set the from address specifically ourself instead of leaving it blank. I can see that it might be useful to have it as a command variable as well. However it is quite strange to set up a 'connect from' address. {code} if (encryptionOptions != null && encryptionOptions.internode_encryption != EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none) { if (outboundBindAny) return SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort); else return SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort, FBUtilities.getLocalAddress(), 0); }{code} I am a little unclear of why the code is the way it is. The method is only called with {outboundBindAny} set to false. It seems to me that calling it without the {FBUtilities} call would be the correct way of calling it. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089522#comment-16089522 ] Jan Karlsson commented on CASSANDRA-13639: -- The problem stems from the fact that the SSTableLoader has his own way of reading the yaml file but still uses the a default created DatabaseDescriptor to connect by using {{FBUtilities.getLocalAddress()}}. Perhaps another solution maybe to add this as a parameter to SSTableLoader. In BulkLoadConnectionFactory, after a rather strange if clause that is always false, {{SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort, FBUtilities.getLocalAddress(), 0);}} fetches the IP address from the DatabaseDescriptor which will return null because the listenAddress is not set by default on the DatabaseDescriptor object. My patch applies the listen address from the yaml file to the DatabaseDescriptor which in turn fixes the issue. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Reproduced In: 2.2.9, 4.0 (was: 2.2.9) Status: Patch Available (was: Open) > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Attachment: 13639-trunk Patch on trunk which resolves this issue. Verified it manually with lsof. > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13639-trunk > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Fix Version/s: 4.x > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from
[ https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13639: - Summary: SSTableLoader always uses hostname to stream files from (was: SSTableLoader always uses hostname to stream files) > SSTableLoader always uses hostname to stream files from > --- > > Key: CASSANDRA-13639 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson > > I stumbled upon an issue where SSTableLoader was ignoring our routing by > using the wrong interface to send the SSTables to the other nodes. Looking at > the code, it seems that we are using FBUtilities.getLocalAddress() to fetch > out the hostname, even if the yaml file specifies a different host. I am not > sure why we call this function instead of using the routing by leaving it > blank, perhaps someone could enlighten me. > This behaviour comes from the fact that we use a default created > DatabaseDescriptor which does not set the values for listenAddress and > listenInterface. This causes the aforementioned function to retrieve the > hostname at all times, even if it is not the interface used in the yaml file. > I propose we break out the function that handles listenAddress and > listenInterface and call it so that listenAddress or listenInterface is > getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files
Jan Karlsson created CASSANDRA-13639: Summary: SSTableLoader always uses hostname to stream files Key: CASSANDRA-13639 URL: https://issues.apache.org/jira/browse/CASSANDRA-13639 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jan Karlsson Assignee: Jan Karlsson I stumbled upon an issue where SSTableLoader was ignoring our routing by using the wrong interface to send the SSTables to the other nodes. Looking at the code, it seems that we are using FBUtilities.getLocalAddress() to fetch out the hostname, even if the yaml file specifies a different host. I am not sure why we call this function instead of using the routing by leaving it blank, perhaps someone could enlighten me. This behaviour comes from the fact that we use a default created DatabaseDescriptor which does not set the values for listenAddress and listenInterface. This causes the aforementioned function to retrieve the hostname at all times, even if it is not the interface used in the yaml file. I propose we break out the function that handles listenAddress and listenInterface and call it so that listenAddress or listenInterface is getting populated in the DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960403#comment-15960403 ] Jan Karlsson commented on CASSANDRA-13354: -- yes small change lgtm > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958876#comment-15958876 ] Jan Karlsson commented on CASSANDRA-13404: -- It is good that you made the distinction that MiM is not something that this ticket aims to solve. Instead this ticket allows you to bind certificates to certain hosts to make it less vulnerable. Applications which have to worry about rogue clients can use this on top of application side authentication as an extra layer of security and have broader control over the clients that connect to their server. {Quote} I think it was mentioned somewhere that reusing SSLContext instances would be preferable in the future due to performance reasons. We'd have to change the code to either return a shared or a newly created instance if we would add this feature. {Quote} Could you elaborate on this? Are we not using the same SSLContext and retrieving the engine from it? > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956604#comment-15956604 ] Jan Karlsson commented on CASSANDRA-13404: -- {quote} To back up and add a bit more context (for myself, if anything), where do you want to add the additional hostname verification? Can you explain the specific behaviour you're looking to add? {quote} The behaviour I am trying to add is that the server validates that the client certificate is issued to the IP address/host that the client connects from. You are correct that this would require require_client_auth to be set as this will ensure that the server validates the client to begin with. Disabling require_client_auth while enabling hostname verification will actually not do anything. We wont validate anything. Do you think we should add a warning during startup that you cannot have hostname validation without requiring validation? {quote} Further, this would require the database server to know all of the possible peers that would want to connect to it, before the process starts. {quote} Not necessarily, I take the incoming connection, extract the IP, then the identification algorithm checks whether the SAN in the certificate holds this IP address. {quote} Also, I've spoken with the netty developers, and they said netty currently does not support (in either netty 4.0 or 4.1) the ability to perform hostname verification on the server side (either openssl or jdk ssl). Thus, I'm not sure how you verified your patch behaves correctly. {quote} I used the java driver and added [Netty Options | http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/NettyOptions.html#afterBootstrapInitialized-io.netty.bootstrap.Bootstrap-] to change the local address in afterBootstrapInitialized. This allows me to change what interface I use to connect to C*. Then I used a certificate I had forged for a different interface and tested to connect to a node. Worked like a charm. I then applied my patch and got a exception on both the server and the client side. Lastly i switched which IP address I connected from to the interface that was specified in the certificate and the exceptions disappeared. > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13404: - Fix Version/s: 4.x Status: Patch Available (was: Open) > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13404: - Attachment: 13404-trunk.txt Should apply cleanly to trunk > Hostname verification for client-to-node encryption > --- > > Key: CASSANDRA-13404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Fix For: 4.x > > Attachments: 13404-trunk.txt > > > Similarily to CASSANDRA-9220, Cassandra should support hostname verification > for client-node connections. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13404) Hostname verification for client-to-node encryption
Jan Karlsson created CASSANDRA-13404: Summary: Hostname verification for client-to-node encryption Key: CASSANDRA-13404 URL: https://issues.apache.org/jira/browse/CASSANDRA-13404 Project: Cassandra Issue Type: New Feature Reporter: Jan Karlsson Assignee: Jan Karlsson Similarily to CASSANDRA-9220, Cassandra should support hostname verification for client-node connections. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939107#comment-15939107 ] Jan Karlsson commented on CASSANDRA-13354: -- I did some tests simulating traffic on a 4 node cluster. 2 of the nodes were running with my patch while the other two ran without it. Steps to reproduce: Traffic on Turn one of the nodes off Wait 7 minutes Truncate hints on all other nodes Turn node on Run repair on the node As you can see the unpatched version kept increasing as non-repaired data from ongoing traffic was prioritized. If I had more discrepancies in my data set, this would just increase to the configured FD limit or until you die from heap pressure. Repair is completed at 8:11pm but those small repaired files are not compacted as it picks unrepaired new sstables over the small repaired sstables. However, it did show a downwards trend as compaction was slightly faster than insertion and would probably eventually end with the repaired files compacted. During the unpatched test, it only showed 2 pending compactions with 22k~ file descriptors open/10k~ sstables. At 8:33pm I disabled the traffic completely to hurry this along. SSTables in each level: [10347/4, 5, 0, 0, 0, 0, 0, 0, 0] > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13354: - Attachment: patchedTest.png > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13354: - Attachment: unpatchedTest.png > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932799#comment-15932799 ] Jan Karlsson edited comment on CASSANDRA-13354 at 3/20/17 3:45 PM: --- Added patch on 4.0 to fix this. Applies cleanly to other versions as well(tested 2.2.9). I have tested this in a cluster and will upload some graphs as well. Comments and suggestions welcome! was (Author: jan karlsson): Added patch on 4.0 to fix this. Should be pretty minimal work to get this to apply to other versions as well. I have tested this in a cluster and will upload some graphs as well. Comments and suggestions welcome! > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13354: - Attachment: (was: CASSANDRA-13354) > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13354: - Attachment: 13354-trunk.txt > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: 13354-trunk.txt > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13354: - Attachment: CASSANDRA-13354 > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > Attachments: CASSANDRA-13354 > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-13354: - Status: Patch Available (was: Open) Added patch on 4.0 to fix this. Should be pretty minimal work to get this to apply to other versions as well. I have tested this in a cluster and will upload some graphs as well. Comments and suggestions welcome! > LCS estimated compaction tasks does not take number of files into account > - > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 >Reporter: Jan Karlsson >Assignee: Jan Karlsson > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account
Jan Karlsson created CASSANDRA-13354: Summary: LCS estimated compaction tasks does not take number of files into account Key: CASSANDRA-13354 URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 Project: Cassandra Issue Type: Bug Components: Compaction Environment: Cassandra 2.2.9 Reporter: Jan Karlsson Assignee: Jan Karlsson In LCS, the way we estimate number of compaction tasks remaining for L0 is by taking the size of a SSTable and multiply it by four. This would give 4*160mb with default settings. This calculation is used to determine whether repaired or repaired data is being compacted. Now this works well until you take repair into account. Repair streams over many many sstables which could be smaller than the configured SSTable size depending on your use case. In our case we are talking about many thousands of tiny SSTables. As number of files increases one can run into any number of problems, including GC issues, too many open files or plain increase in read latency. With the current algorithm we will choose repaired or unrepaired depending on whichever side has more data in it. Even if the repaired files outnumber the unrepaired files by a large margin. Similarily, our algorithm that selects compaction candidates takes up to 32 SSTables at a time in L0, however our estimated task calculation does not take this number into account. These two mechanisms should be aligned with each other. I propose that we take the number of files in L0 into account when estimating remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10091) Integrated JMX authn & authz
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204083#comment-15204083 ] Jan Karlsson commented on CASSANDRA-10091: -- Great that you like the patch! I am really excited to get this in! We have already created some dtests for this which can be found [here|https://github.com/beobal/cassandra-dtest/commits/10091]. I could take a look at the comments next week unless you want to take this [~beobal]? > Integrated JMX authn & authz > > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Sam Tunnicliffe >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11210) Unresolved hostname in replace address
[ https://issues.apache.org/jira/browse/CASSANDRA-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-11210: - Attachment: 0001-Unresolved-hostname-leads-to-replace-being-ignored.patch > Unresolved hostname in replace address > -- > > Key: CASSANDRA-11210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11210 > Project: Cassandra > Issue Type: Bug >Reporter: sankalp kohli >Assignee: Jan Karlsson >Priority: Minor > Labels: lhf > Fix For: 2.2.6 > > Attachments: > 0001-Unresolved-hostname-leads-to-replace-being-ignored.patch > > > If you provide a hostname which could not be resolved by DNS, it leads to > replace args being ignored. If you provide an IP which is not in the cluster, > it does the right thing and complain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11210) Unresolved hostname in replace address
[ https://issues.apache.org/jira/browse/CASSANDRA-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-11210: - Fix Version/s: 2.2.6 Status: Patch Available (was: Open) This should apply cleanly to 3.0/trunk except for the Changelog. > Unresolved hostname in replace address > -- > > Key: CASSANDRA-11210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11210 > Project: Cassandra > Issue Type: Bug >Reporter: sankalp kohli >Assignee: Jan Karlsson >Priority: Minor > Labels: lhf > Fix For: 2.2.6 > > > If you provide a hostname which could not be resolved by DNS, it leads to > replace args being ignored. If you provide an IP which is not in the cluster, > it does the right thing and complain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11210) Unresolved hostname in replace address
[ https://issues.apache.org/jira/browse/CASSANDRA-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson reassigned CASSANDRA-11210: Assignee: Jan Karlsson > Unresolved hostname in replace address > -- > > Key: CASSANDRA-11210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11210 > Project: Cassandra > Issue Type: Bug >Reporter: sankalp kohli >Assignee: Jan Karlsson >Priority: Minor > Labels: lhf > > If you provide a hostname which could not be resolved by DNS, it leads to > replace args being ignored. If you provide an IP which is not in the cluster, > it does the right thing and complain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169402#comment-15169402 ] Jan Karlsson edited comment on CASSANDRA-10091 at 3/1/16 2:14 PM: -- [~beobal] We need to change the StartupChecks because we are still throwing an error in [checkJMXPorts| https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StartupChecks.java#L142] when we do not set cassandra.jmx.local.port. We can also use {code}#JVM_OPTS="$JVM_OPTS -Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config"{code} instead of requiring the user to add their own path. Otherwise LGTM. Dtest can be found [here|https://github.com/ejankan/cassandra-dtest/tree/10091] This Dtest needs the aforementioned changes to StartupChecks and $CASSANDRA_HOME to work. was (Author: jan karlsson): [~beobal] We need to change the StartupChecks because we are still throwing an error in [checkJMXPorts| https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StartupChecks.java#L142] when we do not set cassandra.jmx.local.port. Otherwise LGTM. I am currently writing a Dtest for the authn part of it. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Sam Tunnicliffe >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10776) Prepare of statements after table creation fail with unconfigured column family
[ https://issues.apache.org/jira/browse/CASSANDRA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173470#comment-15173470 ] Jan Karlsson edited comment on CASSANDRA-10776 at 3/1/16 9:18 AM: -- This can actually be solved client side by maintaining a lock table. Before creating a table you check, using LWT, whether the lock exists and if it does not, gain the lock and create the table. was (Author: jan karlsson): This can actually be solved by having a lock table, which you check with before creating a table. Use LWT to check whether the lock exists and if it does not, gain the lock and create the table. > Prepare of statements after table creation fail with unconfigured column > family > --- > > Key: CASSANDRA-10776 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10776 > Project: Cassandra > Issue Type: Bug >Reporter: Adam Dougal > > Cassandra 2.1.8 > We have multiple app instances trying to create the same table using IF NOT > EXISTS. > We check for schema agreement via the Java Driver before and after every > statement. > After creating the table we then prepare statements and we sometimes get: > {code} > com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured > columnfamily locks > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50) > ~[cassandra-driver-core-2.1.8.jar:na] > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > ~[cassandra-driver-core-2.1.8.jar:na] > at > com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) > ~[cassandra-driver-core-2.1.8.jar:na] > at > uk.sky.cirrus.locking.CassandraLockingMechanism.init(CassandraLockingMechanism.java:69) > ~[main/:na] > at uk.sky.cirrus.locking.Lock.acquire(Lock.java:35) [main/:na] > at uk.sky.cirrus.CqlMigratorImpl.migrate(CqlMigratorImpl.java:83) > [main/:na] > at > uk.sky.cirrus.locking.LockVerificationTest.lambda$shouldManageContentionsForSchemaMigrate$0(LockVerificationTest.java:90) > [test/:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60] > {code} > Looking at the server logs we get: > {code} > ava.lang.RuntimeException: > org.apache.cassandra.exceptions.ConfigurationException: Column family ID > mismatch (found 90bbb372-9446-11e5-b1ca-8119a6964819; expected > 90b87f20-9446-11e5-b1ca-8119a6964819) > at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1145) > ~[main/:na] > at > org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422) > ~[main/:na] > at > org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295) > ~[main/:na] > at > org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) > ~[main/:na] > at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) > ~[main/:na] > at > org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {code} > We found this issue which is marked as resolved: > https://issues.apache.org/jira/browse/CASSANDRA-8387 > Does the IF NOT EXISTS just check the local node? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10776) Prepare of statements after table creation fail with unconfigured column family
[ https://issues.apache.org/jira/browse/CASSANDRA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173470#comment-15173470 ] Jan Karlsson commented on CASSANDRA-10776: -- This can actually be solved by having a lock table, which you check with before creating a table. Use LWT to check whether the lock exists and if it does not, gain the lock and create the table. > Prepare of statements after table creation fail with unconfigured column > family > --- > > Key: CASSANDRA-10776 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10776 > Project: Cassandra > Issue Type: Bug >Reporter: Adam Dougal > > Cassandra 2.1.8 > We have multiple app instances trying to create the same table using IF NOT > EXISTS. > We check for schema agreement via the Java Driver before and after every > statement. > After creating the table we then prepare statements and we sometimes get: > {code} > com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured > columnfamily locks > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50) > ~[cassandra-driver-core-2.1.8.jar:na] > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > ~[cassandra-driver-core-2.1.8.jar:na] > at > com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) > ~[cassandra-driver-core-2.1.8.jar:na] > at > uk.sky.cirrus.locking.CassandraLockingMechanism.init(CassandraLockingMechanism.java:69) > ~[main/:na] > at uk.sky.cirrus.locking.Lock.acquire(Lock.java:35) [main/:na] > at uk.sky.cirrus.CqlMigratorImpl.migrate(CqlMigratorImpl.java:83) > [main/:na] > at > uk.sky.cirrus.locking.LockVerificationTest.lambda$shouldManageContentionsForSchemaMigrate$0(LockVerificationTest.java:90) > [test/:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60] > {code} > Looking at the server logs we get: > {code} > ava.lang.RuntimeException: > org.apache.cassandra.exceptions.ConfigurationException: Column family ID > mismatch (found 90bbb372-9446-11e5-b1ca-8119a6964819; expected > 90b87f20-9446-11e5-b1ca-8119a6964819) > at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1145) > ~[main/:na] > at > org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422) > ~[main/:na] > at > org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295) > ~[main/:na] > at > org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) > ~[main/:na] > at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) > ~[main/:na] > at > org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_60] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {code} > We found this issue which is marked as resolved: > https://issues.apache.org/jira/browse/CASSANDRA-8387 > Does the IF NOT EXISTS just check the local node? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169402#comment-15169402 ] Jan Karlsson commented on CASSANDRA-10091: -- [~beobal] We need to change the StartupChecks because we are still throwing an error in [checkJMXPorts| https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StartupChecks.java#L142] when we do not set cassandra.jmx.local.port. Otherwise LGTM. I am currently writing a Dtest for the authn part of it. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Sam Tunnicliffe >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException
[ https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138538#comment-15138538 ] Jan Karlsson commented on CASSANDRA-8643: - This problem was on 2.1.12 and we were running full repair with -pr. > merkle tree creation fails with NoSuchElementException > -- > > Key: CASSANDRA-8643 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8643 > Project: Cassandra > Issue Type: Bug > Environment: We are running on a three node cluster with three in > replication(C* 2.1.1). It uses a default C* installation and STCS. >Reporter: Jan Karlsson > Fix For: 2.1.3 > > > We have a problem that we encountered during testing over the weekend. > During the tests we noticed that repairs started to fail. This error has > occured on multiple non-coordinator nodes during repair. It also ran at least > once without producing this error. > We run repair -pr on all nodes on different days. CPU values were around 40% > and disk was 50% full. > From what I understand, the coordinator asked for merkle trees from the other > two nodes. However one of the nodes fails to create his merkle tree. > Unfortunately we do not have a way to reproduce this problem. > The coordinator receives: > {noformat} > 2015-01-09T17:55:57.091+0100 INFO [RepairJobTask:4] RepairJob.java:145 > [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for > censored (to [/xx.90, /xx.98, /xx.82]) > 2015-01-09T17:55:58.516+0100 INFO [AntiEntropyStage:1] > RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] > Received merkle tree for censored from /xx.90 > 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] > RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session > completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] > CassandraDaemon.java:153 Exception in thread > Thread[AntiEntropySessions:76,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] >at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > ... 3 common frames omitte
[jira] [Commented] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException
[ https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136995#comment-15136995 ] Jan Karlsson commented on CASSANDRA-8643: - We hit it again. This time we had more time debugging the situation and we might have found the problem. It started occuring when we switched to LeveledCompactionStrategy. However it does not occur consistently. We usually get it once every 2-3 runs. We enabled assertions and got "received out of order wrt". The problem we found is that the ranges of the tables are intersecting but the getScanners method in LCS expects them to be non-intersecting (as all sstables in the same level should not be intersecting). It could be that during the snapshot, a compaction occurs which writes more sstables into the level. Then when it is supplied to the repair job, it fails due to the ranges intersecting in the new and old sstables. When we tried repairing with -par, we did not hit it. It also worked with 2.2.4 (which runs -par by default). > merkle tree creation fails with NoSuchElementException > -- > > Key: CASSANDRA-8643 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8643 > Project: Cassandra > Issue Type: Bug > Environment: We are running on a three node cluster with three in > replication(C* 2.1.1). It uses a default C* installation and STCS. >Reporter: Jan Karlsson > Fix For: 2.1.3 > > > We have a problem that we encountered during testing over the weekend. > During the tests we noticed that repairs started to fail. This error has > occured on multiple non-coordinator nodes during repair. It also ran at least > once without producing this error. > We run repair -pr on all nodes on different days. CPU values were around 40% > and disk was 50% full. > From what I understand, the coordinator asked for merkle trees from the other > two nodes. However one of the nodes fails to create his merkle tree. > Unfortunately we do not have a way to reproduce this problem. > The coordinator receives: > {noformat} > 2015-01-09T17:55:57.091+0100 INFO [RepairJobTask:4] RepairJob.java:145 > [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for > censored (to [/xx.90, /xx.98, /xx.82]) > 2015-01-09T17:55:58.516+0100 INFO [AntiEntropyStage:1] > RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] > Received merkle tree for censored from /xx.90 > 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] > RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session > completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] > CassandraDaemon.java:153 Exception in thread > Thread[AntiEntropySessions:76,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] >at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: > org.apache.cassandr
[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134299#comment-15134299 ] Jan Karlsson commented on CASSANDRA-10091: -- [~beobal] I apologize for my long absence. How is the refactoring going? Next week I will try to find some time to write up some tests. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037490#comment-15037490 ] Jan Karlsson commented on CASSANDRA-10091: -- I took a look at your proposal and it looks good. I like this approach on authz. You are definitely on the right track. {quote} What does CassandraLoginModule give us? I appreciate that it's the standard-ish java way to do things, but it seems to me that we could just perform the call to legacyAuthenticate directly from JMXPasswordAuthenticator::authenticate. The authenticator impl is already pretty specific, so using the more generic APIs just seems to add bloat (but I could be missing something useful here). {quote} The advantage of doing it this way is that you could use the CassandraLoginModule without the JMXPasswordAuthenticator by setting the LoginModule as a jvm parameter. It might not be that useful for our use case though but this would give us authentication without having to start up our JMX server programmatically. One could use the module with Cassandra as is. {quote} The same thing goes for CassandraPrincipal, could we just create a javax.management.remote.JMXPrincipal in the name of the AuthenticatedUser obtained from the IAuthenticator? {quote} +1. I had originally included it incase we wanted to pass some Cassandra related information down to authz but it does not seem currently necessary. {quote} Will MX4J work with JMXPasswordAuthenticator? {quote} I have not tried this myself but according to [this|http://mx4j.sourceforge.net/docs/ch03s10.html] it seems work in the same fashion. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10551) Investigate JMX auth using JMXMP & SASL
[ https://issues.apache.org/jira/browse/CASSANDRA-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991610#comment-14991610 ] Jan Karlsson commented on CASSANDRA-10551: -- Changing to JMXMP seems to work from an implementation standpoint. However this will mean that current tools which are hardcoded to connect through RMI will have to be changed to function with JMXMP. I'm refering mostly to nodetool. i.e. eariler versions of nodetool will not be able to connect to the server. What is more concerning is that some 3rd party tools like jconsole seem to lack the functionality to connect with Sasl profiles through jmxmp. I tried connecting with a [plain profile/mechanism|https://tools.ietf.org/html/rfc4616], but have not found a way to set a profile for jconsole. > Investigate JMX auth using JMXMP & SASL > --- > > Key: CASSANDRA-10551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10551 > Project: Cassandra > Issue Type: Improvement >Reporter: Sam Tunnicliffe >Assignee: Jan Karlsson > Fix For: 3.x > > > (broken out from CASSANDRA-10091) > We should look into whether using > [JMXMP|https://meteatamel.wordpress.com/2012/02/13/jmx-rmi-vs-jmxmp/] would > enable JMX authentication using SASL. If so, could we then define a custom > SaslServer which wraps a SaslNegotiator instance provided by the configured > IAuthenticator. > An intial look at the > [JMXMP|http://docs.oracle.com/cd/E19698-01/816-7609/6mdjrf873/] docs, > particularly section *11.4.2 SASL Provider*, suggests this might be feasible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10551) Investigate JMX auth using JMXMP & SASL
[ https://issues.apache.org/jira/browse/CASSANDRA-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson reassigned CASSANDRA-10551: Assignee: Jan Karlsson > Investigate JMX auth using JMXMP & SASL > --- > > Key: CASSANDRA-10551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10551 > Project: Cassandra > Issue Type: Improvement >Reporter: Sam Tunnicliffe >Assignee: Jan Karlsson > Fix For: 3.x > > > (broken out from CASSANDRA-10091) > We should look into whether using > [JMXMP|https://meteatamel.wordpress.com/2012/02/13/jmx-rmi-vs-jmxmp/] would > enable JMX authentication using SASL. If so, could we then define a custom > SaslServer which wraps a SaslNegotiator instance provided by the configured > IAuthenticator. > An intial look at the > [JMXMP|http://docs.oracle.com/cd/E19698-01/816-7609/6mdjrf873/] docs, > particularly section *11.4.2 SASL Provider*, suggests this might be feasible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968712#comment-14968712 ] Jan Karlsson commented on CASSANDRA-10091: -- {quote} For instance, do we actually need to enact fine grained control over nodetool at the keyspace or table level, such that a user with permissions on keyspace ks_a would be able to run nodetool status ks_a, but not to run nodetool status ks_b? I think that's overkill and not really needed by most admins. {quote} The current patch does not restrict access from nodetool in terms of which keyspace/table the command is run on. This is due to nodetool calling methods in the {{StorageProxy}}. However if someone were to call these methods from a specific columnfamily, it would prevent that. I believe preventing users from initiating operations like major compaction on a some tables but not on others is a fairly common use case. Especially when we provide so many potentially, in some cases detrimental operations like {{compact}}. Unfortunately this patch does not give this distinction on nodetool level, because you either have the permission for StorageProxy or you do not. However it does give you the choice to make that distinction on non-nodetool users. {quote} So for example, this would enable us to grant read access to all the ColumnFamily mbeans with GRANT SELECT ON ALL MBEANS IN 'org.apache.cassandra.db:type=ColumnFamily', e.g. for running nodetool cfstats. What it doesn't permit is restricting access to a particular subset of ColumnFamily beans. {quote} Another disadvantage is that if the client application (for example I observed jconsole doing this) sends a jmx request with a wildcard mbean. For instance it might send something like {{java.lang:*}} or a wildcard would be send in when a program is trying to receive the names of all mbeans. Now the latter instance might not be so difficult to handle with your proposal, since {{queryNames}} and {{isInstanceOf}} are granted to everyone, but there might be other cases where wildcard mbean are being passed in. We would have to handle this somehow. Otherwise applications who pass wildcard mbean will have to have root permission. {quote} Also, I noticed one other thing regarding the MBeanServerForwarder implementation. We should create a new ClientState and log the AuthenticatedUser derived from the subject into it, which would have a couple of benefits. Firstly, the check that the user has the LOGIN privilege would be performed which isn't the case in the current patch. Second, the permissions check could include the full resource hierarchy using ensureHasPermission, rather than directly by calling the IAuthorizer::authorize. {quote} +1 Another aspect we need to remember is that currently there is no way to ascertain which mbeans are needed for a particular nodetool commands or for the different tools that exist (like jconsole). We probably need to document this somewhere. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955032#comment-14955032 ] Jan Karlsson edited comment on CASSANDRA-10091 at 10/14/15 8:37 AM: Great points. Thank you for taking the time to review this. First of all, I agree completely on the use of {{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 and I only recently forward ported it. I just wanted to get it out so we can commence with the discussion. I agree that we will have to make use of the {{IAuthenticator::newSaslAuthenticator}} and we should Investigate further. Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in your points on the subject. I cannot stress the importance of wildcards. It seemed like an unpleasant experience to go through countless permissions and apply them one at a time. I know this is somehow lessened by the fact that you will only do this once per role, which can then be assigned to different users. However calling a simple command like {{nodetool status}} will require 4~ different mbeans under the hood while starting Jconsole can only be done by adding 10~ different mbeans. Simplifying the {{JMXResource}} might be the way to go but we should consider how much freedom we will lose from doing this. I was actually debating this very thing when I implemented it. Should I have only meta permission, should I expose all permissions or both? I settled on doing both to cater to every use case. The problem is that the mapping between nodetool commands and permissions is somewhat confusing. For instance in your remapping proposal. One would have to give SELECT, DESCRIBE and EXECUTE to be able to get all information out of {{nodetool info}}. Not something one would expect from such a command. This is why these meta-permissions were born. It is simpler to give {{MBREAD}} to a user, then to give {{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if you happen to have such a use case. One could argue that this might be an uncommon use case, but I have a hard time ruling it out. However if the consensus is that we should simplify it, which does have it's advantages, then I agree with your proposal. was (Author: jan karlsson): Great points. Thank you for taking the time to review this. First of all, I agree completely on the use of {{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 and I only recently forward ported it. I just wanted to get it out so we can commence with the discussion. I agree that we will have to make use of the {{IAuthenticator::newSaslAuthenticator}} and we should Investigate further. Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in your points on the subject. I cannot stress the importance of wildcards. It seemed like an unpleasant experience to go through countless permissions and apply them one at a time. I know this is somehow lessened by the fact that you will only do this once per role, which can then be assigned to different users. However calling a simple command like {{nodetool status}} will require 4~ different permissions under the hood while starting Jconsole can only be done by adding 10~ different permissions. Simplifying the {{JMXResource}} might be the way to go but we should consider how much freedom we will lose from doing this. I was actually debating this very thing when I implemented it. Should I have only meta permission, should I expose all permissions or both? I settled on doing both to cater to every use case. The problem is that the mapping between nodetool commands and permissions is somewhat confusing. For instance in your remapping proposal. One would have to give SELECT, DESCRIBE and EXECUTE to be able to get all information out of {{nodetool info}}. Not something one would expect from such a command. This is why these meta-permissions were born. It is simpler to give {{MBREAD}} to a user, then to give {{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if you happen to have such a use case. One could argue that this might be an uncommon use case, but I have a hard time ruling it out. However if the consensus is that we should simplify it, which does have it's advantages, then I agree with your proposal. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jan Karlsso
[jira] [Comment Edited] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955032#comment-14955032 ] Jan Karlsson edited comment on CASSANDRA-10091 at 10/13/15 2:35 PM: Great points. Thank you for taking the time to review this. First of all, I agree completely on the use of {{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 and I only recently forward ported it. I just wanted to get it out so we can commence with the discussion. I agree that we will have to make use of the {{IAuthenticator::newSaslAuthenticator}} and we should Investigate further. Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in your points on the subject. I cannot stress the importance of wildcards. It seemed like an unpleasant experience to go through countless permissions and apply them one at a time. I know this is somehow lessened by the fact that you will only do this once per role, which can then be assigned to different users. However calling a simple command like {{nodetool status}} will require 4~ different permissions under the hood while starting Jconsole can only be done by adding 10~ different permissions. Simplifying the {{JMXResource}} might be the way to go but we should consider how much freedom we will lose from doing this. I was actually debating this very thing when I implemented it. Should I have only meta permission, should I expose all permissions or both? I settled on doing both to cater to every use case. The problem is that the mapping between nodetool commands and permissions is somewhat confusing. For instance in your remapping proposal. One would have to give SELECT, DESCRIBE and EXECUTE to be able to get all information out of {{nodetool info}}. Not something one would expect from such a command. This is why these meta-permissions were born. It is simpler to give {{MBREAD}} to a user, then to give {{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if you happen to have such a use case. One could argue that this might be an uncommon use case, but I have a hard time ruling it out. However if the consensus is that we should simplify it, which does have it's advantages, then I agree with your proposal. was (Author: jan karlsson): Great points. Thank you for taking the time to review this. First of all, I agree completely on the use of {{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 and I only recently forward ported it. I just wanted to get it out so we can commence with the discussion. I agree that we will have to make use of the {{IAuthenticator::newSaslAuthenticator}} and we should Investigate further. Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in your points on the subject. I cannot stress the importance of wildcards. It seemed like an unpleasant experience to go through countless permissions and apply them one at a time. I know this is somehow lessened by the fact that you will only do this once per role, which can then be assigned to different users. However calling a simple command like {{nodetool status}} will require 4~ different JMXResources under the hood. Simplifying the {{JMXResource}} might be the way to go but we should consider how much freedom we will lose from doing this. I was actually debating this very thing when I implemented it. Should I have only meta permission, should I expose all permissions or both? I settled on doing both to cater to every use case. The problem is that the mapping between nodetool commands and permissions is somewhat confusing. For instance in your remapping proposal. One would have to give SELECT, DESCRIBE and EXECUTE to be able to get all information out of {{nodetool info}}. Not something one would expect from such a command. This is why these meta-permissions were born. It is simpler to give {{MBREAD}} to a user, then to give {{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if you happen to have such a use case. One could argue that this might be an uncommon use case, but I have a hard time ruling it out. However if the consensus is that we should simplify it, which does have it's advantages, then I agree with your proposal. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor
[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication
[ https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955032#comment-14955032 ] Jan Karlsson commented on CASSANDRA-10091: -- Great points. Thank you for taking the time to review this. First of all, I agree completely on the use of {{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 and I only recently forward ported it. I just wanted to get it out so we can commence with the discussion. I agree that we will have to make use of the {{IAuthenticator::newSaslAuthenticator}} and we should Investigate further. Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in your points on the subject. I cannot stress the importance of wildcards. It seemed like an unpleasant experience to go through countless permissions and apply them one at a time. I know this is somehow lessened by the fact that you will only do this once per role, which can then be assigned to different users. However calling a simple command like {{nodetool status}} will require 4~ different JMXResources under the hood. Simplifying the {{JMXResource}} might be the way to go but we should consider how much freedom we will lose from doing this. I was actually debating this very thing when I implemented it. Should I have only meta permission, should I expose all permissions or both? I settled on doing both to cater to every use case. The problem is that the mapping between nodetool commands and permissions is somewhat confusing. For instance in your remapping proposal. One would have to give SELECT, DESCRIBE and EXECUTE to be able to get all information out of {{nodetool info}}. Not something one would expect from such a command. This is why these meta-permissions were born. It is simpler to give {{MBREAD}} to a user, then to give {{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if you happen to have such a use case. One could argue that this might be an uncommon use case, but I have a hard time ruling it out. However if the consensus is that we should simplify it, which does have it's advantages, then I agree with your proposal. > Align JMX authentication with internal authentication > - > > Key: CASSANDRA-10091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor > Fix For: 3.x > > > It would be useful to authenticate with JMX through Cassandra's internal > authentication. This would reduce the overhead of keeping passwords in files > on the machine and would consolidate passwords to one location. It would also > allow the possibility to handle JMX permissions in Cassandra. > It could be done by creating our own JMX server and setting custom classes > for the authenticator and authorizer. We could then add some parameters where > the user could specify what authenticator and authorizer to use in case they > want to make their own. > This could also be done by creating a premain method which creates a jmx > server. This would give us the feature without changing the Cassandra code > itself. However I believe this would be a good feature to have in Cassandra. > I am currently working on a solution which creates a JMX server and uses a > custom authenticator and authorizer. It is currently build as a premain, > however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do
[ https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727475#comment-14727475 ] Jan Karlsson commented on CASSANDRA-8741: - Took it for a test spin. +1 > Running a drain before a decommission apparently the wrong thing to do > -- > > Key: CASSANDRA-8741 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8741 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise > 4.5.3) >Reporter: Casey Marshall >Assignee: Jan Karlsson >Priority: Trivial > Labels: lhf > Fix For: 2.1.x, 2.0.x > > Attachments: 8741.txt > > > This might simply be a documentation issue. It appears that running "nodetool > drain" is a very wrong thing to do before running a "nodetool decommission". > The idea was that I was going to safely shut off writes and flush everything > to disk before beginning the decommission. What happens is the "decommission" > call appears to fail very early on after starting, and afterwards, the node > in question is stuck in state LEAVING, but all other nodes in the ring see > that node as NORMAL, but down. No streams are ever sent from the node being > decommissioned to other nodes. > The drain command does indeed shut down the "BatchlogTasks" executor > (org/apache/cassandra/service/StorageService.java, line 3445 in git tag > "cassandra-2.0.11") but the decommission process tries using that executor > when calling the "startBatchlogReplay" function > (org/apache/cassandra/db/BatchlogManager.java, line 123) called through > org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace > pasted below). > This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4). > So, either something is wrong with the drain/decommission commands, or it's > very wrong to run a drain before a decommission. What's worse, there seems to > be no way to recover this node once it is in this state; you need to shut it > down and run "removenode". > My terminal output: > {code} > ubuntu@x:~$ nodetool drain > ubuntu@x:~$ tail /var/log/^C > ubuntu@x:~$ nodetool decommission > Exception in thread "main" java.util.concurrent.RejectedExecutionException: > Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 > rejected from > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) > at > java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629) > at > org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123) > at > org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966) > at > org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer
[jira] [Comment Edited] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do
[ https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726930#comment-14726930 ] Jan Karlsson edited comment on CASSANDRA-8741 at 9/2/15 8:26 AM: - LGTM. Except i'm not seeing the test being run in the dtests you linked. was (Author: jan karlsson): LGTM. Except i'm not finding the test in the dtests you linked. > Running a drain before a decommission apparently the wrong thing to do > -- > > Key: CASSANDRA-8741 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8741 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise > 4.5.3) >Reporter: Casey Marshall >Assignee: Jan Karlsson >Priority: Trivial > Labels: lhf > Fix For: 2.1.x, 2.0.x > > Attachments: 8741.txt > > > This might simply be a documentation issue. It appears that running "nodetool > drain" is a very wrong thing to do before running a "nodetool decommission". > The idea was that I was going to safely shut off writes and flush everything > to disk before beginning the decommission. What happens is the "decommission" > call appears to fail very early on after starting, and afterwards, the node > in question is stuck in state LEAVING, but all other nodes in the ring see > that node as NORMAL, but down. No streams are ever sent from the node being > decommissioned to other nodes. > The drain command does indeed shut down the "BatchlogTasks" executor > (org/apache/cassandra/service/StorageService.java, line 3445 in git tag > "cassandra-2.0.11") but the decommission process tries using that executor > when calling the "startBatchlogReplay" function > (org/apache/cassandra/db/BatchlogManager.java, line 123) called through > org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace > pasted below). > This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4). > So, either something is wrong with the drain/decommission commands, or it's > very wrong to run a drain before a decommission. What's worse, there seems to > be no way to recover this node once it is in this state; you need to shut it > down and run "removenode". > My terminal output: > {code} > ubuntu@x:~$ nodetool drain > ubuntu@x:~$ tail /var/log/^C > ubuntu@x:~$ nodetool decommission > Exception in thread "main" java.util.concurrent.RejectedExecutionException: > Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 > rejected from > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) > at > java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629) > at > org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123) > at > org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966) > at > org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanS
[jira] [Commented] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do
[ https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726930#comment-14726930 ] Jan Karlsson commented on CASSANDRA-8741: - LGTM. Except i'm not finding the test in the dtests you linked. > Running a drain before a decommission apparently the wrong thing to do > -- > > Key: CASSANDRA-8741 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8741 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise > 4.5.3) >Reporter: Casey Marshall >Assignee: Jan Karlsson >Priority: Trivial > Labels: lhf > Fix For: 2.1.x, 2.0.x > > Attachments: 8741.txt > > > This might simply be a documentation issue. It appears that running "nodetool > drain" is a very wrong thing to do before running a "nodetool decommission". > The idea was that I was going to safely shut off writes and flush everything > to disk before beginning the decommission. What happens is the "decommission" > call appears to fail very early on after starting, and afterwards, the node > in question is stuck in state LEAVING, but all other nodes in the ring see > that node as NORMAL, but down. No streams are ever sent from the node being > decommissioned to other nodes. > The drain command does indeed shut down the "BatchlogTasks" executor > (org/apache/cassandra/service/StorageService.java, line 3445 in git tag > "cassandra-2.0.11") but the decommission process tries using that executor > when calling the "startBatchlogReplay" function > (org/apache/cassandra/db/BatchlogManager.java, line 123) called through > org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace > pasted below). > This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4). > So, either something is wrong with the drain/decommission commands, or it's > very wrong to run a drain before a decommission. What's worse, there seems to > be no way to recover this node once it is in this state; you need to shut it > down and run "removenode". > My terminal output: > {code} > ubuntu@x:~$ nodetool drain > ubuntu@x:~$ tail /var/log/^C > ubuntu@x:~$ nodetool decommission > Exception in thread "main" java.util.concurrent.RejectedExecutionException: > Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 > rejected from > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) > at > java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629) > at > org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123) > at > org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966) > at > org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > c
[jira] [Created] (CASSANDRA-10091) Align JMX authentication with internal authentication
Jan Karlsson created CASSANDRA-10091: Summary: Align JMX authentication with internal authentication Key: CASSANDRA-10091 URL: https://issues.apache.org/jira/browse/CASSANDRA-10091 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jan Karlsson Assignee: Jan Karlsson Priority: Minor It would be useful to authenticate with JMX through Cassandra's internal authentication. This would reduce the overhead of keeping passwords in files on the machine and would consolidate passwords to one location. It would also allow the possibility to handle JMX permissions in Cassandra. It could be done by creating our own JMX server and setting custom classes for the authenticator and authorizer. We could then add some parameters where the user could specify what authenticator and authorizer to use in case they want to make their own. This could also be done by creating a premain method which creates a jmx server. This would give us the feature without changing the Cassandra code itself. However I believe this would be a good feature to have in Cassandra. I am currently working on a solution which creates a JMX server and uses a custom authenticator and authorizer. It is currently build as a premain, however it would be great if we could put this in Cassandra instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9657) Hint table doing unnecessary compaction
Jan Karlsson created CASSANDRA-9657: --- Summary: Hint table doing unnecessary compaction Key: CASSANDRA-9657 URL: https://issues.apache.org/jira/browse/CASSANDRA-9657 Project: Cassandra Issue Type: Bug Environment: 2.1.7 Reporter: Jan Karlsson Priority: Minor I found some really strange behaviour. During the replay of a node I found this in the log: {code}INFO [CompactionExecutor:7] CompactionTask.java:271 Compacted 1 sstables to [/var/lib/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-120,]. 452,150,727 bytes to 452,150,727 (~100% of original) in 267,588ms = 1.611449MB/s. 1 total partitions merged to 1. Partition merge counts were {1:1, }{code} This happened multiple times until the hint replay was completed and the sstables were removed. I tried to replicate this by just starting up a cluster in ccm and killing a node for a few minutes. I got the same behaviour then. {Code} INFO [CompactionExecutor:2] CompactionTask.java:270 - Compacted 1 sstables to [/home/ejankan/.ccm/hint/node3/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-2,]. 65,570 bytes to 65,570 (~100% of original) in 600ms = 0.104221MB/s. 1 total partitions merged to 1. Partition merge counts were {1:1, } {Code} It seems weird to me that the file does not decrease in size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do
[ https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8741: Attachment: 8741.txt Should work for both 2.1 and 2.0. > Running a drain before a decommission apparently the wrong thing to do > -- > > Key: CASSANDRA-8741 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8741 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise > 4.5.3) >Reporter: Casey Marshall >Assignee: Jan Karlsson >Priority: Trivial > Labels: lhf > Fix For: 2.1.x, 2.0.x > > Attachments: 8741.txt > > > This might simply be a documentation issue. It appears that running "nodetool > drain" is a very wrong thing to do before running a "nodetool decommission". > The idea was that I was going to safely shut off writes and flush everything > to disk before beginning the decommission. What happens is the "decommission" > call appears to fail very early on after starting, and afterwards, the node > in question is stuck in state LEAVING, but all other nodes in the ring see > that node as NORMAL, but down. No streams are ever sent from the node being > decommissioned to other nodes. > The drain command does indeed shut down the "BatchlogTasks" executor > (org/apache/cassandra/service/StorageService.java, line 3445 in git tag > "cassandra-2.0.11") but the decommission process tries using that executor > when calling the "startBatchlogReplay" function > (org/apache/cassandra/db/BatchlogManager.java, line 123) called through > org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace > pasted below). > This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4). > So, either something is wrong with the drain/decommission commands, or it's > very wrong to run a drain before a decommission. What's worse, there seems to > be no way to recover this node once it is in this state; you need to shut it > down and run "removenode". > My terminal output: > {code} > ubuntu@x:~$ nodetool drain > ubuntu@x:~$ tail /var/log/^C > ubuntu@x:~$ nodetool decommission > Exception in thread "main" java.util.concurrent.RejectedExecutionException: > Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 > rejected from > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) > at > java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629) > at > org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123) > at > org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966) > at > org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.jav
[jira] [Commented] (CASSANDRA-8327) snapshots taken before repair are not cleared if snapshot fails
[ https://issues.apache.org/jira/browse/CASSANDRA-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482767#comment-14482767 ] Jan Karlsson commented on CASSANDRA-8327: - Would it be possible to send clearsnapshots messages after every RepairJob? I guess the problem is that we do not really know when the snapshot has completed, which would introduce a race condition where the clear can occur before the taking of the snapshot. Any other ideas for solving this without requiring a restart? > snapshots taken before repair are not cleared if snapshot fails > --- > > Key: CASSANDRA-8327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8327 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: cassandra 2.0.10.71 >Reporter: MASSIMO CELLI >Assignee: Yuki Morishita >Priority: Minor > Fix For: 3.0 > > > running repair service the following directory was created for the snapshots: > drwxr-xr-x 2 cassandra cassandra 36864 Nov 5 07:47 > 073d16e0-64c0-11e4-8e9a-7b3d4674c508 > but the system.log reports the following error which suggests the snapshot > failed: > ERROR [RMI TCP Connection(3251)-10.150.27.78] 2014-11-05 07:47:55,734 > StorageService.java (line 2599) Repair session > 073d16e0-64c0-11e4-8e9a-7b3d4674c508 for range > (7530018576963469312,7566047373982433280] failed with error > java.io.IOException: Failed during snapshot creation. > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.io.IOException: Failed during snapshot creation. ERROR > [AntiEntropySessions:3312] 2014-11-05 07:47:55,731 RepairSession.java (line > 288) [repair #073d16e0-64c0-11e4-8e9a-7b3d4674c508] session completed with > the following error java.io.IOException: Failed during snapshot creation. > the problem is that the directory for the snapshots that fail are just left > on the disk and don't get cleaned up. They must be removed manually, which is > not ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8696) nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could not create snapshot
[ https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301157#comment-14301157 ] Jan Karlsson edited comment on CASSANDRA-8696 at 2/2/15 11:08 AM: -- We stumbled upon this issue aswell. I was only able to reproduce this when the amount of data on disk was over 12G. was (Author: jan karlsson): I was only able to reproduce this when the amount of data on disk was over 12G. From taking a quick glance at the code, this is caused by the snapshot process throwing a timeout. > nodetool repair on cassandra 2.1.2 keyspaces return > java.lang.RuntimeException: Could not create snapshot > - > > Key: CASSANDRA-8696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8696 > Project: Cassandra > Issue Type: Bug >Reporter: Jeff Liu > > When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra > throw java exceptions: cannot create snapshot. > the error log from system.log: > {noformat} > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 > StreamResultFuture.java:166 - [Stream #692c1450-a692-11e4-9973-070e938df227 > ID#0] Prepare completed. Receiving 2 files(221187 bytes), sending 5 > files(632105 bytes) > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 > StreamResultFuture.java:180 - [Stream #692c1450-a692-11e4-9973-070e938df227] > Session with /10.97.9.110 is complete > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 > StreamResultFuture.java:212 - [Stream #692c1450-a692-11e4-9973-070e938df227] > All sessions completed > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 > StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] > streaming task succeed, returning response to /10.98.194.68 > INFO [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - > [Stream #692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for > Repair > INFO [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 > StreamSession.java:213 - [Stream #692c6270-a692-11e4-9973-070e938df227] > Starting streaming to /10.66.187.201 > INFO [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 > StreamCoordinator.java:209 - [Stream #692c6270-a692-11e4-9973-070e938df227, > ID#0] Beginning stream session with /10.66.187.201 > INFO [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 > StreamResultFuture.java:166 - [Stream #692c6270-a692-11e4-9973-070e938df227 > ID#0] Prepare completed. Receiving 5 files(627994 bytes), sending 5 > files(632105 bytes) > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,971 > StreamResultFuture.java:180 - [Stream #692c6270-a692-11e4-9973-070e938df227] > Session with /10.66.187.201 is complete > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,972 > StreamResultFuture.java:212 - [Stream #692c6270-a692-11e4-9973-070e938df227] > All sessions completed > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,972 > StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] > streaming task succeed, returning response to /10.98.194.68 > ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error > occurred during snapshot phase > java.lang.RuntimeException: Could not create snapshot at /10.97.9.110 > at > org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_45] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > INFO [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 > - [repair #6f85e740-a692-11e4-9973-070e938df227] new session: will sync > /10.98.194.68, /10.66.187.201, /10.226.218.135 on range > (12817179804668051873746972069086 > 2638799,12863540308359254031520865977436165] for events.[bigint0text, > bigint0boolean, bigint0int, dataset_catalog, column_categories, > bigint0double, bigint0bigint] > ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 > - [repair #685e3d00-a692-11e4-9973-070e938df227] session completed with the > following error > java.io.IOException: Failed during snapshot creation. > at > org.apache.cassandra.repair.RepairSession.faile
[jira] [Commented] (CASSANDRA-8696) nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could not create snapshot
[ https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301157#comment-14301157 ] Jan Karlsson commented on CASSANDRA-8696: - I was only able to reproduce this when the amount of data on disk was over 12G. From taking a quick glance at the code, this is caused by the snapshot process throwing a timeout. > nodetool repair on cassandra 2.1.2 keyspaces return > java.lang.RuntimeException: Could not create snapshot > - > > Key: CASSANDRA-8696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8696 > Project: Cassandra > Issue Type: Bug >Reporter: Jeff Liu > > When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra > throw java exceptions: cannot create snapshot. > the error log from system.log: > {noformat} > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 > StreamResultFuture.java:166 - [Stream #692c1450-a692-11e4-9973-070e938df227 > ID#0] Prepare completed. Receiving 2 files(221187 bytes), sending 5 > files(632105 bytes) > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 > StreamResultFuture.java:180 - [Stream #692c1450-a692-11e4-9973-070e938df227] > Session with /10.97.9.110 is complete > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 > StreamResultFuture.java:212 - [Stream #692c1450-a692-11e4-9973-070e938df227] > All sessions completed > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 > StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] > streaming task succeed, returning response to /10.98.194.68 > INFO [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - > [Stream #692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for > Repair > INFO [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 > StreamSession.java:213 - [Stream #692c6270-a692-11e4-9973-070e938df227] > Starting streaming to /10.66.187.201 > INFO [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 > StreamCoordinator.java:209 - [Stream #692c6270-a692-11e4-9973-070e938df227, > ID#0] Beginning stream session with /10.66.187.201 > INFO [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 > StreamResultFuture.java:166 - [Stream #692c6270-a692-11e4-9973-070e938df227 > ID#0] Prepare completed. Receiving 5 files(627994 bytes), sending 5 > files(632105 bytes) > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,971 > StreamResultFuture.java:180 - [Stream #692c6270-a692-11e4-9973-070e938df227] > Session with /10.66.187.201 is complete > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,972 > StreamResultFuture.java:212 - [Stream #692c6270-a692-11e4-9973-070e938df227] > All sessions completed > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,972 > StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] > streaming task succeed, returning response to /10.98.194.68 > ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error > occurred during snapshot phase > java.lang.RuntimeException: Could not create snapshot at /10.97.9.110 > at > org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_45] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > INFO [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 > - [repair #6f85e740-a692-11e4-9973-070e938df227] new session: will sync > /10.98.194.68, /10.66.187.201, /10.226.218.135 on range > (12817179804668051873746972069086 > 2638799,12863540308359254031520865977436165] for events.[bigint0text, > bigint0boolean, bigint0int, dataset_catalog, column_categories, > bigint0double, bigint0bigint] > ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 > - [repair #685e3d00-a692-11e4-9973-070e938df227] session completed with the > following error > java.io.IOException: Failed during snapshot creation. > at > org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) > ~[apache-cassandra-2.1.2.jar:2.1.2]
[jira] [Commented] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException
[ https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285307#comment-14285307 ] Jan Karlsson commented on CASSANDRA-8643: - Unfortunately we have not encountered this bug since. It seemed like we went into some sort of bad state with repairs as most repairs on this cluster failed with this exception until we wiped it. I will keep you posted if I see this happen again. > merkle tree creation fails with NoSuchElementException > -- > > Key: CASSANDRA-8643 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8643 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: We are running on a three node cluster with three in > replication(C* 2.1.1). It uses a default C* installation and STCS. >Reporter: Jan Karlsson > Fix For: 2.1.3 > > > We have a problem that we encountered during testing over the weekend. > During the tests we noticed that repairs started to fail. This error has > occured on multiple non-coordinator nodes during repair. It also ran at least > once without producing this error. > We run repair -pr on all nodes on different days. CPU values were around 40% > and disk was 50% full. > From what I understand, the coordinator asked for merkle trees from the other > two nodes. However one of the nodes fails to create his merkle tree. > Unfortunately we do not have a way to reproduce this problem. > The coordinator receives: > {noformat} > 2015-01-09T17:55:57.091+0100 INFO [RepairJobTask:4] RepairJob.java:145 > [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for > censored (to [/xx.90, /xx.98, /xx.82]) > 2015-01-09T17:55:58.516+0100 INFO [AntiEntropyStage:1] > RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] > Received merkle tree for censored from /xx.90 > 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] > RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session > completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] > CassandraDaemon.java:153 Exception in thread > Thread[AntiEntropySessions:76,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] >at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
[jira] [Updated] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException
[ https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8643: Environment: We are running on a three node cluster with three in replication(C* 2.1.1). It uses a default C* installation and STCS. (was: We are running on a three node cluster with three in replication(C* 2.1.2). It uses a default C* installation and STCS.) > merkle tree creation fails with NoSuchElementException > -- > > Key: CASSANDRA-8643 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8643 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: We are running on a three node cluster with three in > replication(C* 2.1.1). It uses a default C* installation and STCS. >Reporter: Jan Karlsson > Fix For: 2.1.3 > > > We have a problem that we encountered during testing over the weekend. > During the tests we noticed that repairs started to fail. This error has > occured on multiple non-coordinator nodes during repair. It also ran at least > once without producing this error. > We run repair -pr on all nodes on different days. CPU values were around 40% > and disk was 50% full. > From what I understand, the coordinator asked for merkle trees from the other > two nodes. However one of the nodes fails to create his merkle tree. > Unfortunately we do not have a way to reproduce this problem. > The coordinator receives: > {noformat} > 2015-01-09T17:55:57.091+0100 INFO [RepairJobTask:4] RepairJob.java:145 > [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for > censored (to [/xx.90, /xx.98, /xx.82]) > 2015-01-09T17:55:58.516+0100 INFO [AntiEntropyStage:1] > RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] > Received merkle tree for censored from /xx.90 > 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] > RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session > completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] > CassandraDaemon.java:153 Exception in thread > Thread[AntiEntropySessions:76,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] >at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1]
[jira] [Updated] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException
[ https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8643: Reproduced In: 2.1.1 (was: 2.1.2) > merkle tree creation fails with NoSuchElementException > -- > > Key: CASSANDRA-8643 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8643 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: We are running on a three node cluster with three in > replication(C* 2.1.2). It uses a default C* installation and STCS. >Reporter: Jan Karlsson > Fix For: 2.1.3 > > > We have a problem that we encountered during testing over the weekend. > During the tests we noticed that repairs started to fail. This error has > occured on multiple non-coordinator nodes during repair. It also ran at least > once without producing this error. > We run repair -pr on all nodes on different days. CPU values were around 40% > and disk was 50% full. > From what I understand, the coordinator asked for merkle trees from the other > two nodes. However one of the nodes fails to create his merkle tree. > Unfortunately we do not have a way to reproduce this problem. > The coordinator receives: > {noformat} > 2015-01-09T17:55:57.091+0100 INFO [RepairJobTask:4] RepairJob.java:145 > [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for > censored (to [/xx.90, /xx.98, /xx.82]) > 2015-01-09T17:55:58.516+0100 INFO [AntiEntropyStage:1] > RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] > Received merkle tree for censored from /xx.90 > 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] > RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session > completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] > CassandraDaemon.java:153 Exception in thread > Thread[AntiEntropySessions:76,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] >at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: > org.apache.cassandra.exceptions.RepairException: [repair > #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, > (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) > ~[apache-cassandra-2.1.1.jar:2.1.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > ~[apache-cassandra-2.1.1.jar:2.1.1] > ... 3 common frames omitted > {noformat} > While one of the other nodes produces this
[jira] [Created] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException
Jan Karlsson created CASSANDRA-8643: --- Summary: merkle tree creation fails with NoSuchElementException Key: CASSANDRA-8643 URL: https://issues.apache.org/jira/browse/CASSANDRA-8643 Project: Cassandra Issue Type: Bug Components: Core Environment: We are running on a three node cluster with three in replication(C* 2.1.2). It uses a default C* installation and STCS. Reporter: Jan Karlsson We have a problem that we encountered during testing over the weekend. During the tests we noticed that repairs started to fail. This error has occured on multiple non-coordinator nodes during repair. It also ran at least once without producing this error. We run repair -pr on all nodes on different days. CPU values were around 40% and disk was 50% full. >From what I understand, the coordinator asked for merkle trees from the other >two nodes. However one of the nodes fails to create his merkle tree. Unfortunately we do not have a way to reproduce this problem. The coordinator receives: {noformat} 2015-01-09T17:55:57.091+0100 INFO [RepairJobTask:4] RepairJob.java:145 [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for censored (to [/xx.90, /xx.98, /xx.82]) 2015-01-09T17:55:58.516+0100 INFO [AntiEntropyStage:1] RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] Received merkle tree for censored from /xx.90 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] CassandraDaemon.java:153 Exception in thread Thread[AntiEntropySessions:76,5,RMI Runtime] java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: org.apache.cassandra.exceptions.RepairException: [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.1.jar:2.1.1] ... 3 common frames omitted {noformat} While one of the other nodes produces this error: {noformat} 2015-01-09T17:55:59.574+0100 ERROR [ValidationExecutor:16] Validator.java:232 Failed creating a merkle tree for [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]], /xx.82 (see log for details) 2015-01-09T17:55:59.578+0100 ERROR [ValidationExecutor:16] CassandraDaemon.java:153 Exception in thread Thread[ValidationExecutor
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Description: There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? was: There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66