Re: why leaving riak cluster so slowly and how to accelerate the speed
Responses inline. On Tue, Aug 11, 2015 at 12:53 PM, changmao wang wang.chang...@gmail.com wrote: 1. About backuping new nodes of four and then using 'riak-admin force-replace'. what's the status of new added nodes? as you know, we want to replace one of leaving nodes. I don't understand the question. Doing 'riak-admin force-replace' on one of the nodes that's leaving should overwrite the leave request and tell it to change its node id / ip address. (If that doesn't work, stop the leaving node, and do a 'riak-admin reip' command instead). 2. what's the risk of 'riak-admin force-remove' 'riak@10.21.136.91' without backup? As you know, now the node(riak@10.21.136.91) is a member of the cluster, and keeping almost 2.5TB data, maybe 10 percent of the whole cluster. The only reason I asked about backup is because it sounded like you cleared the disk on it. If it currently has the data, then it'll be fine. Force-remove just changes the IP address, and doesn't delete the data or anything. On Tue, Aug 11, 2015 at 7:32 PM, Dmitri Zagidulin dzagidu...@basho.com wrote: 1. How to force leave leaving's nodes without data loss? This depends on - did you back up the data directory of the 4 new nodes, before you reformatted them? If you backed them up (and then restored the data directory once you reformatted them), you can try: riak-admin force-replace 'riak@10.21.136.91' 'riak@whatever your new ip address is for that node' (same for the other 3) If you did not back up those nodes, the only thing you can do is force them to leave, and then join the new ones. So, for each of the 4: riak-admin force-remove 'riak@10.21.136.91' 'riak@10.21.136.66' (same for the other 3) In either case, after force-replacing or force-removing, you have to join the new nodes to the cluster, before you commit. riak-admin join 'riak@new node' 'riak@10.21.136.66' (same for the other 3) and finally: riak-cluster plan riak-cluster commit As for the error, the reason you're seeing it, is because the other nodes can't contact the 4 that are supposed to be leaving. (Since you wiped them). The amount of time that passed doesn't matter, the cluster will be waiting for those nodes to leave indefinitely, unless you force-remove or force-replace. On Tue, Aug 11, 2015 at 1:32 AM, changmao wang wang.chang...@gmail.com wrote: HI Dmitri, For your question, 3) Re-formatted those four nodes and re-installed Riak. Here is where it gets tricky though. Several questions for you: - Did you attempt to re-join those 4 reinstalled nodes into the cluster? What was the output of the cluster join and cluster plan commands? - Did the IP address change, after they were reformatted? If so, you probably need to use something like 'reip' at this point: http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip I did NOT try to re-join those 4 re-join those 4 reinstalled nodes into the cluster. As you know, member-status shows 'they're leaving as below: riak-admin member-status = Membership == Status RingPendingNode --- leaving10.9% 10.9%'riak@10.21.136.91' leaving 9.4% 10.9%'riak@10.21.136.92' leaving 7.8% 10.9%'riak@10.21.136.93' leaving 7.8% 10.9%'riak@10.21.136.94' valid 10.9% 10.9%'riak@10.21.136.66' valid 10.9% 10.9%'riak@10.21.136.71' valid 14.1% 10.9%'riak@10.21.136.76' valid 17.2% 12.5%'riak@10.21.136.81' valid 10.9% 10.9%'riak@10.21.136.86' --- Valid:5 / Leaving:4 / Exiting:0 / Joining:0 / Down:0 two weeks elapsed, 'riak-admin member-status' shows same result. I don't know which step ring hand off? I did not changed the IP address of four newly adding nodes. My questions: 1. How to force leave leaving's nodes without data loss? 2. I have found some errors related to handoff of partition in /etc/riak/log/errors. Details are as below: 2015-07-30 16:04:33.643 [error] 0.12872.15@riak_core_handoff_sender:start_fold:262 ownership_transfer transfer of riak_kv_vnode from 'riak@10.21.136.76' 45671926166590716193865151022383844364247891968 to 'riak@10.21.136.93' 45671926166590716193865151022383844364247891968 failed because of enotconn 2015-07-30 16:04:33.643 [error] 0.197.0@riak_core_handoff_manager:handle_info:289 An outbound handoff of partition riak_kv_vnode 45671926166590716193865151022383844364247891968 was terminated for reason: {shutdown,{error,enotconn}} I have searched it with google and found related articles. However, there's no solution. http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-October/016052.html On Mon, Aug 10, 2015 at 10:09 PM, Dmitri Zagidulin dzagidu...@basho.com wrote:
Re: Getting HTTP port info for all nodes in a cluster
If you know one node's HTTP listening port, you know them all -- all the nodes are supposed to listen on the same ports. (Otherwise, load balancing gets awkward, etc). A case where different nodes in the cluster are listening on different ports is exotic enough to not be worth supporting. (Plus, there's no way to programmatically discover them from Go). You can definitely discover the ip addresses of the other nodes via API, though. But not the ports. On Wed, Aug 12, 2015 at 4:16 AM, Mark Wilkinson m...@m82labs.com wrote: Is there a way via the HTTP API to get the HTTP port the other nodes on a cluster are listening on, assuming of course that I know one of the nodes' listening port? I am trying to find a way to pass in connection information for one of the nodes in a cluster to an app I am writing in Go, and then programmatically determine the addresses of the other nodes in the cluster so I can connect to them and gather performance data. Assuming I always use default ports, this isn't an issue, but I would like this to work for non-default, unknown ports. Thanks for any suggestions! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problem with Riak not thinking hostname is fully-qualified
Put a period at the end. On Wed, Aug 12, 2015 at 8:52 PM, Toby Corkindale t...@dryft.net wrote: I'm trying to set up another cluster, and I'm hitting problems with Riak complaining that ** System running to use fully qualified hostnames ** ** Hostname db04 is illegal ** However, as far as I can see, the host does have a fully-qualified hostname: root@db04:/# hostname db04 root@db04:/# hostname -f db04.dc.sdlocal.net /etc/hosts contains a line with both the short and fqdn names in it. Is there something else I should be checking? Cheers Toby ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problem with Riak not thinking hostname is fully-qualified
I should have mentioned, the actual Riak.conf has nodename set to the FQDN. The FQDN resolves correctly in our DNS. On Thu, 13 Aug 2015 at 13:52 Toby Corkindale t...@dryft.net wrote: I'm trying to set up another cluster, and I'm hitting problems with Riak complaining that ** System running to use fully qualified hostnames ** ** Hostname db04 is illegal ** However, as far as I can see, the host does have a fully-qualified hostname: root@db04:/# hostname db04 root@db04:/# hostname -f db04.dc.sdlocal.net /etc/hosts contains a line with both the short and fqdn names in it. Is there something else I should be checking? Cheers Toby ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Getting HTTP port info for all nodes in a cluster
Thanks for the thoughts here. I do get a bit carried away sometimes, wanting to support fringe cases. I'll go forward assuming the same port number. On 2015-08-12 10:27, Dmitri Zagidulin wrote: If you know one node's HTTP listening port, you know them all -- all the nodes are supposed to listen on the same ports. (Otherwise, load balancing gets awkward, etc). A case where different nodes in the cluster are listening on different ports is exotic enough to not be worth supporting. (Plus, there's no way to programmatically discover them from Go). You can definitely discover the ip addresses of the other nodes via API, though. But not the ports. On Wed, Aug 12, 2015 at 4:16 AM, Mark Wilkinson m...@m82labs.com wrote: Is there a way via the HTTP API to get the HTTP port the other nodes on a cluster are listening on, assuming of course that I know one of the nodes' listening port? I am trying to find a way to pass in connection information for one of the nodes in a cluster to an app I am writing in Go, and then programmatically determine the addresses of the other nodes in the cluster so I can connect to them and gather performance data. Assuming I always use default ports, this isn't an issue, but I would like this to work for non-default, unknown ports. Thanks for any suggestions! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com [1] Links: -- [1] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
why leaving riak cluster so slowly and how to accelerate the speed
Hi Riak users, Before adding new nodes, the cluster only have five nodes. The member list are as below: 10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86. We did not setup http proxy for the cluster, only one node of the cluster provide the http service. so the CPU load is always high on this node. After that, I added four nodes (10.21.136.[91-94]) to those cluster. During the ring/data balance progress, each node failed(riak stopped) because of disk 100% full. I used multi-disk path to data_root parameter in '/etc/riak/app.config'. Each disk is only 580MB size. As you know, bitcask storage engine did not support multi-disk path. After one of the disks is 100% full, it can not switch next idle disk. So the riak service is down. After that, I removed the new add four nodes at active nodes with riak-admin cluster leave riak@'10.21.136.91'. and then stop riak service on other active new nodes, reformat the above new nodes with LVM disk management (bind 6 disk with virtual disk group). Replace the data-root parameter with one folder, and then start riak service again. After that, the cluster began the data balance again. That's the whole story. my question are as below: 1. what's the current status of the whole cluster? Is't doing data balance? 2. there's so many errors during one of the node error log. how to handle it? 2015-08-05 01:38:59.717 [error] 0.23000.298@riak_core_handoff_sender:start_fold:262 ownership_transfer transfer of riak_kv_vnode from 'riak@10.21.136.81' 525227150915793236229449236757414210188850757632 to 'riak@10.21.136.94' 525227150915793236229449236757414210188850757632 failed because of enotconn 2015-08-05 01:38:59.718 [error] 0.195.0@riak_core_handoff_manager:handle_info:289 An outbound handoff of partition riak_kv_vnode 525227150915793236229449236757414210188850757632 was terminated for reason: {shutdown,{error,enotconn}} During the last 5 days, there's no changes of the riak-admin member status output. 3. how to accelerate the data balance? Amao ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: why leaving riak cluster so slowly and how to accelerate the speed
Hi Riak users, Before adding new nodes, the cluster only have five nodes. The member list are as below: 10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86. We did not setup http proxy for the cluster, only one node of the cluster provide the http service. so the CPU load is always high on this node. After that, I added four nodes (10.21.136.[91-94]) to those cluster. During the ring/data balance progress, each node failed(riak stopped) because of disk 100% full. I used multi-disk path to data_root parameter in '/etc/riak/app.config'. Each disk is only 580MB size. As you know, bitcask storage engine did not support multi-disk path. After one of the disks is 100% full, it can not switch next idle disk. So the riak service is down. After that, I removed the new add four nodes at active nodes with riak-admin cluster leave riak@'10.21.136.91' . and then stop riak service on other active new nodes, reformat the above new nodes with LVM disk management (bind 6 disk with virtual disk group). Replace the data-root parameter with one folder, and then start riak service again. After that, the cluster began the data balance again. That's the whole story. Amao - Original Message - From: Dmitri Zagidulin dzagidu...@basho.com To: Changmao.Wang changmao.w...@datayes.com Sent: Thursday, August 6, 2015 10:46:59 PM Subject: Re: why leaving riak cluster so slowly and how to accelerate the speed Hi Amao, Can you explain a bit more which steps you've taken, and what the problem is? Which nodes have been added, and which nodes are leaving the cluster? On Tue, Jul 28, 2015 at 11:03 PM, Changmao.Wang changmao.w...@datayes.com wrote: Hi Raik user group, I'm using riak and riak-cs 1.4.2. Last weekend, I added four nodes to cluster with 5 nodes. However, it's failed with one of disks 100% full. As you know bitcask storage engine can not support multifolders. After that, I restarted the riak and leave the cluster with the command riak-admin cluster leave and riak-admin cluster plan, and the commit. However, riak is always doing KV balance after my submit leaving command. I guess that it's doing join cluster progress. Could you show us how to accelerate the leaving progress? I have tuned the transfer-limit parameters on 9 nodes. below is some commands output: riak-admin member-status = Membership == Status Ring Pending Node --- leaving 6.3% 10.9% ' riak@10.21.136.91 ' leaving 9.4% 10.9% ' riak@10.21.136.92 ' leaving 6.3% 10.9% ' riak@10.21.136.93 ' leaving 6.3% 10.9% ' riak@10.21.136.94 ' valid 10.9% 10.9% ' riak@10.21.136.66 ' valid 12.5% 10.9% ' riak@10.21.136.71 ' valid 18.8% 10.9% ' riak@10.21.136.76 ' valid 18.8% 12.5% ' riak@10.21.136.81 ' valid 10.9% 10.9% ' riak@10.21.136.86 ' riak-admin transfer_limit === Transfer Limit Limit Node --- 200 ' riak@10.21.136.66 ' 200 ' riak@10.21.136.71 ' 100 ' riak@10.21.136.76 ' 100 ' riak@10.21.136.81 ' 200 ' riak@10.21.136.86 ' 500 ' riak@10.21.136.91 ' 500 ' riak@10.21.136.92 ' 500 ' riak@10.21.136.93 ' 500 ' riak@10.21.136.94 ' Any more details for your diagnosing the problem? Amao ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
What in RiakS2 do AS3
I'm new to both S3 and Riak -- I'm trying to set up local RiakS2 to run our integration tests (instead of writing to S3) -- where do I find or configure Access Key ID and Secret Key for the S3 api? The only thing I've found on the Basho site relates to setting up a S3cmd client. It provides this cryptic suggestion: To configure the s3cmd client for the user, you must change the access_key and secret_key settings. Any help is appreciated. Brad Aisa Senior Software Engineer brad.a...@thetradedesk.commailto:amber.scou...@thetradedesk.com cell: 720-233-0225 1615 Pearl St., Boulder, CO 80302 @TheTradeDeskInchttp://twitter.com/TheTradeDeskinc // Facebookhttp://www.facebook.com/TheTradeDesk // LinkedInhttp://www.linkedin.com/company/the-trade-desk // TheTradeDesk.comhttp://www.thetradedesk.com/ [cid:8BA8BF40-3F6C-48D1-89D1-F8A3D3D4C074] ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Getting HTTP port info for all nodes in a cluster
Is there a way via the HTTP API to get the HTTP port the other nodes on a cluster are listening on, assuming of course that I know one of the nodes' listening port? I am trying to find a way to pass in connection information for one of the nodes in a cluster to an app I am writing in Go, and then programmatically determine the addresses of the other nodes in the cluster so I can connect to them and gather performance data. Assuming I always use default ports, this isn't an issue, but I would like this to work for non-default, unknown ports. Thanks for any suggestions! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com