Re: why leaving riak cluster so slowly and how to accelerate the speed

2015-08-12 Thread Dmitri Zagidulin
Responses inline.


On Tue, Aug 11, 2015 at 12:53 PM, changmao wang wang.chang...@gmail.com
wrote:

 1. About backuping new nodes of four and then using 'riak-admin
 force-replace'. what's the status of new added nodes?
 as you know, we want to replace one of leaving nodes.


I don't understand the question. Doing 'riak-admin force-replace' on one of
the nodes that's leaving should overwrite the leave request and tell it to
change its node id / ip address. (If that doesn't work, stop the leaving
node, and do a 'riak-admin reip' command instead).



 2. what's the risk of 'riak-admin force-remove' 'riak@10.21.136.91'
 without backup?
 As you know, now the node(riak@10.21.136.91) is a member of the cluster,
 and keeping almost 2.5TB data, maybe 10 percent of the whole cluster.


The only reason I asked about backup is because it sounded like you cleared
the disk on it. If it currently has the data, then it'll be fine.
Force-remove just changes the IP address, and doesn't delete the data or
anything.


On Tue, Aug 11, 2015 at 7:32 PM, Dmitri Zagidulin dzagidu...@basho.com
wrote:

 1. How to force leave leaving's nodes without data loss?

 This depends on - did you back up the data directory of the 4 new nodes,
 before you reformatted them?
 If you backed them up (and then restored the data directory once you
 reformatted them), you can try:

 riak-admin force-replace 'riak@10.21.136.91' 'riak@whatever your new ip
 address is for that node'
 (same for the other 3)

 If you did not back up those nodes, the only thing you can do is force
 them to leave, and then join the new ones. So, for each of the 4:

 riak-admin force-remove 'riak@10.21.136.91' 'riak@10.21.136.66'
 (same for the other 3)

 In either case, after force-replacing or force-removing, you have to join
 the new nodes to the cluster, before you commit.

 riak-admin join 'riak@new node' 'riak@10.21.136.66'
 (same for the other 3)
 and finally:
 riak-cluster plan
 riak-cluster commit

 As for the error, the reason you're seeing it, is because the other nodes
 can't contact the 4 that are supposed to be leaving. (Since you wiped them).
 The amount of time that passed doesn't matter, the cluster will be waiting
 for those nodes to leave indefinitely, unless you force-remove or
 force-replace.



 On Tue, Aug 11, 2015 at 1:32 AM, changmao wang wang.chang...@gmail.com
 wrote:

 HI Dmitri,

 For your question,
 3) Re-formatted those four nodes and re-installed Riak. Here is where it
 gets tricky though. Several questions for you:
 - Did you attempt to re-join those 4 reinstalled nodes into the cluster?
 What was the output of the cluster join and cluster plan commands?
 - Did the IP address change, after they were reformatted? If so, you
 probably need to use something like 'reip' at this point:
 http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip

 I did  NOT try to re-join those 4 re-join those 4 reinstalled nodes into
 the cluster. As you know, member-status shows 'they're leaving as below:
 riak-admin member-status
 = Membership
 ==
 Status RingPendingNode

 ---
 leaving10.9% 10.9%'riak@10.21.136.91'
 leaving 9.4% 10.9%'riak@10.21.136.92'
 leaving 7.8% 10.9%'riak@10.21.136.93'
 leaving 7.8% 10.9%'riak@10.21.136.94'
 valid  10.9% 10.9%'riak@10.21.136.66'
 valid  10.9% 10.9%'riak@10.21.136.71'
 valid  14.1% 10.9%'riak@10.21.136.76'
 valid  17.2% 12.5%'riak@10.21.136.81'
 valid  10.9% 10.9%'riak@10.21.136.86'

 ---
 Valid:5 / Leaving:4 / Exiting:0 / Joining:0 / Down:0

 two weeks elapsed, 'riak-admin member-status' shows same result. I don't
 know which step ring hand off?

 I did not changed the IP address of four newly adding nodes.

 My questions:

 1. How to force leave leaving's nodes without data loss?
 2. I have found some errors related to handoff of partition in
 /etc/riak/log/errors.
 Details are as below:

 2015-07-30 16:04:33.643 [error]
 0.12872.15@riak_core_handoff_sender:start_fold:262 ownership_transfer
 transfer of riak_kv_vnode from 'riak@10.21.136.76'
 45671926166590716193865151022383844364247891968 to 'riak@10.21.136.93'
 45671926166590716193865151022383844364247891968 failed because of enotconn
 2015-07-30 16:04:33.643 [error]
 0.197.0@riak_core_handoff_manager:handle_info:289 An outbound handoff of
 partition riak_kv_vnode 45671926166590716193865151022383844364247891968 was
 terminated for reason: {shutdown,{error,enotconn}}



 I have searched it with google and found related articles. However,
 there's no solution.

 http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-October/016052.html


 On Mon, Aug 10, 2015 at 10:09 PM, Dmitri Zagidulin dzagidu...@basho.com
 wrote:

 

Re: Getting HTTP port info for all nodes in a cluster

2015-08-12 Thread Dmitri Zagidulin
If you know one node's HTTP listening port, you know them all -- all the
nodes are supposed to listen on the same ports. (Otherwise, load balancing
gets awkward, etc).

A case where different nodes in the cluster are listening on different
ports is exotic enough to not be worth supporting. (Plus, there's no way to
programmatically discover them from Go).

You can definitely discover the ip addresses of the other nodes via API,
though. But not the ports.

On Wed, Aug 12, 2015 at 4:16 AM, Mark Wilkinson m...@m82labs.com wrote:

 Is there a way via the HTTP API to get the HTTP port the other nodes on
 a cluster are listening on, assuming of course that I know one of the
 nodes' listening port?

 I am trying to find a way to pass in connection information for one of
 the nodes in a cluster to an app I am writing in Go, and then
 programmatically determine the addresses of the other nodes in the
 cluster so I can connect to them and gather performance data. Assuming I
 always use default ports, this isn't an issue, but I would like this to
 work for non-default, unknown ports.

 Thanks for any suggestions!


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Problem with Riak not thinking hostname is fully-qualified

2015-08-12 Thread Sargun Dhillon
Put a period at the end.

On Wed, Aug 12, 2015 at 8:52 PM, Toby Corkindale t...@dryft.net wrote:
 I'm trying to set up another cluster, and I'm hitting problems with Riak
 complaining that ** System running to use fully qualified hostnames **
 ** Hostname db04 is illegal **

 However, as far as I can see, the host does have a fully-qualified hostname:

 root@db04:/# hostname
 db04
 root@db04:/# hostname -f
 db04.dc.sdlocal.net

 /etc/hosts contains a line with both the short and fqdn names in it.

 Is there something else I should be checking?

 Cheers
 Toby

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Problem with Riak not thinking hostname is fully-qualified

2015-08-12 Thread Toby Corkindale
I should have mentioned, the actual Riak.conf has nodename set to the FQDN.
The FQDN resolves correctly in our DNS.

On Thu, 13 Aug 2015 at 13:52 Toby Corkindale t...@dryft.net wrote:

 I'm trying to set up another cluster, and I'm hitting problems with Riak
 complaining that ** System running to use fully qualified hostnames **
 ** Hostname db04 is illegal **

 However, as far as I can see, the host does have a fully-qualified
 hostname:

 root@db04:/# hostname
 db04
 root@db04:/# hostname -f
 db04.dc.sdlocal.net

 /etc/hosts contains a line with both the short and fqdn names in it.

 Is there something else I should be checking?

 Cheers
 Toby

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Getting HTTP port info for all nodes in a cluster

2015-08-12 Thread mark
Thanks for the thoughts here. I do get a bit carried away sometimes, 
wanting to support fringe cases. I'll go forward assuming the same port 
number.


On 2015-08-12 10:27, Dmitri Zagidulin wrote:

If you know one node's HTTP listening port, you know them all -- all
the nodes are supposed to listen on the same ports. (Otherwise, load
balancing gets awkward, etc).

A case where different nodes in the cluster are listening on different
ports is exotic enough to not be worth supporting. (Plus, there's no
way to programmatically discover them from Go).

You can definitely discover the ip addresses of the other nodes via
API, though. But not the ports.

On Wed, Aug 12, 2015 at 4:16 AM, Mark Wilkinson m...@m82labs.com
wrote:


Is there a way via the HTTP API to get the HTTP port the other nodes
on
a cluster are listening on, assuming of course that I know one of
the
nodes' listening port?

I am trying to find a way to pass in connection information for one
of
the nodes in a cluster to an app I am writing in Go, and then
programmatically determine the addresses of the other nodes in the
cluster so I can connect to them and gather performance data.
Assuming I
always use default ports, this isn't an issue, but I would like this
to
work for non-default, unknown ports.

Thanks for any suggestions!

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[1]




Links:
--
[1] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


why leaving riak cluster so slowly and how to accelerate the speed

2015-08-12 Thread Changmao.Wang
Hi Riak users,

Before adding new nodes, the cluster only have five nodes. The member list are 
as below:
10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86.
We did not setup http proxy for the cluster, only one node of the cluster 
provide the http service.  so the CPU load is always high on this node.

After that, I added four nodes (10.21.136.[91-94]) to those cluster. During the 
ring/data balance progress, each node failed(riak stopped) because of disk 100% 
full.
I used multi-disk path to data_root parameter in '/etc/riak/app.config'. Each 
disk is only 580MB size. 
As you know, bitcask storage engine did not support multi-disk path. After one 
of the disks is 100% full, it can not switch next idle disk. So the riak 
service is down.

After that, I removed the new add four nodes at active nodes with riak-admin 
cluster leave riak@'10.21.136.91'.
and then stop riak service on other active new nodes, reformat the above new 
nodes with LVM disk management (bind 6 disk with virtual disk group).
Replace the data-root parameter with one folder, and then start riak 
service again. After that, the cluster began the data balance again. 
That's the whole story.

my question are as below:
1. what's the current status of the whole cluster? Is't doing data balance?
2. there's so many errors during one of the node error log. how to handle it?
2015-08-05 01:38:59.717 [error] 
0.23000.298@riak_core_handoff_sender:start_fold:262 ownership_transfer 
transfer of riak_kv_vnode from 'riak@10.21.136.81' 
525227150915793236229449236757414210188850757632 to 'riak@10.21.136.94' 
525227150915793236229449236757414210188850757632 failed because of enotconn
2015-08-05 01:38:59.718 [error] 
0.195.0@riak_core_handoff_manager:handle_info:289 An outbound handoff of 
partition riak_kv_vnode 525227150915793236229449236757414210188850757632 was 
terminated for reason: {shutdown,{error,enotconn}}

During the last 5 days, there's no changes of the riak-admin member status 
output.
3. how to accelerate the data balance? 

Amao

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: why leaving riak cluster so slowly and how to accelerate the speed

2015-08-12 Thread Changmao.Wang
Hi Riak users, 

Before adding new nodes, the cluster only have five nodes. The member list are 
as below: 
10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86. 
We did not setup http proxy for the cluster, only one node of the cluster 
provide the http service. so the CPU load is always high on this node. 

After that, I added four nodes (10.21.136.[91-94]) to those cluster. During the 
ring/data balance progress, each node failed(riak stopped) because of disk 100% 
full. 
I used multi-disk path to data_root parameter in '/etc/riak/app.config'. Each 
disk is only 580MB size. 
As you know, bitcask storage engine did not support multi-disk path. After one 
of the disks is 100% full, it can not switch next idle disk. So the riak 
service is down. 

After that, I removed the new add four nodes at active nodes with riak-admin 
cluster leave riak@'10.21.136.91' . 
and then stop riak service on other active new nodes, reformat the above new 
nodes with LVM disk management (bind 6 disk with virtual disk group). 
Replace the data-root parameter with one folder, and then start riak 
service again. After that, the cluster began the data balance again. 
That's the whole story. 


Amao 

- Original Message -

From: Dmitri Zagidulin dzagidu...@basho.com 
To: Changmao.Wang changmao.w...@datayes.com 
Sent: Thursday, August 6, 2015 10:46:59 PM 
Subject: Re: why leaving riak cluster so slowly and how to accelerate the speed 

Hi Amao, 

Can you explain a bit more which steps you've taken, and what the problem is? 

Which nodes have been added, and which nodes are leaving the cluster? 

On Tue, Jul 28, 2015 at 11:03 PM, Changmao.Wang  changmao.w...@datayes.com  
wrote: 


Hi Raik user group, 

I'm using riak and riak-cs 1.4.2. Last weekend, I added four nodes to cluster 
with 5 nodes. However, it's failed with one of disks 100% full. 
As you know bitcask storage engine can not support multifolders. 

After that, I restarted the riak and leave the cluster with the command 
riak-admin cluster leave and riak-admin cluster plan, and the commit. 
However, riak is always doing KV balance after my submit leaving command. I 
guess that it's doing join cluster progress. 

Could you show us how to accelerate the leaving progress? I have tuned the 
transfer-limit parameters on 9 nodes. 

below is some commands output: 
riak-admin member-status 
= Membership == 
Status Ring Pending Node 
--- 
leaving 6.3% 10.9% ' riak@10.21.136.91 ' 
leaving 9.4% 10.9% ' riak@10.21.136.92 ' 
leaving 6.3% 10.9% ' riak@10.21.136.93 ' 
leaving 6.3% 10.9% ' riak@10.21.136.94 ' 
valid 10.9% 10.9% ' riak@10.21.136.66 ' 
valid 12.5% 10.9% ' riak@10.21.136.71 ' 
valid 18.8% 10.9% ' riak@10.21.136.76 ' 
valid 18.8% 12.5% ' riak@10.21.136.81 ' 
valid 10.9% 10.9% ' riak@10.21.136.86 ' 

riak-admin transfer_limit 
=== Transfer Limit  
Limit Node 
--- 
200 ' riak@10.21.136.66 ' 
200 ' riak@10.21.136.71 ' 
100 ' riak@10.21.136.76 ' 
100 ' riak@10.21.136.81 ' 
200 ' riak@10.21.136.86 ' 
500 ' riak@10.21.136.91 ' 
500 ' riak@10.21.136.92 ' 
500 ' riak@10.21.136.93 ' 
500 ' riak@10.21.136.94 ' 

Any more details for your diagnosing the problem? 

Amao 

___ 
riak-users mailing list 
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


What in RiakS2 do AS3

2015-08-12 Thread Brad Aisa
I'm new to both S3 and Riak -- I'm trying to set up local RiakS2 to run our 
integration tests (instead of writing to S3) -- where do I find or configure 
Access Key ID and Secret Key for the S3 api? The only thing I've found on 
the Basho site relates to setting up a S3cmd client. It provides this cryptic 
suggestion:
To configure the s3cmd client for the user, you must change the access_key and 
secret_key settings.
Any help is appreciated.

Brad Aisa  Senior Software Engineer
brad.a...@thetradedesk.commailto:amber.scou...@thetradedesk.com
cell: 720-233-0225
1615 Pearl St., Boulder, CO 80302
@TheTradeDeskInchttp://twitter.com/TheTradeDeskinc // 
Facebookhttp://www.facebook.com/TheTradeDesk // 
LinkedInhttp://www.linkedin.com/company/the-trade-desk // 
TheTradeDesk.comhttp://www.thetradedesk.com/
[cid:8BA8BF40-3F6C-48D1-89D1-F8A3D3D4C074]

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Getting HTTP port info for all nodes in a cluster

2015-08-12 Thread Mark Wilkinson
Is there a way via the HTTP API to get the HTTP port the other nodes on
a cluster are listening on, assuming of course that I know one of the
nodes' listening port?

I am trying to find a way to pass in connection information for one of
the nodes in a cluster to an app I am writing in Go, and then
programmatically determine the addresses of the other nodes in the
cluster so I can connect to them and gather performance data. Assuming I
always use default ports, this isn't an issue, but I would like this to
work for non-default, unknown ports.

Thanks for any suggestions!


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com