Re: [Gluster-users] [Gluster-devel] Gluster on an ARM system
On Friday 12 August 2011 08:48 AM, Emmanuel Dreyfus wrote: John Mark Walkerjwal...@gluster.com wrote: I've CC'd the gluster-devel list in the hopes that someone there can help you out. However, my understanding is that it will take some significant porting to get GlusterFS to run in any production capacity on ARM. What ARM specific problems have been identified? The biggest issue, IMO, will be that of endianness. GlusterFS has been run only on Intel/AMD architecture, AFAIK. I have not heard of any SPARC installations. That means that the code has been tested only on little-endian architecture. The worst problems come in when there is interaction between entities of different endianness. However, there is another side to this. From what I know, ARM is actually a bi-Endian processor. If the ARM cores have the system control co-processor, the endianness of the ARM processor can be controlled by software. So, if we make ARM to work as a little-endian processor, we should work well even in a mixed environment. But then, ARM is a 32-bit processor. I am unsure/ignorant of the stability of 32-bit GlusterFS. If we can solve the two major issues mentioned above viz. Endianness and stability of GlusterFS on 32-bit, we should theoretically be able to get GlusterFS working on ARM without any other major work. Again, I cannot vouch for the above statement. Just my thoughts from what I know. Pavan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Stripe+replicate
Hello, is this ( http://gluster.org/pipermail/gluster-users/2011-July/008223.html ) true regarding 3.3.0 beta or should check out GIT? Also while it is possible to manually create in client volfile will some more complex setups like striped+replicated+distributed setups ( like for example stripe on 6 (or more) nodes each stripe having 3 replicas and distributed on 12 servers) be supported or better stay away from something like that? What's the suggested way to store large ~500 Gb files in reliable way so it doesn't bring the cluster down if a replica fails and has to be resynced. thx in advance rr ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster server and multi-homing, again
gluster-users-boun...@gluster.org wrote on 08/11/2011 02:22:11 PM: On Thu, Aug 11, 2011 at 12:18 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Run one glusterd on server1. Have server1-eth1 and server1-eth2 refer to the two interfaces. Use gluster peer probe server1-eth1 AND gluster peer probe server1-eth2 (?) My concern is that this will be attempting to add the same server to Don't they have different IP unless I misunderstood your configuration. Are the NIC bonded? No, no bonding. I mean one server with two IP addresses. So server1-eth1 might be 192.168.0.37 and server1-eth2 might be 192.168.1.37, but they both connect to the same server (just on different interfaces). If you want to try out you will also need to do peer probe with eth1 and eth2 IPs. If both IPs refer to the same server, they will connect to the same glusterd process, unless I find a way to launch two such processes and bind each to a separate address... to clarify... you effectively have 1 server and set of clients, but want the clients to be able to access the server over both IPs? With each IP mapping to only a portion (a brick) of the combined volume? If I interpreted incorrectly, take the rest with a grain of salt. I do not think what you want is really possible the way you are describing it, primarily because gluster uses an algorithm to distribute files across bricks, thus making targeting a single brick in a volume pretty close to impossible. To accomplish that you would need them to be separate volumes. Once you have them split into separate volumes you can accomplish what you want (having clients use specific interfaces for specific disks), but it will require you to not use the recommended mounting method for one of the two volumes. For the primary interface, you can mount normally from clients, but for the second one you will have to download the vol file to the clients and change the IPs in the vol file for the second interface. we used to do the manual editing.. it was a pain. we re-worked our system to allow us to use the default method and it works a lot better. Although we do still have a failover scenario where we have to do the manual file change, because gluster doesn't natively support alternate interface/hostnames in the config file. hopefully they will one day. -greg ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] (no subject)
L° Von Samsung Mobile gesendet___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS 3.1.6
Not sure if I announced it here, but GlusterFS 3.1.6 has been released. If you're on the 3.1.x series, and you've been waiting for a fix for the gfid issue, this is it: http://community.gluster.org/p/glusterfs-3-1-6-released/ Thanks, John Mark Gluster Community Guy ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] New GlusterFS Documentation
We have recently (~1 month ago) revamped some of our documentation with new installation and administration guides for GlusterFS 3.2. If you're new to GlusterFS, you may have already seen it, but if you are an old-timer you may not have noticed. Please feel free to peruse the new guides: Installation guide - wiki: http://www.gluster.com/community/documentation/index.php/Gluster_3.2_Filesystem_Installation_Guide PDF: http://download.gluster.com/pub/gluster/glusterfs/3.2/3.2.0/Gluster_FS_3.2_Installation_Guide.pdf Administration guide - wiki: http://www.gluster.com/community/documentation/index.php/Gluster_3.2_Filesystem_Administration_Guide PDF: http://download.gluster.com/pub/gluster/glusterfs/3.2/3.2.2/Gluster_FS_3.2_Admin_Guide.pdf Please take a look and let us know if this documentation meets your needs and how we should change them going forward. We have also begun the task of writing new documentation and adding to the mix of currently available docs. We know that there's more to do yet in terms of performance tuning, extending GlusterFS, and other advanced topics. What other topics would you like to see covered? Thanks, John Mark Walker Gluster Community Guy ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replace brick of a dead node
Thank you Harsha for the quick response. Unfortunately, the infrastructure is in the cloud. So, I cant get the dead node's disk. Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new location (in conjunction with the one existing brick on the #2 node). Is that the self heal feature? I am using v3.11 as of now. Rajat - Original Message - From: Harshavardhana har...@gluster.com To: Rajat Chopra rcho...@redhat.com Cc: gluster-users@gluster.org Sent: Friday, August 12, 2011 2:06:14 PM Subject: Re: [Gluster-users] Replace brick of a dead node I have a two node cluster, with two bricks replicated, one on each node. Lets say one of the node dies and is unreachable. If you have the disk from the dead node, then all have to do is plug it in new system and start running following commands. gluster volume replace-brick volname old-brick new-brick start gluster volume replace-brick volname old-brick new-brick commit You don't have to migrate the data, this works as expected. Since you have a replicate you wouldn't see a downtime, but mind you self-heal will kick in as of 3.2 it will be blocking, wait for 3.3 you have non-blocking self-healing capabilities. I want to be able to spin a new node and replace the dead node's brick to a location on the new node. This is out of Gluster's hand, if you already have mechanisms to decommission a brick and reattach it on new node then above steps are fairly simple. Go ahead and try it, it should work. -Harsha ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replace brick of a dead node
On Fri, Aug 12, 2011 at 2:35 PM, Rajat Chopra rcho...@redhat.com wrote: Thank you Harsha for the quick response. Unfortunately, the infrastructure is in the cloud. So, I cant get the dead node's disk. Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. One way is http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server Other way is to use replace-brick. You should be able to use it even if the node is dead. I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new location (in conjunction with the one existing brick on the #2 node). Is that the self heal feature? I am using v3.11 as of now. Rajat - Original Message - From: Harshavardhana har...@gluster.com To: Rajat Chopra rcho...@redhat.com Cc: gluster-users@gluster.org Sent: Friday, August 12, 2011 2:06:14 PM Subject: Re: [Gluster-users] Replace brick of a dead node I have a two node cluster, with two bricks replicated, one on each node. Lets say one of the node dies and is unreachable. If you have the disk from the dead node, then all have to do is plug it in new system and start running following commands. gluster volume replace-brick volname old-brick new-brick start gluster volume replace-brick volname old-brick new-brick commit You don't have to migrate the data, this works as expected. Since you have a replicate you wouldn't see a downtime, but mind you self-heal will kick in as of 3.2 it will be blocking, wait for 3.3 you have non-blocking self-healing capabilities. I want to be able to spin a new node and replace the dead node's brick to a location on the new node. This is out of Gluster's hand, if you already have mechanisms to decommission a brick and reattach it on new node then above steps are fairly simple. Go ahead and try it, it should work. -Harsha ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster on an ARM system
For what it's worth, I've been running 3.2.0 for about 4 months now on ARM processors (Globalscale SheevaPlug (armv5tel) running Debian squeeze). I have 4 volumes, each running 2 bricks in replicated mode. I haven't seen anything like this. dcm On Fri, Aug 12, 2011 at 7:24 AM, Charles Williams ch...@itadmins.netwrote: As discussed with avati in IRC. I am able to setup a user account on the ARM box. I have also done a bit more tracing and have attached an strace of glusterd from startup to peer probe to core dump. chuck On 08/11/2011 08:50 PM, John Mark Walker wrote: Hi Charles, We have plans in the future to work on an ARM port, but that won't come to fruition for some time. I've CC'd the gluster-devel list in the hopes that someone there can help you out. However, my understanding is that it will take some significant porting to get GlusterFS to run in any production capacity on ARM. Once we have more news on the ARM front, I'll be happy to share it here and elsewhere. Please send all responses to gluster-devel, as that is the proper place for this conversation. Thanks, John Mark Walker Gluster Community Guy From: gluster-users-boun...@gluster.org [ gluster-users-boun...@gluster.org] on behalf of Charles Williams [ ch...@itadmins.net] Sent: Thursday, August 11, 2011 3:48 AM To: gluster-users@gluster.org Subject: Re: [Gluster-users] Gluster on an ARM system OK, running glusterd on the ARM box with gdb and then doing a gluster peer probe zmn1 I get the following from gdb when glusterd core dumps: [2011-08-11 12:46:35.326998] D [glusterd-utils.c:2627:glusterd_friend_find_by_hostname] 0-glusterd: Friend zmn1 found.. state: 0 Program received signal SIGSEGV, Segmentation fault. 0x4008e954 in rpc_transport_connect (this=0x45c48, port=0) at rpc-transport.c:810 810 ret = this-ops-connect (this, port); (gdb) On 08/11/2011 10:49 AM, Charles Williams wrote: sorry, that last lines of the debug info should be: [2011-08-11 10:38:21.499022] D [glusterd-utils.c:2627:glusterd_friend_find_by_hostname] 0-glusterd: Friend zmn1 found.. state: 0 Segmentation fault (core dumped) On 08/11/2011 10:46 AM, Charles Williams wrote: Hey all, So I went ahead and did a test install on my QNAP TS412U (ARM based) and all went well with the build and install. The problems started afterwards. QNAP (ARM server) config: volume management-zmn1 type mgmt/glusterd option working-directory /opt/etc/glusterd option transport-type socket option transport.address-family inet option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume zmn1 (Dell PowerEdge) config: volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket option transport.address-family inet option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume When I tried to do a peer probe from the QNAP server to add the first server into the cluster glusterd seg faulted with a core dump: [2011-08-11 10:38:21.457839] I [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req zmn1 24007 [2011-08-11 10:38:21.459508] D [glusterd-utils.c:213:glusterd_is_local_addr] 0-glusterd: zmn1 is not local [2011-08-11 10:38:21.460162] D [glusterd-utils.c:2675:glusterd_friend_find_by_hostname] 0-glusterd: Unable to find friend: zmn1 [2011-08-11 10:38:21.460682] D [glusterd-utils.c:2675:glusterd_friend_find_by_hostname] 0-glusterd: Unable to find friend: zmn1 [2011-08-11 10:38:21.460766] I [glusterd-handler.c:391:glusterd_friend_find] 0-glusterd: Unable to find hostname: zmn1 [2011-08-11 10:38:21.460843] I [glusterd-handler.c:3417:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: zmn1 (24007) [2011-08-11 10:38:21.460943] D [glusterd-utils.c:3080:glusterd_sm_tr_log_init] 0-: returning 0 [2011-08-11 10:38:21.461017] D [glusterd-utils.c:3169:glusterd_peerinfo_new] 0-: returning 0 [2011-08-11 10:38:21.461199] D [glusterd-handler.c:3323:glusterd_transport_inet_keepalive_options_build] 0-glusterd: Returning 0 [2011-08-11 10:38:21.465952] D [rpc-clnt.c:914:rpc_clnt_connection_init] 0-management-zmn1: defaulting frame-timeout to 30mins [2011-08-11 10:38:21.466146] D [rpc-transport.c:672:rpc_transport_load] 0-rpc-transport: attempt to load file /opt/lib/glusterfs/3.2.2/rpc-transport/socket.so [2011-08-11 10:38:21.466346] D [rpc-transport.c:97:__volume_option_value_validate] 0-management-zmn1: no range check required for 'option transport.socket.keepalive-time 10' [2011-08-11 10:38:21.466460] D [rpc-transport.c:97:__volume_option_value_validate] 0-management-zmn1: no range check required for 'option
Re: [Gluster-users] Gluster on an ARM system
Based on the discussions here, I think I should create an ARM resource page/section on the wiki. Looks like there's more community activity that I thought. -JM From: Devon Miller [devon.c.mil...@gmail.com] Sent: Friday, August 12, 2011 3:50 PM To: Charles Williams Cc: John Mark Walker; gluster-users@gluster.org; gluster-de...@nongnu.org Subject: Re: [Gluster-users] Gluster on an ARM system For what it's worth, I've been running 3.2.0 for about 4 months now on ARM processors (Globalscale SheevaPlug (armv5tel) running Debian squeeze). I have 4 volumes, each running 2 bricks in replicated mode. I haven't seen anything like this. dcm On Fri, Aug 12, 2011 at 7:24 AM, Charles Williams ch...@itadmins.netmailto:ch...@itadmins.net wrote: As discussed with avati in IRC. I am able to setup a user account on the ARM box. I have also done a bit more tracing and have attached an strace of glusterd from startup to peer probe to core dump. chuck On 08/11/2011 08:50 PM, John Mark Walker wrote: Hi Charles, We have plans in the future to work on an ARM port, but that won't come to fruition for some time. I've CC'd the gluster-devel list in the hopes that someone there can help you out. However, my understanding is that it will take some significant porting to get GlusterFS to run in any production capacity on ARM. Once we have more news on the ARM front, I'll be happy to share it here and elsewhere. Please send all responses to gluster-devel, as that is the proper place for this conversation. Thanks, John Mark Walker Gluster Community Guy From: gluster-users-boun...@gluster.orgmailto:gluster-users-boun...@gluster.org [gluster-users-boun...@gluster.orgmailto:gluster-users-boun...@gluster.org] on behalf of Charles Williams [ch...@itadmins.netmailto:ch...@itadmins.net] Sent: Thursday, August 11, 2011 3:48 AM To: gluster-users@gluster.orgmailto:gluster-users@gluster.org Subject: Re: [Gluster-users] Gluster on an ARM system OK, running glusterd on the ARM box with gdb and then doing a gluster peer probe zmn1 I get the following from gdb when glusterd core dumps: [2011-08-11 12:46:35.326998] D [glusterd-utils.c:2627:glusterd_friend_find_by_hostname] 0-glusterd: Friend zmn1 found.. state: 0 Program received signal SIGSEGV, Segmentation fault. 0x4008e954 in rpc_transport_connect (this=0x45c48, port=0) at rpc-transport.c:810 810 ret = this-ops-connect (this, port); (gdb) On 08/11/2011 10:49 AM, Charles Williams wrote: sorry, that last lines of the debug info should be: [2011-08-11 10:38:21.499022] D [glusterd-utils.c:2627:glusterd_friend_find_by_hostname] 0-glusterd: Friend zmn1 found.. state: 0 Segmentation fault (core dumped) On 08/11/2011 10:46 AM, Charles Williams wrote: Hey all, So I went ahead and did a test install on my QNAP TS412U (ARM based) and all went well with the build and install. The problems started afterwards. QNAP (ARM server) config: volume management-zmn1 type mgmt/glusterd option working-directory /opt/etc/glusterd option transport-type socket option transport.address-family inet option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume zmn1 (Dell PowerEdge) config: volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket option transport.address-family inet option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume When I tried to do a peer probe from the QNAP server to add the first server into the cluster glusterd seg faulted with a core dump: [2011-08-11 10:38:21.457839] I [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req zmn1 24007 [2011-08-11 10:38:21.459508] D [glusterd-utils.c:213:glusterd_is_local_addr] 0-glusterd: zmn1 is not local [2011-08-11 10:38:21.460162] D [glusterd-utils.c:2675:glusterd_friend_find_by_hostname] 0-glusterd: Unable to find friend: zmn1 [2011-08-11 10:38:21.460682] D [glusterd-utils.c:2675:glusterd_friend_find_by_hostname] 0-glusterd: Unable to find friend: zmn1 [2011-08-11 10:38:21.460766] I [glusterd-handler.c:391:glusterd_friend_find] 0-glusterd: Unable to find hostname: zmn1 [2011-08-11 10:38:21.460843] I [glusterd-handler.c:3417:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: zmn1 (24007) [2011-08-11 10:38:21.460943] D [glusterd-utils.c:3080:glusterd_sm_tr_log_init] 0-: returning 0 [2011-08-11 10:38:21.461017] D [glusterd-utils.c:3169:glusterd_peerinfo_new] 0-: returning 0 [2011-08-11 10:38:21.461199] D [glusterd-handler.c:3323:glusterd_transport_inet_keepalive_options_build] 0-glusterd: Returning 0 [2011-08-11 10:38:21.465952] D [rpc-clnt.c:914:rpc_clnt_connection_init] 0-management-zmn1: defaulting frame-timeout to 30mins
Re: [Gluster-users] Replace brick of a dead node
Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. Exactly if its in Cloud then the disk be it EBS blocks which can be reattached back to the new server and you can do a replace-brick even when the old-brick is dead/unreachable. If there are no EBS blocks, then sure there should be a mechanism to reattach the brick associated with that instance. I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new Gluster cannot decide as it has no awareness that it is in Cloud or a Bare metal or a KVM setup. So right now the above procedure stands good. -Harsha ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replace brick of a dead node
On Fri, Aug 12, 2011 at 4:03 PM, Harshavardhana har...@gluster.com wrote: Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. Exactly if its in Cloud then the disk be it EBS blocks which can be reattached back to the new server and you can do a replace-brick even when the old-brick is dead/unreachable. Here old-brick i meant 'old instance' not 'disk' per se. -Harsha ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replace brick of a dead node
I am afraid the 'replace-brick' procedure does not work well if the node is dead. Here is the (long-ish) step-wise procedure for the dead-end that I run into... [node-1 $] service glusterd start [node-1 $] gluster volume create my-vol replica 2 node-1:/srv-node-1-first node-1:/srv-node-1-second [node-1 $] gluster volume start my-vol # this began my gluster service on first node with two bricks replicated but sourcing from the same node # next I add a new node and replace one of the bricks with a new brick location on second node # the purpose is to achieve failover redundancy [node-2 $] service glusterd start [node-1 $] gluster peer probe node-2 [node-2 $] gluster peer probe node-1 [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-second node-2:/srv-node-2-third start # this starts the replace operation and after a while I can do volume info from either node [node-2 $] gluster volume info Volume Name: my-vol Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: node-1:/srv-node-1-first Brick2: node-2:/srv-node-2-third # all good so far... now node-1 dies (no EBS, no disk, no data... just not reachable.. its a pvt cloud and the machine running the vm had a hardware failure) # good gluster serves well from node-2 to all the clients nicely too # now I want to replace the node-1 brick to another brick in node-2 so that I can pass it on to new nodes later # so according to the suggestion, I ran replace-brick command [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth start # the command succeeds without errors, so I check status... [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth status # this command is supposed to return the status, but it returns nothing # I check with gluster volume info on node-2 [node-2 $] gluster volume info No volumes present # wha?? where did my volume go? # Note that all this while.. my mounted client is working fine, so no downtime Since 'gluster volume info' returned with 'No volumes present', I assume that the procedure does not work. Is there something wrong in my procedure, or was it not supposed to work anyway? I am using v3.1.1 Again, I really appreciate the help, but I seem to be stuck. The email suggested the procedure in the following link [ http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server ] It seems like a better way of replacing dead-nodes, but then it would seem that I cant replace the brick from dead node to a newly created path on an existing node, because I should have the hostname matched. That is fine too, if its a requirement, but can I assume that it will work with 3.1.1 or do I have to upgrade to 3.2 for it? Thanks again for the assistance. Rajat - Original Message - From: Mohit Anchlia mohitanch...@gmail.com To: Rajat Chopra rcho...@redhat.com Cc: Harshavardhana har...@gluster.com, gluster-users@gluster.org Sent: Friday, August 12, 2011 3:07:59 PM Subject: Re: [Gluster-users] Replace brick of a dead node On Fri, Aug 12, 2011 at 2:35 PM, Rajat Chopra rcho...@redhat.com wrote: Thank you Harsha for the quick response. Unfortunately, the infrastructure is in the cloud. So, I cant get the dead node's disk. Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. One way is http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server Other way is to use replace-brick. You should be able to use it even if the node is dead. I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new location (in conjunction with the one existing brick on the #2 node). Is that the self heal feature? I am using v3.11 as of now. Rajat - Original Message - From: Harshavardhana har...@gluster.com To: Rajat Chopra rcho...@redhat.com Cc: gluster-users@gluster.org Sent: Friday, August 12, 2011 2:06:14 PM Subject: Re: [Gluster-users] Replace brick of a dead node I have a two node cluster, with two bricks replicated, one on each node. Lets say one of the node dies and is unreachable. If you have the disk from the dead node, then all have to do is plug it in new system and start running following commands. gluster volume replace-brick volname old-brick new-brick start gluster volume replace-brick volname old-brick new-brick commit You don't have to migrate the data, this works as expected. Since you have a replicate you wouldn't see a downtime, but mind you self-heal will kick in as of 3.2 it will be blocking, wait for 3.3 you have non-blocking self-healing capabilities. I want to be able to
Re: [Gluster-users] Replace brick of a dead node
On Fri, Aug 12, 2011 at 4:49 PM, Rajat Chopra rcho...@redhat.com wrote: I am afraid the 'replace-brick' procedure does not work well if the node is dead. Here is the (long-ish) step-wise procedure for the dead-end that I run into... [node-1 $] service glusterd start [node-1 $] gluster volume create my-vol replica 2 node-1:/srv-node-1-first node-1:/srv-node-1-second [node-1 $] gluster volume start my-vol # this began my gluster service on first node with two bricks replicated but sourcing from the same node # next I add a new node and replace one of the bricks with a new brick location on second node # the purpose is to achieve failover redundancy [node-2 $] service glusterd start [node-1 $] gluster peer probe node-2 [node-2 $] gluster peer probe node-1 [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-second node-2:/srv-node-2-third start # this starts the replace operation and after a while I can do volume info from either node [node-2 $] gluster volume info Volume Name: my-vol Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: node-1:/srv-node-1-first Brick2: node-2:/srv-node-2-third # all good so far... now node-1 dies (no EBS, no disk, no data... just not reachable.. its a pvt cloud and the machine running the vm had a hardware failure) # good gluster serves well from node-2 to all the clients nicely too # now I want to replace the node-1 brick to another brick in node-2 so that I can pass it on to new nodes later # so according to the suggestion, I ran replace-brick command [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth start Did you run $ gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth commit ? If not, try running this additional command commit. This will make necessary changes to the config. So don't check status but run commit right after because dead node is not around. # the command succeeds without errors, so I check status... [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth status # this command is supposed to return the status, but it returns nothing # I check with gluster volume info on node-2 [node-2 $] gluster volume info No volumes present # wha?? where did my volume go? # Note that all this while.. my mounted client is working fine, so no downtime Since 'gluster volume info' returned with 'No volumes present', I assume that the procedure does not work. Is there something wrong in my procedure, or was it not supposed to work anyway? I am using v3.1.1 Again, I really appreciate the help, but I seem to be stuck. The email suggested the procedure in the following link [ http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server ] It seems like a better way of replacing dead-nodes, but then it would seem that I cant replace the brick from dead node to a newly created path on an existing node, because I should have the hostname matched. That is fine too, if its a requirement, but can I assume that it will work with 3.1.1 or do I have to upgrade to 3.2 for it? Thanks again for the assistance. Rajat - Original Message - From: Mohit Anchlia mohitanch...@gmail.com To: Rajat Chopra rcho...@redhat.com Cc: Harshavardhana har...@gluster.com, gluster-users@gluster.org Sent: Friday, August 12, 2011 3:07:59 PM Subject: Re: [Gluster-users] Replace brick of a dead node On Fri, Aug 12, 2011 at 2:35 PM, Rajat Chopra rcho...@redhat.com wrote: Thank you Harsha for the quick response. Unfortunately, the infrastructure is in the cloud. So, I cant get the dead node's disk. Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. One way is http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server Other way is to use replace-brick. You should be able to use it even if the node is dead. I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new location (in conjunction with the one existing brick on the #2 node). Is that the self heal feature? I am using v3.11 as of now. Rajat - Original Message - From: Harshavardhana har...@gluster.com To: Rajat Chopra rcho...@redhat.com Cc: gluster-users@gluster.org Sent: Friday, August 12, 2011 2:06:14 PM Subject: Re: [Gluster-users] Replace brick of a dead node I have a two node cluster, with two bricks replicated, one on each node. Lets say one of the node dies and is unreachable. If you have the disk from the dead node, then all have to do is plug it in new system and
Re: [Gluster-users] Replace brick of a dead node
On Friday 12 August 2011 22:58:22 Rajat Chopra wrote: Is there a way I can achieve that without any downtime? Maybe [1] could help? [1] http://bugs.gluster.com/show_bug.cgi?id=2506#c3 best regards, Marcel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users