Re: [Gluster-users] Gluster Startup Issue

2016-06-26 Thread Danny Lee
Actually, never mind.  I had another background process running that was
checking the state of the gluster cluster.  I turned that off and when I
ran "gluster volume heal appian full" on server-ip-1, only the
server-ip-1's tailed logs showed anything.  The other two server's logs
didn't output anything.

On Sun, Jun 26, 2016 at 2:02 PM, Danny Lee  wrote:

> Originally, I ran "sudo gluster volume heal appian full" on server-ip-1
> and then tailed the logs for all of the servers.  The only thing that
> showed up was the logs for server-ip-1, so I thought it wasn't even
> connecting to the other boxes.  But after about 15 seconds, logs showed up
> in server-ip-2 and server-ip-3.  Thanks for pointing that out, Joe.
>
> Here are the tailed logs.  I also noticed that there were some locking
> errors that popped up once in a while in
> the etc-glusterfs-glusterd.vol.log.  I have also added these logs below.
>
>  server-ip-1 
>
> ==> etc-glusterfs-glusterd.vol.log <==
> [2016-06-26 16:48:31.405513] I [MSGID: 106533]
> [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume volname
> [2016-06-26 16:48:31.409903] E [MSGID: 106153]
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> server-ip-2. Please check log file for details.
>
> ==> cli.log <==
> [2016-06-26 16:48:51.208828] I [cli.c:721:main] 0-cli: Started running
> gluster with version 3.7.11
> [2016-06-26 16:48:51.213391] I
> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
> installed
> [2016-06-26 16:48:51.213674] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2016-06-26 16:48:51.213733] I [socket.c:2356:socket_event_handler]
> 0-transport: disconnecting now
> [2016-06-26 16:48:51.219674] I [cli-rpc-ops.c:8417:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2016-06-26 16:48:51.219768] I [input.c:36:cli_batch] 0-: Exiting with: -1
>
> ==> cmd_history.log <==
> [2016-06-26 16:48:51.219596]  : volume heal volname full : FAILED : Commit
> failed on server-ip-2. Please check log file for details.
>
> ==> etc-glusterfs-glusterd.vol.log <==
> [2016-06-26 16:48:51.214185] I [MSGID: 106533]
> [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume volname
> [2016-06-26 16:48:51.219031] E [MSGID: 106153]
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> server-ip-2. Please check log file for details.
>
>
>  server-ip-2 
>
> ==> etc-glusterfs-glusterd.vol.log <==
> [2016-06-26 16:48:30.087365] I [MSGID: 106533]
> [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume volname
> [2016-06-26 16:48:30.092829] E [MSGID: 106153]
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> server-ip-2. Please check log file for details.
>
> ==> cli.log <==
> [2016-06-26 16:49:30.099446] I [cli.c:721:main] 0-cli: Started running
> gluster with version 3.7.11
> [2016-06-26 16:49:30.104599] I
> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
> installed
> [2016-06-26 16:49:30.104853] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2016-06-26 16:49:30.104896] I [socket.c:2356:socket_event_handler]
> 0-transport: disconnecting now
> [2016-06-26 16:49:30.177924] I [input.c:36:cli_batch] 0-: Exiting with: 0
>
>
>  server-ip-3 
>
> ==> cli.log <==
> [2016-06-26 16:48:49.177859] I [cli.c:721:main] 0-cli: Started running
> gluster with version 3.7.11
> [2016-06-26 16:48:49.182887] I
> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
> installed
> [2016-06-26 16:48:49.183146] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2016-06-26 16:48:49.183188] I [socket.c:2356:socket_event_handler]
> 0-transport: disconnecting now
> [2016-06-26 16:48:49.189005] I [cli-rpc-ops.c:8417:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2016-06-26 16:48:49.189058] I [input.c:36:cli_batch] 0-: Exiting with: -1
>
>
>  All servers 
>
> ==> glfsheal-volname.log <==
> [2016-06-26 16:51:47.493809] I [MSGID: 104045] [glfs-master.c:95:notify]
> 0-gfapi: New graph 6766732d-332d-6272-6963-6b2d656d7074 (0) coming up
> [2016-06-26 16:51:47.493841] I [MSGID: 114020] [client.c:2106:notify]
> 0-volname-client-0: parent translators are ready, attempting connect on
> transport
> [2016-06-26 16:51:47.496465] I [MSGID: 114020] [client.c:2106:notify]
> 0-volname-client-1: parent translators are ready, attempting connect on
> transport
> [2016-06-26 16:51:47.496729] I [rpc-clnt.c:1868:rpc_clnt_reconfig]
> 0-volname-client-0: changing port to 49152 (from 0)
> [2016-06-26 16:51:47.498945] I [MSGID: 114020] [client.c:2106:notify]
> 0-volname-client-2: parent translators are ready, att

Re: [Gluster-users] Gluster Startup Issue

2016-06-26 Thread Danny Lee
Originally, I ran "sudo gluster volume heal appian full" on server-ip-1 and
then tailed the logs for all of the servers.  The only thing that showed up
was the logs for server-ip-1, so I thought it wasn't even connecting to the
other boxes.  But after about 15 seconds, logs showed up in server-ip-2 and
server-ip-3.  Thanks for pointing that out, Joe.

Here are the tailed logs.  I also noticed that there were some locking
errors that popped up once in a while in
the etc-glusterfs-glusterd.vol.log.  I have also added these logs below.

 server-ip-1 

==> etc-glusterfs-glusterd.vol.log <==
[2016-06-26 16:48:31.405513] I [MSGID: 106533]
[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume volname
[2016-06-26 16:48:31.409903] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
server-ip-2. Please check log file for details.

==> cli.log <==
[2016-06-26 16:48:51.208828] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2016-06-26 16:48:51.213391] I
[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
installed
[2016-06-26 16:48:51.213674] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2016-06-26 16:48:51.213733] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2016-06-26 16:48:51.219674] I [cli-rpc-ops.c:8417:gf_cli_heal_volume_cbk]
0-cli: Received resp to heal volume
[2016-06-26 16:48:51.219768] I [input.c:36:cli_batch] 0-: Exiting with: -1

==> cmd_history.log <==
[2016-06-26 16:48:51.219596]  : volume heal volname full : FAILED : Commit
failed on server-ip-2. Please check log file for details.

==> etc-glusterfs-glusterd.vol.log <==
[2016-06-26 16:48:51.214185] I [MSGID: 106533]
[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume volname
[2016-06-26 16:48:51.219031] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
server-ip-2. Please check log file for details.


 server-ip-2 

==> etc-glusterfs-glusterd.vol.log <==
[2016-06-26 16:48:30.087365] I [MSGID: 106533]
[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume volname
[2016-06-26 16:48:30.092829] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
server-ip-2. Please check log file for details.

==> cli.log <==
[2016-06-26 16:49:30.099446] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2016-06-26 16:49:30.104599] I
[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
installed
[2016-06-26 16:49:30.104853] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2016-06-26 16:49:30.104896] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2016-06-26 16:49:30.177924] I [input.c:36:cli_batch] 0-: Exiting with: 0


 server-ip-3 

==> cli.log <==
[2016-06-26 16:48:49.177859] I [cli.c:721:main] 0-cli: Started running
gluster with version 3.7.11
[2016-06-26 16:48:49.182887] I
[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
installed
[2016-06-26 16:48:49.183146] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2016-06-26 16:48:49.183188] I [socket.c:2356:socket_event_handler]
0-transport: disconnecting now
[2016-06-26 16:48:49.189005] I [cli-rpc-ops.c:8417:gf_cli_heal_volume_cbk]
0-cli: Received resp to heal volume
[2016-06-26 16:48:49.189058] I [input.c:36:cli_batch] 0-: Exiting with: -1


 All servers 

==> glfsheal-volname.log <==
[2016-06-26 16:51:47.493809] I [MSGID: 104045] [glfs-master.c:95:notify]
0-gfapi: New graph 6766732d-332d-6272-6963-6b2d656d7074 (0) coming up
[2016-06-26 16:51:47.493841] I [MSGID: 114020] [client.c:2106:notify]
0-volname-client-0: parent translators are ready, attempting connect on
transport
[2016-06-26 16:51:47.496465] I [MSGID: 114020] [client.c:2106:notify]
0-volname-client-1: parent translators are ready, attempting connect on
transport
[2016-06-26 16:51:47.496729] I [rpc-clnt.c:1868:rpc_clnt_reconfig]
0-volname-client-0: changing port to 49152 (from 0)
[2016-06-26 16:51:47.498945] I [MSGID: 114020] [client.c:2106:notify]
0-volname-client-2: parent translators are ready, attempting connect on
transport
[2016-06-26 16:51:47.501600] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-volname-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2016-06-26 16:51:47.502008] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-volname-client-0:
Connected to volname-client-0, attached to remote volume
'/usr/local/volname/local-data/mirrored-data'.
[2016-06-26 16:51:47.502031] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-volname-client-0: Server

Re: [Gluster-users] Gluster Startup Issue

2016-06-25 Thread Joe Julian
Notice it actually tells you to look in the logs on server-ip-2 but you did not 
include any logs from that server.

On June 21, 2016 10:22:14 AM PDT, Danny Lee  wrote:
>Hello,
>
>We are currently figuring out how to add GlusterFS to our system to
>make
>our systems highly available using scripts.  We are using Gluster
>3.7.11.
>
>Problem:
>Trying to migrate to GlusterFS from a non-clustered system to a 3-node
>glusterfs replicated cluster using scripts.  Tried various things to
>make
>this work, but it sometimes causes us to be in an indesirable state
>where
>if you call "gluster volume heal  full", we would get the
>error
>message, "Launching heal operation to perform full self heal on volume
> has been unsuccessful on bricks that are down. Please check
>if
>all brick processes are running."  All the brick processes are running
>based on running the command, "gluster volume status volname"
>
>Things we have tried:
>Order of preference
>1. Create Volume with 3 Filesystems with the same data
>2. Create Volume with 2 Empty filesysytems and one with the data
>3. Create Volume with only one filesystem with data and then using
>"add-brick" command to add the other two empty filesystems
>4. Create Volume with one empty filesystem, mounting it, and then
>copying
>the data over to that one.  And then finally, using "add-brick" command
>to
>add the other two empty filesystems
>5. Create Volume with 3 empty filesystems, mounting it, and then
>copying
>the data over
>
>Other things to note:
>A few minutes after the volume is created and started successfully, our
>application server starts up against it, so reads and writes may happen
>pretty quickly after the volume has started.  But there is only about
>50MB
>of data.
>
>Steps to reproduce (all in a script):
># This is run by the primary node with the IP Adress, ,
>that
>has data
>systemctl restart glusterd
>gluster peer probe 
>gluster peer probe 
>Wait for "gluster peer status" to all be in "Peer in Cluster" state
>gluster volume create  replica 3 transport tcp ${BRICKS[0]}
>${BRICKS[1]} ${BRICKS[2]} force
>gluster volume set  nfs.disable true
>gluster volume start 
>mkdir -p $MOUNT_POINT
>mount -t glusterfs :/volname $MOUNT_POINT
>find $MOUNT_POINT | xargs stat
>
>Note that, when we added sleeps around the gluster commands, there was
>a
>higher probability of success, but not 100%.
>
># Once volume is started, all the the clients/servers will mount the
>gluster filesystem by polling "mountpoint -q $MOUNT_POINT":
>mkdir -p $MOUNT_POINT
>mount -t glusterfs :/volname $MOUNT_POINT
>
>Logs:
>*etc-glusterfs-glusterd.vol.log* in *server-ip-1*
>
>[2016-06-21 14:10:38.285234] I [MSGID: 106533]
>[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
>0-management:
>Received heal vol req for volume volname
>[2016-06-21 14:10:38.296801] E [MSGID: 106153]
>[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
>. Please check log file for details.
>
>
>*usr-local-volname-data-mirrored-data.log* in *server-ip-1*
>
>[2016-06-21 14:14:39.233366] E [MSGID: 114058]
>[client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
>failed to get the port number for remote subvolume. Please run 'gluster
>volume status' on server to see if brick process is running.
>*I think this is caused by the self heal daemon*
>
>*cmd_history.log* in *server-ip-1*
>
>[2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :
>Commit
>failed on . Please check log file for details.
>
>
>
>
>___
>Gluster-users mailing list
>Gluster-users@gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Startup Issue

2016-06-24 Thread Danny Lee
So I've tried using a lot of your script, but I'm still unable to get past
the "Launching heal operation to perform full self heal on volume 
has been unsuccessful on bricks that are down. Please check if all brick
processes are running." error message.  Everything else seems like it's
working, but the "gluster volume heal appian full" is never working.

Is there any way to figure out what exactly happened that would cause this
error message?  The logs don't seem very useful in determining what exactly
happened.  It seems to just state that it can't seem to "Commit" with the
other bricks.

When I restart the volume though, it sometimes fixes it, but not sure I
want to run a script that constantly restarts the volume until "gluster
volume heal appian full" is working.

On Thu, Jun 23, 2016 at 2:21 AM, Heiko L.  wrote:

>
> hostname not needed
>
> # nodea=10.1.1.100;bricka=/mnt/sda6/brick4
> should be working
>
> but I prefer like to work with hostnames.
>
>
> regards heiko
>
> PS i forgot notes:
> - xfs,zfs (ext3 work, but partially bad performance (V3.4))
> - brickdir should not be topdir of fs
>   /dev/sda6 /mnt/brick4, brick=/mnt/brick4 ->  not recommended
>   /dev/sda6 /mnt/sda6,   brick=/mnt/sda6/brick4 better
>
> > Thank you for responding, Heiko.  In the process of seeing the
> differences
> > between our two scripts.  First thing I noticed was that the notes
> states "need
> > to be defined in the /etc/hosts". Would using the IP address directly be
> a
> > problem?
> >
> > On Tue, Jun 21, 2016 at 2:10 PM, Heiko L.  wrote:
> >
> >> Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
> >> > Hello,
> >> >
> >> >
> >> > We are currently figuring out how to add GlusterFS to our system to
> make
> >> > our systems highly available using scripts.  We are using Gluster
> 3.7.11.
> >> >
> >> > Problem:
> >> > Trying to migrate to GlusterFS from a non-clustered system to a 3-node
> >> > glusterfs replicated cluster using scripts.  Tried various things to
> >> make this work, but it sometimes causes us to be in an
> >> > indesirable state where if you call "gluster volume heal 
> >> full", we would get the error message, "Launching heal
> >> > operation to perform full self heal on volume  has been
> >> unsuccessful on bricks that are down. Please check if
> >> > all brick processes are running."  All the brick processes are running
> >> based on running the command, "gluster volume status
> >> > volname"
> >> >
> >> > Things we have tried:
> >> > Order of preference
> >> > 1. Create Volume with 3 Filesystems with the same data
> >> > 2. Create Volume with 2 Empty filesysytems and one with the data
> >> > 3. Create Volume with only one filesystem with data and then using
> >> > "add-brick" command to add the other two empty filesystems
> >> > 4. Create Volume with one empty filesystem, mounting it, and then
> copying
> >> > the data over to that one.  And then finally, using "add-brick"
> command
> >> to add the other two empty filesystems
> >> - should be working
> >> - read each file on /mnt/gvol, to trigger replication [2]
> >>
> >> > 5. Create Volume
> >> > with 3 empty filesystems, mounting it, and then copying the data over
> >> - my favorite
> >>
> >> >
> >> > Other things to note:
> >> > A few minutes after the volume is created and started successfully,
> our
> >> > application server starts up against it, so reads and writes may
> happen
> >> pretty quickly after the volume has started.  But there
> >> > is only about 50MB of data.
> >> >
> >> > Steps to reproduce (all in a script):
> >> > # This is run by the primary node with the IP Adress, ,
> that
> >> > has data systemctl restart glusterd gluster peer probe 
> >> gluster peer probe  Wait for "gluster peer
> >> > status" to all be in "Peer in Cluster" state gluster volume create
> >>  replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
> >> > ${BRICKS[2]} force
> >> > gluster volume set  nfs.disable true gluster volume start
> >>  mkdir -p $MOUNT_POINT mount -t glusterfs
> >> > :/volname $MOUNT_POINT
> >> > find $MOUNT_POINT | xargs stat
> >>
> >> I have written a script for 2 nodes. [1]
> >> but should be at least 3 nodes.
> >>
> >>
> >> I hope it helps you
> >> regards Heiko
> >>
> >> >
> >> > Note that, when we added sleeps around the gluster commands, there
> was a
> >> > higher probability of success, but not 100%.
> >> >
> >> > # Once volume is started, all the the clients/servers will mount the
> >> > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p
> >> $MOUNT_POINT mount -t glusterfs :/volname
> >> > $MOUNT_POINT
> >> >
> >> >
> >> > Logs:
> >> > *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
> >> >
> >> >
> >> > [2016-06-21 14:10:38.285234] I [MSGID: 106533]
> >> > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
> >> 0-management:
> >> > Received heal vol req for volume volname
> >> > [2016-06-21 14:10:38.296801] E [MSGID: 106153]
> >> > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed o

Re: [Gluster-users] Gluster Startup Issue

2016-06-22 Thread Heiko L.

hostname not needed

# nodea=10.1.1.100;bricka=/mnt/sda6/brick4
should be working

but I prefer like to work with hostnames.


regards heiko

PS i forgot notes:
- xfs,zfs (ext3 work, but partially bad performance (V3.4))
- brickdir should not be topdir of fs
  /dev/sda6 /mnt/brick4, brick=/mnt/brick4 ->  not recommended
  /dev/sda6 /mnt/sda6,   brick=/mnt/sda6/brick4 better

> Thank you for responding, Heiko.  In the process of seeing the differences
> between our two scripts.  First thing I noticed was that the notes states 
> "need
> to be defined in the /etc/hosts". Would using the IP address directly be a
> problem?
>
> On Tue, Jun 21, 2016 at 2:10 PM, Heiko L.  wrote:
>
>> Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
>> > Hello,
>> >
>> >
>> > We are currently figuring out how to add GlusterFS to our system to make
>> > our systems highly available using scripts.  We are using Gluster 3.7.11.
>> >
>> > Problem:
>> > Trying to migrate to GlusterFS from a non-clustered system to a 3-node
>> > glusterfs replicated cluster using scripts.  Tried various things to
>> make this work, but it sometimes causes us to be in an
>> > indesirable state where if you call "gluster volume heal 
>> full", we would get the error message, "Launching heal
>> > operation to perform full self heal on volume  has been
>> unsuccessful on bricks that are down. Please check if
>> > all brick processes are running."  All the brick processes are running
>> based on running the command, "gluster volume status
>> > volname"
>> >
>> > Things we have tried:
>> > Order of preference
>> > 1. Create Volume with 3 Filesystems with the same data
>> > 2. Create Volume with 2 Empty filesysytems and one with the data
>> > 3. Create Volume with only one filesystem with data and then using
>> > "add-brick" command to add the other two empty filesystems
>> > 4. Create Volume with one empty filesystem, mounting it, and then copying
>> > the data over to that one.  And then finally, using "add-brick" command
>> to add the other two empty filesystems
>> - should be working
>> - read each file on /mnt/gvol, to trigger replication [2]
>>
>> > 5. Create Volume
>> > with 3 empty filesystems, mounting it, and then copying the data over
>> - my favorite
>>
>> >
>> > Other things to note:
>> > A few minutes after the volume is created and started successfully, our
>> > application server starts up against it, so reads and writes may happen
>> pretty quickly after the volume has started.  But there
>> > is only about 50MB of data.
>> >
>> > Steps to reproduce (all in a script):
>> > # This is run by the primary node with the IP Adress, , that
>> > has data systemctl restart glusterd gluster peer probe 
>> gluster peer probe  Wait for "gluster peer
>> > status" to all be in "Peer in Cluster" state gluster volume create
>>  replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
>> > ${BRICKS[2]} force
>> > gluster volume set  nfs.disable true gluster volume start
>>  mkdir -p $MOUNT_POINT mount -t glusterfs
>> > :/volname $MOUNT_POINT
>> > find $MOUNT_POINT | xargs stat
>>
>> I have written a script for 2 nodes. [1]
>> but should be at least 3 nodes.
>>
>>
>> I hope it helps you
>> regards Heiko
>>
>> >
>> > Note that, when we added sleeps around the gluster commands, there was a
>> > higher probability of success, but not 100%.
>> >
>> > # Once volume is started, all the the clients/servers will mount the
>> > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p
>> $MOUNT_POINT mount -t glusterfs :/volname
>> > $MOUNT_POINT
>> >
>> >
>> > Logs:
>> > *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
>> >
>> >
>> > [2016-06-21 14:10:38.285234] I [MSGID: 106533]
>> > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
>> 0-management:
>> > Received heal vol req for volume volname
>> > [2016-06-21 14:10:38.296801] E [MSGID: 106153]
>> > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
>> > . Please check log file for details.
>> >
>> >
>> >
>> > *usr-local-volname-data-mirrored-data.log* in *server-ip-1*
>> >
>> >
>> > [2016-06-21 14:14:39.233366] E [MSGID: 114058]
>> > [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
>> > failed to get the port number for remote subvolume. Please run 'gluster
>> volume status' on server to see if brick process is
>> > running. *I think this is caused by the self heal daemon*
>> >
>> >
>> > *cmd_history.log* in *server-ip-1*
>> >
>> >
>> > [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :
>> Commit
>> > failed on . Please check log file for details.
>> ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>
>> [1]
>> http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt
>>   - old, limit 2 nodes
>>
>>
>> --
>>
>>
>>
>


___
Gluster-users maili

Re: [Gluster-users] Gluster Startup Issue

2016-06-22 Thread Danny Lee
Thank you for responding, Heiko.  In the process of seeing the differences
between our two scripts.  First thing I noticed was that the notes states "need
to be defined in the /etc/hosts". Would using the IP address directly be a
problem?

On Tue, Jun 21, 2016 at 2:10 PM, Heiko L.  wrote:

> Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
> > Hello,
> >
> >
> > We are currently figuring out how to add GlusterFS to our system to make
> > our systems highly available using scripts.  We are using Gluster 3.7.11.
> >
> > Problem:
> > Trying to migrate to GlusterFS from a non-clustered system to a 3-node
> > glusterfs replicated cluster using scripts.  Tried various things to
> make this work, but it sometimes causes us to be in an
> > indesirable state where if you call "gluster volume heal 
> full", we would get the error message, "Launching heal
> > operation to perform full self heal on volume  has been
> unsuccessful on bricks that are down. Please check if
> > all brick processes are running."  All the brick processes are running
> based on running the command, "gluster volume status
> > volname"
> >
> > Things we have tried:
> > Order of preference
> > 1. Create Volume with 3 Filesystems with the same data
> > 2. Create Volume with 2 Empty filesysytems and one with the data
> > 3. Create Volume with only one filesystem with data and then using
> > "add-brick" command to add the other two empty filesystems
> > 4. Create Volume with one empty filesystem, mounting it, and then copying
> > the data over to that one.  And then finally, using "add-brick" command
> to add the other two empty filesystems
> - should be working
> - read each file on /mnt/gvol, to trigger replication [2]
>
> > 5. Create Volume
> > with 3 empty filesystems, mounting it, and then copying the data over
> - my favorite
>
> >
> > Other things to note:
> > A few minutes after the volume is created and started successfully, our
> > application server starts up against it, so reads and writes may happen
> pretty quickly after the volume has started.  But there
> > is only about 50MB of data.
> >
> > Steps to reproduce (all in a script):
> > # This is run by the primary node with the IP Adress, , that
> > has data systemctl restart glusterd gluster peer probe 
> gluster peer probe  Wait for "gluster peer
> > status" to all be in "Peer in Cluster" state gluster volume create
>  replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
> > ${BRICKS[2]} force
> > gluster volume set  nfs.disable true gluster volume start
>  mkdir -p $MOUNT_POINT mount -t glusterfs
> > :/volname $MOUNT_POINT
> > find $MOUNT_POINT | xargs stat
>
> I have written a script for 2 nodes. [1]
> but should be at least 3 nodes.
>
>
> I hope it helps you
> regards Heiko
>
> >
> > Note that, when we added sleeps around the gluster commands, there was a
> > higher probability of success, but not 100%.
> >
> > # Once volume is started, all the the clients/servers will mount the
> > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p
> $MOUNT_POINT mount -t glusterfs :/volname
> > $MOUNT_POINT
> >
> >
> > Logs:
> > *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
> >
> >
> > [2016-06-21 14:10:38.285234] I [MSGID: 106533]
> > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
> 0-management:
> > Received heal vol req for volume volname
> > [2016-06-21 14:10:38.296801] E [MSGID: 106153]
> > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> > . Please check log file for details.
> >
> >
> >
> > *usr-local-volname-data-mirrored-data.log* in *server-ip-1*
> >
> >
> > [2016-06-21 14:14:39.233366] E [MSGID: 114058]
> > [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
> > failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is
> > running. *I think this is caused by the self heal daemon*
> >
> >
> > *cmd_history.log* in *server-ip-1*
> >
> >
> > [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :
> Commit
> > failed on . Please check log file for details.
> ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
> [1]
> http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt
>   - old, limit 2 nodes
>
>
> --
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster Startup Issue

2016-06-21 Thread Danny Lee
Hello,

We are currently figuring out how to add GlusterFS to our system to make
our systems highly available using scripts.  We are using Gluster 3.7.11.

Problem:
Trying to migrate to GlusterFS from a non-clustered system to a 3-node
glusterfs replicated cluster using scripts.  Tried various things to make
this work, but it sometimes causes us to be in an indesirable state where
if you call "gluster volume heal  full", we would get the error
message, "Launching heal operation to perform full self heal on volume
 has been unsuccessful on bricks that are down. Please check if
all brick processes are running."  All the brick processes are running
based on running the command, "gluster volume status volname"

Things we have tried:
Order of preference
1. Create Volume with 3 Filesystems with the same data
2. Create Volume with 2 Empty filesysytems and one with the data
3. Create Volume with only one filesystem with data and then using
"add-brick" command to add the other two empty filesystems
4. Create Volume with one empty filesystem, mounting it, and then copying
the data over to that one.  And then finally, using "add-brick" command to
add the other two empty filesystems
5. Create Volume with 3 empty filesystems, mounting it, and then copying
the data over

Other things to note:
  A few minutes after the volume is created and started successfully, our
application server starts up against it, so reads and writes may happen
pretty quickly after the volume has started.  But there is only about 50MB
of data.

Steps to reproduce (all in a script):
# This is run by the primary node with the IP Adress, , that
has data
systemctl restart glusterd
gluster peer probe 
gluster peer probe 
Wait for "gluster peer status" to all be in "Peer in Cluster" state
gluster volume create  replica 3 transport tcp ${BRICKS[0]}
${BRICKS[1]} ${BRICKS[2]} force
gluster volume set  nfs.disable true
gluster volume start 
mkdir -p $MOUNT_POINT
mount -t glusterfs :/volname $MOUNT_POINT
find $MOUNT_POINT | xargs stat

Note that, when we added sleeps around the gluster commands, there was a
higher probability of success, but not 100%.

# Once volume is started, all the the clients/servers will mount the
gluster filesystem by polling "mountpoint -q $MOUNT_POINT":
mkdir -p $MOUNT_POINT
mount -t glusterfs :/volname $MOUNT_POINT

Logs:
*etc-glusterfs-glusterd.vol.log* in *server-ip-1*

[2016-06-21 14:10:38.285234] I [MSGID: 106533]
[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume volname
[2016-06-21 14:10:38.296801] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
. Please check log file for details.


*usr-local-volname-data-mirrored-data.log* in *server-ip-1*

[2016-06-21 14:14:39.233366] E [MSGID: 114058]
[client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
*I think this is caused by the self heal daemon*

*cmd_history.log* in *server-ip-1*

[2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED : Commit
failed on . Please check log file for details.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Startup Issue

2016-06-21 Thread Heiko L.
Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
> Hello,
>
>
> We are currently figuring out how to add GlusterFS to our system to make
> our systems highly available using scripts.  We are using Gluster 3.7.11.
>
> Problem:
> Trying to migrate to GlusterFS from a non-clustered system to a 3-node
> glusterfs replicated cluster using scripts.  Tried various things to make 
> this work, but it sometimes causes us to be in an
> indesirable state where if you call "gluster volume heal  full", we 
> would get the error message, "Launching heal
> operation to perform full self heal on volume  has been unsuccessful 
> on bricks that are down. Please check if
> all brick processes are running."  All the brick processes are running based 
> on running the command, "gluster volume status
> volname"
>
> Things we have tried:
> Order of preference
> 1. Create Volume with 3 Filesystems with the same data
> 2. Create Volume with 2 Empty filesysytems and one with the data
> 3. Create Volume with only one filesystem with data and then using
> "add-brick" command to add the other two empty filesystems
> 4. Create Volume with one empty filesystem, mounting it, and then copying
> the data over to that one.  And then finally, using "add-brick" command to 
> add the other two empty filesystems
- should be working
- read each file on /mnt/gvol, to trigger replication [2]

> 5. Create Volume
> with 3 empty filesystems, mounting it, and then copying the data over
- my favorite

>
> Other things to note:
> A few minutes after the volume is created and started successfully, our
> application server starts up against it, so reads and writes may happen 
> pretty quickly after the volume has started.  But there
> is only about 50MB of data.
>
> Steps to reproduce (all in a script):
> # This is run by the primary node with the IP Adress, , that
> has data systemctl restart glusterd gluster peer probe  gluster 
> peer probe  Wait for "gluster peer
> status" to all be in "Peer in Cluster" state gluster volume create  
> replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
> ${BRICKS[2]} force
> gluster volume set  nfs.disable true gluster volume start  
> mkdir -p $MOUNT_POINT mount -t glusterfs
> :/volname $MOUNT_POINT
> find $MOUNT_POINT | xargs stat

I have written a script for 2 nodes. [1]
but should be at least 3 nodes.


I hope it helps you
regards Heiko

>
> Note that, when we added sleeps around the gluster commands, there was a
> higher probability of success, but not 100%.
>
> # Once volume is started, all the the clients/servers will mount the
> gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p 
> $MOUNT_POINT mount -t glusterfs :/volname
> $MOUNT_POINT
>
>
> Logs:
> *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
>
>
> [2016-06-21 14:10:38.285234] I [MSGID: 106533]
> [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume volname
> [2016-06-21 14:10:38.296801] E [MSGID: 106153]
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> . Please check log file for details.
>
>
>
> *usr-local-volname-data-mirrored-data.log* in *server-ip-1*
>
>
> [2016-06-21 14:14:39.233366] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
> failed to get the port number for remote subvolume. Please run 'gluster 
> volume status' on server to see if brick process is
> running. *I think this is caused by the self heal daemon*
>
>
> *cmd_history.log* in *server-ip-1*
>
>
> [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED : Commit
> failed on . Please check log file for details. 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

[1] 
http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt
  - old, limit 2 nodes


-- 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster Startup Issue

2016-06-21 Thread Danny Lee
Hello,

We are currently figuring out how to add GlusterFS to our system to make
our systems highly available using scripts.  We are using Gluster 3.7.11.

Problem:
Trying to migrate to GlusterFS from a non-clustered system to a 3-node
glusterfs replicated cluster using scripts.  Tried various things to make
this work, but it sometimes causes us to be in an indesirable state where
if you call "gluster volume heal  full", we would get the error
message, "Launching heal operation to perform full self heal on volume
 has been unsuccessful on bricks that are down. Please check if
all brick processes are running."  All the brick processes are running
based on running the command, "gluster volume status volname"

Things we have tried:
Order of preference
1. Create Volume with 3 Filesystems with the same data
2. Create Volume with 2 Empty filesysytems and one with the data
3. Create Volume with only one filesystem with data and then using
"add-brick" command to add the other two empty filesystems
4. Create Volume with one empty filesystem, mounting it, and then copying
the data over to that one.  And then finally, using "add-brick" command to
add the other two empty filesystems
5. Create Volume with 3 empty filesystems, mounting it, and then copying
the data over

Other things to note:
  A few minutes after the volume is created and started successfully, our
application server starts up against it, so reads and writes may happen
pretty quickly after the volume has started.  But there is only about 50MB
of data.

Steps to reproduce (all in a script):
# This is run by the primary node with the IP Adress, , that
has data
systemctl restart glusterd
gluster peer probe 
gluster peer probe 
Wait for "gluster peer status" to all be in "Peer in Cluster" state
gluster volume create  replica 3 transport tcp ${BRICKS[0]}
${BRICKS[1]} ${BRICKS[2]} force
gluster volume set  nfs.disable true
gluster volume start 
mkdir -p $MOUNT_POINT
mount -t glusterfs :/volname $MOUNT_POINT
find $MOUNT_POINT | xargs stat

Note that, when we added sleeps around the gluster commands, there was a
higher probability of success, but not 100%.

# Once volume is started, all the the clients/servers will mount the
gluster filesystem by polling "mountpoint -q $MOUNT_POINT":
mkdir -p $MOUNT_POINT
mount -t glusterfs :/volname $MOUNT_POINT

Logs:
*etc-glusterfs-glusterd.vol.log* in *server-ip-1*

[2016-06-21 14:10:38.285234] I [MSGID: 106533]
[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume volname
[2016-06-21 14:10:38.296801] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
. Please check log file for details.


*usr-local-volname-data-mirrored-data.log* in *server-ip-1*

[2016-06-21 14:14:39.233366] E [MSGID: 114058]
[client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
*I think this is caused by the self heal daemon*

*cmd_history.log* in *server-ip-1*

[2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED : Commit
failed on . Please check log file for details.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users