Hi Atin, please reply, is there any configurable time out parameter for brick process to go offline which we can increase?
Regards, Abhishek On Thu, Apr 21, 2016 at 12:34 PM, ABHISHEK PALIWAL <abhishpali...@gmail.com> wrote: > Hi Atin, > > Please answer following doubts as well: > > 1 .If there is a temporary glitch in the network , will that affect the > gluster brick process in anyway, Is there any timeout for the brick process > to go offline in case of the glitch in the network. > > 2. Is there is any configurable time out parameter which we can increase ? > > 3.Brick and glusterd connected by unix domain socket.It is just a local > socket then why it is disconnect in below logs: > > 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005] > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: > Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from > glusterd. > 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0] > [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting > brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped > > Regards, > Abhishek > > > On Tue, Apr 19, 2016 at 1:12 PM, ABHISHEK PALIWAL <abhishpali...@gmail.com > > wrote: > >> Hi Atin, >> >> Thanks. >> >> Have more doubts here. >> >> Brick and glusterd connected by unix domain socket.It is just a local >> socket then why it is disconnect in below logs: >> >> 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005] >> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: >> Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from >> glusterd. >> 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0] >> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting >> brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped >> >> >> Regards, >> Abhishek >> >> >> On Fri, Apr 15, 2016 at 9:14 AM, Atin Mukherjee <amukh...@redhat.com> >> wrote: >> >>> >>> >>> On 04/14/2016 04:07 PM, ABHISHEK PALIWAL wrote: >>> > >>> > >>> > On Thu, Apr 14, 2016 at 2:33 PM, Atin Mukherjee <amukh...@redhat.com >>> > <mailto:amukh...@redhat.com>> wrote: >>> > >>> > >>> > >>> > On 04/05/2016 03:35 PM, ABHISHEK PALIWAL wrote: >>> > > >>> > > >>> > > On Tue, Apr 5, 2016 at 2:22 PM, Atin Mukherjee < >>> amukh...@redhat.com <mailto:amukh...@redhat.com> >>> > > <mailto:amukh...@redhat.com <mailto:amukh...@redhat.com>>> >>> wrote: >>> > > >>> > > >>> > > >>> > > On 04/05/2016 01:04 PM, ABHISHEK PALIWAL wrote: >>> > > > Hi Team, >>> > > > >>> > > > We are using Gluster 3.7.6 and facing one problem in which >>> > brick is not >>> > > > comming online after restart the board. >>> > > > >>> > > > To understand our setup, please look the following steps: >>> > > > 1. We have two boards A and B on which Gluster volume is >>> > running in >>> > > > replicated mode having one brick on each board. >>> > > > 2. Gluster mount point is present on the Board A which is >>> > sharable >>> > > > between number of processes. >>> > > > 3. Till now our volume is in sync and everthing is working >>> fine. >>> > > > 4. Now we have test case in which we'll stop the glusterd, >>> > reboot the >>> > > > Board B and when this board comes up, starts the glusterd >>> > again on it. >>> > > > 5. We repeated Steps 4 multiple times to check the >>> > reliability of system. >>> > > > 6. After the Step 4, sometimes system comes in working >>> state >>> > (i.e. in >>> > > > sync) but sometime we faces that brick of Board B is >>> present in >>> > > > “gluster volume status” command but not be online even >>> > waiting for >>> > > > more than a minute. >>> > > As I mentioned in another email thread until and unless the >>> > log shows >>> > > the evidence that there was a reboot nothing can be >>> concluded. >>> > The last >>> > > log what you shared with us few days back didn't give any >>> > indication >>> > > that brick process wasn't running. >>> > > >>> > > How can we identify that the brick process is running in brick >>> logs? >>> > > >>> > > > 7. When the Step 4 is executing at the same time on Board >>> A some >>> > > > processes are started accessing the files from the Gluster >>> > mount point. >>> > > > >>> > > > As a solution to make this brick online, we found some >>> > existing issues >>> > > > in gluster mailing list giving suggestion to use “gluster >>> > volume start >>> > > > <vol_name> force” to make the brick 'offline' to 'online'. >>> > > > >>> > > > If we use “gluster volume start <vol_name> force” command. >>> > It will kill >>> > > > the existing volume process and started the new process >>> then >>> > what will >>> > > > happen if other processes are accessing the same volume at >>> > the time when >>> > > > volume process is killed by this command internally. Will >>> it >>> > impact any >>> > > > failure on these processes? >>> > > This is not true, volume start force will start the brick >>> > processes only >>> > > if they are not running. Running brick processes will not be >>> > > interrupted. >>> > > >>> > > we have tried and check the pid of process before force start and >>> > after >>> > > force start. >>> > > the pid has been changed after force start. >>> > > >>> > > Please find the logs at the time of failure attached once again >>> with >>> > > log-level=debug. >>> > > >>> > > if you can give me the exact line where you are able to find out >>> that >>> > > the brick process >>> > > is running in brick log file please give me the line number of >>> > that file. >>> > >>> > Here is the sequence at which glusterd and respective brick >>> process is >>> > restarted. >>> > >>> > 1. glusterd restart trigger - line number 1014 in glusterd.log >>> file: >>> > >>> > [2016-04-03 10:12:29.051735] I [MSGID: 100030] >>> [glusterfsd.c:2318:main] >>> > 0-/usr/sbin/glusterd: Started running /usr/sbin/ >>> glusterd >>> > version 3.7.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid >>> > --log-level DEBUG) >>> > >>> > 2. brick start trigger - line number 190 in opt-lvmdir-c2-brick.log >>> > >>> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] >>> [glusterfsd.c:2318:main] >>> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ >>> glusterfsd >>> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 >>> --volfile-id >>> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p / >>> > >>> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid >>> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket >>> > --brick-name /opt/lvmdir/c2/brick -l >>> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option >>> > *-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256 >>> > --brick-port 49329 --xlator-option >>> c_glusterfs-server.listen-port=49329) >>> > >>> > 3. The following log indicates that brick is up and is now started. >>> > Refer to line 16123 in glusterd.log >>> > >>> > [2016-04-03 10:14:25.336855] D [MSGID: 0] >>> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management: >>> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick >>> > >>> > This clearly indicates that the brick is up and running as after >>> that I >>> > do not see any disconnect event been processed by glusterd for the >>> brick >>> > process. >>> > >>> > >>> > Thanks for replying descriptively but please also clear some more >>> doubts: >>> > >>> > 1. At this 10:14:25 moment of time brick is available because we have >>> > removed brick and added it again to make it online: >>> > following are the logs from cmd-history.log file of 000300 >>> > >>> > [2016-04-03 10:14:21.446570] : volume status : SUCCESS >>> > [2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica >>> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS >>> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS >>> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS >>> > [2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2 >>> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS >>> > >>> > and also 10:12:29 was the last reboot time before this failure. So I am >>> > totally agree what you said earlier. >>> > >>> > 2 .As you said at 10:12:29 glusterd restarted then why we are not >>> > getting 'brick start trigger' related logs >>> > like below between 10:12:29 to 10:14:25 time stamp which is something >>> > two minute of time interval. >>> So here is the culprit: >>> >>> 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005] >>> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: >>> Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from >>> glusterd. >>> 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0] >>> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting >>> brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped >>> >>> >>> GlusterD received a disconnect event for this brick process and mark it >>> as stopped. This could happen due to two reasons. 1. brick process goes >>> down or 2. Network issue. In this case its the later I believe since the >>> brick process was running at that time. I'd request you to check this >>> from the N/W side. >>> >>> >>> > >>> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main] >>> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ >>> glusterfsd >>> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id >>> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p / >>> > >>> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid >>> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket >>> > --brick-name /opt/lvmdir/c2/brick -l >>> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option >>> > *-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256 >>> > --brick-port 49329 --xlator-option >>> c_glusterfs-server.listen-port=49329) >>> > >>> > 3. We are continuously checking brick status in the above time duration >>> > using "gluster volume status" refer the cmd-history.log file from >>> 000300 >>> > >>> > In glusterd.log file we are also getting below logs >>> > >>> > [2016-04-03 10:12:31.771051] D [MSGID: 0] >>> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management: >>> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick >>> > >>> > [2016-04-03 10:12:32.981152] D [MSGID: 0] >>> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management: >>> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick >>> > >>> > two times b/w 10:12:29 and 10:14:25 and as you said these logs " >>> > clearly indicates that the brick is up and running as after" then why >>> > brick is not online in "gluster volume status" command >>> > >>> > [2016-04-03 10:12:33.990487] : volume status : SUCCESS >>> > [2016-04-03 10:12:34.007469] : volume status : SUCCESS >>> > [2016-04-03 10:12:35.095918] : volume status : SUCCESS >>> > [2016-04-03 10:12:35.126369] : volume status : SUCCESS >>> > [2016-04-03 10:12:36.224018] : volume status : SUCCESS >>> > [2016-04-03 10:12:36.251032] : volume status : SUCCESS >>> > [2016-04-03 10:12:37.352377] : volume status : SUCCESS >>> > [2016-04-03 10:12:37.374028] : volume status : SUCCESS >>> > [2016-04-03 10:12:38.446148] : volume status : SUCCESS >>> > [2016-04-03 10:12:38.468860] : volume status : SUCCESS >>> > [2016-04-03 10:12:39.534017] : volume status : SUCCESS >>> > [2016-04-03 10:12:39.553711] : volume status : SUCCESS >>> > [2016-04-03 10:12:40.616610] : volume status : SUCCESS >>> > [2016-04-03 10:12:40.636354] : volume status : SUCCESS >>> > ...... >>> > ...... >>> > ...... >>> > [2016-04-03 10:14:21.446570] : volume status : SUCCESS >>> > [2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica >>> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS >>> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS >>> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS >>> > [2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2 >>> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS >>> > >>> > In above logs we are continuously checking brick status but when we >>> > don't find brick status 'online' even after ~2 minutes then we removed >>> > it and add it again to make it online. >>> > >>> > [2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica >>> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS >>> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS >>> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS >>> > [2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2 >>> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS >>> > >>> > that is why in logs we are gettting "brick start trigger logs" at time >>> > stamp 10:14:25 >>> > >>> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main] >>> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ >>> glusterfsd >>> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id >>> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p / >>> > >>> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid >>> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket >>> > --brick-name /opt/lvmdir/c2/brick -l >>> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option >>> > *-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256 >>> > --brick-port 49329 --xlator-option >>> c_glusterfs-server.listen-port=49329) >>> > >>> > >>> > Regards, >>> > Abhishek >>> > >>> > >>> > Please note that all the logs referred and pasted are from 002500. >>> > >>> > ~Atin >>> > > >>> > > 002500 - Board B that brick is offline >>> > > 00300 - Board A logs >>> > > >>> > > > >>> > > > *Question : What could be contributing to brick offline?* >>> > > > >>> > > > >>> > > > -- >>> > > > >>> > > > Regards >>> > > > Abhishek Paliwal >>> > > > >>> > > > >>> > > > _______________________________________________ >>> > > > Gluster-devel mailing list >>> > > > Gluster-devel@gluster.org <mailto: >>> Gluster-devel@gluster.org> >>> > <mailto:Gluster-devel@gluster.org <mailto: >>> Gluster-devel@gluster.org>> >>> > > > http://www.gluster.org/mailman/listinfo/gluster-devel >>> > > > >>> > > >>> > > >>> > > >>> > > >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > >>> > >>> > >>> >> >> > -- Regards Abhishek Paliwal
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel