Hi Atin,

please reply, is there any configurable time out parameter for brick
process to go offline which we can increase?

Regards,
Abhishek

On Thu, Apr 21, 2016 at 12:34 PM, ABHISHEK PALIWAL <abhishpali...@gmail.com>
wrote:

> Hi Atin,
>
> Please answer following doubts as well:
>
> 1 .If there is a temporary glitch in the network , will that affect the
> gluster brick process in anyway, Is there any timeout for the brick process
> to go offline in case of the glitch in the network.
>
> 2. Is there is any configurable time out parameter which we can increase ?
>
> 3.Brick and glusterd connected by unix domain socket.It is just a local
> socket then why it is disconnect in below logs:
>
>  1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]
> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
> Brick 10.32.       1.144:/opt/lvmdir/c2/brick has disconnected from
> glusterd.
>  1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]
> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting
> brick 10.32.1.        144:/opt/lvmdir/c2/brick status to stopped
>
> Regards,
> Abhishek
>
>
> On Tue, Apr 19, 2016 at 1:12 PM, ABHISHEK PALIWAL <abhishpali...@gmail.com
> > wrote:
>
>> Hi Atin,
>>
>> Thanks.
>>
>> Have more doubts here.
>>
>> Brick and glusterd connected by unix domain socket.It is just a local
>> socket then why it is disconnect in below logs:
>>
>>  1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]
>> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
>> Brick 10.32.       1.144:/opt/lvmdir/c2/brick has disconnected from
>> glusterd.
>>  1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]
>> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting
>> brick 10.32.1.        144:/opt/lvmdir/c2/brick status to stopped
>>
>>
>> Regards,
>> Abhishek
>>
>>
>> On Fri, Apr 15, 2016 at 9:14 AM, Atin Mukherjee <amukh...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On 04/14/2016 04:07 PM, ABHISHEK PALIWAL wrote:
>>> >
>>> >
>>> > On Thu, Apr 14, 2016 at 2:33 PM, Atin Mukherjee <amukh...@redhat.com
>>> > <mailto:amukh...@redhat.com>> wrote:
>>> >
>>> >
>>> >
>>> >     On 04/05/2016 03:35 PM, ABHISHEK PALIWAL wrote:
>>> >     >
>>> >     >
>>> >     > On Tue, Apr 5, 2016 at 2:22 PM, Atin Mukherjee <
>>> amukh...@redhat.com <mailto:amukh...@redhat.com>
>>> >     > <mailto:amukh...@redhat.com <mailto:amukh...@redhat.com>>>
>>> wrote:
>>> >     >
>>> >     >
>>> >     >
>>> >     >     On 04/05/2016 01:04 PM, ABHISHEK PALIWAL wrote:
>>> >     >     > Hi Team,
>>> >     >     >
>>> >     >     > We are using Gluster 3.7.6 and facing one problem in which
>>> >     brick is not
>>> >     >     > comming online after restart the board.
>>> >     >     >
>>> >     >     > To understand our setup, please look the following steps:
>>> >     >     > 1. We have two boards A and B on which Gluster volume is
>>> >     running in
>>> >     >     > replicated mode having one brick on each board.
>>> >     >     > 2. Gluster mount point is present on the Board A which is
>>> >     sharable
>>> >     >     > between number of processes.
>>> >     >     > 3. Till now our volume is in sync and everthing is working
>>> fine.
>>> >     >     > 4. Now we have test case in which we'll stop the glusterd,
>>> >     reboot the
>>> >     >     > Board B and when this board comes up, starts the glusterd
>>> >     again on it.
>>> >     >     > 5. We repeated Steps 4 multiple times to check the
>>> >     reliability of system.
>>> >     >     > 6. After the Step 4, sometimes system comes in working
>>> state
>>> >     (i.e. in
>>> >     >     > sync) but sometime we faces that brick of Board B is
>>> present in
>>> >     >     >     “gluster volume status” command but not be online even
>>> >     waiting for
>>> >     >     > more than a minute.
>>> >     >     As I mentioned in another email thread until and unless the
>>> >     log shows
>>> >     >     the evidence that there was a reboot nothing can be
>>> concluded.
>>> >     The last
>>> >     >     log what you shared with us few days back didn't give any
>>> >     indication
>>> >     >     that brick process wasn't running.
>>> >     >
>>> >     > How can we identify that the brick process is running in brick
>>> logs?
>>> >     >
>>> >     >     > 7. When the Step 4 is executing at the same time on Board
>>> A some
>>> >     >     > processes are started accessing the files from the Gluster
>>> >     mount point.
>>> >     >     >
>>> >     >     > As a solution to make this brick online, we found some
>>> >     existing issues
>>> >     >     > in gluster mailing list giving suggestion to use “gluster
>>> >     volume start
>>> >     >     > <vol_name> force” to make the brick 'offline' to 'online'.
>>> >     >     >
>>> >     >     > If we use “gluster volume start <vol_name> force” command.
>>> >     It will kill
>>> >     >     > the existing volume process and started the new process
>>> then
>>> >     what will
>>> >     >     > happen if other processes are accessing the same volume at
>>> >     the time when
>>> >     >     > volume process is killed by this command internally. Will
>>> it
>>> >     impact any
>>> >     >     > failure on these processes?
>>> >     >     This is not true, volume start force will start the brick
>>> >     processes only
>>> >     >     if they are not running. Running brick processes will not be
>>> >     >     interrupted.
>>> >     >
>>> >     > we have tried and check the pid of process before force start and
>>> >     after
>>> >     > force start.
>>> >     > the pid has been changed after force start.
>>> >     >
>>> >     > Please find the logs at the time of failure attached once again
>>> with
>>> >     > log-level=debug.
>>> >     >
>>> >     > if you can give me the exact line where you are able to find out
>>> that
>>> >     > the brick process
>>> >     > is running in brick log file please give me the line number of
>>> >     that file.
>>> >
>>> >     Here is the sequence at which glusterd and respective brick
>>> process is
>>> >     restarted.
>>> >
>>> >     1. glusterd restart trigger - line number 1014 in glusterd.log
>>> file:
>>> >
>>> >     [2016-04-03 10:12:29.051735] I [MSGID: 100030]
>>> [glusterfsd.c:2318:main]
>>> >     0-/usr/sbin/glusterd: Started running /usr/sbin/
>>> glusterd
>>> >     version 3.7.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid
>>> >     --log-level DEBUG)
>>> >
>>> >     2. brick start trigger - line number 190 in opt-lvmdir-c2-brick.log
>>> >
>>> >     [2016-04-03 10:14:25.268833] I [MSGID: 100030]
>>> [glusterfsd.c:2318:main]
>>> >     0-/usr/sbin/glusterfsd: Started running /usr/sbin/
>>> glusterfsd
>>> >     version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144
>>> --volfile-id
>>> >     c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /
>>> >
>>>  system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
>>> >     -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005.        socket
>>> >     --brick-name /opt/lvmdir/c2/brick -l
>>> >     /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option
>>> >     *-posix.glusterd-       uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256
>>> >     --brick-port 49329 --xlator-option
>>> c_glusterfs-server.listen-port=49329)
>>> >
>>> >     3. The following log indicates that brick is up and is now started.
>>> >     Refer to line 16123 in glusterd.log
>>> >
>>> >     [2016-04-03 10:14:25.336855] D [MSGID: 0]
>>> >     [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:
>>> >     Connected to 10.32.1.144:/opt/lvmdir/c2/brick
>>> >
>>> >     This clearly indicates that the brick is up and running as after
>>> that I
>>> >     do not see any disconnect event been processed by glusterd for the
>>> brick
>>> >     process.
>>> >
>>> >
>>> > Thanks for replying descriptively but please also clear some more
>>> doubts:
>>> >
>>> > 1. At this 10:14:25 moment of time brick is available because we have
>>> > removed brick and added it again to make it online:
>>> > following are the logs from cmd-history.log file of 000300
>>> >
>>> > [2016-04-03 10:14:21.446570]  : volume status : SUCCESS
>>> > [2016-04-03 10:14:21.665889]  : volume remove-brick c_glusterfs replica
>>> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
>>> > [2016-04-03 10:14:21.764270]  : peer detach 10.32.1.144 : SUCCESS
>>> > [2016-04-03 10:14:23.060442]  : peer probe 10.32.1.144 : SUCCESS
>>> > [2016-04-03 10:14:25.649525]  : volume add-brick c_glusterfs replica 2
>>> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
>>> >
>>> > and also 10:12:29 was the last reboot time before this failure. So I am
>>> > totally agree what you said earlier.
>>> >
>>> > 2 .As you said at 10:12:29 glusterd restarted then why we are not
>>> > getting 'brick start trigger' related logs
>>> >  like below between 10:12:29 to 10:14:25 time stamp which is something
>>> > two minute of time interval.
>>> So here is the culprit:
>>>
>>>  1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]
>>> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
>>> Brick 10.32.       1.144:/opt/lvmdir/c2/brick has disconnected from
>>> glusterd.
>>>  1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]
>>> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting
>>> brick 10.32.1.        144:/opt/lvmdir/c2/brick status to stopped
>>>
>>>
>>> GlusterD received a disconnect event for this brick process and mark it
>>> as stopped. This could happen due to two reasons. 1. brick process goes
>>> down or 2. Network issue. In this case its the later I believe since the
>>> brick process was running at that time. I'd request you to check this
>>> from the N/W side.
>>>
>>>
>>> >
>>> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]
>>> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/
>>> glusterfsd
>>> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id
>>> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /
>>> >
>>> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
>>> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005.        socket
>>> > --brick-name /opt/lvmdir/c2/brick -l
>>> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option
>>> > *-posix.glusterd-       uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256
>>> > --brick-port 49329 --xlator-option
>>> c_glusterfs-server.listen-port=49329)
>>> >
>>> > 3. We are continuously checking brick status in the above time duration
>>> > using  "gluster volume status" refer the cmd-history.log file from
>>> 000300
>>> >
>>> > In glusterd.log file we are also getting below logs
>>> >
>>> > [2016-04-03 10:12:31.771051] D [MSGID: 0]
>>> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:
>>> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick
>>> >
>>> > [2016-04-03 10:12:32.981152] D [MSGID: 0]
>>> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:
>>> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick
>>> >
>>> > two times b/w 10:12:29 and 10:14:25 and as you said these logs  "
>>> > clearly indicates that the brick is up and running as after" then why
>>> > brick is not online in "gluster volume status" command
>>> >
>>> > [2016-04-03 10:12:33.990487]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:34.007469]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:35.095918]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:35.126369]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:36.224018]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:36.251032]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:37.352377]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:37.374028]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:38.446148]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:38.468860]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:39.534017]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:39.553711]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:40.616610]  : volume status : SUCCESS
>>> > [2016-04-03 10:12:40.636354]  : volume status : SUCCESS
>>> > ......
>>> > ......
>>> > ......
>>> > [2016-04-03 10:14:21.446570]  : volume status : SUCCESS
>>> > [2016-04-03 10:14:21.665889]  : volume remove-brick c_glusterfs replica
>>> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
>>> > [2016-04-03 10:14:21.764270]  : peer detach 10.32.1.144 : SUCCESS
>>> > [2016-04-03 10:14:23.060442]  : peer probe 10.32.1.144 : SUCCESS
>>> > [2016-04-03 10:14:25.649525]  : volume add-brick c_glusterfs replica 2
>>> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
>>> >
>>> > In above logs we are continuously checking brick status but when we
>>> > don't find brick status 'online' even after ~2 minutes then we removed
>>> > it and add it again to make it online.
>>> >
>>> > [2016-04-03 10:14:21.665889]  : volume remove-brick c_glusterfs replica
>>> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
>>> > [2016-04-03 10:14:21.764270]  : peer detach 10.32.1.144 : SUCCESS
>>> > [2016-04-03 10:14:23.060442]  : peer probe 10.32.1.144 : SUCCESS
>>> > [2016-04-03 10:14:25.649525]  : volume add-brick c_glusterfs replica 2
>>> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
>>> >
>>> > that is why in logs we are gettting "brick start trigger logs" at time
>>> > stamp 10:14:25
>>> >
>>> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]
>>> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/
>>> glusterfsd
>>> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id
>>> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /
>>> >
>>> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
>>> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005.        socket
>>> > --brick-name /opt/lvmdir/c2/brick -l
>>> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option
>>> > *-posix.glusterd-       uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256
>>> > --brick-port 49329 --xlator-option
>>> c_glusterfs-server.listen-port=49329)
>>> >
>>> >
>>> > Regards,
>>> > Abhishek
>>> >
>>> >
>>> >     Please note that all the logs referred and pasted are from 002500.
>>> >
>>> >     ~Atin
>>> >     >
>>> >     > 002500 - Board B that brick is offline
>>> >     > 00300 - Board A logs
>>> >     >
>>> >     >     >
>>> >     >     > *Question : What could be contributing to brick offline?*
>>> >     >     >
>>> >     >     >
>>> >     >     > --
>>> >     >     >
>>> >     >     > Regards
>>> >     >     > Abhishek Paliwal
>>> >     >     >
>>> >     >     >
>>> >     >     > _______________________________________________
>>> >     >     > Gluster-devel mailing list
>>> >     >     > Gluster-devel@gluster.org <mailto:
>>> Gluster-devel@gluster.org>
>>> >     <mailto:Gluster-devel@gluster.org <mailto:
>>> Gluster-devel@gluster.org>>
>>> >     >     > http://www.gluster.org/mailman/listinfo/gluster-devel
>>> >     >     >
>>> >     >
>>> >     >
>>> >     >
>>> >     >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>


-- 




Regards
Abhishek Paliwal
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to