Re: [Gluster-users] Previously replaced brick not coming up after reboot

2018-08-16 Thread Hu Bert
glusterfs 3.12.12

2018-08-16 9:26 GMT+02:00 Serkan Çoban :
> What is your gluster version? There was a bug in 3.10, when you reboot
> a node some bricks may not come online but it fixed in later versions.
>
> On 8/16/18, Hu Bert  wrote:
>> Hi there,
>>
>> 2 times i had to replace a brick on 2 different servers; replace went
>> fine, heal took very long but finally finished. From time to time you
>> have to reboot the server (kernel upgrades), and i've noticed that the
>> replaced brick doesn't come up after the reboot. Status after reboot:
>>
>> gluster volume status
>> Status of volume: shared
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>> --
>> Brick gluster11:/gluster/bricksda1/shared   49164 0  Y
>> 6425
>> Brick gluster12:/gluster/bricksda1/shared   49152 0  Y
>> 2078
>> Brick gluster13:/gluster/bricksda1/shared   49152 0  Y
>> 2478
>> Brick gluster11:/gluster/bricksdb1/shared   49165 0  Y
>> 6452
>> Brick gluster12:/gluster/bricksdb1/shared   49153 0  Y
>> 2084
>> Brick gluster13:/gluster/bricksdb1/shared   49153 0  Y
>> 2497
>> Brick gluster11:/gluster/bricksdc1/shared   49166 0  Y
>> 6479
>> Brick gluster12:/gluster/bricksdc1/shared   49154 0  Y
>> 2090
>> Brick gluster13:/gluster/bricksdc1/shared   49154 0  Y
>> 2485
>> Brick gluster11:/gluster/bricksdd1/shared   49168 0  Y
>> 7897
>> Brick gluster12:/gluster/bricksdd1_new/shared  49157 0  Y
>> 7632
>> Brick gluster13:/gluster/bricksdd1_new/shared  N/A   N/AN
>>  N/A
>> Self-heal Daemon on localhost   N/A   N/AY
>> 25483
>> Self-heal Daemon on gluster13   N/A   N/AY
>> 2463
>> Self-heal Daemon on gluster12   N/A   N/AY
>> 17619
>>
>> Task Status of Volume shared
>> --
>> There are no active volume tasks
>>
>> Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log
>> message after reboot in glusterd.log:
>>
>> [2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv]
>> 0-management: readv on
>> /var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No
>> data available)
>> [2018-08-16 05:22:52.987648] I [MSGID: 106005]
>> [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
>> Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from
>> glusterd.
>> [2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e]
>> (--> /usr/lib/x86_64-
>> linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e]
>> (--> /usr/lib/x86_64-linu
>> x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8]
>> ) 0-management: force
>> d unwinding frame type(brick operations) op(--(4)) called at
>> 2018-08-16 05:22:52.941332 (xid=0x2)
>> [2018-08-16 05:22:52.988058] W [dict.c:426:dict_set]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59)
>> [0x7fdba4f9ce59]
>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b)
>> [0x7fdbaa39122b]
>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3)
>> [0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I
>> nvalid argument]
>> [2018-08-16 05:22:52.988092] E [MSGID: 106060]
>> [glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error
>> setting index on brick status rsp dict
>>
>> This problem could be related to my previous mail. After executing
>> "gluster volume start shared force" the brick comes up, resulting in
>> healing the brick (and in high load, too). Is there any possibility to
>> track down why this happens and how to ensure that the brick comes up
>> at boot?
>>
>>
>> Best regards
>> Hubert
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Previously replaced brick not coming up after reboot

2018-08-16 Thread Serkan Çoban
What is your gluster version? There was a bug in 3.10, when you reboot
a node some bricks may not come online but it fixed in later versions.

On 8/16/18, Hu Bert  wrote:
> Hi there,
>
> 2 times i had to replace a brick on 2 different servers; replace went
> fine, heal took very long but finally finished. From time to time you
> have to reboot the server (kernel upgrades), and i've noticed that the
> replaced brick doesn't come up after the reboot. Status after reboot:
>
> gluster volume status
> Status of volume: shared
> Gluster process TCP Port  RDMA Port  Online
> Pid
> --
> Brick gluster11:/gluster/bricksda1/shared   49164 0  Y
> 6425
> Brick gluster12:/gluster/bricksda1/shared   49152 0  Y
> 2078
> Brick gluster13:/gluster/bricksda1/shared   49152 0  Y
> 2478
> Brick gluster11:/gluster/bricksdb1/shared   49165 0  Y
> 6452
> Brick gluster12:/gluster/bricksdb1/shared   49153 0  Y
> 2084
> Brick gluster13:/gluster/bricksdb1/shared   49153 0  Y
> 2497
> Brick gluster11:/gluster/bricksdc1/shared   49166 0  Y
> 6479
> Brick gluster12:/gluster/bricksdc1/shared   49154 0  Y
> 2090
> Brick gluster13:/gluster/bricksdc1/shared   49154 0  Y
> 2485
> Brick gluster11:/gluster/bricksdd1/shared   49168 0  Y
> 7897
> Brick gluster12:/gluster/bricksdd1_new/shared  49157 0  Y
> 7632
> Brick gluster13:/gluster/bricksdd1_new/shared  N/A   N/AN
>  N/A
> Self-heal Daemon on localhost   N/A   N/AY
> 25483
> Self-heal Daemon on gluster13   N/A   N/AY
> 2463
> Self-heal Daemon on gluster12   N/A   N/AY
> 17619
>
> Task Status of Volume shared
> --
> There are no active volume tasks
>
> Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log
> message after reboot in glusterd.log:
>
> [2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv]
> 0-management: readv on
> /var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No
> data available)
> [2018-08-16 05:22:52.987648] I [MSGID: 106005]
> [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
> Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from
> glusterd.
> [2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind]
> (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e]
> (--> /usr/lib/x86_64-
> linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e]
> (--> /usr/lib/x86_64-linu
> x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8]
> ) 0-management: force
> d unwinding frame type(brick operations) op(--(4)) called at
> 2018-08-16 05:22:52.941332 (xid=0x2)
> [2018-08-16 05:22:52.988058] W [dict.c:426:dict_set]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59)
> [0x7fdba4f9ce59]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b)
> [0x7fdbaa39122b]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3)
> [0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I
> nvalid argument]
> [2018-08-16 05:22:52.988092] E [MSGID: 106060]
> [glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error
> setting index on brick status rsp dict
>
> This problem could be related to my previous mail. After executing
> "gluster volume start shared force" the brick comes up, resulting in
> healing the brick (and in high load, too). Is there any possibility to
> track down why this happens and how to ensure that the brick comes up
> at boot?
>
>
> Best regards
> Hubert
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Previously replaced brick not coming up after reboot

2018-08-16 Thread Hu Bert
Hi there,

2 times i had to replace a brick on 2 different servers; replace went
fine, heal took very long but finally finished. From time to time you
have to reboot the server (kernel upgrades), and i've noticed that the
replaced brick doesn't come up after the reboot. Status after reboot:

gluster volume status
Status of volume: shared
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gluster11:/gluster/bricksda1/shared   49164 0  Y   6425
Brick gluster12:/gluster/bricksda1/shared   49152 0  Y   2078
Brick gluster13:/gluster/bricksda1/shared   49152 0  Y   2478
Brick gluster11:/gluster/bricksdb1/shared   49165 0  Y   6452
Brick gluster12:/gluster/bricksdb1/shared   49153 0  Y   2084
Brick gluster13:/gluster/bricksdb1/shared   49153 0  Y   2497
Brick gluster11:/gluster/bricksdc1/shared   49166 0  Y   6479
Brick gluster12:/gluster/bricksdc1/shared   49154 0  Y   2090
Brick gluster13:/gluster/bricksdc1/shared   49154 0  Y   2485
Brick gluster11:/gluster/bricksdd1/shared   49168 0  Y   7897
Brick gluster12:/gluster/bricksdd1_new/shared  49157 0  Y   7632
Brick gluster13:/gluster/bricksdd1_new/shared  N/A   N/AN
 N/A
Self-heal Daemon on localhost   N/A   N/AY   25483
Self-heal Daemon on gluster13   N/A   N/AY   2463
Self-heal Daemon on gluster12   N/A   N/AY   17619

Task Status of Volume shared
--
There are no active volume tasks

Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log
message after reboot in glusterd.log:

[2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv]
0-management: readv on
/var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No
data available)
[2018-08-16 05:22:52.987648] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from
glusterd.
[2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind]
(--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e]
(--> /usr/lib/x86_64-
linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111]
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e]
(--> /usr/lib/x86_64-linu
x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1]
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8]
) 0-management: force
d unwinding frame type(brick operations) op(--(4)) called at
2018-08-16 05:22:52.941332 (xid=0x2)
[2018-08-16 05:22:52.988058] W [dict.c:426:dict_set]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59)
[0x7fdba4f9ce59]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b)
[0x7fdbaa39122b]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3)
[0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I
nvalid argument]
[2018-08-16 05:22:52.988092] E [MSGID: 106060]
[glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error
setting index on brick status rsp dict

This problem could be related to my previous mail. After executing
"gluster volume start shared force" the brick comes up, resulting in
healing the brick (and in high load, too). Is there any possibility to
track down why this happens and how to ensure that the brick comes up
at boot?


Best regards
Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users