Re: [Gluster-users] Rebalancing newly added bricks

2019-09-18 Thread Herb Burnswell
>
> Hi,
>
> Rebalance will abort itself if it cannot reach any of the nodes. Are all
> the bricks still up and reachable?
>
> Regards,
> Nithya
>

Yes the bricks appear to be fine.  I restarted the rebalance and the
process is moving along again:

# gluster vol rebalance tank status
Node Rebalanced-files  size
  scanned  failures   skipped   status  run time in
h:m:s
   -  ---   ---
---   ---   --- 
--
   localhost   22697314.9TB
  1572952 0 0  in progress   44:26:48
   serverB   0
 0Bytes631667 0 0completed
  37:2:14
volume rebalance: tank: success

# df -hP |grep data
/dev/mapper/gluster_vg-gluster_lv1_data   60T   24T   36T  40%
/gluster_bricks/data1
/dev/mapper/gluster_vg-gluster_lv2_data   60T   24T   36T  40%
/gluster_bricks/data2
/dev/mapper/gluster_vg-gluster_lv3_data   60T   17T   43T  29%
/gluster_bricks/data3
/dev/mapper/gluster_vg-gluster_lv4_data   60T   17T   43T  29%
/gluster_bricks/data4
/dev/mapper/gluster_vg-gluster_lv5_data   60T   19T   41T  31%
/gluster_bricks/data5
/dev/mapper/gluster_vg-gluster_lv6_data   60T   19T   41T  31%
/gluster_bricks/data6

Thanks,

HB



>
>
>
>
>>
>> # gluster vol rebalance tank status
>> Node Rebalanced-files  size
>> scanned  failures   skipped   status  run time in
>> h:m:s
>>-  ---   ---
>> ---   ---   --- 
>> --
>>localhost  134870657.8TB
>> 2234439 9 6   failed  190:24:3
>>serverB 0
>>  0Bytes 7 0 0completed
>>   63:47:55
>> volume rebalance: tank: success
>>
>> # gluster vol status tank
>> Status of volume: tank
>> Gluster process TCP Port  RDMA Port  Online
>>  Pid
>>
>> --
>> Brick serverA:/gluster_bricks/data1   49162 0  Y
>> 20318
>> Brick serverB:/gluster_bricks/data1   49166 0  Y
>> 3432
>> Brick serverA:/gluster_bricks/data2   49163 0  Y
>> 20323
>> Brick serverB:/gluster_bricks/data2   49167 0  Y
>> 3435
>> Brick serverA:/gluster_bricks/data3   49164 0  Y
>> 4625
>> Brick serverA:/gluster_bricks/data4   49165 0  Y
>> 4644
>> Brick serverA:/gluster_bricks/data5   49166 0  Y
>> 5088
>> Brick serverA:/gluster_bricks/data6   49167 0  Y
>> 5128
>> Brick serverB:/gluster_bricks/data3   49168 0  Y
>> 22314
>> Brick serverB:/gluster_bricks/data4   49169 0  Y
>> 22345
>> Brick serverB:/gluster_bricks/data5   49170 0  Y
>> 22889
>> Brick serverB:/gluster_bricks/data6   49171 0  Y
>> 22932
>> Self-heal Daemon on localhost   N/A   N/AY
>> 6202
>> Self-heal Daemon on serverB   N/A   N/AY
>> 22981
>>
>> Task Status of Volume tank
>>
>> --
>> Task : Rebalance
>> ID   : eec64343-8e0d-4523-ad05-5678f9eb9eb2
>> Status   : failed
>>
>> # df -hP |grep data
>> /dev/mapper/gluster_vg-gluster_lv1_data   60T   31T   29T  52%
>> /gluster_bricks/data1
>> /dev/mapper/gluster_vg-gluster_lv2_data   60T   31T   29T  51%
>> /gluster_bricks/data2
>> /dev/mapper/gluster_vg-gluster_lv3_data   60T   15T   46T  24%
>> /gluster_bricks/data3
>> /dev/mapper/gluster_vg-gluster_lv4_data   60T   15T   46T  24%
>> /gluster_bricks/data4
>> /dev/mapper/gluster_vg-gluster_lv5_data   60T   15T   45T  25%
>> /gluster_bricks/data5
>> /dev/mapper/gluster_vg-gluster_lv6_data   60T   15T   45T  25%
>> /gluster_bricks/data6
>>
>>
>> The rebalance log on serverA shows a disconnect from serverB
>>
>> [2019-09-08 15:41:44.285591] C
>> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-tank-client-10: server
>> :49170 has not responded in the last 42 seconds, disconnecting.
>> [2019-09-08 15:41:44.285739] I [MSGID: 114018]
>> [client.c:2280:client_rpc_notify] 0-tank-client-10: disconnected from
>> tank-client-10. Client process will keep trying to connect to glusterd
>> until brick's port is available
>> [2019-09-08 15:41:44.286023] E [rpc-clnt.c:365:saved_frames_unwind] (-->
>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff986e8b132] (-->
>> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff986c5299e] (-->
>> /lib64

Re: [Gluster-users] split-brain errors under heavy load when one brick down

2019-09-18 Thread Erik Jacobson
Thank you for replying!

> Okay so 0-cm_shared-replicate-1 means these 3 bricks:
> 
> Brick4: 172.23.0.6:/data/brick_cm_shared
> Brick5: 172.23.0.7:/data/brick_cm_shared
> Brick6: 172.23.0.8:/data/brick_cm_shared

The above is correct.


> Were there any pending self-heals for this volume? Is it possible that the
> server (one of Brick 4, 5 or 6 ) that is down had the only good copy and the
> other 2 online bricks had a bad copy (needing heal)? Clients can get EIO in
> that case.

So I did check for heals and saw nothing. The storage at this time was in a
read-only use case. What I mean by that is the NFS clients mount it read only
and there were no write activities going to shared storage anyway at that
time.  So it was not surprising that no heals were listed.

I did inspect both remaining bricks for several of the example problem files
and found them with matching md5sums.

The strange thing, as I mentioned, is it only happened under the job
launch workload. The nfs boot workload, which is also very stressful,
ran clean with one brick down.

> When you say accessing the file from the compute nodes afterwards works
> fine, it is still with that one server (brick) down?

I can no longer check this system personally but as I recall when we
fixed the ethernet problem, all seemed well. I don't have a better
answer for this one than that. I am starting a document of things to try
when we have a large system in the factory to run on. I'll put this in
there.

> 
> There was a case of AFR reporting spurious split-brain errors but that was
> fixed long back (http://review.gluster.org/16362
> ) and seems to be present in glusterf-4.1.6.


So I brought this up. In my case, we know the files on the NFS client
side really were missing because we saw errors on the clients. That is
to say, the above bug seems to mean that split-brain was reported in
error with no other impacts. However, in my case, the error resulted in
actual problems accessing the files on the NFS clients.

> Side note: Why are you using replica 9 for the ctdb volume? All
> development/tests are usually done on (distributed) replica 3 setup.

I am happy to change this. Whatever guide I used to set this up
suggested replica 9. I don't even know which resource was incorrect as
it was so long ago. I have no other reason.

I'm filing an incident now to change our setup tools to use replica-3 for
CTDB for new setups.

Again, I appreciate that you followed up with me. Thank you,

Erik


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebalancing newly added bricks

2019-09-18 Thread Nithya Balachandran
On Sat, 14 Sep 2019 at 01:25, Herb Burnswell 
wrote:

> Hi,
>
> Well our rebalance seems to have failed.  Here is the output:
>

Hi,

Rebalance will abort itself if it cannot reach any of the nodes. Are all
the bricks still up and reachable?

Regards,
Nithya




>
> # gluster vol rebalance tank status
> Node Rebalanced-files  size
> scanned  failures   skipped   status  run time in
> h:m:s
>-  ---   ---
> ---   ---   --- 
> --
>localhost  134870657.8TB
> 2234439 9 6   failed  190:24:3
>serverB 0
>  0Bytes 7 0 0completed
>   63:47:55
> volume rebalance: tank: success
>
> # gluster vol status tank
> Status of volume: tank
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick serverA:/gluster_bricks/data1   49162 0  Y
> 20318
> Brick serverB:/gluster_bricks/data1   49166 0  Y
> 3432
> Brick serverA:/gluster_bricks/data2   49163 0  Y
> 20323
> Brick serverB:/gluster_bricks/data2   49167 0  Y
> 3435
> Brick serverA:/gluster_bricks/data3   49164 0  Y
> 4625
> Brick serverA:/gluster_bricks/data4   49165 0  Y
> 4644
> Brick serverA:/gluster_bricks/data5   49166 0  Y
> 5088
> Brick serverA:/gluster_bricks/data6   49167 0  Y
> 5128
> Brick serverB:/gluster_bricks/data3   49168 0  Y
> 22314
> Brick serverB:/gluster_bricks/data4   49169 0  Y
> 22345
> Brick serverB:/gluster_bricks/data5   49170 0  Y
> 22889
> Brick serverB:/gluster_bricks/data6   49171 0  Y
> 22932
> Self-heal Daemon on localhost   N/A   N/AY
> 6202
> Self-heal Daemon on serverB   N/A   N/AY
> 22981
>
> Task Status of Volume tank
>
> --
> Task : Rebalance
> ID   : eec64343-8e0d-4523-ad05-5678f9eb9eb2
> Status   : failed
>
> # df -hP |grep data
> /dev/mapper/gluster_vg-gluster_lv1_data   60T   31T   29T  52%
> /gluster_bricks/data1
> /dev/mapper/gluster_vg-gluster_lv2_data   60T   31T   29T  51%
> /gluster_bricks/data2
> /dev/mapper/gluster_vg-gluster_lv3_data   60T   15T   46T  24%
> /gluster_bricks/data3
> /dev/mapper/gluster_vg-gluster_lv4_data   60T   15T   46T  24%
> /gluster_bricks/data4
> /dev/mapper/gluster_vg-gluster_lv5_data   60T   15T   45T  25%
> /gluster_bricks/data5
> /dev/mapper/gluster_vg-gluster_lv6_data   60T   15T   45T  25%
> /gluster_bricks/data6
>
>
> The rebalance log on serverA shows a disconnect from serverB
>
> [2019-09-08 15:41:44.285591] C
> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-tank-client-10: server
> :49170 has not responded in the last 42 seconds, disconnecting.
> [2019-09-08 15:41:44.285739] I [MSGID: 114018]
> [client.c:2280:client_rpc_notify] 0-tank-client-10: disconnected from
> tank-client-10. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2019-09-08 15:41:44.286023] E [rpc-clnt.c:365:saved_frames_unwind] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff986e8b132] (-->
> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff986c5299e] (-->
> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff986c52aae] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7ff986c54220] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2b0)[0x7ff986c54ce0] )
> 0-tank-client-10: forced unwinding frame type(GlusterFS 3.3)
> op(FXATTROP(34)) called at 2019-09-08 15:40:44.040333 (xid=0x7f8cfac)
>
> Does this type of failure cause data corruption?  What is the best course
> of action at this point?
>
> Thanks,
>
> HB
>
> On Wed, Sep 11, 2019 at 11:58 PM Strahil  wrote:
>
>> Hi Nithya,
>>
>> Thanks for the detailed explanation.
>> It makes sense.
>>
>> Best Regards,
>> Strahil Nikolov
>> On Sep 12, 2019 08:18, Nithya Balachandran  wrote:
>>
>>
>>
>> On Wed, 11 Sep 2019 at 09:47, Strahil  wrote:
>>
>> Hi Nithya,
>>
>> I just reminded about your previous  e-mail  which left me with the
>> impression that old volumes need that.
>> This is the one 1 mean:
>>
>> >It looks like this is a replicate volume. If >that is the case then yes,
>> you are >running an old version of Gluster for >which this was the default
>>
>>
>> Hi Strahil,
>>
>> I'm providing a little more detail here which I hope will explain things.
>> Rebalance was always a volume wide operation - a *rebalance start*
>> operation will start rebalance processes