Re: [Gluster-users] Gluster 11.1 - heal hangs (again)

2024-04-23 Thread Hu Bert
Howdy,
was able to solve the problem. I had 2 options: reset-brick (i.e.
reconfigure) or replace-brick (i.e. full sync). Tried reset-brick
first...

gluster volume reset-brick sourceimages
gluster190:/gluster/md3/sourceimages start
[... do nothing ...]
gluster volume reset-brick sourceimages
gluster190:/gluster/md3/sourceimages
gluster190:/gluster/md3/sourceimages commit force

After that the pending heals started, going to 0 pretty fast, and the
connected clients are now identical for all 3 servers.


Thx for reading,

Hubert


Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert :
>
> Hi,
>
> referring to this thread:
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html
> especially: 
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html
>
> I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1
> running. The first 2 servers went fine, gluster volume ok, no heals,
> so after a couple of minutes i rebooted the 3rd server. And having the
> same problem again: heals are counting up, no heals happen. gluster
> volume status+info ok, gluster peer status ok.
>
> Full volume status+info: https://pastebin.com/aEEEKn7h
>
> Volume Name: sourceimages
> Type: Replicate
> Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster188:/gluster/md3/sourceimages
> Brick2: gluster189:/gluster/md3/sourceimages
> Brick3: gluster190:/gluster/md3/sourceimages
>
> Internal IPs:
> gluster188: 192.168.0.188
> gluster189: 192.168.0.189
> gluster190: 192.168.0.190
>
> After rebooting the 3rd server (gluster190) the client info looks like this:
>
> gluster volume status sourceimages clients
> Client connections for volume sourceimages
> --
> Brick : gluster188:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:491511047856
>  988364  11
> 192.168.0.189:49149   930792
>654096  11
> 192.168.0.109:49147   271598
>279908  11
> 192.168.0.223:49147   126764
>130964  11
> 192.168.0.222:49146   125848
>130144  11
> 192.168.0.2:49147 273756
>  43400387  11
> 192.168.0.15:49147  57248531
>  14327465  11
> 192.168.0.126:49147 32282645
> 671284763  11
> 192.168.0.94:49146125520
>128864  11
> 192.168.0.66:49146  34086248
> 666519388  11
> 192.168.0.99:49146   3051076
> 522652843  11
> 192.168.0.16:49146 149773024
>   1049035  11
> 192.168.0.110:49146  1574768
> 566124922  11
> 192.168.0.106:49146152640790
> 146483580  11
> 192.168.0.91:49133  89548971
>  82709793  11
> 192.168.0.190:49149 4132
>  6540  11
> 192.168.0.118:4913392176
> 92884  11
> --
> Brick : gluster189:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:49146   935172
>658268  11
> 192.168.0.189:491511039048
>  977920  11
> 192.168.0.126:49146 27106555
> 231766764  11
> 192.168.0.110:49147  1121696
> 226426262  11
> 192.168.0.16:49147 147165735
>994015  11
> 192.168.0.106:49147152476618
>   1091156  11
> 192.168.0.94:49147109612
>112688  11
> 192.168.0.109:49146   180819
>   1489715  11
> 192.168.0.223:49146   110708
>114316  11
> 192.168.0.99:49147   2573412
> 157737429  11
> 

Re: [Gluster-users] Gluster 11.1 - heal hangs (again)

2024-04-23 Thread Hu Bert
Ah, logs: nothing in the glustershd.log on the 3 gluster servers. But
on one client in /var/log/glusterfs/data-sourceimages.log :

[2024-04-23 06:54:21.456157 +] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-sourceimages-client-2:
remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511},
{errno=77}, {error=File descriptor in bad state}]
[2024-04-23 06:54:21.456195 +] E [MSGID: 108028]
[afr-open.c:361:afr_is_reopen_allowed_cbk] 0-sourceimages-replicate-0:
Failed getlk for a1817071-2949-4145-a96a-874159e46511 [File descriptor
in bad state]
[2024-04-23 06:54:21.488511 +] W [MSGID: 114061]
[client-common.c:530:client_pre_flush_v2] 0-sourceimages-client-2:
remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511},
{errno=77}, {error=File descriptor in bad stat
e}]


Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert :
>
> Hi,
>
> referring to this thread:
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html
> especially: 
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html
>
> I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1
> running. The first 2 servers went fine, gluster volume ok, no heals,
> so after a couple of minutes i rebooted the 3rd server. And having the
> same problem again: heals are counting up, no heals happen. gluster
> volume status+info ok, gluster peer status ok.
>
> Full volume status+info: https://pastebin.com/aEEEKn7h
>
> Volume Name: sourceimages
> Type: Replicate
> Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster188:/gluster/md3/sourceimages
> Brick2: gluster189:/gluster/md3/sourceimages
> Brick3: gluster190:/gluster/md3/sourceimages
>
> Internal IPs:
> gluster188: 192.168.0.188
> gluster189: 192.168.0.189
> gluster190: 192.168.0.190
>
> After rebooting the 3rd server (gluster190) the client info looks like this:
>
> gluster volume status sourceimages clients
> Client connections for volume sourceimages
> --
> Brick : gluster188:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:491511047856
>  988364  11
> 192.168.0.189:49149   930792
>654096  11
> 192.168.0.109:49147   271598
>279908  11
> 192.168.0.223:49147   126764
>130964  11
> 192.168.0.222:49146   125848
>130144  11
> 192.168.0.2:49147 273756
>  43400387  11
> 192.168.0.15:49147  57248531
>  14327465  11
> 192.168.0.126:49147 32282645
> 671284763  11
> 192.168.0.94:49146125520
>128864  11
> 192.168.0.66:49146  34086248
> 666519388  11
> 192.168.0.99:49146   3051076
> 522652843  11
> 192.168.0.16:49146 149773024
>   1049035  11
> 192.168.0.110:49146  1574768
> 566124922  11
> 192.168.0.106:49146152640790
> 146483580  11
> 192.168.0.91:49133  89548971
>  82709793  11
> 192.168.0.190:49149 4132
>  6540  11
> 192.168.0.118:4913392176
> 92884  11
> --
> Brick : gluster189:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:49146   935172
>658268  11
> 192.168.0.189:491511039048
>  977920  11
> 192.168.0.126:49146 27106555
> 231766764  11
> 192.168.0.110:49147  1121696
> 226426262  11
> 192.168.0.16:49147 147165735
>994015  11
> 192.168.0.106:49147152476618
>   1091156  11
> 192.168.0.94:49147109612
>112688  11
> 

[Gluster-users] Gluster 11.1 - heal hangs (again)

2024-04-23 Thread Hu Bert
Hi,

referring to this thread:
https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html
especially: 
https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html

I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1
running. The first 2 servers went fine, gluster volume ok, no heals,
so after a couple of minutes i rebooted the 3rd server. And having the
same problem again: heals are counting up, no heals happen. gluster
volume status+info ok, gluster peer status ok.

Full volume status+info: https://pastebin.com/aEEEKn7h

Volume Name: sourceimages
Type: Replicate
Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster188:/gluster/md3/sourceimages
Brick2: gluster189:/gluster/md3/sourceimages
Brick3: gluster190:/gluster/md3/sourceimages

Internal IPs:
gluster188: 192.168.0.188
gluster189: 192.168.0.189
gluster190: 192.168.0.190

After rebooting the 3rd server (gluster190) the client info looks like this:

gluster volume status sourceimages clients
Client connections for volume sourceimages
--
Brick : gluster188:/gluster/md3/sourceimages
Clients connected : 17
Hostname   BytesRead
BytesWritten   OpVersion
   -
   -
192.168.0.188:491511047856
 988364  11
192.168.0.189:49149   930792
   654096  11
192.168.0.109:49147   271598
   279908  11
192.168.0.223:49147   126764
   130964  11
192.168.0.222:49146   125848
   130144  11
192.168.0.2:49147 273756
 43400387  11
192.168.0.15:49147  57248531
 14327465  11
192.168.0.126:49147 32282645
671284763  11
192.168.0.94:49146125520
   128864  11
192.168.0.66:49146  34086248
666519388  11
192.168.0.99:49146   3051076
522652843  11
192.168.0.16:49146 149773024
  1049035  11
192.168.0.110:49146  1574768
566124922  11
192.168.0.106:49146152640790
146483580  11
192.168.0.91:49133  89548971
 82709793  11
192.168.0.190:49149 4132
 6540  11
192.168.0.118:4913392176
92884  11
--
Brick : gluster189:/gluster/md3/sourceimages
Clients connected : 17
Hostname   BytesRead
BytesWritten   OpVersion
   -
   -
192.168.0.188:49146   935172
   658268  11
192.168.0.189:491511039048
 977920  11
192.168.0.126:49146 27106555
231766764  11
192.168.0.110:49147  1121696
226426262  11
192.168.0.16:49147 147165735
   994015  11
192.168.0.106:49147152476618
  1091156  11
192.168.0.94:49147109612
   112688  11
192.168.0.109:49146   180819
  1489715  11
192.168.0.223:49146   110708
   114316  11
192.168.0.99:49147   2573412
157737429  11
192.168.0.2:49145 242696
 26088710  11
192.168.0.222:49145   109728
   113064  11
192.168.0.66:49145  27003740
215124678  11
192.168.0.15:49145  57217513
   594699  11
192.168.0.91:49132  89463431
  2714920  11
192.168.0.190:49148 4132
 6540  11
192.168.0.118:4913192380
94996  11
--
Brick : gluster190:/gluster/md3/sourceimages
Clients connected : 2
Hostname