Re: [Gluster-users] Cann't mount NFS,please help!
"sz_cui...@163.com" writes: > [1:multipart/alternative Hide] > > > [1/1:text/plain Show] > > > [1/2:text/html Hide Save:noname (10kB)] > > 1.The gluster server has set volume option nfs.disable to: off > > Volume Name: gv0 > Type: Disperse > Volume ID: 429100e4-f56d-4e28-96d0-ee837386aa84 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: gfs1:/brick1/gv0 > Brick2: gfs2:/brick1/gv0 > Brick3: gfs3:/brick1/gv0 > Options Reconfigured: > transport.address-family: inet > storage.fips-mode-rchecksum: on > nfs.disable: off I can't remember how to start the NFS process integrated with Gluster, but the documentation now suggest you better use NFS-Ganesha. Best regards, Olivier > > 2. The process has start. > > [root@gfs1 ~]# ps -ef | grep glustershd > root 1117 1 0 10:12 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id > shd/gv0 -p > /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S > /var/run/gluster/ca97b99a29c04606.socket > --xlator-option *replicate*.node-uuid=323075ea-2b38-427c-a9aa-70ce18e94208 > --process-name glustershd --client-pid=-6 > > 3.But the status of gv0 is not correct,for it's status of NFS Server is not > online. > > [root@gfs1 ~]# gluster volume status gv0 > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > -- > Brick gfs1:/brick1/gv0 49154 0 Y 4180 > Brick gfs2:/brick1/gv0 49154 0 Y 1222 > Brick gfs3:/brick1/gv0 49154 0 Y 1216 > Self-heal Daemon on localhost N/A N/A Y 1117 > NFS Server on localhost N/A N/A N N/A > Self-heal Daemon on gfs2 N/A N/A Y 1138 > NFS Server on gfs2 N/A N/A N N/A > Self-heal Daemon on gfs3 N/A N/A Y 1131 > NFS Server on gfs3 N/A N/A N N/A > > Task Status of Volume gv0 > -- > There are no active volume tasks > > 4.So, I cann't mount the gv0 on my client. > > [root@kvms1 ~]# mount -t nfs gfs1:/gv0 /mnt/test > mount.nfs: Connection refused > > Please Help! > Thanks! > > -- > sz_cui...@163.com > > [2:text/plain Hide] > > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 01/04/20 8:57 am, Erik Jacobson wrote: Here are some back traces. They make my head hurt. Maybe you can suggest something else to try next? In the morning I'll try to unwind this myself too in the source code but I suspect it will be tough for me. (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5 Breakpoint 1 at 0x7fff688e057b: file afr-read-txn.c, line 281. (gdb) continue Continuing. [Switching to Thread 0x7ffec700 (LWP 50175)] Thread 15 "glfs_epoll007" hit Breakpoint 1, afr_read_txn_refresh_done ( frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 281 if (err) { (gdb) bt #0 afr_read_txn_refresh_done (frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 #1 0x7fff68901fdb in afr_txn_refresh_done ( frame=frame@entry=0x7fff48325d78, this=this@entry=0x7fff640137b0, err=5, err@entry=0) at afr-common.c:1223 #2 0x7fff689022b3 in afr_inode_refresh_done ( frame=frame@entry=0x7fff48325d78, this=this@entry=0x7fff640137b0, error=0) at afr-common.c:1295 Hmm, afr_inode_refresh_done() is called with error=0 and by the time we reach afr_txn_refresh_done(), it becomes 5(i.e. EIO). So afr_inode_refresh_done() is changing it to 5. Maybe you can put breakpoints/ log messages in afr_inode_refresh_done() at the places where error is getting changed and see where the assignment happens. Regards, Ravi Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Cann't mount NFS,please help!
1.The gluster server has set volume option nfs.disable to: off Volume Name: gv0 Type: Disperse Volume ID: 429100e4-f56d-4e28-96d0-ee837386aa84 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gfs1:/brick1/gv0 Brick2: gfs2:/brick1/gv0 Brick3: gfs3:/brick1/gv0 Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: off 2. The process has start. [root@gfs1 ~]# ps -ef | grep glustershd root 1117 1 0 10:12 ?00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/ca97b99a29c04606.socket --xlator-option *replicate*.node-uuid=323075ea-2b38-427c-a9aa-70ce18e94208 --process-name glustershd --client-pid=-6 3.But the status of gv0 is not correct,for it's status of NFS Server is not online. [root@gfs1 ~]# gluster volume status gv0 Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid -- Brick gfs1:/brick1/gv0 49154 0 Y 4180 Brick gfs2:/brick1/gv0 49154 0 Y 1222 Brick gfs3:/brick1/gv0 49154 0 Y 1216 Self-heal Daemon on localhost N/A N/AY 1117 NFS Server on localhost N/A N/AN N/A Self-heal Daemon on gfs2N/A N/AY 1138 NFS Server on gfs2 N/A N/AN N/A Self-heal Daemon on gfs3N/A N/AY 1131 NFS Server on gfs3 N/A N/AN N/A Task Status of Volume gv0 -- There are no active volume tasks 4.So, I cann't mount the gv0 on my client. [root@kvms1 ~]# mount -t nfs gfs1:/gv0 /mnt/test mount.nfs: Connection refused Please Help! Thanks! sz_cui...@163.com Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
THANK YOU for the hints. Very happy to have the help. I'll reply to a couple things then dig in: On Tue, Mar 31, 2020 at 03:27:59PM +0530, Ravishankar N wrote: > From your reply in the other thread, I'm assuming that the file/gfid in > question is not in genuine split-brain or needing heal. i.e. for example Right, they were not tagged split-brain either, just healing needed, which is expected for those 76 files. > with that 1 brick down and 2 bricks up test case, if you tried to read the > file from say a temporary fuse mount (which is also now connected to only to > 2 bricks since the 3rd one is down) it works fine and there is no EIO > error... Looking at the heal info, all files are the files I expected to have write changes and I think* are outside the scope of this issue. To close the loop, I ran a 'strings' on the top of one the files to confirm from a fuse mount and had no trouble. > ...which means that what you have observed is true, i.e. > afr_read_txn_refresh_done() is called with err=EIO. You can add logs to see > at what point it is EIO set. The call graph is like this: > afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). > > Maybe > https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-common.c#L1188 > in afr_txn_refresh_done() is causing it either due to ret being -EIO or > event_generation being zero. > > If you are comfortable with gdb, you an put a conditional break point in > afr_read_txn_refresh_done() at > https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-read-txn.c#L283 > when err=EIO and then check the backtrace for who is setting err to EIO. Ok so the main event! :) I'm not a gdb expert but I think I figured it out well enough to paste some back traces. However, I'm having trouble intepreting them exactly. It looks to me to be the "event" case. (I got permission to use this MFG system at night for a couple more nights; avoiding the 24-hour-reserved internal larger system we have). here is what I did, feel free to suggest something better. - I am using an RPM build so I changed the spec file to create debuginfo packages. I'm on rhel8.1 - I installed the updated packages and debuginfo packages - When glusterd started the nfs glusterfs, I killed it. - I ran this: gdb -d /root/rpmbuild/BUILD/glusterfs-7.2 -d /root/rpmbuild/BUILD/glusterfs-7.2/xlators/cluster/afr/src/ /usr/sbin/glusterfs - Then from GDB, I ran this: (gdb) run -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/9ddb5561058ff543.socket -N - I hit ctrl-c, then set the break point: (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5 - I have some debugging statements but gluster 72 line 280 is this: --> line 280 (I think gdb changed it to 281 internally) if (err) { if (!priv->thin_arbiter_count) { - continue - Then I ran the test case. Here are some back traces. They make my head hurt. Maybe you can suggest something else to try next? In the morning I'll try to unwind this myself too in the source code but I suspect it will be tough for me. (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5 Breakpoint 1 at 0x7fff688e057b: file afr-read-txn.c, line 281. (gdb) continue Continuing. [Switching to Thread 0x7ffec700 (LWP 50175)] Thread 15 "glfs_epoll007" hit Breakpoint 1, afr_read_txn_refresh_done ( frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 281 if (err) { (gdb) bt #0 afr_read_txn_refresh_done (frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 #1 0x7fff68901fdb in afr_txn_refresh_done ( frame=frame@entry=0x7fff48325d78, this=this@entry=0x7fff640137b0, err=5, err@entry=0) at afr-common.c:1223 #2 0x7fff689022b3 in afr_inode_refresh_done ( frame=frame@entry=0x7fff48325d78, this=this@entry=0x7fff640137b0, error=0) at afr-common.c:1295 #3 0x7fff6890f3fb in afr_inode_refresh_subvol_cbk (frame=0x7fff48325d78, cookie=, this=0x7fff640137b0, op_ret=, op_errno=, buf=buf@entry=0x7ffecfffdaa0, xdata=0x7ffeb806ef08, par=0x7ffecfffdb40) at afr-common.c:1333 #4 0x7fff6890f42a in afr_inode_refresh_subvol_with_lookup_cbk ( frame=, cookie=, this=, op_ret=, op_errno=, inode=, buf=0x7ffecfffdaa0, xdata=0x7ffeb806ef08, par=0x7ffecfffdb40) at afr-common.c:1344 #5 0x7fff68b8e96f in client4_0_lookup_cbk (req=, iov=, count=, myframe=0x7fff483147b8) at client-rpc-fops_v2.c:2640 #6 0x7fffed293115 in rpc_clnt_handle_reply ( clnt=clnt@entry=0x7fff640671b0, pollin=pollin@entry=0x7ffeb81aa110) at rpc-clnt.c:764 #7 0x7fffed2934b3 in rpc_clnt_notify (trans=0x7fff64067540, mydata=0x7fff640671e0, event=, data=0x7ffeb81aa110) at rpc-clnt.c:931 #8 0x7fffed29007b in rpc_transport_notify ( this=this@entry=0x7fff64067540, event=event@entry=RPC_TRANSPORT_MSG_REC
Re: [Gluster-users] [rhgs-devel] Announcing Gluster release 5.12
Thanks for the responses, Kaleb and Hari. I'm eyeing CentOS 8 for later this year, but I can also make the jump to Gluster 6 or 7 before I do that, so no worries. I appreciate the work you're doing. Regards, On Tue, Mar 31, 2020 at 2:30 PM Kaleb Keithley wrote: > Support of upstream, community-built packages is pretty nebulous. If it > builds, with little or no work, typically we package it. Actual support, as > in help with problems, comes from the "community." > > Niels and I discussed building glusterfs-5 for C8 and decided we'd wait > and see if anyone actually asked for it. > > Typical places to reach the CentOS Storage SIG people would be the > #centos-devel channel on FreeNode IRC, the centos-devel@ and/or the > centos-storage-...@centos.org mailing lists, and also here, to a lesser > extent, on gluster-de...@gluster.org. > > -- > > Kaleb > > > On Tue, Mar 31, 2020 at 4:39 AM Yaniv Kaul wrote: > >> >> >> On Tue, Mar 31, 2020 at 10:06 AM Alan Orth wrote: >> >>> Thanks, Hari! Do you know where the CentOS Storage SIG does their >>> release planning? I'm curious as they have released CentOS 8 packages for >>> Gluster 6 and Guster 7, but not Gluster 5. >>> >> >> I'm not sure it makes sense to support Gluster 5 with CentOS 8. >> I would not have supported 6 as well? >> Y. >> >>> >>> http://mirror.centos.org/centos/8/storage/x86_64/ >>> >>> Regards, >>> >>> On Mon, Mar 2, 2020 at 10:20 AM Hari Gowtham >>> wrote: >>> Hi, The Gluster community is pleased to announce the release of Gluster 5.12 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features and limitations addressed in this release: None Thanks, Gluster community [1] Packages for 5.12: https://download.gluster.org/pub/gluster/glusterfs/5/5.12/ [2] Release notes for 5.12: https://docs.gluster.org/en/latest/release-notes/5.12/ -- Regards, Hari Gowtham. Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> -- >>> Alan Orth >>> alan.o...@gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> -- Alan Orth alan.o...@gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [rhgs-devel] Announcing Gluster release 5.12
Support of upstream, community-built packages is pretty nebulous. If it builds, with little or no work, typically we package it. Actual support, as in help with problems, comes from the "community." Niels and I discussed building glusterfs-5 for C8 and decided we'd wait and see if anyone actually asked for it. Typical places to reach the CentOS Storage SIG people would be the #centos-devel channel on FreeNode IRC, the centos-devel@ and/or the centos-storage-...@centos.org mailing lists, and also here, to a lesser extent, on gluster-de...@gluster.org. -- Kaleb On Tue, Mar 31, 2020 at 4:39 AM Yaniv Kaul wrote: > > > On Tue, Mar 31, 2020 at 10:06 AM Alan Orth wrote: > >> Thanks, Hari! Do you know where the CentOS Storage SIG does their release >> planning? I'm curious as they have released CentOS 8 packages for Gluster 6 >> and Guster 7, but not Gluster 5. >> > > I'm not sure it makes sense to support Gluster 5 with CentOS 8. > I would not have supported 6 as well? > Y. > >> >> http://mirror.centos.org/centos/8/storage/x86_64/ >> >> Regards, >> >> On Mon, Mar 2, 2020 at 10:20 AM Hari Gowtham wrote: >> >>> Hi, >>> >>> The Gluster community is pleased to announce the release of Gluster >>> 5.12 (packages available at [1]). >>> >>> Release notes for the release can be found at [2]. >>> >>> Major changes, features and limitations addressed in this release: >>> None >>> >>> Thanks, >>> Gluster community >>> >>> [1] Packages for 5.12: >>> https://download.gluster.org/pub/gluster/glusterfs/5/5.12/ >>> >>> [2] Release notes for 5.12: >>> https://docs.gluster.org/en/latest/release-notes/5.12/ >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> -- >> Alan Orth >> alan.o...@gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 6.8 & debian
Hi, The packages are rebuilt with the missing dependencies and updated. Regards, Sheetal Pamecha On Mon, Mar 30, 2020 at 6:53 PM Sheetal Pamecha wrote: > Hi Hubert, > > This time we triggered the automation scripts for package building instead > of doing it manually. It seems this is a bug in the script that all lib > packages are excluded. > Thanks for trying and pointing it out. We are working to resolve this. I > will update the package once build is complete. > > Regards, > Sheetal Pamecha > > > On Mon, Mar 30, 2020 at 5:28 PM Hu Bert wrote: > >> Hi Sheetal, >> >> thx so far, but some additional packages are missing: libgfapi0, >> libgfchangelog0, libgfrpc0, libgfxdr0, libglusterfs0 >> >> The following packages have unmet dependencies: >> glusterfs-common : Depends: libgfapi0 (>= 6.8) but it is not going to >> be installed >> Depends: libgfchangelog0 (>= 6.8) but it is not >> going to be installed >> Depends: libgfrpc0 (>= 6.8) but it is not going to >> be installed >> Depends: libgfxdr0 (>= 6.8) but it is not going to >> be installed >> Depends: libglusterfs0 (>= 6.8) but it is not >> going to be installed >> glusterfs-server : Depends: libgfapi0 (>= 6.8) but it is not going to >> be installed >> Depends: libgfrpc0 (>= 6.8) but it is not going to >> be installed >> Depends: libgfxdr0 (>= 6.8) but it is not going to >> be installed >> Depends: libglusterfs0 (>= 6.8) but it is not >> going to be installed >> >> All the lib* packages are simply missing for version 6.8, but are >> there for version 6.7. >> >> >> https://download.gluster.org/pub/gluster/glusterfs/6/6.7/Debian/buster/amd64/apt/pool/main/g/glusterfs/ >> vs. >> >> https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/ >> >> Can you please check? >> >> >> Thx, >> Hubert >> >> Am Mo., 30. März 2020 um 12:57 Uhr schrieb Sheetal Pamecha >> : >> > >> > Hi, >> > >> > I have updated the path now and now latest points to 6.8 and packages >> in place. >> > Regards, >> > Sheetal Pamecha >> > >> > >> > On Mon, Mar 30, 2020 at 2:23 PM Hu Bert wrote: >> >> >> >> Hello, >> >> >> >> now the packages appeared: >> >> >> >> >> https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/ >> >> >> >> Dated: 2020-03-17 - so this looks good, right? Thx to the one who... >> ;-) >> >> >> >> >> >> Best Regards, >> >> Hubert >> >> >> >> Am Do., 26. März 2020 um 15:03 Uhr schrieb Ingo Fischer < >> i...@fischer-ka.de>: >> >> > >> >> > Hey, >> >> > >> >> > I also asked for "when 6.8 comes to LATEST" in two mails here the >> last >> >> > weeks ... I would also be very interested in the reasons. >> >> > >> >> > Ingo >> >> > >> >> > Am 26.03.20 um 07:15 schrieb Hu Bert: >> >> > > Hello, >> >> > > >> >> > > i just wanted to test an upgrade from version 5.12 to version 6.8, >> but >> >> > > there are no packages for debian buster in version 6.8. >> >> > > >> >> > > >> https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/ >> >> > > >> >> > > This directory is empty. LATEST still links to version 6.7 >> >> > > >> >> > > https://download.gluster.org/pub/gluster/glusterfs/6/LATEST/ -> >> 6.7 >> >> > > >> >> > > 6.8 was released on 2nd of march - is there any reason why there >> are >> >> > > no packages? bugs? >> >> > > >> >> > > >> >> > > Best regards >> >> > > >> >> > > Hubert >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > Community Meeting Calendar: >> >> > > >> >> > > Schedule - >> >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >> > > Bridge: https://bluejeans.com/441850968 >> >> > > >> >> > > Gluster-users mailing list >> >> > > Gluster-users@gluster.org >> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > >> >> >> >> >> >> >> >> >> >> Community Meeting Calendar: >> >> >> >> Schedule - >> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >> Bridge: https://bluejeans.com/441850968 >> >> >> >> Gluster-users mailing list >> >> Gluster-users@gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
From your reply in the other thread, I'm assuming that the file/gfid in question is not in genuine split-brain or needing heal. i.e. for example with that 1 brick down and 2 bricks up test case, if you tried to read the file from say a temporary fuse mount (which is also now connected to only to 2 bricks since the 3rd one is down) it works fine and there is no EIO error... ...which means that what you have observed is true, i.e. afr_read_txn_refresh_done() is called with err=EIO. You can add logs to see at what point it is EIO set. The call graph is like this: afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). Maybe https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-common.c#L1188 in afr_txn_refresh_done() is causing it either due to ret being -EIO or event_generation being zero. If you are comfortable with gdb, you an put a conditional break point in afr_read_txn_refresh_done() at https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-read-txn.c#L283 when err=EIO and then check the backtrace for who is setting err to EIO. Regards, Ravi On 31/03/20 12:20 pm, Erik Jacobson wrote: I note that this part of afr_read_txn() gets triggered a lot. if (afr_is_inode_refresh_reqd(inode, this, local->event_generation, event_generation)) { Maybe that's normal when one of the three servers are down (but why isn't it using its local copy by default?) The comment in that if block is: /* servers have disconnected / reconnected, and possibly rebooted, very likely changing the state of freshness of copies */ But we have one server conssitently down, not a changing situation. digging digging digging seemed to show this related to cache invalidation Because the paths seemed to suggest the inode needed refreshing and that seems handled by a case statement named GF_UPCALL_CACHE_INVALIDATION However, that must have been a wrong turn since turning off cache invalidation didn't help. I'm struggling to wrap my head around the code base and without the background in these concepts it's a tough hill to climb. I am going to have to try this again some day with fresh eyes and go to bed; the machine I have easy access to is going away in the morning. Now I'll have to reserve time on a contended one but I will do that and continue digging. Any suggestions would be greatly appreciated as I think I'm starting to tip over here on this one. On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote: Sadly I am not a developer, so I can't answer your questions. I'm not a FS o rnetwork developer either. I think there is a joke about playing one on TV but maybe it's netflix now. Enabling certain debug options made too much information for me to watch personally (but an expert could probably get through it). So I started putting targeted 'print' (gf_msg) statements in the code to see how it got its way to split-brain. Maybe this will ring a bell for someone. I can tell the only way we enter the split-brain path is through in the first if statement of afr_read_txn_refresh_done(). This means afr_read_txn_refresh_done() itself was passed "err" and that it appears thin_arbiter_count was not set (which makes sense, I'm using 1x3, not a thin arbiter). So we jump to the readfn label, and read_subvol() should still be -1. If I read right, it must mean that this if didn't return true because my print statement didn't appear: if ((ret == 0) && spb_choice >= 0) { So we're still with the original read_subvol == 1, Which gets us to the split_brain message. So now I will try to learn why afr_read_txn_refresh_done() would have 'err' set in the first place. I will also learn about afr_inode_split_brain_choice_get(). Those seem to be the two methods to have avoided falling in to the split brain hole here. I put debug statements in these locations. I will mark with !! what I see: diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c --- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c 2020-01-15 11:43:53.887894293 -0600 +++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c2020-03-30 15:45:02.917104321 -0500 @@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t * priv = this->private; if (err) { -if (!priv->thin_arbiter_count) +if (!priv->thin_arbiter_count) { +gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn"); !! We hit this error condition and jump to readfn below !!! goto readfn; -if (err != EINVAL) +} +if (err != EINVAL) { +gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in afr_read_txn_refresh_done() err != EINVAL, goto readfn");
Re: [Gluster-users] Announcing Gluster release 5.12
Hi Alan, The best person to answer the above question is Niels. I have CCed him to this email. Hi @Niels de Vos please do take a look. On Tue, Mar 31, 2020 at 12:36 PM Alan Orth wrote: > Thanks, Hari! Do you know where the CentOS Storage SIG does their release > planning? I'm curious as they have released CentOS 8 packages for Gluster 6 > and Guster 7, but not Gluster 5. > > http://mirror.centos.org/centos/8/storage/x86_64/ > > Regards, > > On Mon, Mar 2, 2020 at 10:20 AM Hari Gowtham wrote: > >> Hi, >> >> The Gluster community is pleased to announce the release of Gluster >> 5.12 (packages available at [1]). >> >> Release notes for the release can be found at [2]. >> >> Major changes, features and limitations addressed in this release: >> None >> >> Thanks, >> Gluster community >> >> [1] Packages for 5.12: >> https://download.gluster.org/pub/gluster/glusterfs/5/5.12/ >> >> [2] Release notes for 5.12: >> https://docs.gluster.org/en/latest/release-notes/5.12/ >> >> >> -- >> Regards, >> Hari Gowtham. >> >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > -- > Alan Orth > alan.o...@gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > -- Regards, Hari Gowtham. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Announcing Gluster release 5.12
Thanks, Hari! Do you know where the CentOS Storage SIG does their release planning? I'm curious as they have released CentOS 8 packages for Gluster 6 and Guster 7, but not Gluster 5. http://mirror.centos.org/centos/8/storage/x86_64/ Regards, On Mon, Mar 2, 2020 at 10:20 AM Hari Gowtham wrote: > Hi, > > The Gluster community is pleased to announce the release of Gluster > 5.12 (packages available at [1]). > > Release notes for the release can be found at [2]. > > Major changes, features and limitations addressed in this release: > None > > Thanks, > Gluster community > > [1] Packages for 5.12: > https://download.gluster.org/pub/gluster/glusterfs/5/5.12/ > > [2] Release notes for 5.12: > https://docs.gluster.org/en/latest/release-notes/5.12/ > > > -- > Regards, > Hari Gowtham. > > > > > Community Meeting Calendar: > > Schedule - > Every Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Alan Orth alan.o...@gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users