Re: [Gluster-users] brick is down but gluster volume status says it's fine
It looks like this is to do with the stale port issue. I think it's pretty clear from the below that the digitalcorpora brick process is shown by volume status as having the same TCP port as the public volume brick on gluster-2, 49156. But is actually listening on 49154. So although the brick process is technically up nothing is talking to it. I am surprised I don't see more errors in the brick log for brick8/public. It also explains the wack-a-mole problem, Every time I kill and restart the daemon it must be grabbing the port of another brick and then that volume brick goes silent. I killed all the brick processes and restarted glusterd and everything came up ok. [root@gluster-2 ~]# glv status digitalcorpora | grep -v ^Self Status of volume: digitalcorpora Gluster process TCP Port RDMA Port Online Pid -- Brick gluster-2:/export/brick7/digitalcorpo ra 49156 0 Y 125708 Brick gluster1.vsnet.gmu.edu:/export/brick7 /digitalcorpora 49152 0 Y 12345 Brick gluster0:/export/brick7/digitalcorpor a 49152 0 Y 16098 Task Status of Volume digitalcorpora -- There are no active volume tasks [root@gluster-2 ~]# glv status public | grep -v ^Self Status of volume: public Gluster process TCP Port RDMA Port Online Pid -- Brick gluster1:/export/brick8/public49156 0 Y 3519 Brick gluster2:/export/brick8/public49156 0 Y 8578 Brick gluster0:/export/brick8/public49156 0 Y 3176 Task Status of Volume public -- There are no active volume tasks [root@gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0 tcp0 0 0.0.0.0:49156 0.0.0.0:* LISTEN 8578/glusterfsd [root@gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0 tcp0 0 0.0.0.0:49154 0.0.0.0:* LISTEN 125708/glusterfsd [root@gluster-2 ~]# ps -c --pid 125708 8578 PID CLS PRI TTY STAT TIME COMMAND 8578 TS 19 ?Ssl 224:20 /usr/sbin/glusterfsd -s gluster2 --volfile-id public.gluster2.export-brick8-public -p /var/lib/glusterd/vols/public/run/gluster2-export-bric 125708 TS 19 ?Ssl0:08 /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpor [root@gluster-2 ~]# On 24 October 2017 at 13:56, Atin Mukherjeewrote: > > > On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil > wrote: > >> gluster version 3.10.6, replica 3 volume, daemon is present but does not >> appear to be functioning >> >> peculiar behaviour. If I kill the glusterfs brick daemon and restart >> glusterd then the brick becomes available - but one of my other volumes >> bricks on the same server goes down in the same way it's like wack-a-mole. >> >> any ideas? >> > > The subject and the data looks to be contradictory to me. Brick log (what > you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are > you sure brick is down? OTOH, I see a mismatch of port for > brick7/digitalcorpora where the brick process has 49154 but gluster volume > status shows 49152. There is an issue with stale port which we're trying to > address through https://review.gluster.org/18541 . But could you specify > what exactly the problem is? Is it the stale port or the conflict between > volume status output and actual brick health? If it's the latter, I'd need > further information like output of "gluster get-state" command from the > same node. > > >> >> [root@gluster-2 bricks]# glv status digitalcorpora >> >>> Status of volume: digitalcorpora >>> Gluster process TCP Port RDMA Port >>> Online Pid >>> >>> -- >>> Brick gluster-2:/export/brick7/digitalcorpo >>> ra 49156 0 >>> Y 125708 >>> Brick gluster1.vsnet.gmu.edu:/export/brick7 >>> /digitalcorpora 49152 0 >>> Y 12345 >>> Brick gluster0:/export/brick7/digitalcorpor >>> a 49152 0 >>> Y 16098 >>> Self-heal Daemon on localhost N/A N/AY >>> 126625 >>> Self-heal Daemon on gluster1N/A N/AY >>> 15405 >>> Self-heal Daemon on gluster0N/A N/AY >>> 18584 >>> >>> Task Status of Volume digitalcorpora >>> >>>
Re: [Gluster-users] brick is down but gluster volume status says it's fine
On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neilwrote: > gluster version 3.10.6, replica 3 volume, daemon is present but does not > appear to be functioning > > peculiar behaviour. If I kill the glusterfs brick daemon and restart > glusterd then the brick becomes available - but one of my other volumes > bricks on the same server goes down in the same way it's like wack-a-mole. > > any ideas? > The subject and the data looks to be contradictory to me. Brick log (what you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are you sure brick is down? OTOH, I see a mismatch of port for brick7/digitalcorpora where the brick process has 49154 but gluster volume status shows 49152. There is an issue with stale port which we're trying to address through https://review.gluster.org/18541 . But could you specify what exactly the problem is? Is it the stale port or the conflict between volume status output and actual brick health? If it's the latter, I'd need further information like output of "gluster get-state" command from the same node. > > [root@gluster-2 bricks]# glv status digitalcorpora > >> Status of volume: digitalcorpora >> Gluster process TCP Port RDMA Port Online >> Pid >> >> -- >> Brick gluster-2:/export/brick7/digitalcorpo >> ra 49156 0 Y >> 125708 >> Brick gluster1.vsnet.gmu.edu:/export/brick7 >> /digitalcorpora 49152 0 Y >> 12345 >> Brick gluster0:/export/brick7/digitalcorpor >> a 49152 0 Y >> 16098 >> Self-heal Daemon on localhost N/A N/AY >> 126625 >> Self-heal Daemon on gluster1N/A N/AY >> 15405 >> Self-heal Daemon on gluster0N/A N/AY >> 18584 >> >> Task Status of Volume digitalcorpora >> >> -- >> There are no active volume tasks >> >> [root@gluster-2 bricks]# glv heal digitalcorpora info >> Brick gluster-2:/export/brick7/digitalcorpora >> Status: Transport endpoint is not connected >> Number of entries: - >> >> Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora >> /.trashcan >> /DigitalCorpora/hello2.txt >> /DigitalCorpora >> Status: Connected >> Number of entries: 3 >> >> Brick gluster0:/export/brick7/digitalcorpora >> /.trashcan >> /DigitalCorpora/hello2.txt >> /DigitalCorpora >> Status: Connected >> Number of entries: 3 >> >> [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25] >> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135] >> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: >> received signum (15), shutting down >> [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] >> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 >> (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id >> digitalcorpora.gluster-2.export-brick7-digitalcorpora -p >> /var/lib/glusterd/vols/digitalcorpora/run/gluster-2- >> export-brick7-digitalcorpora.pid -S /var/run/gluster/ >> f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name >> /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/ >> export-brick7-digitalcorpora.log --xlator-option *-posix.glusterd-uuid= >> 032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port 49154 --xlator-option >> digitalcorpora-server.listen-port=49154) >> [2017-10-24 17:18:59.285279] I [MSGID: 101190] >> [event-epoll.c:629:event_dispatch_epoll_worker] >> 0-epoll: Started thread with index 1 >> [2017-10-24 17:19:04.611723] I >> [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] >> 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 >> [2017-10-24 17:19:04.611815] W [MSGID: 101002] >> [options.c:954:xl_opt_validate] >> 0-digitalcorpora-server: option 'listen-port' is deprecated, preferred is >> 'transport.socket.listen-port', continuing with correction >> [2017-10-24 17:19:04.615974] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'rpc-auth.auth-glusterfs' is not recognized >> [2017-10-24 17:19:04.616033] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'rpc-auth.auth-unix' is not recognized >> [2017-10-24 17:19:04.616070] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'rpc-auth.auth-null' is not recognized >> [2017-10-24 17:19:04.616134] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'auth-path' is not recognized >> [2017-10-24 17:19:04.616177] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'ping-timeout' is not recognized >>
[Gluster-users] brick is down but gluster volume status says it's fine
gluster version 3.10.6, replica 3 volume, daemon is present but does not appear to be functioning peculiar behaviour. If I kill the glusterfs brick daemon and restart glusterd then the brick becomes available - but one of my other volumes bricks on the same server goes down in the same way it's like wack-a-mole. any ideas? [root@gluster-2 bricks]# glv status digitalcorpora > Status of volume: digitalcorpora > Gluster process TCP Port RDMA Port Online > Pid > > -- > Brick gluster-2:/export/brick7/digitalcorpo > ra 49156 0 Y > 125708 > Brick gluster1.vsnet.gmu.edu:/export/brick7 > /digitalcorpora 49152 0 Y > 12345 > Brick gluster0:/export/brick7/digitalcorpor > a 49152 0 Y > 16098 > Self-heal Daemon on localhost N/A N/AY > 126625 > Self-heal Daemon on gluster1N/A N/AY > 15405 > Self-heal Daemon on gluster0N/A N/AY > 18584 > > Task Status of Volume digitalcorpora > > -- > There are no active volume tasks > > [root@gluster-2 bricks]# glv heal digitalcorpora info > Brick gluster-2:/export/brick7/digitalcorpora > Status: Transport endpoint is not connected > Number of entries: - > > Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora > /.trashcan > /DigitalCorpora/hello2.txt > /DigitalCorpora > Status: Connected > Number of entries: 3 > > Brick gluster0:/export/brick7/digitalcorpora > /.trashcan > /DigitalCorpora/hello2.txt > /DigitalCorpora > Status: Connected > Number of entries: 3 > > [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25] > -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135] > -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: > received signum (15), shutting down > [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 > (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id > digitalcorpora.gluster-2.export-brick7-digitalcorpora -p > /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-export-brick7-digitalcorpora.pid > -S /var/run/gluster/f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name > /export/brick7/digitalcorpora -l > /var/log/glusterfs/bricks/export-brick7-digitalcorpora.log --xlator-option > *-posix.glusterd-uuid=032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port > 49154 --xlator-option digitalcorpora-server.listen-port=49154) > [2017-10-24 17:18:59.285279] I [MSGID: 101190] > [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2017-10-24 17:19:04.611723] I > [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured > rpc.outstanding-rpc-limit with value 64 > [2017-10-24 17:19:04.611815] W [MSGID: 101002] > [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option > 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', > continuing with correction > [2017-10-24 17:19:04.615974] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'rpc-auth.auth-glusterfs' is not recognized > [2017-10-24 17:19:04.616033] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'rpc-auth.auth-unix' is not recognized > [2017-10-24 17:19:04.616070] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'rpc-auth.auth-null' is not recognized > [2017-10-24 17:19:04.616134] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'auth-path' is not recognized > [2017-10-24 17:19:04.616177] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'ping-timeout' is not recognized > [2017-10-24 17:19:04.616203] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'rpc-auth-allow-insecure' is not recognized > [2017-10-24 17:19:04.616215] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized > [2017-10-24 17:19:04.616226] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth-path' is not recognized > [2017-10-24 17:19:04.616237] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is not > recognized > [2017-10-24 17:19:04.616248] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option
Re: [Gluster-users] active-active georeplication?
On Tue, Oct 24, 2017 at 11:04 AM, atris adamwrote: > thx for reply, that was so much interesting to me. > How can I get these news about glusterfs new features? > The release notes generally contain information about new features. You can also lookup the github projects page [2] for understanding feature to release mapping. Regards, Vijay [2] https://github.com/gluster/glusterfs/projects/1 > > On Tue, Oct 24, 2017 at 5:54 PM, Vijay Bellur wrote: > >> >> Halo replication [1] could be of interest here. This functionality is >> available since 3.11 and the current plan is to have it fully supported in >> a 4.x release. >> >> Note that Halo replication is built on existing synchronous replication >> in Gluster and differs from the current geo-replication implementation. >> Kotresh's response is spot on for the current geo-replication >> implementation. >> >> Regards, >> Vijay >> >> [1] https://github.com/gluster/glusterfs/issues/199 >> >> On Tue, Oct 24, 2017 at 5:13 AM, Kotresh Hiremath Ravishankar < >> khire...@redhat.com> wrote: >> >>> Hi, >>> >>> No, gluster doesn't support active-active geo-replication. It's not >>> planned in near future. We will let you know when it's planned. >>> >>> Thanks, >>> Kotresh HR >>> >>> On Tue, Oct 24, 2017 at 11:19 AM, atris adam >>> wrote: >>> hi everybody, Have glusterfs released a feature named active-active georeplication? If yes, in which version it is released? If no, is it planned to have this feature? ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >>> ___ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gfid entries in volume heal info that do not heal
I have 14,734 GFIDS that are different. All the different ones are only on the brick that was live during the outage and concurrent file copy- in. The brick that was down at that time has no GFIDs that are not also on the up brick. As the bricks are 10TB, the find is going to be a long running process. I'm running several finds at once with gnu parallel but it will still take some time. Can't bring the up machine offline as it's in use. At least I have 24 cores to work with. I've only tested with one GFID but the file it referenced _IS_ on the down machine even though it has no GFID in the .glusterfs structure. On Tue, 2017-10-24 at 12:35 +0530, Karthik Subrahmanya wrote: > Hi Jim, > > Can you check whether the same hardlinks are present on both the > bricks & both of them have the link count 2? > If the link count is 2 then "find -samefile > bits of gfid>//" > should give you the file path. > > Regards, > Karthik > > On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney> wrote: > > > > > > > > I'm not so lucky. ALL of mine show 2 links and none have the attr > > data that supplies the path to the original. > > > > I have the inode from stat. Looking now to dig out the > > path/filename from xfs_db on the specific inodes individually. > > > > Is the hash of the filename or /filename and if so relative > > to where? /, , ? > > > > On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote: > > > In my case I was able to delete the hard links in the .glusterfs > > > folders of the bricks and it seems to have done the trick, > > > thanks! > > > > > > > > > From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] > > > > > > > > > Sent: Monday, October 23, 2017 1:52 AM > > > > > > To: Jim Kinney ; Matt Waymack > > dv.com> > > > > > > Cc: gluster-users > > > > > > Subject: Re: [Gluster-users] gfid entries in volume heal info > > > that do not heal > > > > > > > > > > > > > > > Hi Jim & Matt, > > > > > > Can you also check for the link count in the stat output of those > > > hardlink entries in the .glusterfs folder on the bricks. > > > > > > If the link count is 1 on all the bricks for those entries, then > > > they are orphaned entries and you can delete those hardlinks. > > > > > > > > > To be on the safer side have a backup before deleting any of the > > > entries. > > > > > > > > > Regards, > > > > > > > > > Karthik > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney > > > wrote: > > > > > > > > I've been following this particular thread as I have a similar > > > > issue (RAID6 array failed out with 3 dead drives at once while > > > > a 12 TB load was being copied into one mounted space - what a > > > > mess) > > > > > > > > > > > > > > > > > > > > > > > > I have >700K GFID entries that have no path data: > > > > > > > > > > > > Example: > > > > > > > > > > > > getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7-401b- > > > > 84b5-ff2a51c10421 > > > > > > > > > > > > # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 > > > > > > > > > > > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c61 > > > > 62656c65645f743a733000 > > > > > > > > > > > > trusted.bit-rot.version=0x020059b1b316000270e7 > > > > > > > > > > > > trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421 > > > > > > > > > > > > > > > > > > > > > > > > [root@bmidata1 brick]# getfattr -d -n > > > > trusted.glusterfs.pathinfo -e hex -m . > > > > .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 > > > > > > > > > > > > .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421: > > > > trusted.glusterfs.pathinfo: No such attribute > > > > > > > > > > > > > > > > > > > > > > > > I had to totally rebuild the dead RAID array and did a copy > > > > from the live one before activating gluster on the rebuilt > > > > system. I accidentally copied over the .glusterfs folder from > > > > the working side > > > > > > > > > > > > (replica 2 only for now - adding arbiter node as soon as I can > > > > get this one cleaned up). > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've run the methods from "http://docs.gluster.org/en/latest/Tr > > > > oubleshooting/gfid-to-path/" with no results using random > > > > GFIDs. A full systemic > > > > run using the script from method 3 crashes with "too many > > > > nested links" error (or something similar). > > > > > > > > > > > > > > > > > > > > > > > > When I run gluster volume heal volname info, I get 700K+ GFIDs. > > > > Oh. gluster 3.8.4 on Centos 7.3 > > > > > > > > > > > > > > > > > > > > > > > > Should I just remove the contents of the .glusterfs folder on > > > > both and restart gluster and run a ls/stat on every file? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I run a heal, it no longer has a decreasing
Re: [Gluster-users] active-active georeplication?
thx for reply, that was so much interesting to me. How can I get these news about glusterfs new features? On Tue, Oct 24, 2017 at 5:54 PM, Vijay Bellurwrote: > > Halo replication [1] could be of interest here. This functionality is > available since 3.11 and the current plan is to have it fully supported in > a 4.x release. > > Note that Halo replication is built on existing synchronous replication in > Gluster and differs from the current geo-replication implementation. > Kotresh's response is spot on for the current geo-replication > implementation. > > Regards, > Vijay > > [1] https://github.com/gluster/glusterfs/issues/199 > > On Tue, Oct 24, 2017 at 5:13 AM, Kotresh Hiremath Ravishankar < > khire...@redhat.com> wrote: > >> Hi, >> >> No, gluster doesn't support active-active geo-replication. It's not >> planned in near future. We will let you know when it's planned. >> >> Thanks, >> Kotresh HR >> >> On Tue, Oct 24, 2017 at 11:19 AM, atris adam >> wrote: >> >>> hi everybody, >>> >>> Have glusterfs released a feature named active-active georeplication? If >>> yes, in which version it is released? If no, is it planned to have this >>> feature? >>> >>> ___ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> -- >> Thanks and Regards, >> Kotresh H R >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] active-active georeplication?
Halo replication [1] could be of interest here. This functionality is available since 3.11 and the current plan is to have it fully supported in a 4.x release. Note that Halo replication is built on existing synchronous replication in Gluster and differs from the current geo-replication implementation. Kotresh's response is spot on for the current geo-replication implementation. Regards, Vijay [1] https://github.com/gluster/glusterfs/issues/199 On Tue, Oct 24, 2017 at 5:13 AM, Kotresh Hiremath Ravishankar < khire...@redhat.com> wrote: > Hi, > > No, gluster doesn't support active-active geo-replication. It's not > planned in near future. We will let you know when it's planned. > > Thanks, > Kotresh HR > > On Tue, Oct 24, 2017 at 11:19 AM, atris adamwrote: > >> hi everybody, >> >> Have glusterfs released a feature named active-active georeplication? If >> yes, in which version it is released? If no, is it planned to have this >> feature? >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Thanks and Regards, > Kotresh H R > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] trying to add a 3rd peer
I always used IP addresses instead of names when I added a peer. In the gluster peer status, I do see IP: [root@DC-MTL-NAS-01 ~]# gluster peer status Number of Peers: 2 Hostname: XXX.XXX.XXX.12 Uuid: ec1e10c1-0e38-4d2a-ab51-50fb0c67b6ee State: Peer in Cluster (Connected) Hostname: XXX.XXX.XXX.13 Uuid: eef75e55-170a-4621-9d6e-3b5c3a6e5561 State: Accepted peer request (Disconnected) I can ping those IPs from any server. >From the Server 3 Gluster logs, I can see this: [2017-10-24 12:31:33.012446] I [MSGID: 100030] [glusterfsd.c:2503:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2017-10-24 12:31:33.020739] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536 [2017-10-24 12:31:33.020796] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory [2017-10-24 12:31:33.029673] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.6/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2017-10-24 12:31:33.029702] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2017-10-24 12:31:33.029715] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2017-10-24 12:31:33.029731] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2017-10-24 12:31:33.032226] I [MSGID: 106228] [glusterd.c:500:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2017-10-24 12:31:33.032816] I [MSGID: 106513] [glusterd-store.c:2201:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31000 [2017-10-24 12:31:33.042393] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2017-10-24 12:31:33.042474] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2017-10-24 12:31:33.042501] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-10-24 12:31:33.082295] E [MSGID: 101075] [common-utils.c:307:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known) [2017-10-24 12:31:33.082331] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host dc-mtl-nas-01.elemenai.lan [2017-10-24 12:31:33.082563] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: eef75e55-170a-4621-9d6e-3b5c3a6e5561 [2017-10-24 12:31:33.082589] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer (<3e190322-78f1-4ef6-80f7-8f48d51c2263>), in state , has disconnected from glusterd. [2017-10-24 12:31:33.117581] E [MSGID: 106187] [glusterd-store.c:4566:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2017-10-24 12:31:33.117658] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2017-10-24 12:31:33.117678] E [MSGID: 101066] [graph.c:325:glusterfs_graph_init] 0-management: initializing translator failed [2017-10-24 12:31:33.117696] E [MSGID: 101176] [graph.c:681:glusterfs_graph_activate] 0-graph: init failed [2017-10-24 12:31:33.118208] W [glusterfsd.c:1360:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7f1a34ba1bcd] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) [0x7f1a34ba1a71] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x7f1a34ba0f5b] ) 0-: received signum (1), shutting down server1.domain.lan: Is the server 1 FQDN (not the ip address). Ludwig On Tue, Oct 24, 2017 at 2:16 AM, Bartosz Ziębawrote: > Are you shure about possibility to resolve all node names on all other > nodes? > You need to use names used previously in Gluster - check their by ‚gluster > peer status’ or ‚gluster pool list’. > > Regards, > Bartosz > > > Wiadomość napisana przez Ludwig Gamache w dniu > 24.10.2017, o godz. 03:13: > > All, > > I am trying to add a third peer to my gluster install. The first 2 nodes > are running since many months and have gluster 3.10.3-1. > > I recently installed the 3rd node and gluster 3.10.6-1. I was able to > start the gluster daemon on it. After, I tried to add the peer from one of > the 2 previous server (gluster peer probe IPADDRESS). > > That first peer started the communication with the 3rd peer. At that > point, peer status were messed up. Server 1 saw both other servers as > connected. Server 2 only saw server 1 as connected and did not have server > 3 as a peer. Server 3 only had server 1 as a peer and saw it as >
Re: [Gluster-users] create volume in two different Data Centers
On 24/10/17 13:01, Alessandro Briosi wrote: > I would set up a VPN (tinc could work well). I, too, would recommend to try tinc for this, it can automatically route traffic of nodes that don't have direct access to other nodes via those nodes that do. I have a publicly available setup of Gluster over tinc on NixOS here: https://github.com/nh2/nixops-gluster-example/ and it works pretty well, certainly tinc is not a bottleneck in it (though note my nodes do have full mesh connectivity and I use this only with 0.5 ms latency). ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] create volume in two different Data Centers
Il 24/10/2017 12:45, atris adam ha scritto: > thanks for answering. But I have to setup and test it myself and > record the result. Can you guide me a little more. The problem is, one > valid ip for each data centers exist, and each data centers have 3 > servers. How should I config the network in which the server bricks > see each other to create a glusterfs volume? > I would set up a VPN (tinc could work well). Though probably if you have 1 public IP then you would have to forward it to one of the internal servers. A workaround could be to have a "floating" IP which is handled by VRRP or similar, that depends on the gateway you have. Have no idea on performance :-) Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] create volume in two different Data Centers
thanks for answering. But I have to setup and test it myself and record the result. Can you guide me a little more. The problem is, one valid ip for each data centers exist, and each data centers have 3 servers. How should I config the network in which the server bricks see each other to create a glusterfs volume? On Tue, Oct 24, 2017 at 1:47 PM,wrote: > Hi, > > You can, but unless the two datacenters are very close, it'll be slow as > hell. I tried it myself and even a 10ms ping between the bricks is > horrible. > > On Tue, Oct 24, 2017 at 01:42:49PM +0330, atris adam wrote: > > Hi > > > > I have two data centers, each of them have 3 servers. This two data > centers > > can see each other over the internet. > > I want to create a distributed glusterfs volume with these 6 servers, > but I > > have only one valid ip in each data center. Is it possible to create a > > glusterfs volume?Can anyone guide me? > > > > thx alot > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] create volume in two different Data Centers
Hi, You can, but unless the two datacenters are very close, it'll be slow as hell. I tried it myself and even a 10ms ping between the bricks is horrible. On Tue, Oct 24, 2017 at 01:42:49PM +0330, atris adam wrote: > Hi > > I have two data centers, each of them have 3 servers. This two data centers > can see each other over the internet. > I want to create a distributed glusterfs volume with these 6 servers, but I > have only one valid ip in each data center. Is it possible to create a > glusterfs volume?Can anyone guide me? > > thx alot > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] create volume in two different Data Centers
Hi I have two data centers, each of them have 3 servers. This two data centers can see each other over the internet. I want to create a distributed glusterfs volume with these 6 servers, but I have only one valid ip in each data center. Is it possible to create a glusterfs volume?Can anyone guide me? thx alot ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] active-active georeplication?
Hi, No, gluster doesn't support active-active geo-replication. It's not planned in near future. We will let you know when it's planned. Thanks, Kotresh HR On Tue, Oct 24, 2017 at 11:19 AM, atris adamwrote: > hi everybody, > > Have glusterfs released a feature named active-active georeplication? If > yes, in which version it is released? If no, is it planned to have this > feature? > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > -- Thanks and Regards, Kotresh H R ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gfid entries in volume heal info that do not heal
Hi Jim, Can you check whether the same hardlinks are present on both the bricks & both of them have the link count 2? If the link count is 2 then "find -samefile //" should give you the file path. Regards, Karthik On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinneywrote: > I'm not so lucky. ALL of mine show 2 links and none have the attr data > that supplies the path to the original. > > I have the inode from stat. Looking now to dig out the path/filename from > xfs_db on the specific inodes individually. > > Is the hash of the filename or /filename and if so relative to > where? /, , ? > > On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote: > > In my case I was able to delete the hard links in the .glusterfs folders > of the bricks and it seems to have done the trick, thanks! > > > > *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com] > *Sent:* Monday, October 23, 2017 1:52 AM > *To:* Jim Kinney ; Matt Waymack > *Cc:* gluster-users > *Subject:* Re: [Gluster-users] gfid entries in volume heal info that do > not heal > > > > Hi Jim & Matt, > > Can you also check for the link count in the stat output of those hardlink > entries in the .glusterfs folder on the bricks. > If the link count is 1 on all the bricks for those entries, then they are > orphaned entries and you can delete those hardlinks. > > To be on the safer side have a backup before deleting any of the entries. > > Regards, > > Karthik > > > > On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney wrote: > > I've been following this particular thread as I have a similar issue > (RAID6 array failed out with 3 dead drives at once while a 12 TB load was > being copied into one mounted space - what a mess) > > > > I have >700K GFID entries that have no path data: > > Example: > > getfattr -d -e hex -m . .glusterfs/00/00/a5ef- > 5af7-401b-84b5-ff2a51c10421 > > # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 > > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > > trusted.bit-rot.version=0x020059b1b316000270e7 > > trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421 > > > > [root@bmidata1 brick]# getfattr -d -n trusted.glusterfs.pathinfo -e hex > -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 > > .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421: > trusted.glusterfs.pathinfo: No such attribute > > > > I had to totally rebuild the dead RAID array and did a copy from the live > one before activating gluster on the rebuilt system. I accidentally copied > over the .glusterfs folder from the working side > > (replica 2 only for now - adding arbiter node as soon as I can get this > one cleaned up). > > > > I've run the methods from "http://docs.gluster.org/en/ > latest/Troubleshooting/gfid-to-path/" with no results using random GFIDs. > A full systemic run using the script from method 3 crashes with "too many > nested links" error (or something similar). > > > > When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh. > gluster 3.8.4 on Centos 7.3 > > > > Should I just remove the contents of the .glusterfs folder on both and > restart gluster and run a ls/stat on every file? > > > > > > When I run a heal, it no longer has a decreasing number of files to heal > so that's an improvement over the last 2-3 weeks :-) > > > > On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote: > > Attached is the heal log for the volume as well as the shd log. > > > > > > > > Run these commands on all the bricks of the replica pair to get the attrs set > on the backend. > > > > > > > > [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . > /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 > > getfattr: Removing leading '/' from absolute path names > > # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > > trusted.afr.dirty=0x > > trusted.afr.gv0-client-2=0x0001 > > trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 > > trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 > > > > [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . > /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 > > getfattr: Removing leading '/' from absolute path names > > # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > > trusted.afr.dirty=0x > > trusted.afr.gv0-client-2=0x0001 > > trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 > >
Re: [Gluster-users] trying to add a 3rd peer
Are you shure about possibility to resolve all node names on all other nodes? You need to use names used previously in Gluster - check their by ‚gluster peer status’ or ‚gluster pool list’. Regards, Bartosz > Wiadomość napisana przez Ludwig Gamachew dniu > 24.10.2017, o godz. 03:13: > > All, > > I am trying to add a third peer to my gluster install. The first 2 nodes are > running since many months and have gluster 3.10.3-1. > > I recently installed the 3rd node and gluster 3.10.6-1. I was able to start > the gluster daemon on it. After, I tried to add the peer from one of the 2 > previous server (gluster peer probe IPADDRESS). > > That first peer started the communication with the 3rd peer. At that point, > peer status were messed up. Server 1 saw both other servers as connected. > Server 2 only saw server 1 as connected and did not have server 3 as a peer. > Server 3 only had server 1 as a peer and saw it as disconnected. > > I also found errors in the gluster logs of server 3 that could not be done: > [2017-10-24 00:15:20.090462] E > [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution > failed on host HOST3.DOMAIN.lan > > I rebooted node 3 and now gluster does not even restart on that node. It > keeps giving Name resolution problems. The 2 other nodes are active. > > However, I can ping the 3 servers (one from each others) using their DNS > names. > > Any idea about what to look at? > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users