Re: [Gluster-users] brick is down but gluster volume status says it's fine

2017-10-24 Thread Alastair Neil
It looks like this is to do with the stale port issue.

I think it's pretty clear from the below that the digitalcorpora brick
process is shown by volume status as having the same TCP port as the public
volume brick on gluster-2, 49156. But is actually listening on 49154.  So
although the brick process is technically up nothing is talking to it.  I
am surprised I don't see more errors in the brick log for brick8/public.
It also explains the wack-a-mole problem,  Every time I kill and restart
the daemon it must be grabbing the port of another brick and then that
volume brick  goes silent.

I killed all the brick processes and restarted glusterd and everything came
up ok.


[root@gluster-2 ~]# glv status digitalcorpora | grep -v ^Self
Status of volume: digitalcorpora
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gluster-2:/export/brick7/digitalcorpo
ra  49156 0  Y
125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora 49152 0  Y
12345
Brick gluster0:/export/brick7/digitalcorpor
a   49152 0  Y
16098

Task Status of Volume digitalcorpora
--
There are no active volume tasks

[root@gluster-2 ~]# glv status public  | grep -v ^Self
Status of volume: public
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gluster1:/export/brick8/public49156 0  Y
3519
Brick gluster2:/export/brick8/public49156 0  Y
8578
Brick gluster0:/export/brick8/public49156 0  Y
3176

Task Status of Volume public
--
There are no active volume tasks

[root@gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0
tcp0  0 0.0.0.0:49156   0.0.0.0:*
LISTEN  8578/glusterfsd
[root@gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0
tcp0  0 0.0.0.0:49154   0.0.0.0:*
LISTEN  125708/glusterfsd
[root@gluster-2 ~]# ps -c  --pid  125708 8578
   PID CLS PRI TTY  STAT   TIME COMMAND
  8578 TS   19 ?Ssl  224:20 /usr/sbin/glusterfsd -s gluster2
--volfile-id public.gluster2.export-brick8-public -p
/var/lib/glusterd/vols/public/run/gluster2-export-bric
125708 TS   19 ?Ssl0:08 /usr/sbin/glusterfsd -s gluster-2
--volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p
/var/lib/glusterd/vols/digitalcorpor
[root@gluster-2 ~]#


On 24 October 2017 at 13:56, Atin Mukherjee  wrote:

>
>
> On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil 
> wrote:
>
>> gluster version 3.10.6, replica 3 volume, daemon is present but does not
>> appear to be functioning
>>
>> peculiar behaviour.  If I kill the glusterfs brick daemon and restart
>> glusterd then the brick becomes available - but one of my other volumes
>> bricks on the same server goes down in the same way it's like wack-a-mole.
>>
>> any ideas?
>>
>
> The subject and the data looks to be contradictory to me. Brick log (what
> you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are
> you sure brick is down? OTOH, I see a mismatch of port for
> brick7/digitalcorpora where the brick process has 49154 but gluster volume
> status shows 49152. There is an issue with stale port which we're trying to
> address through https://review.gluster.org/18541 . But could you specify
> what exactly the problem is? Is it the stale port  or the conflict between
> volume status output and actual brick health? If it's the latter, I'd need
> further information like output of "gluster get-state" command from the
> same node.
>
>
>>
>> [root@gluster-2 bricks]# glv status digitalcorpora
>>
>>> Status of volume: digitalcorpora
>>> Gluster process TCP Port  RDMA Port
>>> Online  Pid
>>> 
>>> --
>>> Brick gluster-2:/export/brick7/digitalcorpo
>>> ra  49156 0
>>> Y   125708
>>> Brick gluster1.vsnet.gmu.edu:/export/brick7
>>> /digitalcorpora 49152 0
>>> Y   12345
>>> Brick gluster0:/export/brick7/digitalcorpor
>>> a   49152 0
>>> Y   16098
>>> Self-heal Daemon on localhost   N/A   N/AY
>>> 126625
>>> Self-heal Daemon on gluster1N/A   N/AY
>>> 15405
>>> Self-heal Daemon on gluster0N/A   N/AY
>>> 18584
>>>
>>> Task Status of Volume digitalcorpora
>>> 
>>> 

Re: [Gluster-users] brick is down but gluster volume status says it's fine

2017-10-24 Thread Atin Mukherjee
On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil 
wrote:

> gluster version 3.10.6, replica 3 volume, daemon is present but does not
> appear to be functioning
>
> peculiar behaviour.  If I kill the glusterfs brick daemon and restart
> glusterd then the brick becomes available - but one of my other volumes
> bricks on the same server goes down in the same way it's like wack-a-mole.
>
> any ideas?
>

The subject and the data looks to be contradictory to me. Brick log (what
you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are
you sure brick is down? OTOH, I see a mismatch of port for
brick7/digitalcorpora where the brick process has 49154 but gluster volume
status shows 49152. There is an issue with stale port which we're trying to
address through https://review.gluster.org/18541 . But could you specify
what exactly the problem is? Is it the stale port  or the conflict between
volume status output and actual brick health? If it's the latter, I'd need
further information like output of "gluster get-state" command from the
same node.


>
> [root@gluster-2 bricks]# glv status digitalcorpora
>
>> Status of volume: digitalcorpora
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>> 
>> --
>> Brick gluster-2:/export/brick7/digitalcorpo
>> ra  49156 0  Y
>> 125708
>> Brick gluster1.vsnet.gmu.edu:/export/brick7
>> /digitalcorpora 49152 0  Y
>> 12345
>> Brick gluster0:/export/brick7/digitalcorpor
>> a   49152 0  Y
>> 16098
>> Self-heal Daemon on localhost   N/A   N/AY
>> 126625
>> Self-heal Daemon on gluster1N/A   N/AY
>> 15405
>> Self-heal Daemon on gluster0N/A   N/AY
>> 18584
>>
>> Task Status of Volume digitalcorpora
>> 
>> --
>> There are no active volume tasks
>>
>> [root@gluster-2 bricks]# glv heal digitalcorpora info
>> Brick gluster-2:/export/brick7/digitalcorpora
>> Status: Transport endpoint is not connected
>> Number of entries: -
>>
>> Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora
>> /.trashcan
>> /DigitalCorpora/hello2.txt
>> /DigitalCorpora
>> Status: Connected
>> Number of entries: 3
>>
>> Brick gluster0:/export/brick7/digitalcorpora
>> /.trashcan
>> /DigitalCorpora/hello2.txt
>> /DigitalCorpora
>> Status: Connected
>> Number of entries: 3
>>
>> [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25]
>> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135]
>> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-:
>> received signum (15), shutting down
>> [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main]
>> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6
>> (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id
>> digitalcorpora.gluster-2.export-brick7-digitalcorpora -p
>> /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-
>> export-brick7-digitalcorpora.pid -S /var/run/gluster/
>> f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name
>> /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/
>> export-brick7-digitalcorpora.log --xlator-option *-posix.glusterd-uuid=
>> 032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port 49154 --xlator-option
>> digitalcorpora-server.listen-port=49154)
>> [2017-10-24 17:18:59.285279] I [MSGID: 101190] 
>> [event-epoll.c:629:event_dispatch_epoll_worker]
>> 0-epoll: Started thread with index 1
>> [2017-10-24 17:19:04.611723] I 
>> [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit]
>> 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
>> [2017-10-24 17:19:04.611815] W [MSGID: 101002] 
>> [options.c:954:xl_opt_validate]
>> 0-digitalcorpora-server: option 'listen-port' is deprecated, preferred is
>> 'transport.socket.listen-port', continuing with correction
>> [2017-10-24 17:19:04.615974] W [MSGID: 101174]
>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>> 'rpc-auth.auth-glusterfs' is not recognized
>> [2017-10-24 17:19:04.616033] W [MSGID: 101174]
>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>> 'rpc-auth.auth-unix' is not recognized
>> [2017-10-24 17:19:04.616070] W [MSGID: 101174]
>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>> 'rpc-auth.auth-null' is not recognized
>> [2017-10-24 17:19:04.616134] W [MSGID: 101174]
>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>> 'auth-path' is not recognized
>> [2017-10-24 17:19:04.616177] W [MSGID: 101174]
>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>> 'ping-timeout' is not recognized
>> 

[Gluster-users] brick is down but gluster volume status says it's fine

2017-10-24 Thread Alastair Neil
gluster version 3.10.6, replica 3 volume, daemon is present but does not
appear to be functioning

peculiar behaviour.  If I kill the glusterfs brick daemon and restart
glusterd then the brick becomes available - but one of my other volumes
bricks on the same server goes down in the same way it's like wack-a-mole.

any ideas?


[root@gluster-2 bricks]# glv status digitalcorpora

> Status of volume: digitalcorpora
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> --
> Brick gluster-2:/export/brick7/digitalcorpo
> ra  49156 0  Y
> 125708
> Brick gluster1.vsnet.gmu.edu:/export/brick7
> /digitalcorpora 49152 0  Y
> 12345
> Brick gluster0:/export/brick7/digitalcorpor
> a   49152 0  Y
> 16098
> Self-heal Daemon on localhost   N/A   N/AY
> 126625
> Self-heal Daemon on gluster1N/A   N/AY
> 15405
> Self-heal Daemon on gluster0N/A   N/AY
> 18584
>
> Task Status of Volume digitalcorpora
>
> --
> There are no active volume tasks
>
> [root@gluster-2 bricks]# glv heal digitalcorpora info
> Brick gluster-2:/export/brick7/digitalcorpora
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora
> /.trashcan
> /DigitalCorpora/hello2.txt
> /DigitalCorpora
> Status: Connected
> Number of entries: 3
>
> Brick gluster0:/export/brick7/digitalcorpora
> /.trashcan
> /DigitalCorpora/hello2.txt
> /DigitalCorpora
> Status: Connected
> Number of entries: 3
>
> [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25]
> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135]
> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-:
> received signum (15), shutting down
> [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main]
> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6
> (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id
> digitalcorpora.gluster-2.export-brick7-digitalcorpora -p
> /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-export-brick7-digitalcorpora.pid
> -S /var/run/gluster/f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name
> /export/brick7/digitalcorpora -l
> /var/log/glusterfs/bricks/export-brick7-digitalcorpora.log --xlator-option
> *-posix.glusterd-uuid=032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port
> 49154 --xlator-option digitalcorpora-server.listen-port=49154)
> [2017-10-24 17:18:59.285279] I [MSGID: 101190]
> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2017-10-24 17:19:04.611723] I
> [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
> rpc.outstanding-rpc-limit with value 64
> [2017-10-24 17:19:04.611815] W [MSGID: 101002]
> [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option
> 'listen-port' is deprecated, preferred is 'transport.socket.listen-port',
> continuing with correction
> [2017-10-24 17:19:04.615974] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
> 'rpc-auth.auth-glusterfs' is not recognized
> [2017-10-24 17:19:04.616033] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
> 'rpc-auth.auth-unix' is not recognized
> [2017-10-24 17:19:04.616070] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
> 'rpc-auth.auth-null' is not recognized
> [2017-10-24 17:19:04.616134] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
> 'auth-path' is not recognized
> [2017-10-24 17:19:04.616177] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
> 'ping-timeout' is not recognized
> [2017-10-24 17:19:04.616203] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
> option 'rpc-auth-allow-insecure' is not recognized
> [2017-10-24 17:19:04.616215] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
> option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized
> [2017-10-24 17:19:04.616226] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
> option 'auth-path' is not recognized
> [2017-10-24 17:19:04.616237] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
> option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is not
> recognized
> [2017-10-24 17:19:04.616248] W [MSGID: 101174]
> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
> option 

Re: [Gluster-users] active-active georeplication?

2017-10-24 Thread Vijay Bellur
On Tue, Oct 24, 2017 at 11:04 AM, atris adam  wrote:

> thx for reply, that was so much interesting to me.
> How can I get these news about glusterfs new features?
>


The release notes generally contain information about new features. You can
also lookup the github projects page [2] for understanding feature to
release mapping.

Regards,
Vijay

[2] https://github.com/gluster/glusterfs/projects/1


>
> On Tue, Oct 24, 2017 at 5:54 PM, Vijay Bellur  wrote:
>
>>
>> Halo replication [1] could be of interest here. This functionality is
>> available since 3.11 and the current plan is to have it fully supported in
>> a 4.x release.
>>
>> Note that Halo replication is built on existing synchronous replication
>> in Gluster and differs from the current geo-replication implementation.
>> Kotresh's response is spot on for the current geo-replication
>> implementation.
>>
>> Regards,
>> Vijay
>>
>> [1] https://github.com/gluster/glusterfs/issues/199
>>
>> On Tue, Oct 24, 2017 at 5:13 AM, Kotresh Hiremath Ravishankar <
>> khire...@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> No, gluster doesn't support active-active geo-replication. It's not
>>> planned in near future. We will let you know when it's planned.
>>>
>>> Thanks,
>>> Kotresh HR
>>>
>>> On Tue, Oct 24, 2017 at 11:19 AM, atris adam 
>>> wrote:
>>>
 hi everybody,

 Have glusterfs released a feature named active-active georeplication?
 If yes, in which version it is released? If no, is it planned to have this
 feature?

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://lists.gluster.org/mailman/listinfo/gluster-users

>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-24 Thread Jim Kinney
I have 14,734 GFIDS that are different. All the different ones are only
on the brick that was live during the outage and concurrent file copy-
in. The brick that was down at that time has no GFIDs that are not also
on the up brick.
As the bricks are 10TB, the find is going to be a long running process.
I'm running several finds at once with gnu parallel but it will still
take some time. Can't bring the up machine offline as it's in use. At
least I have 24 cores to work with. 
I've only tested with one GFID but the file it referenced _IS_ on the
down machine even though it has no GFID in the .glusterfs structure.
On Tue, 2017-10-24 at 12:35 +0530, Karthik Subrahmanya wrote:
> Hi Jim,
> 
> Can you check whether the same hardlinks are present on both the
> bricks & both of them have the link count 2?
> If the link count is 2 then  "find  -samefile
>   bits of gfid>//"
> should give you the file path.
> 
> Regards,
> Karthik
> 
> On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney 
> wrote:
> > 
> > 
> > 
> > I'm not so lucky. ALL of mine show 2 links and none have the attr
> > data that supplies the path to the original.
> > 
> > I have the inode from stat. Looking now to dig out the
> > path/filename from  xfs_db on the specific inodes individually.
> > 
> > Is the hash of the filename or /filename and if so relative
> > to where? /, , ?
> > 
> > On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote:
> > > In my case I was able to delete the hard links in the .glusterfs
> > > folders of the bricks and it seems to have done the trick,
> > > thanks!
> > >  
> > > 
> > > From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> > > 
> > > 
> > > Sent: Monday, October 23, 2017 1:52 AM
> > > 
> > > To: Jim Kinney ; Matt Waymack  > > dv.com>
> > > 
> > > Cc: gluster-users 
> > > 
> > > Subject: Re: [Gluster-users] gfid entries in volume heal info
> > > that do not heal
> > >  
> > > 
> > > 
> > > 
> > > Hi Jim & Matt,
> > > 
> > > Can you also check for the link count in the stat output of those
> > > hardlink entries in the .glusterfs folder on the bricks.
> > > 
> > > If the link count is 1 on all the bricks for those entries, then
> > > they are orphaned entries and you can delete those hardlinks.
> > > 
> > > 
> > > To be on the safer side have a backup before deleting any of the
> > > entries.
> > > 
> > > 
> > > Regards,
> > > 
> > > 
> > > Karthik
> > > 
> > > 
> > > 
> > >  
> > > 
> > > On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney  > > > wrote:
> > > > 
> > > > I've been following this particular thread as I have a similar
> > > > issue (RAID6 array failed out with 3 dead drives at once while
> > > > a 12 TB load was being copied into one mounted space - what a
> > > > mess)
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > I have >700K GFID entries that have no path data:
> > > > 
> > > > 
> > > > Example:
> > > > 
> > > > 
> > > > getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7-401b-
> > > > 84b5-ff2a51c10421
> > > > 
> > > > 
> > > > # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
> > > > 
> > > > 
> > > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c61
> > > > 62656c65645f743a733000
> > > > 
> > > > 
> > > > trusted.bit-rot.version=0x020059b1b316000270e7
> > > > 
> > > > 
> > > > trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > [root@bmidata1 brick]# getfattr -d -n
> > > > trusted.glusterfs.pathinfo -e hex -m .
> > > > .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
> > > > 
> > > > 
> > > > .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421:
> > > > trusted.glusterfs.pathinfo: No such attribute
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > I had to totally rebuild the dead RAID array and did a copy
> > > > from the live one before activating gluster on the rebuilt
> > > > system. I accidentally copied over the .glusterfs folder from
> > > > the working side
> > > > 
> > > > 
> > > > (replica 2 only for now - adding arbiter node as soon as I can
> > > > get this one cleaned up).
> > > > 
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > I've run the methods from "http://docs.gluster.org/en/latest/Tr
> > > > oubleshooting/gfid-to-path/" with no results using random
> > > > GFIDs. A full systemic
> > > >  run using the script from method 3 crashes with "too many
> > > > nested links" error (or something similar).
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > When I run gluster volume heal volname info, I get 700K+ GFIDs.
> > > > Oh. gluster 3.8.4 on Centos 7.3
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > Should I just remove the contents of the .glusterfs folder on
> > > > both and restart gluster and run a ls/stat on every file?
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > When I run a heal, it no longer has a decreasing 

Re: [Gluster-users] active-active georeplication?

2017-10-24 Thread atris adam
thx for reply, that was so much interesting to me.
How can I get these news about glusterfs new features?

On Tue, Oct 24, 2017 at 5:54 PM, Vijay Bellur  wrote:

>
> Halo replication [1] could be of interest here. This functionality is
> available since 3.11 and the current plan is to have it fully supported in
> a 4.x release.
>
> Note that Halo replication is built on existing synchronous replication in
> Gluster and differs from the current geo-replication implementation.
> Kotresh's response is spot on for the current geo-replication
> implementation.
>
> Regards,
> Vijay
>
> [1] https://github.com/gluster/glusterfs/issues/199
>
> On Tue, Oct 24, 2017 at 5:13 AM, Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> Hi,
>>
>> No, gluster doesn't support active-active geo-replication. It's not
>> planned in near future. We will let you know when it's planned.
>>
>> Thanks,
>> Kotresh HR
>>
>> On Tue, Oct 24, 2017 at 11:19 AM, atris adam 
>> wrote:
>>
>>> hi everybody,
>>>
>>> Have glusterfs released a feature named active-active georeplication? If
>>> yes, in which version it is released? If no, is it planned to have this
>>> feature?
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] active-active georeplication?

2017-10-24 Thread Vijay Bellur
Halo replication [1] could be of interest here. This functionality is
available since 3.11 and the current plan is to have it fully supported in
a 4.x release.

Note that Halo replication is built on existing synchronous replication in
Gluster and differs from the current geo-replication implementation.
Kotresh's response is spot on for the current geo-replication
implementation.

Regards,
Vijay

[1] https://github.com/gluster/glusterfs/issues/199

On Tue, Oct 24, 2017 at 5:13 AM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi,
>
> No, gluster doesn't support active-active geo-replication. It's not
> planned in near future. We will let you know when it's planned.
>
> Thanks,
> Kotresh HR
>
> On Tue, Oct 24, 2017 at 11:19 AM, atris adam  wrote:
>
>> hi everybody,
>>
>> Have glusterfs released a feature named active-active georeplication? If
>> yes, in which version it is released? If no, is it planned to have this
>> feature?
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] trying to add a 3rd peer

2017-10-24 Thread Ludwig Gamache
I always used IP addresses instead of names when I added a peer. In the
gluster peer status, I do see IP:

[root@DC-MTL-NAS-01 ~]# gluster peer status

Number of Peers: 2


Hostname: XXX.XXX.XXX.12

Uuid: ec1e10c1-0e38-4d2a-ab51-50fb0c67b6ee

State: Peer in Cluster (Connected)


Hostname: XXX.XXX.XXX.13

Uuid: eef75e55-170a-4621-9d6e-3b5c3a6e5561

State: Accepted peer request (Disconnected)

I can ping those IPs from any server.

>From the Server 3 Gluster logs, I can see this:

[2017-10-24 12:31:33.012446] I [MSGID: 100030] [glusterfsd.c:2503:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.6
(args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

[2017-10-24 12:31:33.020739] I [MSGID: 106478] [glusterd.c:1449:init]
0-management: Maximum allowed open file descriptors set to 65536

[2017-10-24 12:31:33.020796] I [MSGID: 106479] [glusterd.c:1496:init]
0-management: Using /var/lib/glusterd as working directory

[2017-10-24 12:31:33.029673] E [rpc-transport.c:283:rpc_transport_load]
0-rpc-transport: /usr/lib64/glusterfs/3.10.6/rpc-transport/rdma.so: cannot
open shared object file: No such file or directory

[2017-10-24 12:31:33.029702] W [rpc-transport.c:287:rpc_transport_load]
0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not
valid or not found on this machine

[2017-10-24 12:31:33.029715] W [rpcsvc.c:1661:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed

[2017-10-24 12:31:33.029731] E [MSGID: 106243] [glusterd.c:1720:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport

[2017-10-24 12:31:33.032226] I [MSGID: 106228]
[glusterd.c:500:glusterd_check_gsync_present] 0-glusterd: geo-replication
module not installed in the system [No such file or directory]

[2017-10-24 12:31:33.032816] I [MSGID: 106513]
[glusterd-store.c:2201:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 31000

[2017-10-24 12:31:33.042393] I [MSGID: 106498]
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0

[2017-10-24 12:31:33.042474] W [MSGID: 106062]
[glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout

[2017-10-24 12:31:33.042501] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600

[2017-10-24 12:31:33.082295] E [MSGID: 101075]
[common-utils.c:307:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or
service not known)

[2017-10-24 12:31:33.082331] E
[name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS
resolution failed on host dc-mtl-nas-01.elemenai.lan

[2017-10-24 12:31:33.082563] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
eef75e55-170a-4621-9d6e-3b5c3a6e5561

[2017-10-24 12:31:33.082589] I [MSGID: 106004]
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
 (<3e190322-78f1-4ef6-80f7-8f48d51c2263>), in state
, has disconnected from glusterd.

[2017-10-24 12:31:33.117581] E [MSGID: 106187]
[glusterd-store.c:4566:glusterd_resolve_all_bricks] 0-glusterd: resolve
brick failed in restore

[2017-10-24 12:31:33.117658] E [MSGID: 101019] [xlator.c:503:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again

[2017-10-24 12:31:33.117678] E [MSGID: 101066]
[graph.c:325:glusterfs_graph_init] 0-management: initializing translator
failed

[2017-10-24 12:31:33.117696] E [MSGID: 101176]
[graph.c:681:glusterfs_graph_activate] 0-graph: init failed

[2017-10-24 12:31:33.118208] W [glusterfsd.c:1360:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7f1a34ba1bcd]
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) [0x7f1a34ba1a71]
-->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x7f1a34ba0f5b] ) 0-:
received signum (1), shutting down

server1.domain.lan: Is the server 1 FQDN (not the ip address).

Ludwig

On Tue, Oct 24, 2017 at 2:16 AM, Bartosz Zięba  wrote:

> Are you shure about possibility to resolve all node names on all other
> nodes?
> You need to use names used previously in Gluster - check their by ‚gluster
> peer status’ or ‚gluster pool list’.
>
> Regards,
> Bartosz
>
>
> Wiadomość napisana przez Ludwig Gamache  w dniu
> 24.10.2017, o godz. 03:13:
>
> All,
>
> I am trying to add a third peer to my gluster install. The first 2 nodes
> are running since many months and have gluster 3.10.3-1.
>
> I recently installed the 3rd node and gluster 3.10.6-1. I was able to
> start the gluster daemon on it. After, I tried to add the peer from one of
> the 2 previous server (gluster peer probe IPADDRESS).
>
> That first peer started the communication with the 3rd peer. At that
> point, peer status were messed up. Server 1 saw both other servers as
> connected. Server 2 only saw server 1 as connected and did not have server
> 3 as a peer. Server 3 only had server 1 as a peer and saw it as
> 

Re: [Gluster-users] create volume in two different Data Centers

2017-10-24 Thread Niklas Hambüchen
On 24/10/17 13:01, Alessandro Briosi wrote:
> I would set up a VPN (tinc could work well).

I, too, would recommend to try tinc for this, it can automatically route
traffic of nodes that don't have direct access to other nodes via those
nodes that do.

I have a publicly available setup of Gluster over tinc on NixOS here:
https://github.com/nh2/nixops-gluster-example/
and it works pretty well, certainly tinc is not a bottleneck in it
(though note my nodes do have full mesh connectivity and I use this only
with 0.5 ms latency).
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] create volume in two different Data Centers

2017-10-24 Thread Alessandro Briosi
Il 24/10/2017 12:45, atris adam ha scritto:
> thanks for answering. But I have to setup and test it myself and
> record the result. Can you guide me a little more. The problem is, one
> valid ip for each data centers exist, and each data centers have 3
> servers. How should I config the network in which the server bricks
> see each other to create a glusterfs volume?
>
I would set up a VPN (tinc could work well).
Though probably if you have 1 public IP then you would have to forward
it to one of the internal servers.
A workaround could be to have a "floating" IP which is handled by VRRP
or similar, that depends on the gateway you have.

Have no idea on performance :-)


Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] create volume in two different Data Centers

2017-10-24 Thread atris adam
thanks for answering. But I have to setup and test it myself and record the
result. Can you guide me a little more. The problem is, one valid ip for
each data centers exist, and each data centers have 3 servers. How should I
config the network in which the server bricks see each other to create a
glusterfs volume?

On Tue, Oct 24, 2017 at 1:47 PM,  wrote:

> Hi,
>
> You can, but unless the two datacenters are very close, it'll be slow as
> hell. I tried it myself and even a 10ms ping between the bricks is
> horrible.
>
> On Tue, Oct 24, 2017 at 01:42:49PM +0330, atris adam wrote:
> > Hi
> >
> > I have two data centers, each of them have 3 servers. This two data
> centers
> > can see each other over the internet.
> > I want to create a distributed glusterfs volume with these 6 servers,
> but I
> > have only one valid ip in each data center. Is it possible to create a
> > glusterfs volume?Can anyone guide me?
> >
> > thx alot
>
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] create volume in two different Data Centers

2017-10-24 Thread lemonnierk
Hi,

You can, but unless the two datacenters are very close, it'll be slow as
hell. I tried it myself and even a 10ms ping between the bricks is
horrible.

On Tue, Oct 24, 2017 at 01:42:49PM +0330, atris adam wrote:
> Hi
> 
> I have two data centers, each of them have 3 servers. This two data centers
> can see each other over the internet.
> I want to create a distributed glusterfs volume with these 6 servers, but I
> have only one valid ip in each data center. Is it possible to create a
> glusterfs volume?Can anyone guide me?
> 
> thx alot

> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users



signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] create volume in two different Data Centers

2017-10-24 Thread atris adam
Hi

I have two data centers, each of them have 3 servers. This two data centers
can see each other over the internet.
I want to create a distributed glusterfs volume with these 6 servers, but I
have only one valid ip in each data center. Is it possible to create a
glusterfs volume?Can anyone guide me?

thx alot
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] active-active georeplication?

2017-10-24 Thread Kotresh Hiremath Ravishankar
Hi,

No, gluster doesn't support active-active geo-replication. It's not planned
in near future. We will let you know when it's planned.

Thanks,
Kotresh HR

On Tue, Oct 24, 2017 at 11:19 AM, atris adam  wrote:

> hi everybody,
>
> Have glusterfs released a feature named active-active georeplication? If
> yes, in which version it is released? If no, is it planned to have this
> feature?
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-24 Thread Karthik Subrahmanya
Hi Jim,

Can you check whether the same hardlinks are present on both the bricks &
both of them have the link count 2?
If the link count is 2 then "find  -samefile
//"
should give you the file path.

Regards,
Karthik

On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney  wrote:

> I'm not so lucky. ALL of mine show 2 links and none have the attr data
> that supplies the path to the original.
>
> I have the inode from stat. Looking now to dig out the path/filename from
> xfs_db on the specific inodes individually.
>
> Is the hash of the filename or /filename and if so relative to
> where? /, , ?
>
> On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote:
>
> In my case I was able to delete the hard links in the .glusterfs folders
> of the bricks and it seems to have done the trick, thanks!
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Monday, October 23, 2017 1:52 AM
> *To:* Jim Kinney ; Matt Waymack 
> *Cc:* gluster-users 
> *Subject:* Re: [Gluster-users] gfid entries in volume heal info that do
> not heal
>
>
>
> Hi Jim & Matt,
>
> Can you also check for the link count in the stat output of those hardlink
> entries in the .glusterfs folder on the bricks.
> If the link count is 1 on all the bricks for those entries, then they are
> orphaned entries and you can delete those hardlinks.
>
> To be on the safer side have a backup before deleting any of the entries.
>
> Regards,
>
> Karthik
>
>
>
> On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney  wrote:
>
> I've been following this particular thread as I have a similar issue
> (RAID6 array failed out with 3 dead drives at once while a 12 TB load was
> being copied into one mounted space - what a mess)
>
>
>
> I have >700K GFID entries that have no path data:
>
> Example:
>
> getfattr -d -e hex -m . .glusterfs/00/00/a5ef-
> 5af7-401b-84b5-ff2a51c10421
>
> # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
>
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
>
> trusted.bit-rot.version=0x020059b1b316000270e7
>
> trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421
>
>
>
> [root@bmidata1 brick]# getfattr -d -n trusted.glusterfs.pathinfo -e hex
> -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
>
> .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421:
> trusted.glusterfs.pathinfo: No such attribute
>
>
>
> I had to totally rebuild the dead RAID array and did a copy from the live
> one before activating gluster on the rebuilt system. I accidentally copied
> over the .glusterfs folder from the working side
>
> (replica 2 only for now - adding arbiter node as soon as I can get this
> one cleaned up).
>
>
>
> I've run the methods from "http://docs.gluster.org/en/
> latest/Troubleshooting/gfid-to-path/" with no results using random GFIDs.
> A full systemic run using the script from method 3 crashes with "too many
> nested links" error (or something similar).
>
>
>
> When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh.
> gluster 3.8.4 on Centos 7.3
>
>
>
> Should I just remove the contents of the .glusterfs folder on both and
> restart gluster and run a ls/stat on every file?
>
>
>
>
>
> When I run a heal, it no longer has a decreasing number of files to heal
> so that's an improvement over the last 2-3 weeks :-)
>
>
>
> On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote:
>
> Attached is the heal log for the volume as well as the shd log.
>
>
>
>
>
>
>
> Run these commands on all the bricks of the replica pair to get the attrs set 
> on the backend.
>
>
>
>
>
>
>
> [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
> /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>
> trusted.afr.dirty=0x
>
> trusted.afr.gv0-client-2=0x0001
>
> trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
>
> trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435
>
>
>
> [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
> /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>
> trusted.afr.dirty=0x
>
> trusted.afr.gv0-client-2=0x0001
>
> trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
>
> 

Re: [Gluster-users] trying to add a 3rd peer

2017-10-24 Thread Bartosz Zięba
Are you shure about possibility to resolve all node names on all other nodes?
You need to use names used previously in Gluster - check their by ‚gluster peer 
status’ or ‚gluster pool list’.

Regards,
Bartosz


> Wiadomość napisana przez Ludwig Gamache  w dniu 
> 24.10.2017, o godz. 03:13:
> 
> All,
> 
> I am trying to add a third peer to my gluster install. The first 2 nodes are 
> running since many months and have gluster 3.10.3-1.
> 
> I recently installed the 3rd node and gluster 3.10.6-1. I was able to start 
> the gluster daemon on it. After, I tried to add the peer from one of the 2 
> previous server (gluster peer probe IPADDRESS).
> 
> That first peer started the communication with the 3rd peer. At that point, 
> peer status were messed up. Server 1 saw both other servers as connected. 
> Server 2 only saw server 1 as connected and did not have server 3 as a peer. 
> Server 3 only had server 1 as a peer and saw it as disconnected.
> 
> I also found errors in the gluster logs of server 3 that could not be done:
> [2017-10-24 00:15:20.090462] E 
> [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution 
> failed on host HOST3.DOMAIN.lan
> 
> I rebooted node 3 and now gluster does not even restart on that node. It 
> keeps giving Name resolution problems. The 2 other nodes are active.
> 
> However, I can ping the 3 servers (one from each others) using their DNS 
> names.
> 
> Any idea about what to look at?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users