Re: [Gluster-users] upgrade to 3.12.1 from 3.10: df returns wrong numbers

2017-09-28 Thread Robert Hajime Lanning

I found the issue.

The CentOS 7 RPMs, upon upgrade, modifies the .vol files. Among other 
things, it adds "option shared-brick-count \d", using the number of 
bricks in the volume.


This gives you an average free space per brick, instead of total free 
space in the volume.


When I create a new volume, the value of "shared-brick-count" is "1".

find /var/lib/glusterd/vols -type f|xargs sed -i -e 's/option 
shared-brick-count [0-9]*/option shared-brick-count 1/g'


On 09/27/17 17:09, Robert Hajime Lanning wrote:

Hi,

When I upgraded my cluster, df started returning some odd numbers for 
my legacy volumes.


Newly created volumes after the upgrade, df works just fine.

I have been researching since Monday and have not found any reference 
to this symptom.


"vm-images" is the old legacy volume, "test" is the new one.

[root@st-srv-03 ~]# (df -h|grep bricks;ssh st-srv-02 'df -h|grep 
bricks')|sort

/dev/sda1  7.3T  991G  6.4T  14% /bricks/sda1
/dev/sda1  7.3T  991G  6.4T  14% /bricks/sda1
/dev/sdb1  7.3T  557G  6.8T   8% /bricks/sdb1
/dev/sdb1  7.3T  557G  6.8T   8% /bricks/sdb1
/dev/sdc1  7.3T  630G  6.7T   9% /bricks/sdc1
/dev/sdc1  7.3T  630G  6.7T   9% /bricks/sdc1
/dev/sdd1  7.3T  683G  6.7T  10% /bricks/sdd1
/dev/sdd1  7.3T  683G  6.7T  10% /bricks/sdd1
/dev/sde1  7.3T  657G  6.7T   9% /bricks/sde1
/dev/sde1  7.3T  658G  6.7T   9% /bricks/sde1
/dev/sdf1  7.3T  711G  6.6T  10% /bricks/sdf1
/dev/sdf1  7.3T  711G  6.6T  10% /bricks/sdf1
/dev/sdg1  7.3T  756G  6.6T  11% /bricks/sdg1
/dev/sdg1  7.3T  756G  6.6T  11% /bricks/sdg1
/dev/sdh1  7.3T  753G  6.6T  11% /bricks/sdh1
/dev/sdh1  7.3T  753G  6.6T  11% /bricks/sdh1

[root@st-srv-03 ~]# df -h|grep localhost
localhost:/test 59T  5.7T   53T  10% /gfs/test
localhost:/vm-images   7.3T  717G  6.6T  10% /gfs/vm-images

This is on CentOS 7.

Upgrade method was to shutdown glusterd/glusterfsd, "yum erase 
centos-release-gluster310", "yum install centos-release-gluster312", 
"yum upgrade -y", start glusterd.


[root@st-srv-03 ~]# rpm -qa|grep gluster
glusterfs-cli-3.12.1-1.el7.x86_64
glusterfs-3.12.1-1.el7.x86_64
nfs-ganesha-gluster-2.5.2-1.el7.x86_64
glusterfs-client-xlators-3.12.1-1.el7.x86_64
glusterfs-server-3.12.1-1.el7.x86_64
glusterfs-libs-3.12.1-1.el7.x86_64
glusterfs-api-3.12.1-1.el7.x86_64
glusterfs-fuse-3.12.1-1.el7.x86_64
centos-release-gluster312-1.0-1.el7.centos.noarch

[root@st-srv-03 ~]# gluster volume info test

Volume Name: test
Type: Distributed-Replicate
Volume ID: b53e0836-575e-46fd-9f86-ab7bf7c07ca9
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: st-srv-02-stor:/bricks/sda1/test
Brick2: st-srv-03-stor:/bricks/sda1/test
Brick3: st-srv-02-stor:/bricks/sdb1/test
Brick4: st-srv-03-stor:/bricks/sdb1/test
Brick5: st-srv-02-stor:/bricks/sdc1/test
Brick6: st-srv-03-stor:/bricks/sdc1/test
Brick7: st-srv-02-stor:/bricks/sdd1/test
Brick8: st-srv-03-stor:/bricks/sdd1/test
Brick9: st-srv-02-stor:/bricks/sde1/test
Brick10: st-srv-03-stor:/bricks/sde1/test
Brick11: st-srv-02-stor:/bricks/sdf1/test
Brick12: st-srv-03-stor:/bricks/sdf1/test
Brick13: st-srv-02-stor:/bricks/sdg1/test
Brick14: st-srv-03-stor:/bricks/sdg1/test
Brick15: st-srv-02-stor:/bricks/sdh1/test
Brick16: st-srv-03-stor:/bricks/sdh1/test
Options Reconfigured:
features.cache-invalidation: on
server.allow-insecure: on
auth.allow: 192.168.60.*
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable
nfs-ganesha: disable
[root@st-srv-03 ~]# gluster volume info vm-images

Volume Name: vm-images
Type: Distributed-Replicate
Volume ID: 066a0598-e72e-419f-809e-86fa17f6f81c
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: st-srv-02-stor:/bricks/sda1/vm-images
Brick2: st-srv-03-stor:/bricks/sda1/vm-images
Brick3: st-srv-02-stor:/bricks/sdb1/vm-images
Brick4: st-srv-03-stor:/bricks/sdb1/vm-images
Brick5: st-srv-02-stor:/bricks/sdc1/vm-images
Brick6: st-srv-03-stor:/bricks/sdc1/vm-images
Brick7: st-srv-02-stor:/bricks/sdd1/vm-images
Brick8: st-srv-03-stor:/bricks/sdd1/vm-images
Brick9: st-srv-02-stor:/bricks/sde1/vm-images
Brick10: st-srv-03-stor:/bricks/sde1/vm-images
Brick11: st-srv-02-stor:/bricks/sdf1/vm-images
Brick12: st-srv-03-stor:/bricks/sdf1/vm-images
Brick13: st-srv-02-stor:/bricks/sdg1/vm-images
Brick14: st-srv-03-stor:/bricks/sdg1/vm-images
Brick15: st-srv-02-stor:/bricks/sdh1/vm-images
Brick16: st-srv-03-stor:/bricks/sdh1/vm-images
Options Reconfigured:
features.cache-invalidation: on
server.allow-insecure: on
auth.allow: 192.168.60.*

Re: [Gluster-users] sparse files on EC volume

2017-09-28 Thread Dmitri Chebotarov
Hi Ben

Thank you.
I just ran some tests with the same data on EC and R3 volumes (same
hardware).
R3 is a lot faster

EC

untar 48.879s
find 2.993s
rm 11.244s

R3

untar 10.938s
find 0.722s
rm 4.144s



On Wed, Sep 27, 2017 at 3:12 AM, Ben Turner  wrote:

> Have you done any testing with replica 2/3?  IIRC my replica 2/3 tests out
> performed EC on smallfile workloads, it may be worth looking into if you
> can't get EC up to where you need it to be.
>
> -b
>
> - Original Message -
> > From: "Dmitri Chebotarov" <4dim...@gmail.com>
> > Cc: "gluster-users" 
> > Sent: Tuesday, September 26, 2017 9:57:55 AM
> > Subject: Re: [Gluster-users] sparse files on EC volume
> >
> > Hi Xavi
> >
> > At this time I'm using 'plain' bricks with XFS. I'll be moving to LVM
> cached
> > bricks.
> > There is no RAID for data bricks, but I'll be using hardware RAID10 for
> SSD
> > cache disks (I can use 'writeback' cache in this case).
> >
> > 'small file performance' is the main reason I'm looking at different
> options,
> > i.e. using formated sparse files.
> > I spent considerable amount of time tuning 10GB/kernel/gluster to reduce
> > latency - the small file performance improved ~50% but it's still no good
> > enough, especially when I need to use Gluster for /home folders.
> >
> > I understand limitations and single point of failure in case with sparse
> > files. I'm considering different options to provide HA
> (pacemaker/corosync,
> > keepalived or using VMs - RHEV - to deliver storage).
> >
> > Thank you for your reply.
> >
> >
> > On Tue, Sep 26, 2017 at 3:55 AM, Xavi Hernandez < jaher...@redhat.com >
> > wrote:
> >
> >
> > Hi Dmitri,
> >
> > On 22/09/17 17:07, Dmitri Chebotarov wrote:
> >
> >
> >
> > Hello
> >
> > I'm running some tests to compare performance between Gluster FUSE mount
> and
> > formated sparse files (located on the same Gluster FUSE mount).
> >
> > The Gluster volume is EC (same for both tests).
> >
> > I'm seeing HUGE difference and trying to figure out why.
> >
> > Could you explain what hardware configuration are you using ?
> >
> > Do you have a plain disk for each brick formatted in XFS, or do you have
> some
> > RAID configuration ?
> >
> >
> >
> >
> > Here is an example:
> >
> > GlusterFUSE mount:
> >
> > # cd /mnt/glusterfs
> > # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1
> > 1+0 records in
> > 1+0 records out
> > 1073741824 bytes (1.1 GB) copied, 9.74757 s, *110 MB/s*
> >
> > Sparse file (located on GlusterFUSE mount):
> >
> > # truncate -l 100GB /mnt/glusterfs/xfs-100G.img
> > # mkfs.xfs /mnt/glusterfs/xfs-100G.img
> > # mount -o loop /mnt/glusterfs/xfs-100G.img /mnt/xfs-100G
> > # cd /mnt/xfs-100G
> > # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1
> > 1+0 records in
> > 1+0 records out
> > 1073741824 bytes (1.1 GB) copied, 1.20576 s, *891 MB/s*
> >
> > The same goes for working with small files (i.e. code file, make, etc)
> with
> > the same data located on FUSE mount vs formated sparse file on the same
> FUSE
> > mount.
> >
> > What would explain such difference?
> >
> > First of all, doing tests with relatively small files tends to be
> misleading
> > because of caching capacity of the operating system (to minimize that,
> you
> > can add 'conv=fsync' option to dd). You should do tests with file sizes
> > bigger than the amount of physical memory on servers. This way you
> minimize
> > cache effects and see the real sustained performance.
> >
> > A second important point to note is that gluster is a distributed file
> system
> > that can be accessed simultaneously by more than one client. This means
> that
> > consistency must be assured in all cases, which makes things go to bricks
> > sooner than local filesystems normally do.
> >
> > In your case, all data saved to the fuse volume will most probably be
> present
> > on bricks once the dd command completes. On the other side, the test
> through
> > the formatted sparse file, most probably, is keeping most of the data in
> the
> > cache of the client machine.
> >
> > Note that using the formatted sparse file makes it possible a better use
> of
> > local cache, improving (relatively) small file access, but on the other
> > side, this filesystem can only be used from a single client (single
> mount).
> > If this client fails for some reason, you will loose access to your data.
> >
> >
> >
> >
> > How does Gluster work with sparse files in general? I may move some of
> the
> > data on gluster volumes to formated sparse files..
> >
> > Gluster works fine with sparse files. However you should consider the
> > previous points before choosing the formatted sparse files option. I
> guess
> > that the sustained throughput will be very similar for bigger files.
> >
> > Regards,
> >
> > Xavi
> >
> >
> >
> >
> > Thank you.
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > 

[Gluster-users] Upgrading (online) GlusterFS-3.7.11 to 3.10 with Distributed-Disperse volume

2017-09-28 Thread Bradley T Lunsford
I'm working on upgrading a set of our gluster machines from 3.7 to 3.10- 
at first I was going to follow the guide here: 
https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.10/


but it mentions:


  * Online upgrade is only possible with replicated and distributed
replicate volumes
  * Online upgrade is not supported for dispersed or distributed
dispersed volumes


and

*ALERT*: If any of your volumes, in the trusted storage pool that is 
being upgraded, uses disperse or is a pure distributed volume, this 
procedure is *NOT* recommended, use the Offline upgrade procedure 
 
instead.


The data stored on this gluster volume (Distributed-Disperse, 2 x (4 + 
2) = 12) is somewhat critical, but I wanted to make sure there was no 
possible way to do an 'online' upgrade before following the 'offline' 
steps.



Brad Lunsford

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] one brick one volume process dies?

2017-09-28 Thread lejeczek



On 28/09/17 17:05, lejeczek wrote:



On 13/09/17 20:47, Ben Werthmann wrote:
These symptoms appear to be the same as I've recorded in 
this post:


http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html 



On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee 
> wrote:


    Additionally the brick log file of the same brick
    would be required. Please look for if brick process
    went down or crashed. Doing a volume start force
    should resolve the issue.



When I do: vol start force I see this between the lines:

[2017-09-28 16:00:55.120726] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: 
Stopping glustershd daemon running in pid: 308300
[2017-09-28 16:00:55.128867] W [socket.c:593:__socket_rwv] 
0-glustershd: readv on 
/var/run/gluster/0853a4555820d3442b1c3909f1cb8466.socket 
failed (No data available)
[2017-09-28 16:00:56.122687] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: 
glustershd service is stopped


funnily(or not) I now see, a week after:

gluster vol status CYTO-DATA
Status of volume: CYTO-DATA
Gluster process TCP Port  RDMA 
Port Online  Pid
-- 


Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA 49161 0 Y 
1743719

Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
STERs/0GLUSTER-CYTO-DATA    49152 0 Y 
20438

Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA 49152 0 Y 
5607
Self-heal Daemon on localhost   N/A   N/A 
Y 41106
Quota Daemon on localhost   N/A   N/A 
Y 41117
Self-heal Daemon on 10.5.6.17   N/A   N/A 
Y 19088
Quota Daemon on 10.5.6.17   N/A   N/A 
Y 19097
Self-heal Daemon on 10.5.6.32   N/A   N/A 
Y 1832978
Quota Daemon on 10.5.6.32   N/A   N/A 
Y 1832987
Self-heal Daemon on 10.5.6.49   N/A   N/A 
Y 320291
Quota Daemon on 10.5.6.49   N/A   N/A 
Y 320303


Task Status of Volume CYTO-DATA
-- 


There are no active volume tasks


$ gluster vol heal CYTO-DATA info
Brick 
10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA

Status: Transport endpoint is not connected
Number of entries: -

Brick 
10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA

...
...



And if I trace pid 1743719, yes, it's up & running but that 
port - 49161 - is not open.

I do not see any segfaults nor obvious crashes.




    On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav
    > wrote:

    Please send me the logs as well i.e glusterd.logs
    and cmd_history.log.


    On Wed, Sep 13, 2017 at 1:45 PM, lejeczek
    >
    wrote:



    On 13/09/17 06:21, Gaurav Yadav wrote:

    Please provide the output of gluster
    volume info, gluster volume status and
    gluster peer status.

    Apart  from above info, please provide
    glusterd logs, cmd_history.log.

    Thanks
    Gaurav

    On Tue, Sep 12, 2017 at 2:22 PM, lejeczek
    
    >> wrote:

        hi everyone

        I have 3-peer cluster with all vols in
    replica mode, 9
        vols.
        What I see, unfortunately, is one
    brick fails in one
        vol, when it happens it's always the
    same vol on the
        same brick.
        Command: gluster vol status $vol -
    would show brick
        not online.
        Restarting glusterd with systemclt
    does not help, only
        system reboot seem to help, until it
    happens, next time.

        How to troubleshoot this weird
    misbehaviour?
        many thanks, L.

        .

    
___

        Gluster-users mailing list
    Gluster-users@gluster.org
    

        >
http://lists.gluster.org/mailman/listinfo/gluster-users




Re: [Gluster-users] one brick one volume process dies?

2017-09-28 Thread lejeczek



On 13/09/17 20:47, Ben Werthmann wrote:
These symptoms appear to be the same as I've recorded in 
this post:


http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html

On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee 
> wrote:


Additionally the brick log file of the same brick
would be required. Please look for if brick process
went down or crashed. Doing a volume start force
should resolve the issue.



When I do: vol start force I see this between the lines:

[2017-09-28 16:00:55.120726] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: 
Stopping glustershd daemon running in pid: 308300
[2017-09-28 16:00:55.128867] W [socket.c:593:__socket_rwv] 
0-glustershd: readv on 
/var/run/gluster/0853a4555820d3442b1c3909f1cb8466.socket 
failed (No data available)
[2017-09-28 16:00:56.122687] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: 
glustershd service is stopped


funnily(or not) I now see, a week after:

gluster vol status CYTO-DATA
Status of volume: CYTO-DATA
Gluster process TCP Port  RDMA 
Port Online  Pid

--
Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA 49161 0 
Y   1743719

Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
STERs/0GLUSTER-CYTO-DATA    49152 0 
Y   20438

Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA 49152 0 
Y   5607
Self-heal Daemon on localhost   N/A   N/A 
Y   41106
Quota Daemon on localhost   N/A   N/A 
Y   41117
Self-heal Daemon on 10.5.6.17   N/A   N/A 
Y   19088
Quota Daemon on 10.5.6.17   N/A   N/A 
Y   19097
Self-heal Daemon on 10.5.6.32   N/A   N/A 
Y   1832978
Quota Daemon on 10.5.6.32   N/A   N/A 
Y   1832987
Self-heal Daemon on 10.5.6.49   N/A   N/A 
Y   320291
Quota Daemon on 10.5.6.49   N/A   N/A 
Y   320303


Task Status of Volume CYTO-DATA
--
There are no active volume tasks


$ gluster vol heal CYTO-DATA info
Brick 
10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA

Status: Transport endpoint is not connected
Number of entries: -

Brick 
10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA






On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav
> wrote:

Please send me the logs as well i.e glusterd.logs
and cmd_history.log.


On Wed, Sep 13, 2017 at 1:45 PM, lejeczek
>
wrote:



On 13/09/17 06:21, Gaurav Yadav wrote:

Please provide the output of gluster
volume info, gluster volume status and
gluster peer status.

Apart  from above info, please provide
glusterd logs, cmd_history.log.

Thanks
Gaurav

On Tue, Sep 12, 2017 at 2:22 PM, lejeczek

>> wrote:

    hi everyone

    I have 3-peer cluster with all vols in
replica mode, 9
    vols.
    What I see, unfortunately, is one
brick fails in one
    vol, when it happens it's always the
same vol on the
    same brick.
    Command: gluster vol status $vol -
would show brick
    not online.
    Restarting glusterd with systemclt
does not help, only
    system reboot seem to help, until it
happens, next time.

    How to troubleshoot this weird
misbehaviour?
    many thanks, L.

    .
   
___
    Gluster-users mailing list
Gluster-users@gluster.org


    >
http://lists.gluster.org/mailman/listinfo/gluster-users

   
>



   

Re: [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi,
Thanks for reply!
I’ve checked [1]. But the problem is that there is nothing shown in command 
“gluster volume heal  info”. So these split-entry files could only 
be detected when app try to visit them.
I can find gfid mismatch for those in-split-brain entries from mount log, 
however, nothing show in shd log, the shd log does not know those split-brain 
entries. Because there is nothing in indices/xattrop directory.

The log is not available right now, when it reproduced, I will provide it to 
your, Thanks!

Best regards,
Cynthia (周琳)
MBB SM HETRAN SW3 MATRIX
Storage
Mobile: +86 (0)18657188311

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
Sent: Thursday, September 28, 2017 2:02 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: Gluster-users@gluster.org; gluster-de...@gluster.org
Subject: Re: [Gluster-users] after hard reboot, split-brain happened, but 
nothing showed in gluster voluem heal info command !

Hi,
To resolve the gfid split-brain you can follow the steps at [1].
Since we don't have the pending markers set on the files, it is not showing in 
the heal info.
To debug this issue, need some more data from you. Could you provide these 
things?
1. volume info
2. mount log
3. brick logs
4. shd log

May I also know which version of gluster you are running. From the info you 
have provided it looks like an old version.
If it is, then it would be great if you can upgarde to one of the latest 
supported release.

[1] 
http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain

Thanks & Regards,
Karthik
On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) 
> wrote:

HI gluster experts,

I meet a tough problem about “split-brain” issue. Sometimes, after hard reboot, 
we will find some files in split-brain, however its parent directory or 
anything could be shown in command “gluster volume heal  info”, 
also, no entry in .glusterfs/indices/xattrop directory, can you help to shed 
some lights on this issue? Thanks!



Following is some info from our env,

Checking from sn-0 cliet, nothing is shown in-split-brain!

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# gluster v heal services info
Brick sn-0:/mnt/bricks/services/brick/
Number of entries: 0

Brick sn-1:/mnt/bricks/services/brick/
Number of entries: 0

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# gluster v heal services info split-brain
Gathering list of split brain entries on volume services has been successful

Brick sn-0.local:/mnt/bricks/services/brick
Number of entries: 0

Brick sn-1.local:/mnt/bricks/services/brick
Number of entries: 0

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# ls -l /mnt/services/netserv/ethip/
ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
total 3
-rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
-rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
-rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
-rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
-? ? ??  ?? mn-1
-rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
-rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
-? ? ??  ?? sn-2
[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]

Checking from glusterfs server side, the gfid of mn-1 on sn-0 and sn-1 is 
different

[SN-0]
[root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
# getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
getfattr: Removing leading '/' from absolute path names
# file: mnt/bricks/services/brick/netserv/ethip
trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
trusted.glusterfs.dht=0x0001

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# getfattr -m . -d -e hex mn-1
# file: mn-1
trusted.afr.dirty=0x
trusted.afr.services-client-0=0x
trusted.afr.services-client-1=0x
trusted.gfid=0x53a33f437464475486f31c4e44d83afd
[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# stat mn-1
  File: mn-1
  Size: 237  Blocks: 16 IO Block: 4096   regular file
Device: fd51h/64849dInode: 2536Links: 2
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2017-09-26 20:30:25.67900 +0300
Modify: 2017-09-26 20:30:24.60400 +0300
Change: 2017-09-26 20:30:24.61000 +0300
Birth: -
[root@sn-0:/mnt/bricks/services/brick/.glusterfs/indices/xattrop]
# ls
xattrop-63f8bbcb-7fa6-4fc8-b721-675a05de0ab3
[root@sn-0:/mnt/bricks/services/brick/.glusterfs/indices/xattrop]

[root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
# ls
53a33f43-7464-4754-86f3-1c4e44d83afd
[root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
# stat 53a33f43-7464-4754-86f3-1c4e44d83afd
  File: 

[Gluster-users] Clients can't connect after a server reboot (need to use volume force start)

2017-09-28 Thread Frizz
After I rebooted my GlusterFS servers I can’t connect from clients any more.

The volume is running, but I have to do a volume start FORCE on all server
hosts to make it work again.

I am running glusterfs 3.12.1 on Ubuntu 16.04.

Is this a bug?


Here are more details:

"gluster volume status" returns:

Status of volume: gv0
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick glusterfs-vm-01:/mnt/data/brick   N/A   N/AN
N/A
Brick glusterfs-vm-02:/mnt/data/brick   N/A   N/AN
N/A
Brick glusterfs-vm-03:/mnt/data/brick   N/A   N/AN
N/A
Self-heal Daemon on localhost   N/A   N/AY
1864
Self-heal Daemon on glusterfs-vm-03 N/A   N/AY
2343
Self-heal Daemon on glusterfs-vm-02 N/A   N/AY
1900

Task Status of Volume gv0
--
There are no active volume tasks


"gluster volume start gv0" returns:
volume start: gv0: failed: Volume gv0 already started

After I use the force flag ("gluster volume start gv0 force") a "gluster
volume status" returns this:

gluster volume status
Status of volume: gv0
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick glusterfs-vm-01:/mnt/data/brick   49153 0  Y
2629
Brick glusterfs-vm-02:/mnt/data/brick   49153 0  Y
2619
Brick glusterfs-vm-03:/mnt/data/brick   49153 0  Y
2570
Self-heal Daemon on localhost   N/A   N/AY
2650
Self-heal Daemon on glusterfs-vm-03 N/A   N/AY
2591
Self-heal Daemon on glusterfs-vm-02 N/A   N/AY
2640

Task Status of Volume gv0
--
There are no active volume tasks


After that clients can connect again!
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Zhou, Cynthia (NSB - CN/Hangzhou)

The version I am using is glusterfs 3.6.9
Best regards,
Cynthia (周琳)
MBB SM HETRAN SW3 MATRIX
Storage
Mobile: +86 (0)18657188311

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
Sent: Thursday, September 28, 2017 2:37 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
Cc: Gluster-users@gluster.org; gluster-de...@gluster.org
Subject: Re: [Gluster-users] after hard reboot, split-brain happened, but 
nothing showed in gluster voluem heal info command !



On Thu, Sep 28, 2017 at 11:41 AM, Zhou, Cynthia (NSB - CN/Hangzhou) 
> wrote:
Hi,
Thanks for reply!
I’ve checked [1]. But the problem is that there is nothing shown in command 
“gluster volume heal  info”. So these split-entry files could only 
be detected when app try to visit them.
I can find gfid mismatch for those in-split-brain entries from mount log, 
however, nothing show in shd log, the shd log does not know those split-brain 
entries. Because there is nothing in indices/xattrop directory.
I guess it was there before, and then it got cleared by one of the heal process 
either client side or server side. I wanted to check that by examining the logs.
Which version of gluster you are running by the way?

The log is not available right now, when it reproduced, I will provide it to 
your, Thanks!
Ok.

Best regards,
Cynthia (周琳)
MBB SM HETRAN SW3 MATRIX
Storage
Mobile: +86 (0)18657188311

From: Karthik Subrahmanya 
[mailto:ksubr...@redhat.com]
Sent: Thursday, September 28, 2017 2:02 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) 
>
Cc: Gluster-users@gluster.org; 
gluster-de...@gluster.org
Subject: Re: [Gluster-users] after hard reboot, split-brain happened, but 
nothing showed in gluster voluem heal info command !

Hi,
To resolve the gfid split-brain you can follow the steps at [1].
Since we don't have the pending markers set on the files, it is not showing in 
the heal info.
To debug this issue, need some more data from you. Could you provide these 
things?
1. volume info
2. mount log
3. brick logs
4. shd log

May I also know which version of gluster you are running. From the info you 
have provided it looks like an old version.
If it is, then it would be great if you can upgarde to one of the latest 
supported release.

[1] 
http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain

Thanks & Regards,
Karthik
On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) 
> wrote:

HI gluster experts,

I meet a tough problem about “split-brain” issue. Sometimes, after hard reboot, 
we will find some files in split-brain, however its parent directory or 
anything could be shown in command “gluster volume heal  info”, 
also, no entry in .glusterfs/indices/xattrop directory, can you help to shed 
some lights on this issue? Thanks!



Following is some info from our env,

Checking from sn-0 cliet, nothing is shown in-split-brain!

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# gluster v heal services info
Brick sn-0:/mnt/bricks/services/brick/
Number of entries: 0

Brick sn-1:/mnt/bricks/services/brick/
Number of entries: 0

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# gluster v heal services info split-brain
Gathering list of split brain entries on volume services has been successful

Brick sn-0.local:/mnt/bricks/services/brick
Number of entries: 0

Brick sn-1.local:/mnt/bricks/services/brick
Number of entries: 0

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# ls -l /mnt/services/netserv/ethip/
ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
total 3
-rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
-rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
-rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
-rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
-? ? ??  ?? mn-1
-rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
-rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
-? ? ??  ?? sn-2
[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]

Checking from glusterfs server side, the gfid of mn-1 on sn-0 and sn-1 is 
different

[SN-0]
[root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
# getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
getfattr: Removing leading '/' from absolute path names
# file: mnt/bricks/services/brick/netserv/ethip
trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
trusted.glusterfs.dht=0x0001

[root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
# getfattr -m . -d -e hex mn-1
# file: mn-1
trusted.afr.dirty=0x

Re: [Gluster-users] Bandwidth and latency requirements

2017-09-28 Thread Arman Khalatyan
Interesting table Karan!,
Could you please tell us how you did  the benchmark? fio or iozone
orsimilar?

thanks
Arman.

On Wed, Sep 27, 2017 at 1:20 PM, Karan Sandha  wrote:

> Hi Collin,
>
> During our arbiter latency testing for completion of ops we found the
> below results:-  an arbiter node in another data centre and both the data
> bricks in the same data centre,
>
> 1) File-size 1 KB (1 files )
> 2) mkdir
>
>
> Latency
>
> 5ms
>
> 10ms
>
> 20ms
>
> 50ms
>
> 100ms
>
> 200ms
>
> Ops
>
> Create
>
> 755 secs
>
> 1410 secs
>
> 2717 secs
>
> 5874 secs
>
> 12908 sec
>
> 26113 sec
>
> Mkdir
>
> 922 secs
>
> 1725 secs
>
> 3325 secs
>
> 8127 secs
>
> 16160 sec
>
> 30079 sec
>
>
> Thanks & Regards
>
> On Mon, Sep 25, 2017 at 5:40 AM, Colin Coe  wrote:
>
>> Hi all
>>
>> I've googled but can't find an answer to my question.
>>
>> I have two data centers.  Currently, I have a replica (count of 2 plus
>> arbiter) in one data center but is used by both.
>>
>> I want to change this to be a distributed replica across the two data
>> centers.
>>
>> There is a 20Mbps pipe and approx 22 ms latency. Is this sufficient?
>>
>> I really don't want to do the geo-replication in its current form.
>>
>> Thanks
>>
>> CC
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
>
> KARAN SANDHA
>
> ASSOCIATE QUALITY ENGINEER
>
> Red Hat Bangalore 
>
> ksan...@redhat.comM: 9888009555 IM: Karan on @irc
> 
> TRIED. TESTED. TRUSTED. 
> @redhatnews    Red Hat
>    Red Hat
> 
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Karthik Subrahmanya
On Thu, Sep 28, 2017 at 12:11 PM, Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

>
>
> The version I am using is glusterfs 3.6.9
>
This is a very old version which is EOL. If you can upgrade to any of the
supported version (3.10 or 3.12) would be great.
They have many new features, bug fixes & performance improvements. If you
can try to reproduce the issue on that would be
very helpful.

Regards,
Karthik

> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Thursday, September 28, 2017 2:37 PM
>
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* Gluster-users@gluster.org; gluster-de...@gluster.org
> *Subject:* Re: [Gluster-users] after hard reboot, split-brain happened,
> but nothing showed in gluster voluem heal info command !
>
>
>
>
>
>
>
> On Thu, Sep 28, 2017 at 11:41 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi,
>
> Thanks for reply!
>
> I’ve checked [1]. But the problem is that there is nothing shown in
> command “gluster volume heal  info”. So these split-entry
> files could only be detected when app try to visit them.
>
> I can find gfid mismatch for those in-split-brain entries from mount log,
> however, nothing show in shd log, the shd log does not know those
> split-brain entries. Because there is nothing in indices/xattrop directory.
>
> I guess it was there before, and then it got cleared by one of the heal
> process either client side or server side. I wanted to check that by
> examining the logs.
>
> Which version of gluster you are running by the way?
>
>
>
> The log is not available right now, when it reproduced, I will provide it
> to your, Thanks!
>
> Ok.
>
>
>
> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Thursday, September 28, 2017 2:02 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* Gluster-users@gluster.org; gluster-de...@gluster.org
> *Subject:* Re: [Gluster-users] after hard reboot, split-brain happened,
> but nothing showed in gluster voluem heal info command !
>
>
>
> Hi,
>
> To resolve the gfid split-brain you can follow the steps at [1].
>
> Since we don't have the pending markers set on the files, it is not
> showing in the heal info.
> To debug this issue, need some more data from you. Could you provide these
> things?
>
> 1. volume info
>
> 2. mount log
>
> 3. brick logs
>
> 4. shd log
>
>
>
> May I also know which version of gluster you are running. From the info
> you have provided it looks like an old version.
>
> If it is, then it would be great if you can upgarde to one of the latest
> supported release.
>
>
> [1] http://docs.gluster.org/en/latest/Troubleshooting/split-
> brain/#fixing-directory-entry-split-brain
>
>
>
> Thanks & Regards,
>
> Karthik
>
> On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
>
>
> HI gluster experts,
>
>
>
> I meet a tough problem about “split-brain” issue. Sometimes, after hard
> reboot, we will find some files in split-brain, however its parent
> directory or anything could be shown in command “gluster volume heal
>  info”, also, no entry in .glusterfs/indices/xattrop
> directory, can you help to shed some lights on this issue? Thanks!
>
>
>
>
>
>
>
> Following is some info from our env,
>
>
>
> *Checking from sn-0 cliet, nothing is shown in-split-brain!*
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info
>
> Brick sn-0:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> Brick sn-1:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info split-brain
>
> Gathering list of split brain entries on volume services has been
> successful
>
>
>
> Brick sn-0.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> Brick sn-1.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # ls -l /mnt/services/netserv/ethip/
>
> ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
>
> ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
>
> total 3
>
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
>
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
>
> -rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
>
> -rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
>
> -? ? ??  ?? mn-1
>
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
>
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
>
> -? ? ??  ?? sn-2
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
>
>

Re: [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Karthik Subrahmanya
On Thu, Sep 28, 2017 at 11:41 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> Thanks for reply!
>
> I’ve checked [1]. But the problem is that there is nothing shown in
> command “gluster volume heal  info”. So these split-entry
> files could only be detected when app try to visit them.
>
> I can find gfid mismatch for those in-split-brain entries from mount log,
> however, nothing show in shd log, the shd log does not know those
> split-brain entries. Because there is nothing in indices/xattrop directory.
>
I guess it was there before, and then it got cleared by one of the heal
process either client side or server side. I wanted to check that by
examining the logs.
Which version of gluster you are running by the way?

>
>
> The log is not available right now, when it reproduced, I will provide it
> to your, Thanks!
>
Ok.

>
>
> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Thursday, September 28, 2017 2:02 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* Gluster-users@gluster.org; gluster-de...@gluster.org
> *Subject:* Re: [Gluster-users] after hard reboot, split-brain happened,
> but nothing showed in gluster voluem heal info command !
>
>
>
> Hi,
>
> To resolve the gfid split-brain you can follow the steps at [1].
>
> Since we don't have the pending markers set on the files, it is not
> showing in the heal info.
> To debug this issue, need some more data from you. Could you provide these
> things?
>
> 1. volume info
>
> 2. mount log
>
> 3. brick logs
>
> 4. shd log
>
>
>
> May I also know which version of gluster you are running. From the info
> you have provided it looks like an old version.
>
> If it is, then it would be great if you can upgarde to one of the latest
> supported release.
>
>
> [1] http://docs.gluster.org/en/latest/Troubleshooting/split-
> brain/#fixing-directory-entry-split-brain
>
>
>
> Thanks & Regards,
>
> Karthik
>
> On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
>
>
> HI gluster experts,
>
>
>
> I meet a tough problem about “split-brain” issue. Sometimes, after hard
> reboot, we will find some files in split-brain, however its parent
> directory or anything could be shown in command “gluster volume heal
>  info”, also, no entry in .glusterfs/indices/xattrop
> directory, can you help to shed some lights on this issue? Thanks!
>
>
>
>
>
>
>
> Following is some info from our env,
>
>
>
> *Checking from sn-0 cliet, nothing is shown in-split-brain!*
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info
>
> Brick sn-0:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> Brick sn-1:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info split-brain
>
> Gathering list of split brain entries on volume services has been
> successful
>
>
>
> Brick sn-0.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> Brick sn-1.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # ls -l /mnt/services/netserv/ethip/
>
> ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
>
> ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
>
> total 3
>
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
>
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
>
> -rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
>
> -rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
>
> -? ? ??  ?? mn-1
>
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
>
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
>
> -? ? ??  ?? sn-2
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
>
>
> *Checking from glusterfs server side, the gfid of mn-1 on sn-0 and sn-1 is
> different*
>
>
>
> *[SN-0]*
>
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
>
> # getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/bricks/services/brick/netserv/ethip
>
> trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
>
> trusted.glusterfs.dht=0x0001
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # getfattr -m . -d -e hex mn-1
>
> # file: mn-1
>
> trusted.afr.dirty=0x
>
> trusted.afr.services-client-0=0x
>
> trusted.afr.services-client-1=0x
>
> trusted.gfid=0x53a33f437464475486f31c4e44d83afd
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # stat mn-1
>
>   File: mn-1
>
>   Size: 237  Blocks: 16 IO Block: 4096   

Re: [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Karthik Subrahmanya
Hi,

To resolve the gfid split-brain you can follow the steps at [1].
Since we don't have the pending markers set on the files, it is not showing
in the heal info.
To debug this issue, need some more data from you. Could you provide these
things?
1. volume info
2. mount log
3. brick logs
4. shd log

May I also know which version of gluster you are running. From the info you
have provided it looks like an old version.
If it is, then it would be great if you can upgarde to one of the latest
supported release.

[1]
http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain

Thanks & Regards,
Karthik

On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

>
> HI gluster experts,
>
> I meet a tough problem about “split-brain” issue. Sometimes, after hard
> reboot, we will find some files in split-brain, however its parent
> directory or anything could be shown in command “gluster volume heal
>  info”, also, no entry in .glusterfs/indices/xattrop
> directory, can you help to shed some lights on this issue? Thanks!
>
>
>
> Following is some info from our env,
>
> *Checking from sn-0 cliet, nothing is shown in-split-brain!*
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # gluster v heal services info
> Brick sn-0:/mnt/bricks/services/brick/
> Number of entries: 0
>
> Brick sn-1:/mnt/bricks/services/brick/
> Number of entries: 0
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # gluster v heal services info split-brain
> Gathering list of split brain entries on volume services has been
> successful
>
> Brick sn-0.local:/mnt/bricks/services/brick
> Number of entries: 0
>
> Brick sn-1.local:/mnt/bricks/services/brick
> Number of entries: 0
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # ls -l /mnt/services/netserv/ethip/
> ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
> ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
> total 3
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
> -rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
> -rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
> -? ? ??  ?? mn-1
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
> -? ? ??  ?? sn-2
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> *Checking from glusterfs server side, the gfid of mn-1 on sn-0 and sn-1 is
> different*
>
> *[SN-0]*
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
> # getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/bricks/services/brick/netserv/ethip
> trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
> trusted.glusterfs.dht=0x0001
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # getfattr -m . -d -e hex mn-1
> # file: mn-1
> trusted.afr.dirty=0x
> trusted.afr.services-client-0=0x
> trusted.afr.services-client-1=0x
> trusted.gfid=0x53a33f437464475486f31c4e44d83afd
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # stat mn-1
>   File: mn-1
>   Size: 237  Blocks: 16 IO Block: 4096   regular file
> Device: fd51h/64849dInode: 2536Links: 2
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2017-09-26 20:30:25.67900 +0300
> Modify: 2017-09-26 20:30:24.60400 +0300
> Change: 2017-09-26 20:30:24.61000 +0300
> Birth: -
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/indices/xattrop]
> # ls
> xattrop-63f8bbcb-7fa6-4fc8-b721-675a05de0ab3
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/indices/xattrop]
>
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
> # ls
> 53a33f43-7464-4754-86f3-1c4e44d83afd
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
> # stat 53a33f43-7464-4754-86f3-1c4e44d83afd
>   File: 53a33f43-7464-4754-86f3-1c4e44d83afd
>   Size: 237  Blocks: 16 IO Block: 4096   regular file
> Device: fd51h/64849dInode: 2536Links: 2
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2017-09-26 20:30:25.67900 +0300
> Modify: 2017-09-26 20:30:24.60400 +0300
> Change: 2017-09-26 20:30:24.61000 +0300
> Birth: -
>
> #
> *[SN-1]*
>
> [root@sn-1:/mnt/bricks/services/brick/.glusterfs/f7/f1]
> #  getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/bricks/services/brick/netserv/ethip
> trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
> trusted.glusterfs.dht=0x0001
>
> [root@sn-1:/mnt/bricks/services/brick/.glusterfs/f7/f1]
> *#*
>