Re: [Gluster-users] not healing one file

2017-10-27 Thread Richard Neuboeck
Hi Karthik,

the procedure you described in [1] worked perfectly. After removing the
file and the hardlink on brick-3 it got healed. Client access is restored.

Since there doesn't seem to be an access problem with Fedora's 3.10
client, I'll upgrade all servers to 3.12. Just in case.

Thank you so much your help!
All the best
Richard

On 26.10.17 11:34, Karthik Subrahmanya wrote:
> Hi Richard,
> 
> Thanks for the informations. As you said there is gfid mismatch for the
> file.
> On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.
> This is not considered as split-brain because we have two good copies here.
> Gluster 3.10 does not have a method to resolve this situation other than the
> manual intervention [1]. Basically what you need to do is remove the
> file and
> the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
> you do a lookup for the file from mount it will recreate the entry on
> the other brick.
> 
> Form 3.12 we have methods to resolve this situation with the cli option
> [2] and
> with favorite-child-policy [3]. For the time being you can use [1] to
> resolve this
> and if you can consider upgrading to 3.12 that would give you options to
> handle
> these scenarios.
> 
> [1]
> http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
> [2] https://review.gluster.org/#/c/17485/
> [3] https://review.gluster.org/#/c/16878/
> 
> HTH,
> Karthik
> 
> On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck
> > wrote:
> 
> Hi Karthik,
> 
> thanks for taking a look at this. I'm not working with gluster long
> enough to make heads or tails out of the logs. The logs are attached to
> this mail and here is the other information:
> 
> # gluster volume info home
> 
> Volume Name: home
> Type: Replicate
> Volume ID: fe6218ae-f46b-42b3-a467-5fc6a36ad48a
> Status: Started
> Snapshot Count: 1
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: sphere-six:/srv/gluster_home/brick
> Brick2: sphere-five:/srv/gluster_home/brick
> Brick3: sphere-four:/srv/gluster_home/brick
> Options Reconfigured:
> features.barrier: disable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.cache-samba-metadata: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 9
> performance.cache-size: 1GB
> performance.client-io-threads: on
> cluster.lookup-optimize: on
> cluster.readdir-optimize: on
> features.quota: on
> features.inode-quota: on
> features.quota-deem-statfs: on
> cluster.server-quorum-ratio: 51%
> 
> 
> [root@sphere-four ~]# getfattr -d -e hex -m .
> 
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> 
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> 
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.bit-rot.version=0x020059df20a40006f989
> trusted.gfid=0xda1c94b1643544b18d5b6f4654f60bf5
> 
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x9a01
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x0001
> 
> [root@sphere-five ~]# getfattr -d -e hex -m .
> 
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> 
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> 
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.home-client-4=0x00010001
> trusted.bit-rot.version=0x020059df1f310006ce63
> trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
> 
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x9a01
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x0001
> 
> [root@sphere-six ~]# getfattr -d -e hex -m .
> 
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> 
> 

Re: [Gluster-users] not healing one file

2017-10-26 Thread Karthik Subrahmanya
Hi Richard,

Thanks for the informations. As you said there is gfid mismatch for the
file.
On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.
This is not considered as split-brain because we have two good copies here.
Gluster 3.10 does not have a method to resolve this situation other than the
manual intervention [1]. Basically what you need to do is remove the file
and
the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
you do a lookup for the file from mount it will recreate the entry on the
other brick.

Form 3.12 we have methods to resolve this situation with the cli option [2]
and
with favorite-child-policy [3]. For the time being you can use [1] to
resolve this
and if you can consider upgrading to 3.12 that would give you options to
handle
these scenarios.

[1]
http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
[2] https://review.gluster.org/#/c/17485/
[3] https://review.gluster.org/#/c/16878/

HTH,
Karthik

On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck 
wrote:

> Hi Karthik,
>
> thanks for taking a look at this. I'm not working with gluster long
> enough to make heads or tails out of the logs. The logs are attached to
> this mail and here is the other information:
>
> # gluster volume info home
>
> Volume Name: home
> Type: Replicate
> Volume ID: fe6218ae-f46b-42b3-a467-5fc6a36ad48a
> Status: Started
> Snapshot Count: 1
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: sphere-six:/srv/gluster_home/brick
> Brick2: sphere-five:/srv/gluster_home/brick
> Brick3: sphere-four:/srv/gluster_home/brick
> Options Reconfigured:
> features.barrier: disable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.cache-samba-metadata: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 9
> performance.cache-size: 1GB
> performance.client-io-threads: on
> cluster.lookup-optimize: on
> cluster.readdir-optimize: on
> features.quota: on
> features.inode-quota: on
> features.quota-deem-statfs: on
> cluster.server-quorum-ratio: 51%
>
>
> [root@sphere-four ~]# getfattr -d -e hex -m .
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.bit-rot.version=0x020059df20a40006f989
> trusted.gfid=0xda1c94b1643544b18d5b6f4654f60bf5
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=
> 0x9a01
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x0001
>
> [root@sphere-five ~]# getfattr -d -e hex -m .
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.home-client-4=0x00010001
> trusted.bit-rot.version=0x020059df1f310006ce63
> trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=
> 0x9a01
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x0001
>
> [root@sphere-six ~]# getfattr -d -e hex -m .
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.home-client-4=0x00010001
> trusted.bit-rot.version=0x020059df11cd000548ec
> trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=
> 0x9a01
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x0001
>
> Cheers
> Richard
>
> On 26.10.17 07:41, Karthik Subrahmanya wrote:
> > HeyRichard,
> >
> > Could you share the following informations please?
> > 1. 

Re: [Gluster-users] not healing one file

2017-10-26 Thread Richard Neuboeck
Hi Amar,

thanks for the information! I tried this tool on all machines.

# gluster-health-report

Loaded reports: glusterd-op-version, georep, gfid-mismatch-dht-report,
glusterd-peer-disconnect, disk_usage, errors_in_logs, coredump,
glusterd, glusterd_volume_version_cksum_errors, kernel_issues,
errors_in_logs, ifconfig, nic-health, process_status

[ OK] Disk used percentage  path=/  percentage=4
[ OK] Disk used percentage  path=/var  percentage=4
[ OK] Disk used percentage  path=/tmp  percentage=4
[ OK] All peers are in connected state  connected_count=2
total_peer_count=2
[ OK] no gfid mismatch
[  ERROR] Report failure  report=report_check_glusterd_op_version
[ NOT OK] The maximum size of core files created is NOT set to unlimited.
[  ERROR] Report failure  report=report_check_worker_restarts
[  ERROR] Report failure  report=report_non_participating_bricks
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=72798
[WARNING] Errors in Glusterd log file  num_errors=35
[WARNING] Warnings in Glusterd log file  num_warning=37
[ NOT OK] Recieve errors in "ifconfig bond0" output
[ NOT OK] Errors seen in "cat /proc/net/dev -- bond0" output
High CPU usage by Self-heal
[WARNING] Errors in Glusterd log file num_errors=77
[WARNING] Warnings in Glusterd log file num_warnings=61

Basically it's the same message on all of them with varying error and
warning counts.
Glusterd is not up for long since I updated and then rebootet the
machines yesterday. That's also the reason for some of the errors and
warnings and also for the network errors since it always takes some time
until the bonded device (4x1Gbit, balanced alb) is fully functional.

From what I've seen in the getfattr output Karthik asked me to get GFIDs
are different on the file in question. Even though the report says there
is no mismatch.

So is this a split-brain situation gluster is not aware of?

Cheers
Richard

On 26.10.17 06:51, Amar Tumballi wrote:
> On a side note, try recently released health report tool, and see if it
> does diagnose any issues in setup. Currently you may have to run it in
> all the three machines.
> 
> 
> 
> On 26-Oct-2017 6:50 AM, "Amar Tumballi"  > wrote:
> 
> Thanks for this report. This week many of the developers are at
> Gluster Summit in Prague, will be checking this and respond next
> week. Hope that's fine.
> 
> Thanks,
> Amar
> 
> 
> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  > wrote:
> 
> Hi Gluster Gurus,
> 
> I'm using a gluster volume as home for our users. The volume is
> replica 3, running on CentOS 7, gluster version 3.10
> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
> gluster 3.10 (3.10.6-3.fc26.x86_64).
> 
> During the data backup I got an I/O error on one file. Manually
> checking for this file on a client confirms this:
> 
> ls -l
> 
> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/
> ls: cannot access
> 
> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4':
> Input/output error
> total 2015
> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
> -?? ? ???? recovery.baklz4
> 
> Out of curiosity I checked all the bricks for this file. It's
> present there. Making a checksum shows that the file is different on
> one of the three replica servers.
> 
> Querying healing information shows that the file should be healed:
> # gluster volume heal home info
> Brick sphere-six:/srv/gluster_home/brick
> 
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> 
> Status: Connected
> Number of entries: 1
> 
> Brick sphere-five:/srv/gluster_home/brick
> 
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
> 
> Status: Connected
> Number of entries: 1
> 
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries: 0
> 
> Manually triggering heal doesn't report an error but also does not
> heal the file.
> # gluster volume heal home
> Launching heal operation to perform index self heal on volume home
> has been successful
> 
> Same with a full heal
> # gluster volume heal home full
> Launching heal operation to perform full self heal on volume home
> has been successful
> 
> According to the split brain query that's not the problem:
> # gluster volume heal 

Re: [Gluster-users] not healing one file

2017-10-25 Thread Karthik Subrahmanya
Hey Richard,

Could you share the following informations please?
1. gluster volume info 
2. getfattr output of that file from all the bricks
getfattr -d -e hex -m . 
3. glustershd & glfsheal logs

Regards,
Karthik

On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi  wrote:

> On a side note, try recently released health report tool, and see if it
> does diagnose any issues in setup. Currently you may have to run it in all
> the three machines.
>
>
>
> On 26-Oct-2017 6:50 AM, "Amar Tumballi"  wrote:
>
>> Thanks for this report. This week many of the developers are at Gluster
>> Summit in Prague, will be checking this and respond next week. Hope that's
>> fine.
>>
>> Thanks,
>> Amar
>>
>>
>> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  wrote:
>>
>>> Hi Gluster Gurus,
>>>
>>> I'm using a gluster volume as home for our users. The volume is
>>> replica 3, running on CentOS 7, gluster version 3.10
>>> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>>> gluster 3.10 (3.10.6-3.fc26.x86_64).
>>>
>>> During the data backup I got an I/O error on one file. Manually
>>> checking for this file on a client confirms this:
>>>
>>> ls -l
>>> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/ses
>>> sionstore-backups/
>>> ls: cannot access
>>> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4':
>>> Input/output error
>>> total 2015
>>> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>>> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>>> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>>> -?? ? ???? recovery.baklz4
>>>
>>> Out of curiosity I checked all the bricks for this file. It's
>>> present there. Making a checksum shows that the file is different on
>>> one of the three replica servers.
>>>
>>> Querying healing information shows that the file should be healed:
>>> # gluster volume heal home info
>>> Brick sphere-six:/srv/gluster_home/brick
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4
>>>
>>> Status: Connected
>>> Number of entries: 1
>>>
>>> Brick sphere-five:/srv/gluster_home/brick
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4
>>>
>>> Status: Connected
>>> Number of entries: 1
>>>
>>> Brick sphere-four:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Manually triggering heal doesn't report an error but also does not
>>> heal the file.
>>> # gluster volume heal home
>>> Launching heal operation to perform index self heal on volume home
>>> has been successful
>>>
>>> Same with a full heal
>>> # gluster volume heal home full
>>> Launching heal operation to perform full self heal on volume home
>>> has been successful
>>>
>>> According to the split brain query that's not the problem:
>>> # gluster volume heal home info split-brain
>>> Brick sphere-six:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>> Brick sphere-five:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>> Brick sphere-four:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>>
>>> I have no idea why this situation arose in the first place and also
>>> no idea as how to solve this problem. I would highly appreciate any
>>> helpful feedback I can get.
>>>
>>> The only mention in the logs matching this file is a rename operation:
>>> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>>> 09:19:11.561661] I [MSGID: 115061]
>>> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
>>> RENAME
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.jsonlz4
>>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4
>>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>>> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-ho
>>> me-client-3-0-0,
>>> error-xlator: home-posix [No data available]
>>>
>>> I enabled directory quotas the same day this problem showed up but
>>> I'm not sure how quotas could have an effect like this (maybe unless
>>> the limit is reached but that's also not the case).
>>>
>>> Thanks again if anyone as an idea.
>>> Cheers
>>> Richard
>>> --
>>> /dev/null
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing 

Re: [Gluster-users] not healing one file

2017-10-25 Thread Amar Tumballi
On a side note, try recently released health report tool, and see if it
does diagnose any issues in setup. Currently you may have to run it in all
the three machines.



On 26-Oct-2017 6:50 AM, "Amar Tumballi"  wrote:

> Thanks for this report. This week many of the developers are at Gluster
> Summit in Prague, will be checking this and respond next week. Hope that's
> fine.
>
> Thanks,
> Amar
>
>
> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  wrote:
>
>> Hi Gluster Gurus,
>>
>> I'm using a gluster volume as home for our users. The volume is
>> replica 3, running on CentOS 7, gluster version 3.10
>> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>> gluster 3.10 (3.10.6-3.fc26.x86_64).
>>
>> During the data backup I got an I/O error on one file. Manually
>> checking for this file on a client confirms this:
>>
>> ls -l
>> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/
>> ls: cannot access
>> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4':
>> Input/output error
>> total 2015
>> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>> -?? ? ???? recovery.baklz4
>>
>> Out of curiosity I checked all the bricks for this file. It's
>> present there. Making a checksum shows that the file is different on
>> one of the three replica servers.
>>
>> Querying healing information shows that the file should be healed:
>> # gluster volume heal home info
>> Brick sphere-six:/srv/gluster_home/brick
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick sphere-five:/srv/gluster_home/brick
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick sphere-four:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries: 0
>>
>> Manually triggering heal doesn't report an error but also does not
>> heal the file.
>> # gluster volume heal home
>> Launching heal operation to perform index self heal on volume home
>> has been successful
>>
>> Same with a full heal
>> # gluster volume heal home full
>> Launching heal operation to perform full self heal on volume home
>> has been successful
>>
>> According to the split brain query that's not the problem:
>> # gluster volume heal home info split-brain
>> Brick sphere-six:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick sphere-five:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick sphere-four:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>>
>> I have no idea why this situation arose in the first place and also
>> no idea as how to solve this problem. I would highly appreciate any
>> helpful feedback I can get.
>>
>> The only mention in the logs matching this file is a rename operation:
>> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>> 09:19:11.561661] I [MSGID: 115061]
>> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
>> RENAME
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.jsonlz4
>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-
>> home-client-3-0-0,
>> error-xlator: home-posix [No data available]
>>
>> I enabled directory quotas the same day this problem showed up but
>> I'm not sure how quotas could have an effect like this (maybe unless
>> the limit is reached but that's also not the case).
>>
>> Thanks again if anyone as an idea.
>> Cheers
>> Richard
>> --
>> /dev/null
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] not healing one file

2017-10-25 Thread Amar Tumballi
Thanks for this report. This week many of the developers are at Gluster
Summit in Prague, will be checking this and respond next week. Hope that's
fine.

Thanks,
Amar


On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  wrote:

> Hi Gluster Gurus,
>
> I'm using a gluster volume as home for our users. The volume is
> replica 3, running on CentOS 7, gluster version 3.10
> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
> gluster 3.10 (3.10.6-3.fc26.x86_64).
>
> During the data backup I got an I/O error on one file. Manually
> checking for this file on a client confirms this:
>
> ls -l
> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/
> ls: cannot access
> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4':
> Input/output error
> total 2015
> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
> -?? ? ???? recovery.baklz4
>
> Out of curiosity I checked all the bricks for this file. It's
> present there. Making a checksum shows that the file is different on
> one of the three replica servers.
>
> Querying healing information shows that the file should be healed:
> # gluster volume heal home info
> Brick sphere-six:/srv/gluster_home/brick
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-five:/srv/gluster_home/brick
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries: 0
>
> Manually triggering heal doesn't report an error but also does not
> heal the file.
> # gluster volume heal home
> Launching heal operation to perform index self heal on volume home
> has been successful
>
> Same with a full heal
> # gluster volume heal home full
> Launching heal operation to perform full self heal on volume home
> has been successful
>
> According to the split brain query that's not the problem:
> # gluster volume heal home info split-brain
> Brick sphere-six:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-five:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
>
> I have no idea why this situation arose in the first place and also
> no idea as how to solve this problem. I would highly appreciate any
> helpful feedback I can get.
>
> The only mention in the logs matching this file is a rename operation:
> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
> 09:19:11.561661] I [MSGID: 115061]
> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
> RENAME
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.jsonlz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:
> 206366-home-client-3-0-0,
> error-xlator: home-posix [No data available]
>
> I enabled directory quotas the same day this problem showed up but
> I'm not sure how quotas could have an effect like this (maybe unless
> the limit is reached but that's also not the case).
>
> Thanks again if anyone as an idea.
> Cheers
> Richard
> --
> /dev/null
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] not healing one file

2017-10-25 Thread Richard Neuboeck
Hi Gluster Gurus,

I'm using a gluster volume as home for our users. The volume is
replica 3, running on CentOS 7, gluster version 3.10
(3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
gluster 3.10 (3.10.6-3.fc26.x86_64).

During the data backup I got an I/O error on one file. Manually
checking for this file on a client confirms this:

ls -l
romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/
ls: cannot access
'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4':
Input/output error
total 2015
-rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
-rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
-rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
-?? ? ???? recovery.baklz4

Out of curiosity I checked all the bricks for this file. It's
present there. Making a checksum shows that the file is different on
one of the three replica servers.

Querying healing information shows that the file should be healed:
# gluster volume heal home info
Brick sphere-six:/srv/gluster_home/brick
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4

Status: Connected
Number of entries: 1

Brick sphere-five:/srv/gluster_home/brick
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4

Status: Connected
Number of entries: 1

Brick sphere-four:/srv/gluster_home/brick
Status: Connected
Number of entries: 0

Manually triggering heal doesn't report an error but also does not
heal the file.
# gluster volume heal home
Launching heal operation to perform index self heal on volume home
has been successful

Same with a full heal
# gluster volume heal home full
Launching heal operation to perform full self heal on volume home
has been successful

According to the split brain query that's not the problem:
# gluster volume heal home info split-brain
Brick sphere-six:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

Brick sphere-five:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

Brick sphere-four:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0


I have no idea why this situation arose in the first place and also
no idea as how to solve this problem. I would highly appreciate any
helpful feedback I can get.

The only mention in the logs matching this file is a rename operation:
/var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
09:19:11.561661] I [MSGID: 115061]
[server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
RENAME
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.jsonlz4
(48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
(48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-home-client-3-0-0,
error-xlator: home-posix [No data available]

I enabled directory quotas the same day this problem showed up but
I'm not sure how quotas could have an effect like this (maybe unless
the limit is reached but that's also not the case).

Thanks again if anyone as an idea.
Cheers
Richard
-- 
/dev/null



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users