Re: [Gluster-users] not healing one file

2017-10-25 Thread Karthik Subrahmanya
Hey Richard,

Could you share the following informations please?
1. gluster volume info 
2. getfattr output of that file from all the bricks
getfattr -d -e hex -m . 
3. glustershd & glfsheal logs

Regards,
Karthik

On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi  wrote:

> On a side note, try recently released health report tool, and see if it
> does diagnose any issues in setup. Currently you may have to run it in all
> the three machines.
>
>
>
> On 26-Oct-2017 6:50 AM, "Amar Tumballi"  wrote:
>
>> Thanks for this report. This week many of the developers are at Gluster
>> Summit in Prague, will be checking this and respond next week. Hope that's
>> fine.
>>
>> Thanks,
>> Amar
>>
>>
>> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  wrote:
>>
>>> Hi Gluster Gurus,
>>>
>>> I'm using a gluster volume as home for our users. The volume is
>>> replica 3, running on CentOS 7, gluster version 3.10
>>> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>>> gluster 3.10 (3.10.6-3.fc26.x86_64).
>>>
>>> During the data backup I got an I/O error on one file. Manually
>>> checking for this file on a client confirms this:
>>>
>>> ls -l
>>> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/ses
>>> sionstore-backups/
>>> ls: cannot access
>>> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4':
>>> Input/output error
>>> total 2015
>>> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>>> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>>> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>>> -?? ? ???? recovery.baklz4
>>>
>>> Out of curiosity I checked all the bricks for this file. It's
>>> present there. Making a checksum shows that the file is different on
>>> one of the three replica servers.
>>>
>>> Querying healing information shows that the file should be healed:
>>> # gluster volume heal home info
>>> Brick sphere-six:/srv/gluster_home/brick
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4
>>>
>>> Status: Connected
>>> Number of entries: 1
>>>
>>> Brick sphere-five:/srv/gluster_home/brick
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4
>>>
>>> Status: Connected
>>> Number of entries: 1
>>>
>>> Brick sphere-four:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Manually triggering heal doesn't report an error but also does not
>>> heal the file.
>>> # gluster volume heal home
>>> Launching heal operation to perform index self heal on volume home
>>> has been successful
>>>
>>> Same with a full heal
>>> # gluster volume heal home full
>>> Launching heal operation to perform full self heal on volume home
>>> has been successful
>>>
>>> According to the split brain query that's not the problem:
>>> # gluster volume heal home info split-brain
>>> Brick sphere-six:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>> Brick sphere-five:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>> Brick sphere-four:/srv/gluster_home/brick
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>>
>>> I have no idea why this situation arose in the first place and also
>>> no idea as how to solve this problem. I would highly appreciate any
>>> helpful feedback I can get.
>>>
>>> The only mention in the logs matching this file is a rename operation:
>>> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>>> 09:19:11.561661] I [MSGID: 115061]
>>> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
>>> RENAME
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.jsonlz4
>>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/se
>>> ssionstore-backups/recovery.baklz4
>>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>>> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-ho
>>> me-client-3-0-0,
>>> error-xlator: home-posix [No data available]
>>>
>>> I enabled directory quotas the same day this problem showed up but
>>> I'm not sure how quotas could have an effect like this (maybe unless
>>> the limit is reached but that's also not the case).
>>>
>>> Thanks again if anyone as an idea.
>>> Cheers
>>> Richard
>>> --
>>> /dev/null
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing 

Re: [Gluster-users] not healing one file

2017-10-25 Thread Amar Tumballi
On a side note, try recently released health report tool, and see if it
does diagnose any issues in setup. Currently you may have to run it in all
the three machines.



On 26-Oct-2017 6:50 AM, "Amar Tumballi"  wrote:

> Thanks for this report. This week many of the developers are at Gluster
> Summit in Prague, will be checking this and respond next week. Hope that's
> fine.
>
> Thanks,
> Amar
>
>
> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  wrote:
>
>> Hi Gluster Gurus,
>>
>> I'm using a gluster volume as home for our users. The volume is
>> replica 3, running on CentOS 7, gluster version 3.10
>> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>> gluster 3.10 (3.10.6-3.fc26.x86_64).
>>
>> During the data backup I got an I/O error on one file. Manually
>> checking for this file on a client confirms this:
>>
>> ls -l
>> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/
>> ls: cannot access
>> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4':
>> Input/output error
>> total 2015
>> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>> -?? ? ???? recovery.baklz4
>>
>> Out of curiosity I checked all the bricks for this file. It's
>> present there. Making a checksum shows that the file is different on
>> one of the three replica servers.
>>
>> Querying healing information shows that the file should be healed:
>> # gluster volume heal home info
>> Brick sphere-six:/srv/gluster_home/brick
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick sphere-five:/srv/gluster_home/brick
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick sphere-four:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries: 0
>>
>> Manually triggering heal doesn't report an error but also does not
>> heal the file.
>> # gluster volume heal home
>> Launching heal operation to perform index self heal on volume home
>> has been successful
>>
>> Same with a full heal
>> # gluster volume heal home full
>> Launching heal operation to perform full self heal on volume home
>> has been successful
>>
>> According to the split brain query that's not the problem:
>> # gluster volume heal home info split-brain
>> Brick sphere-six:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick sphere-five:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick sphere-four:/srv/gluster_home/brick
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>>
>> I have no idea why this situation arose in the first place and also
>> no idea as how to solve this problem. I would highly appreciate any
>> helpful feedback I can get.
>>
>> The only mention in the logs matching this file is a rename operation:
>> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>> 09:19:11.561661] I [MSGID: 115061]
>> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
>> RENAME
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.jsonlz4
>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/
>> sessionstore-backups/recovery.baklz4
>> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-
>> home-client-3-0-0,
>> error-xlator: home-posix [No data available]
>>
>> I enabled directory quotas the same day this problem showed up but
>> I'm not sure how quotas could have an effect like this (maybe unless
>> the limit is reached but that's also not the case).
>>
>> Thanks again if anyone as an idea.
>> Cheers
>> Richard
>> --
>> /dev/null
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] not healing one file

2017-10-25 Thread Amar Tumballi
Thanks for this report. This week many of the developers are at Gluster
Summit in Prague, will be checking this and respond next week. Hope that's
fine.

Thanks,
Amar


On 25-Oct-2017 3:07 PM, "Richard Neuboeck"  wrote:

> Hi Gluster Gurus,
>
> I'm using a gluster volume as home for our users. The volume is
> replica 3, running on CentOS 7, gluster version 3.10
> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
> gluster 3.10 (3.10.6-3.fc26.x86_64).
>
> During the data backup I got an I/O error on one file. Manually
> checking for this file on a client confirms this:
>
> ls -l
> romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/
> ls: cannot access
> 'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4':
> Input/output error
> total 2015
> -rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
> -rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
> -rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
> -?? ? ???? recovery.baklz4
>
> Out of curiosity I checked all the bricks for this file. It's
> present there. Making a checksum shows that the file is different on
> one of the three replica servers.
>
> Querying healing information shows that the file should be healed:
> # gluster volume heal home info
> Brick sphere-six:/srv/gluster_home/brick
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-five:/srv/gluster_home/brick
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries: 0
>
> Manually triggering heal doesn't report an error but also does not
> heal the file.
> # gluster volume heal home
> Launching heal operation to perform index self heal on volume home
> has been successful
>
> Same with a full heal
> # gluster volume heal home full
> Launching heal operation to perform full self heal on volume home
> has been successful
>
> According to the split brain query that's not the problem:
> # gluster volume heal home info split-brain
> Brick sphere-six:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-five:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
>
> I have no idea why this situation arose in the first place and also
> no idea as how to solve this problem. I would highly appreciate any
> helpful feedback I can get.
>
> The only mention in the logs matching this file is a rename operation:
> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
> 09:19:11.561661] I [MSGID: 115061]
> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
> RENAME
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.jsonlz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
> /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-
> backups/recovery.baklz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:
> 206366-home-client-3-0-0,
> error-xlator: home-posix [No data available]
>
> I enabled directory quotas the same day this problem showed up but
> I'm not sure how quotas could have an effect like this (maybe unless
> the limit is reached but that's also not the case).
>
> Thanks again if anyone as an idea.
> Cheers
> Richard
> --
> /dev/null
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Gluster Health Report tool

2017-10-25 Thread Amar Tumballi
On Thu, Oct 26, 2017 at 3:53 AM, Sankarshan Mukhopadhyay <
sankarshan.mukhopadh...@gmail.com> wrote:

> On Thu, Oct 26, 2017 at 2:24 AM, Marcin Dulak 
> wrote:
> > Hi,
> >
> > since people are suggesting nagios then I can't resist suggesting
> exporting
> > the metrics in the prometheus format,
> > or at least making the project into a library so
> > https://github.com/prometheus/client_python could be used to export the
> > prometheus metrics.
> > There has been an attempt at https://github.com/ofesseler/
> gluster_exporter
> > but it is not maintained anymore.
> >
>
> There is an on-going effort which provides a monitoring dashboard for
> a Gluster cluster. Some detail at
>  At present the
> stack is not consuming Prometheus, however, the team is looking at
> switching over so as to make a more malleable dashboard. There is of
> course a Gitter channel at 
> Install+configure instructions for the latest release are at
>  release-v1.5.3-(install-guide)>
>
>
Thanks Sankarshan for this.

The goal of 'health-report' tool is not 'monitoring'. There are better
tools which do that
well, for instance with Gluster, that would be Tendrl (
https://github.com/Tendrl/). Mainly
because one run of health-report may take few minutes if we add more checks
there.
Where as a monitoring service would be collecting data in every few seconds
without
causing any stress on the machine, or gluster volume.

Goal of the health-report tool is to run it once a day (just as status
check, if you don't
have monitoring setup). or to run it when an issue happens, so that we
don't miss
analyzing something obvious. Hope that is clear.

For anyone who plan to use monitoring, please follow Tendrl project. Also,
if you are at
Gluster Summit happening tomorrow and day after, you can see the demo of
Tendrl + Gluster.

Regards,
Amar



>
> --
> sankarshan mukhopadhyay
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Gluster Health Report tool

2017-10-25 Thread Sankarshan Mukhopadhyay
On Thu, Oct 26, 2017 at 2:24 AM, Marcin Dulak  wrote:
> Hi,
>
> since people are suggesting nagios then I can't resist suggesting exporting
> the metrics in the prometheus format,
> or at least making the project into a library so
> https://github.com/prometheus/client_python could be used to export the
> prometheus metrics.
> There has been an attempt at https://github.com/ofesseler/gluster_exporter
> but it is not maintained anymore.
>

There is an on-going effort which provides a monitoring dashboard for
a Gluster cluster. Some detail at
 At present the
stack is not consuming Prometheus, however, the team is looking at
switching over so as to make a more malleable dashboard. There is of
course a Gitter channel at 
Install+configure instructions for the latest release are at



-- 
sankarshan mukhopadhyay

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Gluster Health Report tool

2017-10-25 Thread Marcin Dulak
Hi,

since people are suggesting nagios then I can't resist suggesting exporting
the metrics in the prometheus format,
or at least making the project into a library so
https://github.com/prometheus/client_python could be used to export the
prometheus metrics.
There has been an attempt at https://github.com/ofesseler/gluster_exporter
but it is not maintained anymore.

Cheers,

Marcin

On Wed, Oct 25, 2017 at 7:56 PM, mabi  wrote:

> Hi Aravinda,
>
> Very nice initiative, thank you very much! As as small recommendation it
> would be nice to have a "nagios/icinga" mode, maybe through a "-n"
> parameter which will do the health check and output the status ina
> nagios/icinga compatible format. As such this tool could be directly used
> by nagios for monitoring.
>
> Best,
> M.
>
>
>
>  Original Message 
> Subject: [Gluster-devel] Gluster Health Report tool
> Local Time: October 25, 2017 2:11 PM
> UTC Time: October 25, 2017 12:11 PM
> From: avish...@redhat.com
> To: Gluster Devel , gluster-users <
> gluster-users@gluster.org>
>
> Hi,
>
> We started a new project to identify issues/misconfigurations in
> Gluster nodes. This project is very young and not yet ready for
> Production use, Feedback on the existing reports and ideas for more
> Reports are welcome.
>
> This tool needs to run in every Gluster node to detect the local
> issues (Example: Parsing log files, checking disk space etc) in each
> Nodes. But some of the reports use Gluster CLI to identify the issues
> Which can be run in any one node.(For example
> gluster-health-report --run-only glusterd-peer-disconnect)
> Install
>
> sudo pip install gluster-health-report
> Usage
> Run gluster-health-report --help for help
>
> gluster-health-report
>
> Example output is available here
> https://github.com/aravindavk/gluster-health-report
> Project Details
>
>- Issue page: https://github.com/gluster/glusterfs/issues/313
>- Project page: https://github.com/aravindavk/gluster-health-report
>- Open new issue if you have new report suggestion or found issue with
>  existing report
>https://github.com/aravindavk/gluster-health-report/issues
>
>--
>
>regards
>Aravinda VK
>
> --
>
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Gluster Health Report tool

2017-10-25 Thread mabi
Hi Aravinda,

Very nice initiative, thank you very much! As as small recommendation it would 
be nice to have a "nagios/icinga" mode, maybe through a "-n" parameter which 
will do the health check and output the status ina nagios/icinga compatible 
format. As such this tool could be directly used by nagios for monitoring.

Best,
M.

>  Original Message 
> Subject: [Gluster-devel] Gluster Health Report tool
> Local Time: October 25, 2017 2:11 PM
> UTC Time: October 25, 2017 12:11 PM
> From: avish...@redhat.com
> To: Gluster Devel , gluster-users 
> 
>
> Hi,
>
> We started a new project to identify issues/misconfigurations in
> Gluster nodes. This project is very young and not yet ready for
> Production use, Feedback on the existing reports and ideas for more
> Reports are welcome.
>
> This tool needs to run in every Gluster node to detect the local
> issues (Example: Parsing log files, checking disk space etc) in each
> Nodes. But some of the reports use Gluster CLI to identify the issues
> Which can be run in any one node.(For example
> gluster-health-report --run-only glusterd-peer-disconnect)
>
> Install
>
> sudo pip install gluster-health-report
>
> Usage
>
> Run gluster-health-report --help for help
>
> gluster-health-report
>
> Example output is available here
> https://github.com/aravindavk/gluster-health-report
>
> Project Details
>
> - Issue page: https://github.com/gluster/glusterfs/issues/313
>
> - Project page: https://github.com/aravindavk/gluster-health-report
>
> - Open new issue if you have new report suggestion or found issue with
>   existing report
> https://github.com/aravindavk/gluster-health-report/issues
>
> --
>
> regards
> Aravinda VK
>
> ---
>
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Health Report tool

2017-10-25 Thread Amar Tumballi
Thanks Aravinda, mainly for making this easy to install with 'pip'. Would
be easier for people to start using quickly.

All, this is a step towards reducing the chatter about 'did you check
that?' 'did you also check this, that and those parameters?'. Please start
giving feedback, and also help us improve it by contributing ideas,
suggestions and code.

Going forward we are planning to make this project a part of gluster org in
github. (https://github.com/gluster/). Soon we would like you to run these
reports everyday once to make sure things are all in 'OK' status, mainly to
prevent issues, instead of getting to debugging it after it happens.

Regards,
Amar

On Wed, Oct 25, 2017 at 2:11 PM, Aravinda  wrote:

> Hi,
>
> We started a new project to identify issues/misconfigurations in
> Gluster nodes. This project is very young and not yet ready for
> Production use, Feedback on the existing reports and ideas for more
> Reports are welcome.
>
> This tool needs to run in every Gluster node to detect the local
> issues (Example: Parsing log files, checking disk space etc) in each
> Nodes. But some of the reports use Gluster CLI to identify the issues
> Which can be run in any one node.(For example
> `gluster-health-report --run-only glusterd-peer-disconnect`)
>
> # Install
>
> sudo pip install gluster-health-report
>
> # Usage
> Run `gluster-health-report --help` for help
>
> gluster-health-report
>
> Example output is available here https://github.com/aravindavk/
> gluster-health-report
>
> # Project Details
> - Issue page: https://github.com/gluster/glusterfs/issues/313
> - Project page: https://github.com/aravindavk/gluster-health-report
> - Open new issue if you have new report suggestion or found issue with
>   existing report https://github.com/aravindavk/
> gluster-health-report/issues
>
>





> --
>
> regards
> Aravinda VK
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users




-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Health Report tool

2017-10-25 Thread Jim Kinney
Very nice! Thanks!

On October 25, 2017 8:11:36 AM EDT, Aravinda  wrote:
>Hi,
>
>We started a new project to identify issues/misconfigurations in
>Gluster nodes. This project is very young and not yet ready for
>Production use, Feedback on the existing reports and ideas for more
>Reports are welcome.
>
>This tool needs to run in every Gluster node to detect the local
>issues (Example: Parsing log files, checking disk space etc) in each
>Nodes. But some of the reports use Gluster CLI to identify the issues
>Which can be run in any one node.(For example
>`gluster-health-report --run-only glusterd-peer-disconnect`)
>
># Install
>
>     sudo pip install gluster-health-report
>
># Usage
>Run `gluster-health-report --help` for help
>
>     gluster-health-report
>
>Example output is available here 
>https://github.com/aravindavk/gluster-health-report
>
># Project Details
>- Issue page: https://github.com/gluster/glusterfs/issues/313
>- Project page: https://github.com/aravindavk/gluster-health-report
>- Open new issue if you have new report suggestion or found issue with
>   existing report 
>https://github.com/aravindavk/gluster-health-report/issues
>
>-- 
>
>regards
>Aravinda VK
>
>___
>Gluster-users mailing list
>Gluster-users@gluster.org
>http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. All tyopes are thumb related and 
reflect authenticity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] not healing one file

2017-10-25 Thread Richard Neuboeck
Hi Gluster Gurus,

I'm using a gluster volume as home for our users. The volume is
replica 3, running on CentOS 7, gluster version 3.10
(3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
gluster 3.10 (3.10.6-3.fc26.x86_64).

During the data backup I got an I/O error on one file. Manually
checking for this file on a client confirms this:

ls -l
romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/
ls: cannot access
'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4':
Input/output error
total 2015
-rw---. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
-rw---. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
-rw---. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
-?? ? ???? recovery.baklz4

Out of curiosity I checked all the bricks for this file. It's
present there. Making a checksum shows that the file is different on
one of the three replica servers.

Querying healing information shows that the file should be healed:
# gluster volume heal home info
Brick sphere-six:/srv/gluster_home/brick
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4

Status: Connected
Number of entries: 1

Brick sphere-five:/srv/gluster_home/brick
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4

Status: Connected
Number of entries: 1

Brick sphere-four:/srv/gluster_home/brick
Status: Connected
Number of entries: 0

Manually triggering heal doesn't report an error but also does not
heal the file.
# gluster volume heal home
Launching heal operation to perform index self heal on volume home
has been successful

Same with a full heal
# gluster volume heal home full
Launching heal operation to perform full self heal on volume home
has been successful

According to the split brain query that's not the problem:
# gluster volume heal home info split-brain
Brick sphere-six:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

Brick sphere-five:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0

Brick sphere-four:/srv/gluster_home/brick
Status: Connected
Number of entries in split-brain: 0


I have no idea why this situation arose in the first place and also
no idea as how to solve this problem. I would highly appreciate any
helpful feedback I can get.

The only mention in the logs matching this file is a rename operation:
/var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
09:19:11.561661] I [MSGID: 115061]
[server-rpc-fops.c:1022:server_rename_cbk] 0-home-server: 5266153:
RENAME
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.jsonlz4
(48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
(48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-home-client-3-0-0,
error-xlator: home-posix [No data available]

I enabled directory quotas the same day this problem showed up but
I'm not sure how quotas could have an effect like this (maybe unless
the limit is reached but that's also not the case).

Thanks again if anyone as an idea.
Cheers
Richard
-- 
/dev/null



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster Health Report tool

2017-10-25 Thread Aravinda

Hi,

We started a new project to identify issues/misconfigurations in
Gluster nodes. This project is very young and not yet ready for
Production use, Feedback on the existing reports and ideas for more
Reports are welcome.

This tool needs to run in every Gluster node to detect the local
issues (Example: Parsing log files, checking disk space etc) in each
Nodes. But some of the reports use Gluster CLI to identify the issues
Which can be run in any one node.(For example
`gluster-health-report --run-only glusterd-peer-disconnect`)

# Install

    sudo pip install gluster-health-report

# Usage
Run `gluster-health-report --help` for help

    gluster-health-report

Example output is available here 
https://github.com/aravindavk/gluster-health-report


# Project Details
- Issue page: https://github.com/gluster/glusterfs/issues/313
- Project page: https://github.com/aravindavk/gluster-health-report
- Open new issue if you have new report suggestion or found issue with
  existing report 
https://github.com/aravindavk/gluster-health-report/issues


--

regards
Aravinda VK

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-25 Thread Karthik Subrahmanya
Hey Jim,

Please let me know what I understood is correct or not.
You have 14,734 GFIDs in .glusterfs which are only on the brick which was
up during the failure and it does not have any referenced file in that
brick.
The down brick has the file  inside the brick path, but not the GFID
hardlink in the .glusterfs folder.
Did I get it right?

Could you also give me these information as well?
1. What is the link count for those GFIDs in the up brick?
2. If the link count is 2 or more do you have a file path for those GFIDs
in the up brick? (use the find command)
3. Do you have the GFID hardlink or you have the file in the down brick?

Let me explain how the things work.
When a file gets created from the mount, the file will be created inside
the brick & a hardlink to that file will be created inside the .glusterfs
folder with its GFID.
So the link count for the file will be 2 (unless you create more hardlinks
manually). So after the failure happened I guess you should have both the
file & hardlink in the up brick,
and when you do the lookup on the mount for that file, it should create the
file & the hardlink on the brick which was down.

Regards,
Karthik

On Tue, Oct 24, 2017 at 10:29 PM, Jim Kinney  wrote:

> I have 14,734 GFIDS that are different. All the different ones are only on
> the brick that was live during the outage and concurrent file copy-in. The
> brick that was down at that time has no GFIDs that are not also on the up
> brick.
>
> As the bricks are 10TB, the find is going to be a long running process.
> I'm running several finds at once with gnu parallel but it will still take
> some time. Can't bring the up machine offline as it's in use. At least I
> have 24 cores to work with.
>
> I've only tested with one GFID but the file it referenced _IS_ on the down
> machine even though it has no GFID in the .glusterfs structure.
>
> On Tue, 2017-10-24 at 12:35 +0530, Karthik Subrahmanya wrote:
>
> Hi Jim,
>
> Can you check whether the same hardlinks are present on both the bricks &
> both of them have the link count 2?
> If the link count is 2 then "find  -samefile
> // gfid>"
> should give you the file path.
>
> Regards,
> Karthik
>
> On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney  wrote:
>
> I'm not so lucky. ALL of mine show 2 links and none have the attr data
> that supplies the path to the original.
>
> I have the inode from stat. Looking now to dig out the path/filename from
> xfs_db on the specific inodes individually.
>
> Is the hash of the filename or /filename and if so relative to
> where? /, , ?
>
> On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote:
>
> In my case I was able to delete the hard links in the .glusterfs folders
> of the bricks and it seems to have done the trick, thanks!
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Monday, October 23, 2017 1:52 AM
> *To:* Jim Kinney ; Matt Waymack 
> *Cc:* gluster-users 
> *Subject:* Re: [Gluster-users] gfid entries in volume heal info that do
> not heal
>
>
>
> Hi Jim & Matt,
>
> Can you also check for the link count in the stat output of those hardlink
> entries in the .glusterfs folder on the bricks.
> If the link count is 1 on all the bricks for those entries, then they are
> orphaned entries and you can delete those hardlinks.
>
> To be on the safer side have a backup before deleting any of the entries.
>
> Regards,
>
> Karthik
>
>
>
> On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney  wrote:
>
> I've been following this particular thread as I have a similar issue
> (RAID6 array failed out with 3 dead drives at once while a 12 TB load was
> being copied into one mounted space - what a mess)
>
>
>
> I have >700K GFID entries that have no path data:
>
> Example:
>
> getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7
> -401b-84b5-ff2a51c10421
>
> # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6
> c6162656c65645f743a733000
>
> trusted.bit-rot.version=0x020059b1b316000270e7
>
> trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421
>
>
>
> [root@bmidata1 brick]# getfattr -d -n trusted.glusterfs.pathinfo -e hex
> -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
>
> .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421:
> trusted.glusterfs.pathinfo: No such attribute
>
>
>
> I had to totally rebuild the dead RAID array and did a copy from the live
> one before activating gluster on the rebuilt system. I accidentally copied
> over the .glusterfs folder from the working side
>
> (replica 2 only for now - adding arbiter node as soon as I can get this
> one cleaned up).
>
>
>
> I've run the methods from "http://docs.gluster.org/en/la
> test/Troubleshooting/gfid-to-path/" with no results using random GFIDs. A
> full systemic run using the script from method 3 crashes with "too many
> nested