[Gluster-users] dealing with gluster outages due to disk timeouts

2016-11-23 Thread Christian Rice
This is a long-standing problem for me, and I’m wondering how to insulate 
myself from it…pardon the long-windedness in advance.

I use gluster internationally as regional repositories of files, and it’s 
pretty constantly being rsync’d to (ie, written to solely by rsync, optimized 
with –inplace or similar).

These regional repositories are also being read from, each to the tune of 
10-50MB/s.  Each gluster pool is anywhere between 4 to 16 servers, each with 
one brick of RAID6, all pools in a distributed-only config.  I’m not currently 
using distributed-replicated, but even that configuration is not immune to my 
problem.

So, here’s the problem:

If one disk on one gluster brick experiences timeouts, all the gluster clients 
block.  This is likely because the rate at which the disks are being exercised 
by rsyncs (writes and stats) plus reads (client file access) causes an 
overwhelming backlog of gluster ops, something perhaps is bottlenecked and 
locking up, but in general it’s fairly useless to me.  Running a ‘df’ hangs 
completely.

This has been an issue for me for years.  My usual procedure is to manually 
fail the disk that’s experiencing timeouts, if it hasn’t been ejected already 
by the raid controller, and remove the load from the gluster file system—it 
only takes a fraction of a minute before the gluster volume recovers and I can 
add the load back.  Rebuilding parity to the brick’s raid is not the 
problem—it’s the moments before the disk ultimately fails that causes the 
backlog of requests that really causes problems.

I’m looking for advice as to how to insulate myself from this problem better.  
My RAID cards don’t support modifying disk timeouts to be incredibly short.  I 
can see disk timeout messages from the raid card, and write an omprog function 
to fail the disk, but that’s kinda brutal.  Maybe I could get a different raid 
card that supports shorter timeouts or fast disk failures, but if anyone has 
experience with, say md raid1 not having this problem, or something similar, it 
might be worth the expense to go that route.

If my memory is correct, gluster still has this problem with a 
distributed-replicated configuration, because writes need to succeed on both 
leafs before an operation is considered complete, so a timeout on one node is 
still detrimental.

Insight, experience designing around this, tunables I haven’t considered—I’ll 
take anything.  I really like gluster, I’ll keep using it, but this is its 
Achille’s heel for me.  Is there a magic bullet?  Or do I just need to fail 
faster?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Why not all of my NFS servers are online?

2016-11-23 Thread Atin Mukherjee
You'd need to check nfs.log & glusterd log file to see why the nfs service
didn't come up on both the nodes. Have you ensured that nfs kernel is
disabled in both the nodes? Not having nfs service up on all the nodes is
certainly not ideal.

On Wed, 23 Nov 2016 at 17:25, Jin Li  wrote:

> Hi all,
>
> I have the following volume, all bricks are connected fine, but why
> not all of the NFS servers are online? (Two of them are offline). Will
> the offline NFS servers cause trouble? How could I make sure they are
> online? Thanks.
>
> Please find my volume status.
>
>
> $ sudo gluster volume status gvd
> Status of volume: gvd
> Gluster process Port Online Pid
>
> --
> Brick rigel:/mnt/raid6/glusterfs_distributed_export 49155 Y 5553
> Brick betelgeuse:/mnt/raid6/glusterfs_distributed_expor
> t 49154 Y 5223
> Brick polaris:/mnt/raid6/glusterfs_distributed_export 49157 Y 5196
> Brick capella:/mnt/raid6/glusterfs_distributed_export 49157 Y 4883
> Brick eridani:/mnt/raid6/glusterfs_distributed_export 49152 Y 3858
> Brick sargas:/mnt/raid6/glusterfs_distributed_export 49152 Y 3886
> Brick toliman:/mnt/raid6/glusterfs_distributed_export 49207 Y 3881
> Brick alphard:/mnt/raid6/glusterfs_distributed_export 49264 Y 3908
> NFS Server on localhost 2049 Y 4898
> NFS Server on eridani 2049 Y 3870
> NFS Server on alphard 2049 Y 3920
> NFS Server on sargas 2049 Y 3898
> NFS Server on toliman 2049 Y 3893
> NFS Server on 172.17.1.1 N/A N N/A
> NFS Server on betelgeuse 2049 Y 5235
> NFS Server on polaris N/A N N/A
>
>
> Best regards,
> Jin
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-- 
- Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing Gluster 3.9

2016-11-23 Thread Atin Mukherjee
Go for 3.8 as that's going to be maintained for a while and is a LTM. 3.9's
life cycle is short.

On Wed, 23 Nov 2016 at 22:32, Dj Merrill  wrote:

> On 11/23/2016 8:23 AM, Amye Scavarda wrote:
> > Gluster
> > versions 3.9, 3.8 and 3.7 are all actively maintained.
>
>
> This might be a bit of a silly question, but how would one know which
> version of Gluster to use?
>
> If you wanted to use Gluster as a scratch space for an HPC cluster, and
> needed a solid, stable setup for a production environment, what would be
> the recommended version to install?
>
> Thanks,
>
> -Dj
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-- 
- Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing Gluster 3.9

2016-11-23 Thread Dj Merrill

On 11/23/2016 8:23 AM, Amye Scavarda wrote:

Gluster
versions 3.9, 3.8 and 3.7 are all actively maintained.



This might be a bit of a silly question, but how would one know which 
version of Gluster to use?


If you wanted to use Gluster as a scratch space for an HPC cluster, and 
needed a solid, stable setup for a production environment, what would be 
the recommended version to install?


Thanks,

-Dj

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Announcing Gluster 3.9

2016-11-23 Thread Amye Scavarda
The Gluster community is pleased to announce the release of Gluster 3.9.

This is a major release that includes a number of changes. Many
improvements contribute to better support of Gluster with containers
and running your storage on the same server as your hypervisors.
Additionally, we've focused on integrating with other projects in the
open source ecosystem. This releases marks the completion of
maintenance releases for Gluster 3.6.  Moving forward, Gluster
versions 3.9, 3.8 and 3.7 are all actively maintained.

Our full release notes:
http://gluster.readthedocs.io/en/latest/release-notes/3.9.0/

Upgrade Guide is available here:
http://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.9/


-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Fwd: Weekly Community Meeting - 2016-11-23

2016-11-23 Thread Kaushal M
(Forgot the gluster-users list)


-- Forwarded message --
From: Kaushal M 
Date: Wed, Nov 23, 2016 at 6:50 PM
Subject: Re: Weekly Community Meeting - 2016-11-23
To: Gluster Devel 


On Tue, Nov 22, 2016 at 6:26 PM, Kaushal M  wrote:
> Hi All,
>
> This is a reminder to add your status updates and topics to the
> meeting agenda at [1].
> Ensure you do this before the meeting tomorrow.
>
> Thanks,
> Kaushal
>
> [1]: https://bit.ly/gluster-community-meetings

Thank you everyone who attended today's meeting.

4 topics were discussed today, the major one among them being on the
release of 3.9 and the beginning of 3.10 cycle. More information can
be found in the meeting minutes and logs at [1][2][3][4].

The agenda for next weeks meeting is available at [5]. Please add your
updates and topics for discussion to the agenda. Everyone is welcome
to add their own topics for discussion.

I'll be hosting next weeks meeting, at the same time and same place.

Thanks.
~kaushal

[1] https://github.com/gluster/glusterfs/wiki/Community-Meeting-2016-11-23
[2] Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-11-23/weekly_community_meeting_2016-11-23.2016-11-23-12.01.html
[3] Minutes (text):
https://meetbot.fedoraproject.org/gluster-meeting/2016-11-23/weekly_community_meeting_2016-11-23.2016-11-23-12.01.txt
[4] Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-11-23/weekly_community_meeting_2016-11-23.2016-11-23-12.01.log.html
[5] https://bit.ly/gluster-community-meetings
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Files won't heal, although no obvious problem visible

2016-11-23 Thread Pavel Cernohorsky
I afraid I do not know how we got to this strange state, I do not know 
Gluster in detail enough. When does the trusted.afr.dirty flag get set? 
And when does the trusted.afr.xxx-client-xxx flag get set? From what you 
are saying, it seems to me that you always expect them to be set / 
cleared in the same moment.


If it helps you, at the end of my message you can find the full volume 
configuration.


Can I help you somehow more in actually discovering what happened / 
fixing the problem?


Thanks for your help, kind regards,

Pavel


Volume Name: hot
Type: Distributed-Replicate
Volume ID: 4d09dd56-97b6-4b63-8765-0a08574e8ddd
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (2 + 1) = 36
Transport-type: tcp
Bricks:
Brick1: 10.10.27.10:/opt/data/hdd1/gluster
Brick2: 10.10.27.12:/opt/data/hdd1/gluster
Brick3: 10.10.27.11:/opt/data/ssd/arbiter1 (arbiter)
... similar triplets here ...
Brick34: 10.10.27.12:/opt/data/hdd8/gluster
Brick35: 10.10.27.11:/opt/data/hdd8/gluster
Brick36: 10.10.27.10:/opt/data/ssd/arbiter12 (arbiter)
Options Reconfigured:
performance.flush-behind: off
performance.write-behind: off
performance.open-behind: off
performance.nfs.write-behind: off
cluster.background-self-heal-count: 1
performance.io-cache: off
network.ping-timeout: 1
network.inode-lru-limit: 1024
performance.nfs.flush-behind: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.self-heal-daemon: off


On 11/23/2016 01:22 PM, Ravishankar N wrote:

On 11/23/2016 04:56 PM, Pavel Cernohorsky wrote:

Hello, thanks for your reply, answers are in the text.

On 11/23/2016 11:55 AM, Ravishankar N wrote:

On 11/23/2016 03:56 PM, Pavel Cernohorsky wrote:
The "hot-client-21" is, based on the vol-file, the following of the 
bricks:

option remote-subvolume /opt/data/hdd5/gluster
option remote-host 10.10.27.11

I have self healing daemon disabled, but when I try to trigger 
healing manually (gluster volume heal ), I get: "Launching 
heal operation to perform index self heal on volume  has 
been unsuccessful on bricks that are down. Please check if all 
brick processes are running.", although all the bricks are online 
(gluster volume status ).


Can you enable the self-heal daemon  and try again ?  `gluster 
volume heal ` requires the shd to be enabled. The error 
message that you get is inappropriate and is being fixed.


When I enabled the self heal daemon, I was able to start healing, and 
the files were actually healed. What does self-heal daemon do in 
addition to the automated healing when you read the file?



The lookup/read code-path doesn't seem to be considering a file with 
only the afr.dirty xattr being non-zero as a candidate for heal (while 
the self heal-daemon code-path does) . I'm not sure at this point if 
it should because just afr.dirty being set on all bricks without any  
trusted.afr.xxx-client-xxx being set doesn't seem to be something that 
should be hit under normal circumstances. I'll need to think about 
this more.




The original reason to disable self heal daemon was to be able to 
control the amount of resources used by the healing, because the 
"cluster.background-self-heal-count: 1" did not help very much and 
the amount of both network and disk io consumed was just extreme.


And I am also pretty sure we have seen similar problem (not sure 
about the attributes) before we disabled the shd.






When I try to just md5sum the file, to trigger automated healing on 
file manipulation, I get the result, but the file is not healed 
anyway. This usually works when I do not get 3 entries for the same 
file in the heal info.


Is the file size for 99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4 
non-zero on the 2 data bricks (i.e. on 10.10.27.11 and 10.10.27.10) 
and do they match?
Do the md5sums match with what you got on the mount when you 
calculate it directly on these bricks?


The file has non-zero size on both the data bricks, and the md5 sum 
was the same on both of them before they were healed, after the 
healing (enabling the shd and healing start) the md5 did not change 
on either of the data bricks. Mount point reports the same md5 as all 
the other attempts directly on the bricks. So what is actually 
happening there? Why was the file blamed (not unblamed after healing?)?


That means there was no real heal pending. But because the dirty xattr 
was set, the shd picked up a brick as a source and did the heal 
anyway. We would need to find how we ended in the 'only afr.dirty 
xattr was set' state for the file.


-Ravi


Thanks for your answers,
Pavel





___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Files won't heal, although no obvious problem visible

2016-11-23 Thread Ravishankar N

On 11/23/2016 04:56 PM, Pavel Cernohorsky wrote:

Hello, thanks for your reply, answers are in the text.

On 11/23/2016 11:55 AM, Ravishankar N wrote:

On 11/23/2016 03:56 PM, Pavel Cernohorsky wrote:
The "hot-client-21" is, based on the vol-file, the following of the 
bricks:

option remote-subvolume /opt/data/hdd5/gluster
option remote-host 10.10.27.11

I have self healing daemon disabled, but when I try to trigger 
healing manually (gluster volume heal ), I get: "Launching 
heal operation to perform index self heal on volume  has 
been unsuccessful on bricks that are down. Please check if all brick 
processes are running.", although all the bricks are online (gluster 
volume status ).


Can you enable the self-heal daemon  and try again ?  `gluster volume 
heal ` requires the shd to be enabled. The error message 
that you get is inappropriate and is being fixed.


When I enabled the self heal daemon, I was able to start healing, and 
the files were actually healed. What does self-heal daemon do in 
addition to the automated healing when you read the file?



The lookup/read code-path doesn't seem to be considering a file with 
only the afr.dirty xattr being non-zero as a candidate for heal (while 
the self heal-daemon code-path does) . I'm not sure at this point if it 
should because just afr.dirty being set on all bricks without any  
trusted.afr.xxx-client-xxx being set doesn't seem to be something that 
should be hit under normal circumstances. I'll need to think about this 
more.




The original reason to disable self heal daemon was to be able to 
control the amount of resources used by the healing, because the 
"cluster.background-self-heal-count: 1" did not help very much and the 
amount of both network and disk io consumed was just extreme.


And I am also pretty sure we have seen similar problem (not sure about 
the attributes) before we disabled the shd.






When I try to just md5sum the file, to trigger automated healing on 
file manipulation, I get the result, but the file is not healed 
anyway. This usually works when I do not get 3 entries for the same 
file in the heal info.


Is the file size for 99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4 
non-zero on the 2 data bricks (i.e. on 10.10.27.11 and 10.10.27.10) 
and do they match?
Do the md5sums match with what you got on the mount when you 
calculate it directly on these bricks?


The file has non-zero size on both the data bricks, and the md5 sum 
was the same on both of them before they were healed, after the 
healing (enabling the shd and healing start) the md5 did not change on 
either of the data bricks. Mount point reports the same md5 as all the 
other attempts directly on the bricks. So what is actually happening 
there? Why was the file blamed (not unblamed after healing?)?


That means there was no real heal pending. But because the dirty xattr 
was set, the shd picked up a brick as a source and did the heal anyway. 
We would need to find how we ended in the 'only afr.dirty xattr was set' 
state for the file.


-Ravi


Thanks for your answers,
Pavel



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Why not all of my NFS servers are online?

2016-11-23 Thread Jin Li
Hi all,

I have the following volume, all bricks are connected fine, but why
not all of the NFS servers are online? (Two of them are offline). Will
the offline NFS servers cause trouble? How could I make sure they are
online? Thanks.

Please find my volume status.


$ sudo gluster volume status gvd
Status of volume: gvd
Gluster process Port Online Pid
--
Brick rigel:/mnt/raid6/glusterfs_distributed_export 49155 Y 5553
Brick betelgeuse:/mnt/raid6/glusterfs_distributed_expor
t 49154 Y 5223
Brick polaris:/mnt/raid6/glusterfs_distributed_export 49157 Y 5196
Brick capella:/mnt/raid6/glusterfs_distributed_export 49157 Y 4883
Brick eridani:/mnt/raid6/glusterfs_distributed_export 49152 Y 3858
Brick sargas:/mnt/raid6/glusterfs_distributed_export 49152 Y 3886
Brick toliman:/mnt/raid6/glusterfs_distributed_export 49207 Y 3881
Brick alphard:/mnt/raid6/glusterfs_distributed_export 49264 Y 3908
NFS Server on localhost 2049 Y 4898
NFS Server on eridani 2049 Y 3870
NFS Server on alphard 2049 Y 3920
NFS Server on sargas 2049 Y 3898
NFS Server on toliman 2049 Y 3893
NFS Server on 172.17.1.1 N/A N N/A
NFS Server on betelgeuse 2049 Y 5235
NFS Server on polaris N/A N N/A


Best regards,
Jin
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Files won't heal, although no obvious problem visible

2016-11-23 Thread Pavel Cernohorsky

Hello, thanks for your reply, answers are in the text.

On 11/23/2016 11:55 AM, Ravishankar N wrote:

On 11/23/2016 03:56 PM, Pavel Cernohorsky wrote:
The "hot-client-21" is, based on the vol-file, the following of the 
bricks:

option remote-subvolume /opt/data/hdd5/gluster
option remote-host 10.10.27.11

I have self healing daemon disabled, but when I try to trigger 
healing manually (gluster volume heal ), I get: "Launching 
heal operation to perform index self heal on volume  has 
been unsuccessful on bricks that are down. Please check if all brick 
processes are running.", although all the bricks are online (gluster 
volume status ).


Can you enable the self-heal daemon  and try again ?  `gluster volume 
heal ` requires the shd to be enabled. The error message that 
you get is inappropriate and is being fixed.


When I enabled the self heal daemon, I was able to start healing, and 
the files were actually healed. What does self-heal daemon do in 
addition to the automated healing when you read the file?


The original reason to disable self heal daemon was to be able to 
control the amount of resources used by the healing, because the 
"cluster.background-self-heal-count: 1" did not help very much and the 
amount of both network and disk io consumed was just extreme.


And I am also pretty sure we have seen similar problem (not sure about 
the attributes) before we disabled the shd.






When I try to just md5sum the file, to trigger automated healing on 
file manipulation, I get the result, but the file is not healed 
anyway. This usually works when I do not get 3 entries for the same 
file in the heal info.


Is the file size for 99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4 
non-zero on the 2 data bricks (i.e. on 10.10.27.11 and 10.10.27.10) 
and do they match?
Do the md5sums match with what you got on the mount when you calculate 
it directly on these bricks?


The file has non-zero size on both the data bricks, and the md5 sum was 
the same on both of them before they were healed, after the healing 
(enabling the shd and healing start) the md5 did not change on either of 
the data bricks. Mount point reports the same md5 as all the other 
attempts directly on the bricks. So what is actually happening there? 
Why was the file blamed (not unblamed after healing?)?


Thanks for your answers,
Pavel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Files won't heal, although no obvious problem visible

2016-11-23 Thread Ravishankar N

On 11/23/2016 03:56 PM, Pavel Cernohorsky wrote:
The "hot-client-21" is, based on the vol-file, the following of the 
bricks:

option remote-subvolume /opt/data/hdd5/gluster
option remote-host 10.10.27.11

I have self healing daemon disabled, but when I try to trigger healing 
manually (gluster volume heal ), I get: "Launching heal 
operation to perform index self heal on volume  has been 
unsuccessful on bricks that are down. Please check if all brick 
processes are running.", although all the bricks are online (gluster 
volume status ).


Can you enable the self-heal daemon  and try again ?  `gluster volume 
heal ` requires the shd to be enabled. The error message that 
you get is inappropriate and is being fixed.




When I try to just md5sum the file, to trigger automated healing on 
file manipulation, I get the result, but the file is not healed 
anyway. This usually works when I do not get 3 entries for the same 
file in the heal info.


Is the file size for 99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4 non-zero 
on the 2 data bricks (i.e. on 10.10.27.11 and 10.10.27.10) and do they 
match?
Do the md5sums match with what you got on the mount when you calculate 
it directly on these bricks?


Thanks,
Ravi



Any clues? What am I doing wrong?

Kind regards,
Pavel 



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Files won't heal, although no obvious problem visible

2016-11-23 Thread Pavel Cernohorsky
Hello, I have Gluster 3.8.5-1.fc24 with replica 3 arbiter 1 volume, 
where gluster heal  info reports (simplified):


Brick 10.10.27.11:/opt/data/hdd5/gluster
/assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4
Status: Connected

Brick 10.10.27.10:/opt/data/hdd6/gluster
/assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4
Status: Connected

Brick 10.10.27.12:/opt/data/ssd/arbiter8
/assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4
Status: Connected

Extended attributes of those files on the bricks are (in the same order 
of bricks):


10.10.27.11:
getfattr -m . -d -e hex 
assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4

# file: assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x0001
trusted.bit-rot.version=0x0200583561b500050a4e
trusted.gfid=0x5d2793f9b2a74514937ceb1a3bca3e1f

10.10.27.10:
getfattr -m . -d -e hex 
assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4

# file: assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x0001
trusted.afr.hot-client-21=0x
trusted.bit-rot.version=0x0200583558e2000b1457
trusted.gfid=0x5d2793f9b2a74514937ceb1a3bca3e1f

10.10.27.12:
getfattr -m . -d -e hex 
assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4

# file: assets/1/286381384/99705_544c0cd369a84ebcaf095b4a9f6d682a.mp4
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x0001
trusted.afr.hot-client-21=0x
trusted.bit-rot.version=0x0200583440580005944b
trusted.gfid=0x5d2793f9b2a74514937ceb1a3bca3e1f

The "hot-client-21" is, based on the vol-file, the following of the bricks:
option remote-subvolume /opt/data/hdd5/gluster
option remote-host 10.10.27.11

I have self healing daemon disabled, but when I try to trigger healing 
manually (gluster volume heal ), I get: "Launching heal 
operation to perform index self heal on volume  has been 
unsuccessful on bricks that are down. Please check if all brick 
processes are running.", although all the bricks are online (gluster 
volume status ).


When I try to just md5sum the file, to trigger automated healing on file 
manipulation, I get the result, but the file is not healed anyway. This 
usually works when I do not get 3 entries for the same file in the heal 
info.


Any clues? What am I doing wrong?

Kind regards,
Pavel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] question about "All subvolumes are down"

2016-11-23 Thread songxin
Hi everyone,


I create a replicate volume using two nodes,A board and B board.
A board ip:10.32.1.144
B board ip:10.32.0.48


One brick and mount point is on A board
Another brick is on B board




I found that I can't access the mount point because the disconnection happen 
between client and two server.


The client log is below.(Please notice the red line)


[2016-10-31 04:04:08.063543] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-c_glusterfs-client-9: 
Connected to c_glusterfs-client-9, attached to remote volume 
'/opt/lvmdir/c2/brick'.


...
[2016-10-31 04:06:03.626047] W [socket.c:588:__socket_rwv] 
0-c_glusterfs-client-9: readv on 10.32.1.144:49391 failed (Connection reset by 
peer)


[2016-10-31 04:06:03.627345] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (
 --> 
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (
 --> 
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (
 --> 
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] (
 --> 
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) 
 
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) 
 
op(FINODELK(30)) called at 2016-10-31 04:06:03.626033 (xid=0x7f5e)






[2016-10-31 04:06:03.627395] E [MSGID: 114031] 
[client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: remote 
operation failed [Transport endpoint is not connected]
[2016-10-31 04:06:03.628381] I [socket.c:3308:socket_submit_request] 
0-c_glusterfs-client-9: not connected (priv->connected = 0)


[2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit] 
0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f Program: 
GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (c_glusterfs-client-9)


[2016-10-31 04:06:03.628466] E [MSGID: 114031] 
[client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: remote 
operation failed [Transport endpoint is not connected]


[2016-10-31 04:06:03.628475] I [MSGID: 108019] 
[afr-lk-common.c:1086:afr_lock_blocking] 0-c_glusterfs-replicate-0: unable to 
lock on even one child
[2016-10-31 04:06:03.628539] I [MSGID: 108019] 
[afr-transaction.c:1224:afr_post_blocking_inodelk_cbk] 
0-c_glusterfs-replicate-0: Blocking inodelks failed.


[2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk] 
0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is not connected)


[2016-10-31 04:06:03.629149] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (
 --> 
/usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (
 --> 
/usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (
 --> 
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] (
 --> 
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) 
 
0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) 
 
op(LOOKUP(27)) called at 2016-10-31 04:06:03.624346 (xid=0x7f5a)


[2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 
0-c_glusterfs-client-9: changing port to 49391 (from 0)


[2016-10-31 04:06:03.629210] W [MSGID: 114031] 
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote 
operation failed. Path: /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews 
(b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected]


[2016-10-31 04:06:03.629266] I [socket.c:3308:socket_submit_request] 
0-c_glusterfs-client-9: not connected (priv->connected = 255)


[2016-10-31 04:06:03.629277] I [MSGID: 109063] 
[dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies in 
/loadmodules_norepl/CXC1725605_P93A001/cello/emasviews (gfid = 
b0e5a94e-a432-4dce-b86f-a551555780a2). Holes=1 overlaps=0
[2016-10-31 04:06:03.629293] W [rpc-clnt.c:1586:rpc_clnt_submit] 
0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f62 Program: 
GlusterFS 3.3, ProgVers: 330, Proc: 41) to rpc-transport (c_glusterfs-client-9)
[2016-10-31 04:06:03.629333] W [fuse