Based on what I know of the workflow, there is no update. There is no
bug report in bugzilla so there are no patches in review for it.
On 03/27/2017 10:59 AM, Mahdi Adnan wrote:
Hi,
Do you guys have any update regarding this issue ?
--
Respectfully*
**Mahdi A. Mahdi*
------------------------------------------------------------------------
*From:* Krutika Dhananjay <kdhan...@redhat.com>
*Sent:* Tuesday, March 21, 2017 3:02:55 PM
*To:* Mahdi Adnan
*Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
gluster-users@gluster.org List
*Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Hi,
So it looks like Satheesaran managed to recreate this issue. We will
be seeking his help in debugging this. It will be easier that way.
-Krutika
On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.ad...@outlook.com
<mailto:mahdi.ad...@outlook.com>> wrote:
Hello and thank you for your email.
Actually no, i didn't check the gfid of the vms.
If this will help, i can setup a new test cluster and get all the
data you need.
Get Outlook for Android <https://aka.ms/ghei36>
From: Nithya Balachandran
Sent: Monday, March 20, 20:57
Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
To: Krutika Dhananjay
Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
gluster-users@gluster.org <mailto:gluster-users@gluster.org> List
Hi,
Do you know the GFIDs of the VM images which were corrupted?
Regards,
Nithya
On 20 March 2017 at 20:37, Krutika Dhananjay <kdhan...@redhat.com
<mailto:kdhan...@redhat.com>> wrote:
I looked at the logs.
From the time the new graph (since the add-brick command you
shared where bricks 41 through 44 are added) is switched to (line
3011 onwards in nfs-gfapi.log), I see the following kinds of errors:
1. Lookups to a bunch of files failed with ENOENT on both
replicas which protocol/client converts to ESTALE. I am guessing
these entries got migrated to
other subvolumes leading to 'No such file or directory' errors.
DHT and thereafter shard get the same error code and log the
following:
0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task]
17-vmware2-dht: <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>:
failed to lookup the file on vmware2-dht [Stale file handle]
1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
[shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat
failed: a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
which is fine.
2. The other kind are from AFR logging of possible split-brain
which I suppose are harmless too.
[2017-03-17 14:23:36.968883] W [MSGID: 108008]
[afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13:
Unreadable subvolume -1 found with event generation 2 for gfid
74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
Since you are saying the bug is hit only on VMs that are
undergoing IO while rebalance is running (as opposed to those
that remained powered off),
rebalance + IO could be causing some issues.
CC'ing DHT devs
Raghavendra/Nithya/Susant,
Could you take a look?
-Krutika
On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan
<mahdi.ad...@outlook.com <mailto:mahdi.ad...@outlook.com>> wrote:
Thank you for your email mate.
Yes, im aware of this but, to save costs i chose replica 2, this
cluster is all flash.
In version 3.7.x i had issues with ping timeout, if one hosts
went down for few seconds the whole cluster hangs and become
unavailable, to avoid this i adjusted the ping timeout to 5 seconds.
As for choosing Ganesha over gfapi, VMWare does not support
Gluster (FUSE or gfapi) im stuck with NFS for this volume.
The other volume is mounted using gfapi in oVirt cluster.
--
Respectfully
*Mahdi A. Mahdi*
*From:*Krutika Dhananjay <kdhan...@redhat.com
<mailto:kdhan...@redhat.com>>
*Sent:*Sunday, March 19, 2017 2:01:49 PM
*To:* Mahdi Adnan
*Cc:* gluster-users@gluster.org <mailto:gluster-users@gluster.org>
*Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
While I'm still going through the logs, just wanted to point out
a couple of things:
1. It is recommended that you use 3-way replication (replica
count 3) for VM store use case
2. network.ping-timeout at 5 seconds is way too low. Please
change it to 30.
Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?
Will get back with anything else I might find or more questions
if I have any.
-Krutika
On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan
<mahdi.ad...@outlook.com <mailto:mahdi.ad...@outlook.com>> wrote:
Thanks mate,
Kindly, check the attachment.
--
Respectfully
*Mahdi A. Mahdi*
*From:*Krutika Dhananjay <kdhan...@redhat.com
<mailto:kdhan...@redhat.com>>
*Sent:*Sunday, March 19, 2017 10:00:22 AM
*To:* Mahdi Adnan
*Cc:* gluster-users@gluster.org <mailto:gluster-users@gluster.org>
*Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
In that case could you share the ganesha-gfapi logs?
-Krutika
On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan
<mahdi.ad...@outlook.com <mailto:mahdi.ad...@outlook.com>> wrote:
I have two volumes, one is mounted using libgfapi for ovirt
mount, the other one is exported via NFS-Ganesha for VMWare
which is the one im testing now.
--
Respectfully
*Mahdi A. Mahdi*
*From:*Krutika Dhananjay <kdhan...@redhat.com
<mailto:kdhan...@redhat.com>>
*Sent:*Sunday, March 19, 2017 8:02:19 AM
*To:* Mahdi Adnan
*Cc:* gluster-users@gluster.org <mailto:gluster-users@gluster.org>
*Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan
<mahdi.ad...@outlook.com <mailto:mahdi.ad...@outlook.com>> wrote:
Kindly, check the attached new log file, i dont know if it's
helpful or not but, i couldn't find the log with the name you
just described.
No. Are you using FUSE or libgfapi for accessing the volume?
Or is it NFS?
-Krutika
--
Respectfully
*Mahdi A. Mahdi*
*From:*Krutika Dhananjay <kdhan...@redhat.com
<mailto:kdhan...@redhat.com>>
*Sent:*Saturday, March 18, 2017 6:10:40 PM
*To:* Mahdi Adnan
*Cc:* gluster-users@gluster.org
<mailto:gluster-users@gluster.org>
*Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
mnt-disk11-vmware2.log seems like a brick log. Could you
attach the fuse mount logs? It should be right under
/var/log/glusterfs/ directory
named after the mount point name, only hyphenated.
-Krutika
On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan
<mahdi.ad...@outlook.com <mailto:mahdi.ad...@outlook.com>> wrote:
Hello Krutika,
Kindly, check the attached logs.
--
Respectfully
*Mahdi A. Mahdi*
*From:*Krutika Dhananjay <kdhan...@redhat.com
<mailto:kdhan...@redhat.com>>
*Sent:*Saturday, March 18, 2017 3:29:03 PM
*To:*Mahdi Adnan
*Cc:*gluster-users@gluster.org
<mailto:gluster-users@gluster.org>
*Subject:*Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
Hi Mahdi,
Could you attach mount, brick and rebalance logs?
-Krutika
On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan
<mahdi.ad...@outlook.com <mailto:mahdi.ad...@outlook.com>>
wrote:
Hi,
I have upgraded to Gluster 3.8.10 today and ran the
add-brick procedure in a volume contains few VMs.
After the completion of rebalance, i have rebooted the VMs,
some of ran just fine, and others just crashed.
Windows boot to recovery mode and Linux throw xfs errors
and does not boot.
I ran the test again and it happened just as the first one,
but i have noticed only VMs doing disk IOs are affected by
this bug.
The VMs in power off mode started fine and even md5 of the
disk file did not change after the rebalance.
anyone else can confirm this ?
Volume info:
Volume Name: vmware2
Type: Distributed-Replicate
Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
Status: Started
Snapshot Count: 0
Number of Bricks: 22 x 2 = 44
Transport-type: tcp
Bricks:
Brick1: gluster01:/mnt/disk1/vmware2
Brick2: gluster03:/mnt/disk1/vmware2
Brick3: gluster02:/mnt/disk1/vmware2
Brick4: gluster04:/mnt/disk1/vmware2
Brick5: gluster01:/mnt/disk2/vmware2
Brick6: gluster03:/mnt/disk2/vmware2
Brick7: gluster02:/mnt/disk2/vmware2
Brick8: gluster04:/mnt/disk2/vmware2
Brick9: gluster01:/mnt/disk3/vmware2
Brick10: gluster03:/mnt/disk3/vmware2
Brick11: gluster02:/mnt/disk3/vmware2
Brick12: gluster04:/mnt/disk3/vmware2
Brick13: gluster01:/mnt/disk4/vmware2
Brick14: gluster03:/mnt/disk4/vmware2
Brick15: gluster02:/mnt/disk4/vmware2
Brick16: gluster04:/mnt/disk4/vmware2
Brick17: gluster01:/mnt/disk5/vmware2
Brick18: gluster03:/mnt/disk5/vmware2
Brick19: gluster02:/mnt/disk5/vmware2
Brick20: gluster04:/mnt/disk5/vmware2
Brick21: gluster01:/mnt/disk6/vmware2
Brick22: gluster03:/mnt/disk6/vmware2
Brick23: gluster02:/mnt/disk6/vmware2
Brick24: gluster04:/mnt/disk6/vmware2
Brick25: gluster01:/mnt/disk7/vmware2
Brick26: gluster03:/mnt/disk7/vmware2
Brick27: gluster02:/mnt/disk7/vmware2
Brick28: gluster04:/mnt/disk7/vmware2
Brick29: gluster01:/mnt/disk8/vmware2
Brick30: gluster03:/mnt/disk8/vmware2
Brick31: gluster02:/mnt/disk8/vmware2
Brick32: gluster04:/mnt/disk8/vmware2
Brick33: gluster01:/mnt/disk9/vmware2
Brick34: gluster03:/mnt/disk9/vmware2
Brick35: gluster02:/mnt/disk9/vmware2
Brick36: gluster04:/mnt/disk9/vmware2
Brick37: gluster01:/mnt/disk10/vmware2
Brick38: gluster03:/mnt/disk10/vmware2
Brick39: gluster02:/mnt/disk10/vmware2
Brick40: gluster04:/mnt/disk10/vmware2
Brick41: gluster01:/mnt/disk11/vmware2
Brick42: gluster03:/mnt/disk11/vmware2
Brick43: gluster02:/mnt/disk11/vmware2
Brick44: gluster04:/mnt/disk11/vmware2
Options Reconfigured:
cluster.server-quorum-type: server
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
features.shard: on
cluster.data-self-heal-algorithm: full
features.cache-invalidation: on
ganesha.enable: on
features.shard-block-size: 256MB
client.event-threads: 2
server.event-threads: 2
cluster.favorite-child-policy: size
storage.build-pgfid: off
network.ping-timeout: 5
cluster.enable-shared-storage: enable
nfs-ganesha: enable
cluster.server-quorum-ratio: 51%
Adding bricks:
gluster volume add-brick vmware2 replica 2
gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2
gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2
starting fix layout:
gluster volume rebalance vmware2 fix-layout start
Starting rebalance:
gluster volume rebalance vmware2 start
--
Respectfully
*Mahdi A. Mahdi*
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users