Here's the fix for the issue - http://review.gluster.org/#/c/15788/1
Also with this fix, there were no errors when a brick was replaced while IO inside vms was in progress. Also tested add-brick to convert 1x3 replicate volume to a 2x3 dis-rep volume. Everything worked just fine. -Krutika On Wed, Nov 2, 2016 at 7:00 PM, Krutika Dhananjay <kdhan...@redhat.com> wrote: > Just finished testing VM storage use-case. > > *Volume configuration used:* > > [root@srv-1 ~]# gluster volume info > > Volume Name: rep > Type: Replicate > Volume ID: 2c603783-c1da-49b7-8100-0238c777b731 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: srv-1:/bricks/rep1 > Brick2: srv-2:/bricks/rep2 > Brick3: srv-3:/bricks/rep4 > Options Reconfigured: > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > features.shard: on > cluster.granular-entry-heal: on > cluster.locking-scheme: granular > network.ping-timeout: 30 > server.allow-insecure: on > storage.owner-uid: 107 > storage.owner-gid: 107 > cluster.data-self-heal-algorithm: full > > Used FUSE to mount the volume locally on each of the 3 nodes (no external > clients). > shard-block-size - 4MB. > > *TESTS AND RESULTS:* > > *What works:* > > * Created 3 vm images, one per hypervisor. Installed fedora 24 on all of > them. > Used virt-manager for ease of setting up the environment. Installation > went fine. All green. > > * Rebooted the vms. Worked fine. > > * Killed brick-1. Ran dd on the three vms to create a 'src' file. Captured > their md5sum value. Verified that > the gfid indices and name indices are created under > .glusterfs/indices/xattrop and .glusterfs/indices/entry-changes > respectively as they should. Brought the brick back up. Waited until heal > completed. Captured md5sum again. They matched. > > * Killed brick-2. Copied 'src' file from the step above into new file > using dd. Captured md5sum on the newly created file. > Checksum matched. Waited for heal to finish. Captured md5sum again. > Everything matched. > > * Repeated the test above with brick-3 being killed and brought back up > after a while. Worked fine. > > At the end I also captured md5sums from the backend of the shards on the > three replicas. They all were found to be > in sync. So far so good. > > *What did NOT work:* > > * Started dd again on all 3 vms to copy the existing files to new files. > While dd was running, I ran replace-brick to replace the third brick with a > new brick on the same node with a different path. This caused dd on all > three vms to simultaneously fail with "Input/Output error". I tried to read > off the files, even that failed. Rebooted the vms. By this time, /.shard is > in > split-brain as per heal-info. And the vms seem to have suffered corruption > and are in an irrecoverable state. > > I checked the logs. The pattern is very much similar to the one in the > add-brick bug Lindsay reported here - https://bugzilla.redhat.com/ > show_bug.cgi?id=1387878. Seems like something is going wrong each time > there is a graph switch. > > @Aravinda and Pranith: > > I will need some time to debug this, if 3.9 release can wait until it is > RC'd and fixed. > Otherwise we will need to caution the users to not do replace-brick, > add-brick etc (or any form of graph switch for that matter) *might* cause > vm corruption, irrespective of whether the users are using FUSE or gfapi, > in 3.9.0. > > Let me know what your decision is. > > -Krutika > > > On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avish...@redhat.com> wrote: > >> Gluster 3.9.0rc2 tarball is available here >> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs- >> 3.9.0rc2.tar.gz >> >> regards >> Aravinda >> >> >> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote: >> >>> Hi, >>> >>> Since Automated test framework for Gluster is in progress, we need help >>> from Maintainers and developers to test the features and bug fixes to >>> release Gluster 3.9. >>> >>> In last maintainers meeting Shyam shared an idea about having a Test day >>> to accelerate the testing and release. >>> >>> Please participate in testing your component(s) on Oct 27, 2016. We will >>> prepare the rc2 build by tomorrow and share the details before Test day. >>> >>> RC1 Link: http://www.gluster.org/pipermail/maintainers/2016-September/ >>> 001442.html >>> Release Checklist: https://public.pad.fsfe.org/p/ >>> gluster-component-release-checklist >>> >>> >>> Thanks and Regards >>> Aravinda and Pranith >>> >>> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel