On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) < george.l...@nokia-sbell.com> wrote:
> Hi, > > I suppose the brick numbers in your testing is six, and you just shut down > the 3 process. > > When I reproduce the issue, I only create a replicate volume with 2 > bricks, only let ONE brick working and set cluster.consistent-metadata on, > > With this 2 test condition, the issue could 100% reproducible. > Hi, I actually tried it with replica-2 and replica-3 and then distributed replica-2 before replying to the earlier mail. We can have a debugging session if you are okay with it. I am in the middle of a customer issue myself(That is the reason for this delay :-( ) and thinking of wrapping it up early next week. Would that be fine with you? > > > > > > > 16:44:28 :) ⚡ gluster v status > > Status of volume: r2 > > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------ > ------------------ > > Brick localhost.localdomain:/home/gfs/r2_0 49152 0 Y > 5309 > > Brick localhost.localdomain:/home/gfs/r2_1 49154 0 Y > 5330 > > Brick localhost.localdomain:/home/gfs/r2_2 49156 0 Y > 5351 > > Brick localhost.localdomain:/home/gfs/r2_3 49158 0 Y > 5372 > > Brick localhost.localdomain:/home/gfs/r2_4 49159 0 Y > 5393 > > Brick localhost.localdomain:/home/gfs/r2_5 49160 0 Y > 5414 > > Self-heal Daemon on localhost N/A N/A Y > 5436 > > > > Task Status of Volume r2 > > ------------------------------------------------------------ > ------------------ > > There are no active volume tasks > > > > root@dhcp35-190 - ~ > > 16:44:38 :) ⚡ kill -9 5309 5351 5393 > > > > Best Regards, > > George > > *From:* gluster-devel-boun...@gluster.org [mailto:gluster-devel-bounces@ > gluster.org] *On Behalf Of *Pranith Kumar Karampuri > *Sent:* Wednesday, January 17, 2018 7:27 PM > *To:* Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com> > *Cc:* Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com>; > Gluster-devel@gluster.org; Zhou, Cynthia (NSB - CN/Hangzhou) < > cynthia.z...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) < > ping....@nokia-sbell.com> > > *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix > " Don't let NFS cache stat after writes" > > > > > > > > On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > > > > > > On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) < > george.l...@nokia-sbell.com> wrote: > > Hi, > > > > Have you reproduced this issue? If yes, could you please confirm whether > it is an issue or not? > > > > Hi, > > I tried recreating this on my laptop and on both master and 3.12 > and I am not able to recreate the issue :-(. > > Here is the execution log: https://paste.fedoraproject.org/paste/- > csXUKrwsbrZAVW1KzggQQ > > Since I was doing this on my laptop, I changed shutting down of the > replica to killing the brick process to simulate this test. > > Let me know if I missed something. > > > > > > Sorry, I am held up with some issue at work, so I think I will get some > time day after tomorrow to look at this. In the mean time I am adding more > people who know about afr to see if they get a chance to work on this > before me. > > > > > > And if it is an issue, do you have any solution for this issue? > > > > Thanks & Best Regards, > > George > > > > *From:* Lian, George (NSB - CN/Hangzhou) > *Sent:* Thursday, January 11, 2018 2:01 PM > *To:* Pranith Kumar Karampuri <pkara...@redhat.com> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; > Gluster-devel@gluster.org; Li, Deqian (NSB - CN/Hangzhou) < > deqian...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) < > ping....@nokia-sbell.com> > *Subject:* RE: [Gluster-devel] a link issue maybe introduced in a bug fix > " Don't let NFS cache stat after writes" > > > > Hi, > > > > Please see detail test step on https://bugzilla.redhat.com/ > show_bug.cgi?id=1531457 > > > > How reproducible: > > > > > > Steps to Reproduce: > > 1.create a volume name "test" with replicated > > 2.set volume option cluster.consistent-metadata with on: > > gluster v set test cluster.consistent-metadata on > > 3. mount volume test on client on /mnt/test > > 4. create a file aaa size more than 1 byte > > echo "1234567890" >/mnt/test/aaa > > 5. shutdown a replicat node, let's say sn-1, only let sn-0 worked > > 6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc > > > > > > BRs > > George > > > > *From:* gluster-devel-boun...@gluster.org [mailto:gluster-devel-bounces@ > gluster.org <gluster-devel-boun...@gluster.org>] *On Behalf Of *Pranith > Kumar Karampuri > *Sent:* Thursday, January 11, 2018 12:39 PM > *To:* Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; > Gluster-devel@gluster.org; Li, Deqian (NSB - CN/Hangzhou) < > deqian...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) < > ping....@nokia-sbell.com> > *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix > " Don't let NFS cache stat after writes" > > > > > > > > On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) < > george.l...@nokia-sbell.com> wrote: > > Hi, > > >>> In which protocol are you seeing this issue? Fuse/NFS/SMB? > > It is fuse, within mountpoint by “mount -t glusterfs …“ command. > > > > Could you let me know the test you did so that I can try to re-create and > see what exactly is going on? > > Configuration of the volume and the steps to re-create the issue you are > seeing would be helpful in debugging the issue further. > > > > > > Thanks & Best Regards, > > George > > > > *From:* gluster-devel-boun...@gluster.org [mailto:gluster-devel-bounces@ > gluster.org] *On Behalf Of *Pranith Kumar Karampuri > *Sent:* Wednesday, January 10, 2018 8:08 PM > *To:* Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com> > *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; > Zhong, Hua (NSB - CN/Hangzhou) <hua.zh...@nokia-sbell.com>; Li, Deqian > (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com>; Gluster-devel@gluster.org; > Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com> > *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix > " Don't let NFS cache stat after writes" > > > > > > > > On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) < > george.l...@nokia-sbell.com> wrote: > > Hi, Pranith Kumar, > > > > I has create a bug on Bugzilla https://bugzilla.redhat.com/ > show_bug.cgi?id=1531457 > > After my investigation for this link issue, I suppose your changes on > afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your > fix is like: > > -------------------------------------- > > if (afr_txn_nothing_failed (frame, this)) { > > /*if it did pre-op, it will do post-op changing > ctime*/ > > if (priv->consistent_metadata && > > afr_needs_changelog_update (local)) > > afr_zero_fill_stat (local); > > local->transaction.unwind (frame, this); > > } > > In the above fix, it set the ia_nlink to ‘0’ if option > consistent-metadata is set to “on”. > > And hard link a file with which just created will lead to an error, and > the error is caused in kernel function “vfs_link”: > > if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) > > error = -ENOENT; > > > > could you please have a check and give some comments here? > > > > When stat is "zero filled", understanding is that the higher layer > protocol doesn't send stat value to the kernel and a separate lookup is > sent by the kernel to get the latest stat value. In which protocol are you > seeing this issue? Fuse/NFS/SMB? > > > > > > Thanks & Best Regards, > > George > > > > > -- > > Pranith > > > > > -- > > Pranith > > > > > -- > > Pranith > > > > > -- > > Pranith > -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel