Hi, Pranith Kumar, Can you tell me while need set buf->ia_nlink to “0”in function gf_zero_fill_stat(), which API or Application will care it? If I remove this line and also update corresponding in function gf_is_zero_filled_stat, The issue seems gone, but I can’t confirm will lead to other issues.
So could you please double check it and give your comments? My change is as the below: gf_boolean_t gf_is_zero_filled_stat (struct iatt *buf) { if (!buf) return 1; /* Do not use st_dev because it is transformed to store the xlator id * in place of the device number. Do not use st_ino because by this time * we've already mapped the root ino to 1 so it is not guaranteed to be * 0. */ // if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0)) if (buf->ia_ctime == ) return 1; return 0; } void gf_zero_fill_stat (struct iatt *buf) { // buf->ia_nlink = 0; buf->ia_ctime = 0; } Thanks & Best Regards George From: Lian, George (NSB - CN/Hangzhou) Sent: Friday, January 19, 2018 10:03 AM To: Pranith Kumar Karampuri <pkara...@redhat.com>; Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com> Cc: Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com>; Gluster-devel@gluster.org; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com> Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" Hi, >>> Cool, this works for me too. Send me a mail off-list once you are available >>> and we can figure out a way to get into a call and work on this. Have you reproduced the issue per the step I listed in https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail? If not, I would like you could try it yourself , which the difference between yours and mine is just create only 2 bricks instead of 6 bricks. And Cynthia could have a session with you if you needed when I am not available in next Monday and Tuesday. Thanks & Best Regards, George From: gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> [mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri Sent: Thursday, January 18, 2018 6:03 PM To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping....@nokia-sbell.com>> Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote: Hi, >>>I actually tried it with replica-2 and replica-3 and then distributed >>>replica-2 before replying to the earlier mail. We can have a debugging >>>session if you are okay with it. It is fine if you can’t reproduce the issue in your ENV. And I has attached the detail reproduce log in the Bugzilla FYI But I am sorry I maybe OOO at Monday and Tuesday next week, so debug session will be fine to me at next Wednesday. Cool, this works for me too. Send me a mail off-list once you are available and we can figure out a way to get into a call and work on this. Paste the detail reproduce log FYI here: root@ubuntu:~# gluster peer probe ubuntu peer probe: success. Probe on localhost not needed root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 ubuntu:/home/gfs/b2 force volume create: test: success: please start the volume to access data root@ubuntu:~# gluster v start test volume start: test: success root@ubuntu:~# gluster v info test Volume Name: test Type: Replicate Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: ubuntu:/home/gfs/b1 Brick2: ubuntu:/home/gfs/b2 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off root@ubuntu:~# gluster v status Status of volume: test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ubuntu:/home/gfs/b1 49152 0 Y 7798 Brick ubuntu:/home/gfs/b2 49153 0 Y 7818 Self-heal Daemon on localhost N/A N/A Y 7839 Task Status of Volume test ------------------------------------------------------------------------------ There are no active volume tasks root@ubuntu:~# gluster v set test cluster.consistent-metadata on volume set: success root@ubuntu:~# ls /mnt/test ls: cannot access '/mnt/test': No such file or directory root@ubuntu:~# mkdir -p /mnt/test root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test root@ubuntu:~# cd /mnt/test root@ubuntu:/mnt/test# echo "abc">aaa root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc root@ubuntu:/mnt/test# kill -9 7818 root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee link: cannot create link 'eee' to 'ddd': No such file or directory Best Regards, George From: gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> [mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>] On Behalf Of Pranith Kumar Karampuri Sent: Thursday, January 18, 2018 2:40 PM To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping....@nokia-sbell.com>> Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote: Hi, I suppose the brick numbers in your testing is six, and you just shut down the 3 process. When I reproduce the issue, I only create a replicate volume with 2 bricks, only let ONE brick working and set cluster.consistent-metadata on, With this 2 test condition, the issue could 100% reproducible. Hi, I actually tried it with replica-2 and replica-3 and then distributed replica-2 before replying to the earlier mail. We can have a debugging session if you are okay with it. I am in the middle of a customer issue myself(That is the reason for this delay :-( ) and thinking of wrapping it up early next week. Would that be fine with you? 16:44:28 :) ⚡ gluster v status Status of volume: r2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick localhost.localdomain:/home/gfs/r2_0 49152 0 Y 5309 Brick localhost.localdomain:/home/gfs/r2_1 49154 0 Y 5330 Brick localhost.localdomain:/home/gfs/r2_2 49156 0 Y 5351 Brick localhost.localdomain:/home/gfs/r2_3 49158 0 Y 5372 Brick localhost.localdomain:/home/gfs/r2_4 49159 0 Y 5393 Brick localhost.localdomain:/home/gfs/r2_5 49160 0 Y 5414 Self-heal Daemon on localhost N/A N/A Y 5436 Task Status of Volume r2 ------------------------------------------------------------------------------ There are no active volume tasks root@dhcp35-190 - ~ 16:44:38 :) ⚡ kill -9 5309 5351 5393 Best Regards, George From: gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> [mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>] On Behalf Of Pranith Kumar Karampuri Sent: Wednesday, January 17, 2018 7:27 PM To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> Cc: Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping....@nokia-sbell.com>> Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri <pkara...@redhat.com<mailto:pkara...@redhat.com>> wrote: On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote: Hi, Have you reproduced this issue? If yes, could you please confirm whether it is an issue or not? Hi, I tried recreating this on my laptop and on both master and 3.12 and I am not able to recreate the issue :-(. Here is the execution log: https://paste.fedoraproject.org/paste/-csXUKrwsbrZAVW1KzggQQ Since I was doing this on my laptop, I changed shutting down of the replica to killing the brick process to simulate this test. Let me know if I missed something. Sorry, I am held up with some issue at work, so I think I will get some time day after tomorrow to look at this. In the mean time I am adding more people who know about afr to see if they get a chance to work on this before me. And if it is an issue, do you have any solution for this issue? Thanks & Best Regards, George From: Lian, George (NSB - CN/Hangzhou) Sent: Thursday, January 11, 2018 2:01 PM To: Pranith Kumar Karampuri <pkara...@redhat.com<mailto:pkara...@redhat.com>> Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping....@nokia-sbell.com>> Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" Hi, Please see detail test step on https://bugzilla.redhat.com/show_bug.cgi?id=1531457 How reproducible: Steps to Reproduce: 1.create a volume name "test" with replicated 2.set volume option cluster.consistent-metadata with on: gluster v set test cluster.consistent-metadata on 3. mount volume test on client on /mnt/test 4. create a file aaa size more than 1 byte echo "1234567890" >/mnt/test/aaa 5. shutdown a replicat node, let's say sn-1, only let sn-0 worked 6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc BRs George From: gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> [mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri Sent: Thursday, January 11, 2018 12:39 PM To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping....@nokia-sbell.com>> Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote: Hi, >>> In which protocol are you seeing this issue? Fuse/NFS/SMB? It is fuse, within mountpoint by “mount -t glusterfs …“ command. Could you let me know the test you did so that I can try to re-create and see what exactly is going on? Configuration of the volume and the steps to re-create the issue you are seeing would be helpful in debugging the issue further. Thanks & Best Regards, George From: gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> [mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>] On Behalf Of Pranith Kumar Karampuri Sent: Wednesday, January 10, 2018 8:08 PM To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Zhong, Hua (NSB - CN/Hangzhou) <hua.zh...@nokia-sbell.com<mailto:hua.zh...@nokia-sbell.com>>; Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping....@nokia-sbell.com>> Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes" On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote: Hi, Pranith Kumar, I has create a bug on Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1531457 After my investigation for this link issue, I suppose your changes on afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix is like: -------------------------------------- if (afr_txn_nothing_failed (frame, this)) { /*if it did pre-op, it will do post-op changing ctime*/ if (priv->consistent_metadata && afr_needs_changelog_update (local)) afr_zero_fill_stat (local); local->transaction.unwind (frame, this); } In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is set to “on”. And hard link a file with which just created will lead to an error, and the error is caused in kernel function “vfs_link”: if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) error = -ENOENT; could you please have a check and give some comments here? When stat is "zero filled", understanding is that the higher layer protocol doesn't send stat value to the kernel and a separate lookup is sent by the kernel to get the latest stat value. In which protocol are you seeing this issue? Fuse/NFS/SMB? Thanks & Best Regards, George -- Pranith -- Pranith -- Pranith -- Pranith -- Pranith -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel