[Gluster-devel] glusterfs coredump

2019-01-22 Thread Lian, George (NSB - CN/Hangzhou)
Hi, GlusterFS expert,

We have encounter a coredump of client process “glusterfs” issue recently, and 
it could be reproduced more easy when the IO load and CPU/memory load is highly 
during stability testing.
Our glusterfs release is 3.12.2
I have copy the call trace of core dump as the below, and have some question, 
wish can get help from you.


1) Do you have encounter related issue? From the call trace, we could see 
the fd variable seems abnoram with the field “refcount” and “inode”,

For wb_inode->this, it become invalid value with 0xff00, did the 
value “0xff00”is some meaningful value? Because every coredump 
occurred, the value of inode->this is same with “0xff00”



2) When I check the source code, in function wb_enqueue_common, could find 
function __wb_request_unref used instead of  wb_request_unref, and we could see 
though function wb_request_unref is defined, but never used!  Firstly it seems 
some strange, secondly, in wb_request_unref, there are lock mechanism to avoid 
race condition, but __wb_request_unref without those mechanism, and we could 
see there are more occurrence called from __wb_request_unref, will it lead to 
race issue?

[Current thread is 1 (Thread 0x7f54e82a3700 (LWP 6078))]
(gdb) bt
#0  0x7f54e623197c in wb_fulfill (wb_inode=0x7f54d4066bd0, 
liabilities=0x7f54d0824440) at write-behind.c:1155
#1  0x7f54e6233662 in wb_process_queue (wb_inode=0x7f54d4066bd0) at 
write-behind.c:1728
#2  0x7f54e6234039 in wb_writev (frame=0x7f54d406d6c0, this=0x7f54e0014b10, 
fd=0x7f54d8019d70, vector=0x7f54d0018000, count=1, offset=33554431, 
flags=32770, iobref=0x7f54d021ec20, xdata=0x0)
at write-behind.c:1842
#3  0x7f54e6026fcb in du_writev_resume (ret=0, frame=0x7f54d0002260, 
opaque=0x7f54d0002260) at disk-usage.c:490
#4  0x7f54ece07160 in synctask_wrap () at syncop.c:377
#5  0x7f54eb3a2660 in ?? () from /lib64/libc.so.6
#6  0x in ?? ()
(gdb) p wb_inode
$6 = (wb_inode_t *) 0x7f54d4066bd0
(gdb) p wb_inode->this
$1 = (xlator_t *) 0xff00
(gdb) frame 1
#1  0x7f54e6233662 in wb_process_queue (wb_inode=0x7f54d4066bd0) at 
write-behind.c:1728
1728 in write-behind.c
(gdb) p wind_failure
$2 = 0
(gdb) p *wb_inode
$3 = {window_conf = 35840637416824320, window_current = 35840643167805440, 
transit = 35839681019027968, all = {next = 0xb000, prev = 0x7f54d4066bd000}, 
todo = {next = 0x7f54deadc0de00,
prev = 0x7f54e00489e000}, liability = {next = 0x7f5400a200, prev = 
0xb000}, temptation = {next = 0x7f54d4066bd000, prev = 0x7f54deadc0de00}, wip = 
{next = 0x7f54e00489e000, prev = 0x7f5400a200},
  gen = 45056, size = 35840591659782144, lock = {spinlock = 0, mutex = {__data 
= {__lock = 0, __count = 8344798, __owner = 0, __nusers = 8344799, __kind = 
41472, __spins = 21504, __elision = 127, __list = {
  __prev = 0xb000, __next = 0x7f54d4066bd000}},
  __size = 
"\000\000\000\000\336T\177\000\000\000\000\000\337T\177\000\000\242\000\000\000T\177\000\000\260\000\000\000\000\000\000\000\320k\006\324T\177",
 __align = 35840634501726208}},
  this = 0xff00, dontsync = -1}
(gdb) frame 2
#2  0x7f54e6234039 in wb_writev (frame=0x7f54d406d6c0, this=0x7f54e0014b10, 
fd=0x7f54d8019d70, vector=0x7f54d0018000, count=1, offset=33554431, 
flags=32770, iobref=0x7f54d021ec20, xdata=0x0)
at write-behind.c:1842
1842 in write-behind.c
(gdb) p fd
$4 = (fd_t *) 0x7f54d8019d70
(gdb) p *fd
$5 = {pid = 140002378149040, flags = -670836240, refcount = 32596, inode_list = 
{next = 0x7f54d8019d80, prev = 0x7f54d8019d80}, inode = 0x0, lock = {spinlock = 
-536740032, mutex = {__data = {
__lock = -536740032, __count = 32596, __owner = -453505333, __nusers = 
32596, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 
0x0}},
  __size = "@\377\001\340T\177\000\000\313\016\370\344T\177", '\000' 
, __align = 140002512207680}}, _ctx = 0x, xl_count = 
0, lk_ctx = 0x0, anonymous = (unknown: 3623984496)}
(gdb)


Thanks & Best Regards,
George

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] afr_set_transaction_flock lead to bad perforance when write with multi-pthread or multi-process

2018-08-10 Thread Lian, George (NSB - CN/Hangzhou)
Hi,
>>>Can you please try and disable eager-lock?
Eager-lock is disabled already,  and from the source code below:

Arbiter and data FOP will trigger FLOCK the entire file, isn’t it ?

if ((priv->arbiter_count || local->transaction.eager_lock_on ||
 priv->full_lock) &&
local->transaction.type == AFR_DATA_TRANSACTION) {
/*Lock entire file to avoid network split brains.*/
int_lock->flock.l_len   = 0;
int_lock->flock.l_start = 0;
} else {
Best Regards,
George


From: Yaniv Kaul 
Sent: Friday, August 10, 2018 1:37 AM
To: Lian, George (NSB - CN/Hangzhou) 
Subject: Re: [Gluster-devel] afr_set_transaction_flock lead to bad perforance 
when write with multi-pthread or multi-process

Can you please try and disable eager-lock?
Y.


On Thu, Aug 9, 2018, 8:01 PM Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Gluster expert,

When we setup replicate volume with info like the below:

Volume Name: test
Type: Replicate
Volume ID: 9373eba9-eb84-4618-a54c-f2837345daec
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rcp:/trunk/brick/test1/sn0
Brick2: rcp:/trunk/brick/test1/sn1
Brick3: rcp:/trunk/brick/test1/sn2 (arbiter)

If we run a performance test which could write a same file with multi-pthread 
in same time.(write different offset). The write performance drop a lots (about 
60%-70% off  to the volume which no arbiter)
And when we study the source code, there is a function 
“afr_set_transaction_flock” in” afr-transaction.c”,
It will flock the entire file when arbiter_count is not zero, I suppose it is 
the root cause lead to performance drop.
Now my question is:

1) Why flock the entire file when arbiter is set on? Could you please share 
the detail why it will lead to split brain only to arbiter?

2) If it is the root cause, and it really will lead to split-brain if not 
lock entire file, is there any solution to avoid performance drop for this 
mulit-write case?

The following is attached source code for this function FYI:
--
int afr_set_transaction_flock (xlator_t *this, afr_local_t *local)
{
afr_internal_lock_t *int_lock = NULL;
afr_private_t   *priv = NULL;

int_lock = >internal_lock;
priv = this->private;

if ((priv->arbiter_count || local->transaction.eager_lock_on ||
 priv->full_lock) &&
local->transaction.type == AFR_DATA_TRANSACTION) {
/*Lock entire file to avoid network split brains.*/
int_lock->flock.l_len   = 0;
int_lock->flock.l_start = 0;
} else {
int_lock->flock.l_len   = local->transaction.len;
int_lock->flock.l_start = local->transaction.start;
}
int_lock->flock.l_type  = F_WRLCK;

return 0;
}

Thanks & Best Regards,
George
___
Gluster-devel mailing list
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] afr_set_transaction_flock lead to bad perforance when write with multi-pthread or multi-process

2018-08-09 Thread Lian, George (NSB - CN/Hangzhou)
Hi, Gluster expert,

When we setup replicate volume with info like the below:

Volume Name: test
Type: Replicate
Volume ID: 9373eba9-eb84-4618-a54c-f2837345daec
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rcp:/trunk/brick/test1/sn0
Brick2: rcp:/trunk/brick/test1/sn1
Brick3: rcp:/trunk/brick/test1/sn2 (arbiter)

If we run a performance test which could write a same file with multi-pthread 
in same time.(write different offset). The write performance drop a lots (about 
60%-70% off  to the volume which no arbiter)
And when we study the source code, there is a function 
“afr_set_transaction_flock” in” afr-transaction.c”,
It will flock the entire file when arbiter_count is not zero, I suppose it is 
the root cause lead to performance drop.
Now my question is:

1) Why flock the entire file when arbiter is set on? Could you please share 
the detail why it will lead to split brain only to arbiter?

2) If it is the root cause, and it really will lead to split-brain if not 
lock entire file, is there any solution to avoid performance drop for this 
mulit-write case?

The following is attached source code for this function FYI:
--
int afr_set_transaction_flock (xlator_t *this, afr_local_t *local)
{
afr_internal_lock_t *int_lock = NULL;
afr_private_t   *priv = NULL;

int_lock = >internal_lock;
priv = this->private;

if ((priv->arbiter_count || local->transaction.eager_lock_on ||
 priv->full_lock) &&
local->transaction.type == AFR_DATA_TRANSACTION) {
/*Lock entire file to avoid network split brains.*/
int_lock->flock.l_len   = 0;
int_lock->flock.l_start = 0;
} else {
int_lock->flock.l_len   = local->transaction.len;
int_lock->flock.l_start = local->transaction.start;
}
int_lock->flock.l_type  = F_WRLCK;

return 0;
}

Thanks & Best Regards,
George
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-23 Thread Lian, George (NSB - CN/Hangzhou)
Hi,

I tested both patchset1 and patchset2 of  https://review.gluster.org/20549, the 
ctime issue seems both be there.
And I use my test c program and “dd” program, the issue both be there.

But when use the patch of https://review.gluster.org/#/c/20410/11,
My test C program and “dd” to an exist file will pass,
ONLY “dd” to new file will be failed.

Best Regards,
George



From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Raghavendra Gowdappa
Sent: Monday, July 23, 2018 10:37 AM
To: Lian, George (NSB - CN/Hangzhou) 
Cc: Zhang, Bingxuan (NSB - CN/Hangzhou) ; 
Raghavendra G ; Gluster-devel@gluster.org
Subject: Re: [Gluster-devel] The ctime of fstat is not correct which lead to 
"tar" utility error



On Sun, Jul 22, 2018 at 1:41 PM, Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:
George,
Sorry. I sent you a version of the fix which was stale. Can you try with:
https://review.gluster.org/20549
This patch passes the test case you've given.

Patchset 1 solves this problem. However, it ran into dbench failures as 
md-cache was slow to update its cache. Once I fixed it, I am seeing failures 
again. with performance.stat-prefetch off, the error goes away. But, I can see 
only ctime changes. Wondering whether this is related to ctime translator or an 
issue in  md-cache. Note that md-cache caches stats from codepaths which don't 
result in stat updation in kernel too. So, it could be either,
* a bug in md-cache
* or a bug where in those codepaths wrong/changed stat was sent.

I'll probe the first hypothesis. @Pranith/@Ravi,

What do you think about second hypothesis?

regards,
Raghavendra

regards,
Raghavendra

On Fri, Jul 20, 2018 at 2:59 PM, Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>> wrote:
Hi,

Sorry, there seems still have issue.

We use “dd” application of linux tools instead of my demo program, and if the 
file is not exist before dd, the issue still be there.

The test command is
rm -rf /mnt/test/file.txt ; dd if=/dev/zero of=/mnt/test/file.txt bs=512 
count=1 oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz

1) If we set md-cache-timeout to 0, the issue will not happen

2) If we set md-cache-timeout to 1, the issue will 100% reproduced! (with 
new patch you mentioned in the mail)


Please see detail test result as the below:

bash-4.4# gluster v set export md-cache-timeout 0
volume set: failed: Volume export does not exist
bash-4.4# gluster v set test md-cache-timeout 0
volume set: success
bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1 oflag=sync;stat 
/mnt/test/file.txt;tar -czvf /tmp/abc.gz /mnt/test/file.txt;stat 
/mnt/test/file.txt^C
bash-4.4# rm /mnt/test/file.txt
bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1 oflag=sync;stat 
/mnt/test/file.txt;tar -czvf /tmp/abc.gz /mnt/test/file.txt;stat 
/mnt/test/file.txt
1+0 records in
1+0 records out
512 bytes copied, 0.00932571 s, 54.9 kB/s
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 9949244856126716752  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:02.75600 +
Modify: 2018-07-13 17:55:02.76400 +
Change: 2018-07-13 17:55:02.76800 +
Birth: -
tar: Removing leading `/' from member names
/mnt/test/file.txt
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 9949244856126716752  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:02.77600 +
Modify: 2018-07-13 17:55:02.76400 +
Change: 2018-07-13 17:55:02.76800 +
Birth: -
bash-4.4# gluster v set test md-cache-timeout 1
volume set: success
bash-4.4# rm /mnt/test/file.txt
bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1 oflag=sync;stat 
/mnt/test/file.txt;tar -czvf /tmp/abc.gz /mnt/test/file.txt;stat 
/mnt/test/file.txt
1+0 records in
1+0 records out
512 bytes copied, 0.0107589 s, 47.6 kB/s
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 13569976446871695205  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:11.54800 +
Modify: 2018-07-13 17:55:11.56000 +
Change: 2018-07-13 17:55:11.56000 +
Birth: -
tar: Removing leading `/' from member names
/mnt/test/file.txt
tar: /mnt/test/file.txt: file changed as we read it
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 13569976446871695205  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:11.58000 +
Modify: 2018-07-13 17:55:11.56000 +
Change: 2018-07-13 17:55:11.56400 +
Birth: -


Best Regards,
Georg

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-20 Thread Lian, George (NSB - CN/Hangzhou)
Hi,

Sorry, there seems still have issue.

We use “dd” application of linux tools instead of my demo program, and if the 
file is not exist before dd, the issue still be there.

The test command is
rm -rf /mnt/test/file.txt ; dd if=/dev/zero of=/mnt/test/file.txt bs=512 
count=1 oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz

1) If we set md-cache-timeout to 0, the issue will not happen

2) If we set md-cache-timeout to 1, the issue will 100% reproduced! (with 
new patch you mentioned in the mail)


Please see detail test result as the below:

bash-4.4# gluster v set export md-cache-timeout 0
volume set: failed: Volume export does not exist
bash-4.4# gluster v set test md-cache-timeout 0
volume set: success
bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1 oflag=sync;stat 
/mnt/test/file.txt;tar -czvf /tmp/abc.gz /mnt/test/file.txt;stat 
/mnt/test/file.txt^C
bash-4.4# rm /mnt/test/file.txt
bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1 oflag=sync;stat 
/mnt/test/file.txt;tar -czvf /tmp/abc.gz /mnt/test/file.txt;stat 
/mnt/test/file.txt
1+0 records in
1+0 records out
512 bytes copied, 0.00932571 s, 54.9 kB/s
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 9949244856126716752  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:02.75600 +
Modify: 2018-07-13 17:55:02.76400 +
Change: 2018-07-13 17:55:02.76800 +
Birth: -
tar: Removing leading `/' from member names
/mnt/test/file.txt
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 9949244856126716752  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:02.77600 +
Modify: 2018-07-13 17:55:02.76400 +
Change: 2018-07-13 17:55:02.76800 +
Birth: -
bash-4.4# gluster v set test md-cache-timeout 1
volume set: success
bash-4.4# rm /mnt/test/file.txt
bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1 oflag=sync;stat 
/mnt/test/file.txt;tar -czvf /tmp/abc.gz /mnt/test/file.txt;stat 
/mnt/test/file.txt
1+0 records in
1+0 records out
512 bytes copied, 0.0107589 s, 47.6 kB/s
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 13569976446871695205  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:11.54800 +
Modify: 2018-07-13 17:55:11.56000 +
Change: 2018-07-13 17:55:11.56000 +
Birth: -
tar: Removing leading `/' from member names
/mnt/test/file.txt
tar: /mnt/test/file.txt: file changed as we read it
  File: /mnt/test/file.txt
  Size: 512 Blocks: 1  IO Block: 131072 regular file
Device: 33h/51d Inode: 13569976446871695205  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2018-07-13 17:55:11.58000 +
Modify: 2018-07-13 17:55:11.56000 +
Change: 2018-07-13 17:55:11.56400 +
Birth: -


Best Regards,
George
From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Raghavendra Gowdappa
Sent: Friday, July 20, 2018 4:01 PM
To: Lian, George (NSB - CN/Hangzhou) 
Cc: Zhang, Bingxuan (NSB - CN/Hangzhou) ; 
Raghavendra G ; Gluster-devel@gluster.org
Subject: Re: [Gluster-devel] The ctime of fstat is not correct which lead to 
"tar" utility error



On Fri, Jul 20, 2018 at 1:22 PM, Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>> wrote:
>>>We recently identified an issue with stat-prefetch. Fix can be found at:
>>>https://review.gluster.org/#/c/20410/11

>>>Can you let us know whether this helps?


The patch can resolve this issue, I have verified in Gluster 4.2(master trunk 
branch) and Gluster 3.12.3!

Thanks we'll merge it.


Thanks & Best Regards,
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>]
 On Behalf Of Raghavendra Gowdappa
Sent: Thursday, July 19, 2018 5:06 PM
To: Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>>
Cc: Zhang, Bingxuan (NSB - CN/Hangzhou) 
mailto:bingxuan.zh...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Raghavendra G 
mailto:raghaven...@gluster.com>>
Subject: Re: [Gluster-devel] The ctime of fstat is not correct which lead to 
"tar" utility error



On Thu, Jul 19, 2018 at 2:29 PM, Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Gluster Experts,

In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after we use 
fsync,
We have a demo execute binary which write some data and then do fs

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-20 Thread Lian, George (NSB - CN/Hangzhou)
>>>We recently identified an issue with stat-prefetch. Fix can be found at:
>>>https://review.gluster.org/#/c/20410/11

>>>Can you let us know whether this helps?


The patch can resolve this issue, I have verified in Gluster 4.2(master trunk 
branch) and Gluster 3.12.3!

Thanks & Best Regards,
George

From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Raghavendra Gowdappa
Sent: Thursday, July 19, 2018 5:06 PM
To: Lian, George (NSB - CN/Hangzhou) 
Cc: Zhang, Bingxuan (NSB - CN/Hangzhou) ; 
Gluster-devel@gluster.org; Raghavendra G 
Subject: Re: [Gluster-devel] The ctime of fstat is not correct which lead to 
"tar" utility error



On Thu, Jul 19, 2018 at 2:29 PM, Lian, George (NSB - CN/Hangzhou) 
mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Gluster Experts,

In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after we use 
fsync,
We have a demo execute binary which write some data and then do fsync for this 
file, it named as “tt”,
Then run tar command right after “tt” command, it will always error with “tar: 
/mnt/test/file1.txt: file changed as we read it”

The command output is list as the below, the source code and volume info 
configuration attached FYI,
This issue will be 100% reproducible! (/mnt/test is mountpoint of glusterfs 
volume “test” , which the volume info is attached in mail)
--
./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
mtime:1531247107.27200
ctime:1531247107.27200
tar: Removing leading `/' from member names
/mnt/test/file1.txt
tar: /mnt/test/file1.txt: file changed as we read it
--

After my investigation, the xattrop for changelog is later than the fsync 
response , this is mean:
In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume (this, 
local->fd, stub);

In our case, it always a pending changelog , so glusterfs save the metadata 
information to stub, and handle pending changelog first,
But the changelog will also change the ctime, from the packet captured by 
tcpdump, the response packet of xattrop will not include the metadata 
information,  and the wake_resume also not handle this metadata changed case.

So in this case, the metadata in mdc_cache is not right, and when cache is 
valid, the application will get WRONG metadata!

For verify my guess, if I change the configuration for this volume
“gluster v set test md-cache-timeout 0” or
“gluster v set export stat-prefetch off”
This issue will be GONE!

We recently identified an issue with stat-prefetch. Fix can be found at:
https://review.gluster.org/#/c/20410/11

Can you let us know whether this helps?



And I restore the configuration to default, which mean stat-prefetch is on and 
md-cache-timeout is 1 second,
I try invalidate the md-cache in source code as the below in function 
mdc_fync_cbk on md-cache.c
The issue also will be GONE!

So GLusterFS Experts,
Could you please verify this issue, and share your comments on my investigation?
And your finally solutions is highly appreciated!

Does the following fix you've posted solves the problem?


changes in function “mdc_fsync_cbk”
int
mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
   int32_t op_ret, int32_t op_errno,
   struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
{
mdc_local_t  *local = NULL;

local = frame->local;

if (op_ret != 0)
goto out;

if (!local)
goto out;

mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf, postbuf,
 _gf_true);
/* new added for ctime issue*/
mdc_inode_iatt_invalidate(this, local->fd->inode);
/* new added end*/
out:
MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
  xdata);

return 0;
}
-
Best Regards,
George

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-19 Thread Lian, George (NSB - CN/Hangzhou)
Hi, Gluster Experts,

In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after we use 
fsync,
We have a demo execute binary which write some data and then do fsync for this 
file, it named as “tt”,
Then run tar command right after “tt” command, it will always error with “tar: 
/mnt/test/file1.txt: file changed as we read it”

The command output is list as the below, the source code and volume info 
configuration attached FYI,
This issue will be 100% reproducible! (/mnt/test is mountpoint of glusterfs 
volume “test” , which the volume info is attached in mail)
--
./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
mtime:1531247107.27200
ctime:1531247107.27200
tar: Removing leading `/' from member names
/mnt/test/file1.txt
tar: /mnt/test/file1.txt: file changed as we read it
--

After my investigation, the xattrop for changelog is later than the fsync 
response , this is mean:
In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume (this, 
local->fd, stub);

In our case, it always a pending changelog , so glusterfs save the metadata 
information to stub, and handle pending changelog first,
But the changelog will also change the ctime, from the packet captured by 
tcpdump, the response packet of xattrop will not include the metadata 
information,  and the wake_resume also not handle this metadata changed case.

So in this case, the metadata in mdc_cache is not right, and when cache is 
valid, the application will get WRONG metadata!

For verify my guess, if I change the configuration for this volume
“gluster v set test md-cache-timeout 0” or
“gluster v set export stat-prefetch off”
This issue will be GONE!


And I restore the configuration to default, which mean stat-prefetch is on and 
md-cache-timeout is 1 second,
I try invalidate the md-cache in source code as the below in function 
mdc_fync_cbk on md-cache.c
The issue also will be GONE!

So GLusterFS Experts,
Could you please verify this issue, and share your comments on my investigation?
And your finally solutions is highly appreciated!

changes in function “mdc_fsync_cbk”
int
mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
   int32_t op_ret, int32_t op_errno,
   struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
{
mdc_local_t  *local = NULL;

local = frame->local;

if (op_ret != 0)
goto out;

if (!local)
goto out;

mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf, postbuf,
 _gf_true);
/* new added for ctime issue*/
mdc_inode_iatt_invalidate(this, local->fd->inode);
/* new added end*/
out:
MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
  xdata);

return 0;
}
-
Best Regards,
George

#include 
#include 
#include 
#include 
#include 
#include 

void main() {
char* fileName = "/mnt/test/file1.txt";
char buf[128];
struct stat st;
struct timeval tv_begin, tv_end;

// create and write a file, then fflush and fsync
FILE* stream = fopen(fileName,"w");
fwrite("0123456789", sizeof(char), 10, stream);
fflush(stream);
fsync(fileno(stream));
//fsync(stream);
fclose(stream);

// last file status change timestamp
stat(fileName, );
printf("mtime:%06d.%06d\n", st.st_mtim.tv_sec, st.st_mtim.tv_nsec);
printf("ctime:%06d.%06d\n", st.st_ctim.tv_sec, st.st_ctim.tv_nsec);

}
bash-4.4# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: 9373eba9-eb84-4618-a54c-f2837345daec
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rcp:/trunk/brick/test1/sn0
Brick2: rcp:/trunk/brick/test1/sn1
Brick3: rcp:/trunk/brick/test1/sn2 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: none
cluster.quorum-reads: no
cluster.favorite-child-policy: mtime
diagnostics.client-log-level: INFO___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-24 Thread Lian, George (NSB - CN/Hangzhou)
Hi,

I suppose the zero filled attr is for performance consider to NFS, but for 
fuse, it will lead issue such like hard LINK FOP,
So I suggest could we add 2 attr field in the endof "struct iatt {", such like 
ia_fuse_nlink, ia_fuse_ctime,
And in function gf_zero_fill_stat , saved the ia_nlink, ia_ctime to 
ia_fuse_nlink,ia_fuse_ctime before set its to zero,
And restore it to valued nlink and ctime in function gf_fuse_stat2attr, 
So that kernel could get the correct nlink and ctime.

Is it a considerable solution? Any risk?

Please share your comments, thanks in advance!

Best Regards,
George

-Original Message-
From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Niels de Vos
Sent: Wednesday, January 24, 2018 7:43 PM
To: Pranith Kumar Karampuri <pkara...@redhat.com>
Cc: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>; Zhou, 
Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com>; Gluster-devel@gluster.org; Sun, Ping 
(NSB - CN/Hangzhou) <ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

On Wed, Jan 24, 2018 at 02:24:06PM +0530, Pranith Kumar Karampuri wrote:
> hi,
>In the same commit you mentioned earlier, there was this code
> earlier:
> -/* Returns 1 if the stat seems to be filled with zeroes. */ -int 
> -nfs_zero_filled_stat (struct iatt *buf) -{
> -if (!buf)
> -return 1;
> -
> -/* Do not use st_dev because it is transformed to store the xlator
> id
> - * in place of the device number. Do not use st_ino because by
> this time
> - * we've already mapped the root ino to 1 so it is not guaranteed
> to be
> - * 0.
> - */
> -if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
> -return 1;
> -
> -return 0;
> -}
> -
> -
> 
> I moved this to a common library function that can be used in afr as well.
> Why was it there in NFS? +Niels for answering that question.

Sorry, I dont know why that was done. It was introduced with the initial gNFS 
implementation, long before I started to work with Gluster. The only reference 
I have is this from
xlators/nfs/server/src/nfs3-helpers.c:nfs3_stat_to_post_op_attr()

 371 /* Some performance translators return zero-filled stats when they
 372  * do not have up-to-date attributes. Need to handle this by not
 373  * returning these zeroed out attrs.
 374  */

This may not be true for the current situation anymore.

HTH,
Niels


> 
> If I give you a patch which will assert the error condition, would it 
> be possible for you to figure out the first xlator which is unwinding 
> the iatt with nlink count as zero but ctime as non-zero?
> 
> On Wed, Jan 24, 2018 at 1:03 PM, Lian, George (NSB - CN/Hangzhou) < 
> george.l...@nokia-sbell.com> wrote:
> 
> > Hi,  Pranith Kumar,
> >
> >
> >
> > Can you tell me while need set buf->ia_nlink to “0”in function 
> > gf_zero_fill_stat(), which API or Application will care it?
> >
> > If I remove this line and also update corresponding in function 
> > gf_is_zero_filled_stat,
> >
> > The issue seems gone, but I can’t confirm will lead to other issues.
> >
> >
> >
> > So could you please double check it and give your comments?
> >
> >
> >
> > My change is as the below:
> >
> >
> >
> > gf_boolean_t
> >
> > gf_is_zero_filled_stat (struct iatt *buf)
> >
> > {
> >
> > if (!buf)
> >
> > return 1;
> >
> >
> >
> > /* Do not use st_dev because it is transformed to store the 
> > xlator id
> >
> >  * in place of the device number. Do not use st_ino because 
> > by this time
> >
> >  * we've already mapped the root ino to 1 so it is not 
> > guaranteed to be
> >
> >      * 0.
> >
> >  */
> >
> > //if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
> >
> > if (buf->ia_ctime == )
> >
> > return 1;
> >
> >
> >
> > return 0;
> >
> > }
> >
> >
> >
> > void
> >
> > gf_zero_fill_stat (struct iatt *buf)
> >
> > {
> >
> > //   buf->ia_nlink = 0;
> >
> > buf->ia_ctime = 0;
> >
> > }
> >
> >
> >
> > Thanks & Best Regards
> >
> > George
>

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-24 Thread Lian, George (NSB - CN/Hangzhou)
Hi,  Pranith Kumar,

Can you tell me while need set buf->ia_nlink to “0”in function 
gf_zero_fill_stat(), which API or Application will care it?
If I remove this line and also update corresponding in function 
gf_is_zero_filled_stat,
The issue seems gone, but I can’t confirm will lead to other issues.

So could you please double check it and give your comments?

My change is as the below:

gf_boolean_t
gf_is_zero_filled_stat (struct iatt *buf)
{
if (!buf)
return 1;

/* Do not use st_dev because it is transformed to store the xlator id
 * in place of the device number. Do not use st_ino because by this time
 * we've already mapped the root ino to 1 so it is not guaranteed to be
 * 0.
 */
//if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
if (buf->ia_ctime == )
return 1;

return 0;
}

void
gf_zero_fill_stat (struct iatt *buf)
{
//   buf->ia_nlink = 0;
buf->ia_ctime = 0;
}

Thanks & Best Regards
George
From: Lian, George (NSB - CN/Hangzhou)
Sent: Friday, January 19, 2018 10:03 AM
To: Pranith Kumar Karampuri <pkara...@redhat.com>; Zhou, Cynthia (NSB - 
CN/Hangzhou) <cynthia.z...@nokia-sbell.com>
Cc: Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com>; 
Gluster-devel@gluster.org; Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com>
Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

Hi,
>>> Cool, this works for me too. Send me a mail off-list once you are available 
>>> and we can figure out a way to get into a call and work on this.

Have you reproduced the issue per the step I listed in 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail?

If not, I would like you could try it yourself , which the difference between 
yours and mine is just create only 2 bricks instead of 6 bricks.

And Cynthia could have a session with you if you needed when I am not available 
in next Monday and Tuesday.

Thanks & Best Regards,
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 6:03 PM
To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Li, Deqian 
(NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - 
CN/Hangzhou) <ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,
>>>I actually tried it with replica-2 and replica-3 and then distributed 
>>>replica-2 before replying to the earlier mail. We can have a debugging 
>>>session if you are okay with it.

It is fine if you can’t reproduce the issue in your ENV.
And I has attached the detail reproduce log in the Bugzilla FYI

But I am sorry I maybe OOO at Monday and Tuesday next week, so debug session 
will be fine to me at next Wednesday.

Cool, this works for me too. Send me a mail off-list once you are available and 
we can figure out a way to get into a call and work on this.



Paste the detail reproduce log FYI here:
root@ubuntu:~# gluster peer probe ubuntu
peer probe: success. Probe on localhost not needed
root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 
ubuntu:/home/gfs/b2 force
volume create: test: success: please start the volume to access data
root@ubuntu:~# gluster v start test
volume start: test: success
root@ubuntu:~# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ubuntu:/home/gfs/b1
Brick2: ubuntu:/home/gfs/b2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
root@ubuntu:~# gluster v status
Status of volume: test
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick ubuntu:/home/gfs/b1   49152 0  Y   7798
Brick ubuntu:/home/gfs/b2   49153 0  Y   7818
Self-heal Daemon on localhost   N/A   N/AY   7839

Task Status of Volume test
---

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-24 Thread Lian, George (NSB - CN/Hangzhou)
So I suppose ctime is enough to consider it whether a good iatt or not.
And why we also include ia_nlink in function gf_zero_fill_stat and 
gf_is_zero_filled_stat ?

From my investigation, if set ia_nlink to 0, if kernel read the attr with flag 
RCU, kernel will check the ia_nlink field, when do LINK operation, it will lead 
to error of “files is not exist”.

if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
 error =  -ENOENT;


Best Regards,
George

From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 24, 2018 4:15 PM
To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; 
Gluster-devel@gluster.org; Li, Deqian (NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

If ctime is zero, no xlator should consider it as a good iatt. The fact that 
this is happening means some xlator is not doing proper checks in code. We need 
to find what that xlator is and fix it. Internet in our new office is not 
working so I'm not able to have call today with you guys. What I would do is to 
put logs in lookup, link, fstat, stat calls to see if anyone unwound iatt with 
ia_nlink count as zero but ctime as nonzero.

On 24 Jan 2018 1:03 pm, "Lian, George (NSB - CN/Hangzhou)" 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,  Pranith Kumar,

Can you tell me while need set buf->ia_nlink to “0”in function 
gf_zero_fill_stat(), which API or Application will care it?
If I remove this line and also update corresponding in function 
gf_is_zero_filled_stat,
The issue seems gone, but I can’t confirm will lead to other issues.

So could you please double check it and give your comments?

My change is as the below:

gf_boolean_t
gf_is_zero_filled_stat (struct iatt *buf)
{
if (!buf)
return 1;

/* Do not use st_dev because it is transformed to store the xlator id
 * in place of the device number. Do not use st_ino because by this time
 * we've already mapped the root ino to 1 so it is not guaranteed to be
 * 0.
 */
//if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
if (buf->ia_ctime == )
return 1;

return 0;
}

void
gf_zero_fill_stat (struct iatt *buf)
{
//   buf->ia_nlink = 0;
buf->ia_ctime = 0;
}

Thanks & Best Regards
George
From: Lian, George (NSB - CN/Hangzhou)
Sent: Friday, January 19, 2018 10:03 AM
To: Pranith Kumar Karampuri <pkara...@redhat.com<mailto:pkara...@redhat.com>>; 
Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>
Cc: Li, Deqian (NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - 
CN/Hangzhou) <ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>

Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

Hi,
>>> Cool, this works for me too. Send me a mail off-list once you are available 
>>> and we can figure out a way to get into a call and work on this.

Have you reproduced the issue per the step I listed in 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail?

If not, I would like you could try it yourself , which the difference between 
yours and mine is just create only 2 bricks instead of 6 bricks.

And Cynthia could have a session with you if you needed when I am not available 
in next Monday and Tuesday.

Thanks & Best Regards,
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 6:03 PM
To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Li, Deqian 
(NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - 
CN/Hangzhou) <ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:geor

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-18 Thread Lian, George (NSB - CN/Hangzhou)
Hi,
>>> Cool, this works for me too. Send me a mail off-list once you are available 
>>> and we can figure out a way to get into a call and work on this.

Have you reproduced the issue per the step I listed in 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail?

If not, I would like you could try it yourself , which the difference between 
yours and mine is just create only 2 bricks instead of 6 bricks.

And Cynthia could have a session with you if you needed when I am not available 
in next Monday and Tuesday.

Thanks & Best Regards,
George

From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 6:03 PM
To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; Li, 
Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com>; 
Gluster-devel@gluster.org; Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,
>>>I actually tried it with replica-2 and replica-3 and then distributed 
>>>replica-2 before replying to the earlier mail. We can have a debugging 
>>>session if you are okay with it.

It is fine if you can’t reproduce the issue in your ENV.
And I has attached the detail reproduce log in the Bugzilla FYI

But I am sorry I maybe OOO at Monday and Tuesday next week, so debug session 
will be fine to me at next Wednesday.

Cool, this works for me too. Send me a mail off-list once you are available and 
we can figure out a way to get into a call and work on this.



Paste the detail reproduce log FYI here:
root@ubuntu:~# gluster peer probe ubuntu
peer probe: success. Probe on localhost not needed
root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 
ubuntu:/home/gfs/b2 force
volume create: test: success: please start the volume to access data
root@ubuntu:~# gluster v start test
volume start: test: success
root@ubuntu:~# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ubuntu:/home/gfs/b1
Brick2: ubuntu:/home/gfs/b2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
root@ubuntu:~# gluster v status
Status of volume: test
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick ubuntu:/home/gfs/b1   49152 0  Y   7798
Brick ubuntu:/home/gfs/b2   49153 0  Y   7818
Self-heal Daemon on localhost   N/A   N/AY   7839

Task Status of Volume test
--
There are no active volume tasks


root@ubuntu:~# gluster v set test cluster.consistent-metadata on
volume set: success

root@ubuntu:~# ls /mnt/test
ls: cannot access '/mnt/test': No such file or directory
root@ubuntu:~# mkdir -p /mnt/test
root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test

root@ubuntu:~# cd /mnt/test
root@ubuntu:/mnt/test# echo "abc">aaa
root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc

root@ubuntu:/mnt/test# kill -9 7818
root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee
link: cannot create link 'eee' to 'ddd': No such file or directory


Best Regards,
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>]
 On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 2:40 PM

To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,
I suppose the brick numbers in your testing is six, and you just shut down the 
3 process.
When I reproduce the is

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-17 Thread Lian, George (NSB - CN/Hangzhou)
Hi,
I suppose the brick numbers in your testing is six, and you just shut down the 
3 process.
When I reproduce the issue, I only create a replicate volume with 2 bricks, 
only let ONE brick working and set cluster.consistent-metadata on,
With this 2 test condition, the issue could 100% reproducible.



16:44:28 :) ⚡ gluster v status
Status of volume: r2
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick localhost.localdomain:/home/gfs/r2_0  49152 0  Y   5309
Brick localhost.localdomain:/home/gfs/r2_1  49154 0  Y   5330
Brick localhost.localdomain:/home/gfs/r2_2  49156 0  Y   5351
Brick localhost.localdomain:/home/gfs/r2_3  49158 0  Y   5372
Brick localhost.localdomain:/home/gfs/r2_4  49159 0  Y   5393
Brick localhost.localdomain:/home/gfs/r2_5  49160 0  Y   5414
Self-heal Daemon on localhost   N/A   N/AY   5436

Task Status of Volume r2
--
There are no active volume tasks

root@dhcp35-190 - ~
16:44:38 :) ⚡ kill -9 5309 5351 5393

Best Regards,
George
From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 17, 2018 7:27 PM
To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>
Cc: Li, Deqian (NSB - CN/Hangzhou) <deqian...@nokia-sbell.com>; 
Gluster-devel@gluster.org; Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri 
<pkara...@redhat.com<mailto:pkara...@redhat.com>> wrote:


On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,

Have you reproduced this issue? If yes, could you please confirm whether it is 
an issue or not?

Hi,
   I tried recreating this on my laptop and on both master and 3.12 and I 
am not able to recreate the issue :-(.
Here is the execution log: 
https://paste.fedoraproject.org/paste/-csXUKrwsbrZAVW1KzggQQ
Since I was doing this on my laptop, I changed shutting down of the replica to 
killing the brick process to simulate this test.
Let me know if I missed something.


Sorry, I am held up with some issue at work, so I think I will get some time 
day after tomorrow to look at this. In the mean time I am adding more people 
who know about afr to see if they get a chance to work on this before me.


And if it is an issue,  do you have any solution for this issue?

Thanks & Best Regards,
George

From: Lian, George (NSB - CN/Hangzhou)
Sent: Thursday, January 11, 2018 2:01 PM
To: Pranith Kumar Karampuri <pkara...@redhat.com<mailto:pkara...@redhat.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

Hi,

Please see detail test step on 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457

How reproducible:


Steps to Reproduce:
1.create a volume name "test" with replicated
2.set volume option cluster.consistent-metadata with on:
  gluster v set test cluster.consistent-metadata on
3. mount volume test on client on /mnt/test
4. create a file aaa size more than 1 byte
   echo "1234567890" >/mnt/test/aaa
5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc


BRs
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 11, 2018 12:39 PM
To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com<mailto:ping....@nokia-sbell.com>>
Subject: Re: [Gluster-devel]

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-15 Thread Lian, George (NSB - CN/Hangzhou)
Hi,

Have you reproduced this issue? If yes, could you please confirm whether it is 
an issue or not?

And if it is an issue,  do you have any solution for this issue?

Thanks & Best Regards,
George

From: Lian, George (NSB - CN/Hangzhou)
Sent: Thursday, January 11, 2018 2:01 PM
To: Pranith Kumar Karampuri <pkara...@redhat.com>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; 
Gluster-devel@gluster.org; Li, Deqian (NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com>
Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

Hi,

Please see detail test step on 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457

How reproducible:


Steps to Reproduce:
1.create a volume name "test" with replicated
2.set volume option cluster.consistent-metadata with on:
  gluster v set test cluster.consistent-metadata on
3. mount volume test on client on /mnt/test
4. create a file aaa size more than 1 byte
   echo "1234567890" >/mnt/test/aaa
5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc


BRs
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 11, 2018 12:39 PM
To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,
>>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
It is fuse, within mountpoint by “mount -t glusterfs  …“ command.

Could you let me know the test you did so that I can try to re-create and see 
what exactly is going on?
Configuration of the volume and the steps to re-create the issue you are seeing 
would be helpful in debugging the issue further.


Thanks & Best Regards,
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>]
 On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 10, 2018 8:08 PM
To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Zhong, Hua 
(NSB - CN/Hangzhou) 
<hua.zh...@nokia-sbell.com<mailto:hua.zh...@nokia-sbell.com>>; Li, Deqian (NSB 
- CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - 
CN/Hangzhou) <ping....@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Pranith Kumar,

I has create a bug on Bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457
After my investigation for this link issue, I suppose your changes on 
afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix 
is like:
--
   if (afr_txn_nothing_failed (frame, this)) {
/*if it did pre-op, it will do post-op changing ctime*/
if (priv->consistent_metadata &&
afr_needs_changelog_update (local))
afr_zero_fill_stat (local);
local->transaction.unwind (frame, this);
}
In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is 
set to “on”.
And hard link a file with which just created will lead to an error, and the 
error is caused in kernel function “vfs_link”:
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
 error =  -ENOENT;

could you please have a check and give some comments here?

When stat i

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-11 Thread Lian, George (NSB - CN/Hangzhou)
Hi,
>>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
It is fuse, within mountpoint by “mount -t glusterfs  …“ command.

Thanks & Best Regards,
George

From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 10, 2018 8:08 PM
To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; Zhong, 
Hua (NSB - CN/Hangzhou) <hua.zh...@nokia-sbell.com>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com>; Gluster-devel@gluster.org; Sun, Ping 
(NSB - CN/Hangzhou) <ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Pranith Kumar,

I has create a bug on Bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457
After my investigation for this link issue, I suppose your changes on 
afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix 
is like:
--
   if (afr_txn_nothing_failed (frame, this)) {
/*if it did pre-op, it will do post-op changing ctime*/
if (priv->consistent_metadata &&
afr_needs_changelog_update (local))
afr_zero_fill_stat (local);
local->transaction.unwind (frame, this);
}
In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is 
set to “on”.
And hard link a file with which just created will lead to an error, and the 
error is caused in kernel function “vfs_link”:
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
 error =  -ENOENT;

could you please have a check and give some comments here?

When stat is "zero filled", understanding is that the higher layer protocol 
doesn't send stat value to the kernel and a separate lookup is sent by the 
kernel to get the latest stat value. In which protocol are you seeing this 
issue? Fuse/NFS/SMB?


Thanks & Best Regards,
George



--
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-11 Thread Lian, George (NSB - CN/Hangzhou)
Hi,

Please see detail test step on 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457

How reproducible:


Steps to Reproduce:
1.create a volume name "test" with replicated
2.set volume option cluster.consistent-metadata with on:
  gluster v set test cluster.consistent-metadata on
3. mount volume test on client on /mnt/test
4. create a file aaa size more than 1 byte
   echo "1234567890" >/mnt/test/aaa
5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc


BRs
George

From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 11, 2018 12:39 PM
To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; 
Gluster-devel@gluster.org; Li, Deqian (NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) 
<ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi,
>>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
It is fuse, within mountpoint by “mount -t glusterfs  …“ command.

Could you let me know the test you did so that I can try to re-create and see 
what exactly is going on?
Configuration of the volume and the steps to re-create the issue you are seeing 
would be helpful in debugging the issue further.


Thanks & Best Regards,
George

From: 
gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org> 
[mailto:gluster-devel-boun...@gluster.org<mailto:gluster-devel-boun...@gluster.org>]
 On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 10, 2018 8:08 PM
To: Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<cynthia.z...@nokia-sbell.com<mailto:cynthia.z...@nokia-sbell.com>>; Zhong, Hua 
(NSB - CN/Hangzhou) 
<hua.zh...@nokia-sbell.com<mailto:hua.zh...@nokia-sbell.com>>; Li, Deqian (NSB 
- CN/Hangzhou) <deqian...@nokia-sbell.com<mailto:deqian...@nokia-sbell.com>>; 
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>; Sun, Ping (NSB - 
CN/Hangzhou) <ping@nokia-sbell.com<mailto:ping@nokia-sbell.com>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Pranith Kumar,

I has create a bug on Bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457
After my investigation for this link issue, I suppose your changes on 
afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix 
is like:
--
   if (afr_txn_nothing_failed (frame, this)) {
/*if it did pre-op, it will do post-op changing ctime*/
if (priv->consistent_metadata &&
afr_needs_changelog_update (local))
afr_zero_fill_stat (local);
local->transaction.unwind (frame, this);
}
In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is 
set to “on”.
And hard link a file with which just created will lead to an error, and the 
error is caused in kernel function “vfs_link”:
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
 error =  -ENOENT;

could you please have a check and give some comments here?

When stat is "zero filled", understanding is that the higher layer protocol 
doesn't send stat value to the kernel and a separate lookup is sent by the 
kernel to get the latest stat value. In which protocol are you seeing this 
issue? Fuse/NFS/SMB?


Thanks & Best Regards,
George



--
Pranith



--
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Recall: a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-11 Thread Lian, George (NSB - CN/Hangzhou)
Lian, George (NSB - CN/Hangzhou) would like to recall the message, 
"[Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS 
cache stat after writes"".
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-11 Thread Lian, George (NSB - CN/Hangzhou)
Hi, Pranith Kumar,

I has create a bug on Bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457
After my investigation for this link issue, I suppose your changes on 
afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix 
is like:
--
   if (afr_txn_nothing_failed (frame, this)) {
/*if it did pre-op, it will do post-op changing ctime*/
if (priv->consistent_metadata &&
afr_needs_changelog_update (local))
afr_zero_fill_stat (local);
local->transaction.unwind (frame, this);
}
In the above fix, it set the ia_nlink to '0' if option consistent-metadata is 
set to "on".
And hard link a file with which just created will lead to an error, and the 
error is caused in kernel function "vfs_link":
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
 error =  -ENOENT;

could you please have a check and give some comments here?

Thanks & Best Regards,
George
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

2018-01-11 Thread Lian, George (NSB - CN/Hangzhou)
Hi,
>>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
It is fuse, within mountpoint by “mount -t glusterfs  …“ command.

Thanks & Best Regards,
George

From: gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 10, 2018 8:08 PM
To: Lian, George (NSB - CN/Hangzhou) <george.l...@nokia-sbell.com>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>; Zhong, 
Hua (NSB - CN/Hangzhou) <hua.zh...@nokia-sbell.com>; Li, Deqian (NSB - 
CN/Hangzhou) <deqian...@nokia-sbell.com>; Gluster-devel@gluster.org; Sun, Ping 
(NSB - CN/Hangzhou) <ping@nokia-sbell.com>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) 
<george.l...@nokia-sbell.com<mailto:george.l...@nokia-sbell.com>> wrote:
Hi, Pranith Kumar,

I has create a bug on Bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457
After my investigation for this link issue, I suppose your changes on 
afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix 
is like:
--
   if (afr_txn_nothing_failed (frame, this)) {
/*if it did pre-op, it will do post-op changing ctime*/
if (priv->consistent_metadata &&
afr_needs_changelog_update (local))
afr_zero_fill_stat (local);
local->transaction.unwind (frame, this);
}
In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is 
set to “on”.
And hard link a file with which just created will lead to an error, and the 
error is caused in kernel function “vfs_link”:
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
 error =  -ENOENT;

could you please have a check and give some comments here?

When stat is "zero filled", understanding is that the higher layer protocol 
doesn't send stat value to the kernel and a separate lookup is sent by the 
kernel to get the latest stat value. In which protocol are you seeing this 
issue? Fuse/NFS/SMB?


Thanks & Best Regards,
George



--
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Sometimes read failed just after write before file closed when try 100 times in different new files

2017-12-28 Thread Lian, George (NSB - CN/Hangzhou)
Hi, Gluster Expert,

Sorry for wrong code "open(argv[1], O_CREAT|O_RDWR|S_IRUSR|S_IWUSR) "in 
previous mail, after correct it , the issues seems GONE.
But I still confuse why wrong code not failed in EXT4 file system.
After I check tcpdump packet via network, it show that in CREATE process, the 
mode of file sometime became to " Mode: 0101000, S_ISVTX, Reserved ", with this 
mode, the file could be seen with "ls" command, but can't seen by 
"hexdump"command even with root privilege,  but if run this wrong program in 
ext4 file system, this MODE can't be occurred.
So I suggest maybe Gluster could do some inprovment for this MODE , so that can 
avoid thiss can't access file with wrong API called.

@Sami,
After I check the manual of open API, the correct format should be
  int open(const char *pathname, int flags);
   int open(const char *pathname, int flags, mode_t mode);
for your test code, I suppose the correct call should be "open(argv[1], 
O_CREAT|O_RDWR,S_IRUSR|S_IWUSR) "
after I correct it , c and C++ code all PASS in 3.6 and 3.12.

Best Regards,
George

From: Lian, George (NSB - CN/Hangzhou)
Sent: Wednesday, December 27, 2017 5:23 PM
To: Gluster-devel@gluster.org
Cc: Zhong, Hua (NSB - CN/Hangzhou) <hua.zh...@nokia-sbell.com>; Venetjoki, Sami 
(Nokia - FI/Espoo) <sami.venetj...@nokia.com>; Li, Deqian (NSB - CN/Hangzhou) 
<deqian...@nokia-sbell.com>
Subject: Sometimes read failed just after write before file closed when try 100 
times in different new files

Hi, Gluster expert,

I has just raise a bug with https://bugzilla.redhat.com/show_bug.cgi?id=1529237 
on gluster community.

I also attached the reproduce source code and shell script here FYI.

>From my investigation , I suppose the issue maybe caused by "cache in client 
>is invalid but the file still not been persistented in server brick"
But I can't find the root cause from the debug log which attached in 
https://bugzilla.redhat.com/show_bug.cgi?id=1529237.

>From the following log I abstract client log from the attachment in the 
>Bugzilla, I highlight it with YELLOW which in read phase, you can see both 
>volume(test-client-0 and test-client-1) are failed
I just suppose it maybe some abnormal when it in read phase, but I can't any 
usefully log in server brick log from myself, so your helps is highly 
appreciated!

[2017-12-27 06:21:04.332524] T [MSGID: 0] [afr-inode-read.c:286:afr_fstat_cbk] 
12-stack-trace: stack-address: 0x7fb5ec010680, test-replicate-0 returned 0
[2017-12-27 06:21:04.332724] T [MSGID: 0] [syncop.c:1715:syncop_fgetxattr] 
12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-dht to 
test-replicate-0
[2017-12-27 06:21:04.332768] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 
12-test-replicate-0: 70e0f7ee-2f0a-4364-a38e-96e7031eee7b: generation now vs 
cached: 2, 2
[2017-12-27 06:21:04.332791] T [MSGID: 0] 
[afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.332817] T [rpc-clnt.c:1496:rpc_clnt_record] 
12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 
[2017-12-27 06:21:04.332837] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 
12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76
[2017-12-27 06:21:04.332862] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: 
submitted request (XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) 
to rpc-transport (test-client-0)
[2017-12-27 06:21:04.333943] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
12-test-client-0: received rpc message (RPC XID: 0x2681 Program: GlusterFS 3.3, 
ProgVers: 330, Proc: 35) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.334013] D [MSGID: 0] 
[client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-0: remote 
operation failed: No data available
[2017-12-27 06:21:04.334036] D [MSGID: 0] 
[client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, test-client-0 returned -1 error: No data available [No data 
available]
[2017-12-27 06:21:04.334114] T [MSGID: 0] 
[afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: 
stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.334174] T [rpc-clnt.c:1496:rpc_clnt_record] 
12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 
[2017-12-27 06:21:04.334196] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 
12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334229] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: 
submitted request (XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) 
to rpc-transport (test-client-0)
[2017-12-27 06:21:04.334280] T [MSGID: 0] 
[afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: 
stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to t

[Gluster-devel] Sometimes read failed just after write before file closed when try 100 times in different new files

2017-12-28 Thread Lian, George (NSB - CN/Hangzhou)
Hi, Gluster expert,

I has just raise a bug with https://bugzilla.redhat.com/show_bug.cgi?id=1529237 
on gluster community.

I also attached the reproduce source code and shell script here FYI.

>From my investigation , I suppose the issue maybe caused by "cache in client 
>is invalid but the file still not been persistented in server brick"
But I can't find the root cause from the debug log which attached in 
https://bugzilla.redhat.com/show_bug.cgi?id=1529237.

>From the following log I abstract client log from the attachment in the 
>Bugzilla, I highlight it with YELLOW which in read phase, you can see both 
>volume(test-client-0 and test-client-1) are failed
I just suppose it maybe some abnormal when it in read phase, but I can't any 
usefully log in server brick log from myself, so your helps is highly 
appreciated!

[2017-12-27 06:21:04.332524] T [MSGID: 0] [afr-inode-read.c:286:afr_fstat_cbk] 
12-stack-trace: stack-address: 0x7fb5ec010680, test-replicate-0 returned 0
[2017-12-27 06:21:04.332724] T [MSGID: 0] [syncop.c:1715:syncop_fgetxattr] 
12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-dht to 
test-replicate-0
[2017-12-27 06:21:04.332768] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 
12-test-replicate-0: 70e0f7ee-2f0a-4364-a38e-96e7031eee7b: generation now vs 
cached: 2, 2
[2017-12-27 06:21:04.332791] T [MSGID: 0] 
[afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.332817] T [rpc-clnt.c:1496:rpc_clnt_record] 
12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 
[2017-12-27 06:21:04.332837] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 
12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76
[2017-12-27 06:21:04.332862] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: 
submitted request (XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) 
to rpc-transport (test-client-0)
[2017-12-27 06:21:04.333943] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
12-test-client-0: received rpc message (RPC XID: 0x2681 Program: GlusterFS 3.3, 
ProgVers: 330, Proc: 35) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.334013] D [MSGID: 0] 
[client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-0: remote 
operation failed: No data available
[2017-12-27 06:21:04.334036] D [MSGID: 0] 
[client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, test-client-0 returned -1 error: No data available [No data 
available]
[2017-12-27 06:21:04.334114] T [MSGID: 0] 
[afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: 
stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.334174] T [rpc-clnt.c:1496:rpc_clnt_record] 
12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 
[2017-12-27 06:21:04.334196] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 
12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334229] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: 
submitted request (XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) 
to rpc-transport (test-client-0)
[2017-12-27 06:21:04.334280] T [MSGID: 0] 
[afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: 
stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1
[2017-12-27 06:21:04.334311] T [rpc-clnt.c:1496:rpc_clnt_record] 
12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 
[2017-12-27 06:21:04.334330] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 
12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334354] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: 
submitted request (XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) 
to rpc-transport (test-client-1)
[2017-12-27 06:21:04.334904] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
12-test-client-1: received rpc message (RPC XID: 0x252c Program: GlusterFS 3.3, 
ProgVers: 330, Proc: 25) from rpc-transport (test-client-1)
[2017-12-27 06:21:04.334983] T [MSGID: 0] 
[client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, test-client-1 returned 0
[2017-12-27 06:21:04.335128] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
12-test-client-0: received rpc message (RPC XID: 0x2682 Program: GlusterFS 3.3, 
ProgVers: 330, Proc: 25) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.335165] T [MSGID: 0] 
[client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, test-client-0 returned 0
[2017-12-27 06:21:04.335193] T [MSGID: 0] 
[afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 
0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1
[2017-12-27 06:21:04.335224] T [rpc-clnt.c:1496:rpc_clnt_record] 
12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 
[2017-12-27 06:21:04.335253]