Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator

2014-06-05 Thread Anand Avati

On 6/3/14, 11:32 PM, Pranith Kumar Karampuri wrote:


On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:

Hi,

Recently there was a crash in locks translator (BZ 1103347, BZ
1097102) with the following backtrace:
(gdb) bt
#0  uuid_unpack (in=0x8 Address 0x8 out of bounds,
uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
#1  0x7feeba9e19d6 in uuid_unparse_x (uu=value optimized out,
out=0x2350fc0 081bbc7a-7551-44ac-85c7-aad5e2633db9,
fmt=0x7feebaa08e00
%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x) at
../../contrib/uuid/unparse.c:55
#2  0x7feeba9be837 in uuid_utoa (uuid=0x8 Address 0x8 out of
bounds) at common-utils.c:2138
#3  0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910,
ctx=0x7fee700f0c60) at inodelk.c:396
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at
inodelk.c:428
#5  0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910,
client=value optimized out) at posix.c:2550
#6  0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at
client_t.c:368
#7  0x7feeab77ed48 in server_connection_cleanup (this=0x2316390,
client=0x27724a0, flags=value optimized out) at server-helpers.c:354
#8  0x7feeab77ae2c in server_rpc_notify (rpc=value optimized
out, xl=0x2316390, event=value optimized out, data=0x2bf51c0) at
server.c:527
#9  0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980,
trans=0x2bf51c0) at rpcsvc.c:720
#10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0,
mydata=value optimized out, event=value optimized out,
data=0x2bf51c0) at rpcsvc.c:758
#11 0x7feeba778638 in rpc_transport_notify (this=value optimized
out, event=value optimized out, data=value optimized out) at
rpc-transport.c:512
#12 0x7feeb115e971 in socket_event_poll_err (fd=value optimized
out, idx=value optimized out, data=0x2bf51c0, poll_in=value
optimized out, poll_out=0,
poll_err=0) at socket.c:1071
#13 socket_event_handler (fd=value optimized out, idx=value
optimized out, data=0x2bf51c0, poll_in=value optimized out,
poll_out=0, poll_err=0) at socket.c:2240
#14 0x7feeba9fc6a7 in event_dispatch_epoll_handler
(event_pool=0x22e2d00) at event-epoll.c:384
#15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
#16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at
glusterfsd.c:2023
(gdb) f 4
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at
inodelk.c:428
428pl_inodelk_log_cleanup (l);
(gdb) p l-pl_inode-refkeeper
$1 = (inode_t *) 0x0
(gdb)

pl_inode-refkeeper was found to be NULL even when there were some
blocked inodelks in a certain domain of the inode,
which when dereferenced by the epoll thread in the cleanup codepath
led to a crash.

On inspecting the code (for want of a consistent reproducer), three
things were found:

1. The function where the crash happens (pl_inodelk_log_cleanup()),
makes an attempt to resolve the inode to path as can be seen below.
But the way inode_path() itself
works is to first construct the path based on the given inode's
ancestry and place it in the buffer provided. And if all else fails,
the gfid of the inode is placed in a certain format (gfid:%s).
This eliminates the need for statements from line 4 through 7
below, thereby preventing dereferencing of pl_inode-refkeeper.
Now, although this change prevents the crash altogether, it still
does not fix the race that led to pl_inode-refkeeper becoming NULL,
and comes at the cost of
printing (null) in the log message on line 9 every time
pl_inode-refkeeper is found to be NULL, rendering the logged messages
somewhat useless.

code
  0 pl_inode = lock-pl_inode;
  1
  2 inode_path (pl_inode-refkeeper, NULL, path);
  3
  4 if (path)
  5 file = path;
  6 else
  7 file = uuid_utoa (pl_inode-refkeeper-gfid);
8
  9 gf_log (THIS-name, GF_LOG_WARNING,
 10 releasing lock on %s held by 
 11 {client=%p, pid=%PRId64 lk-owner=%s},
 12 file, lock-client, (uint64_t) lock-client_pid,
 13 lkowner_utoa (lock-owner));
\code

I think this logging code is from the days when gfid handle concept was
not there. So it wasn't returning gfid:gfid-str in cases the path is
not present in the dentries. I believe the else block can be deleted
safely now.

Pranith


2. There is at least one codepath found that can lead to this crash:
Imagine an inode on which an inodelk operation is attempted by a
client and is successfully granted too.
   Now, between the time the lock was granted and
pl_update_refkeeper() was called by this thread, the client could send
a DISCONNECT event,
   causing cleanup codepath to be executed, where the epoll thread
crashes on dereferencing pl_inode-refkeeper which is STILL NULL at
this point.

   Besides, there are still places in locks xlator where the refkeeper
is NOT updated whenever the lists are modified - for instance in the
cleanup codepath from a 

Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator

2014-06-05 Thread Anand Avati

On 6/4/14, 9:43 PM, Krutika Dhananjay wrote:





*From: *Pranith Kumar Karampuri pkara...@redhat.com
*To: *Krutika Dhananjay kdhan...@redhat.com, Anand Avati
aav...@redhat.com
*Cc: *gluster-devel@gluster.org
*Sent: *Wednesday, June 4, 2014 12:23:59 PM
*Subject: *Re: [Gluster-devel] Regarding doing away with refkeeper
in locks xlator


On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote:


On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:

Hi,

Recently there was a crash in locks translator (BZ 1103347,
BZ 1097102) with the following backtrace:
(gdb) bt
#0  uuid_unpack (in=0x8 Address 0x8 out of bounds,
uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
#1  0x7feeba9e19d6 in uuid_unparse_x (uu=value
optimized out, out=0x2350fc0
081bbc7a-7551-44ac-85c7-aad5e2633db9,
 fmt=0x7feebaa08e00
%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x) at
../../contrib/uuid/unparse.c:55
#2  0x7feeba9be837 in uuid_utoa (uuid=0x8 Address 0x8
out of bounds) at common-utils.c:2138
#3  0x7feeb06e8a58 in pl_inodelk_log_cleanup
(this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396
#4  pl_inodelk_client_cleanup (this=0x230d910,
ctx=0x7fee700f0c60) at inodelk.c:428
#5  0x7feeb06ddf3a in pl_client_disconnect_cbk
(this=0x230d910, client=value optimized out) at posix.c:2550
#6  0x7feeba9fa2dd in gf_client_disconnect
(client=0x27724a0) at client_t.c:368
#7  0x7feeab77ed48 in server_connection_cleanup
(this=0x2316390, client=0x27724a0, flags=value optimized
out) at server-helpers.c:354
#8  0x7feeab77ae2c in server_rpc_notify (rpc=value
optimized out, xl=0x2316390, event=value optimized out,
data=0x2bf51c0) at server.c:527
#9  0x7feeba775155 in rpcsvc_handle_disconnect
(svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720
#10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0,
mydata=value optimized out, event=value optimized out,
data=0x2bf51c0) at rpcsvc.c:758
#11 0x7feeba778638 in rpc_transport_notify (this=value
optimized out, event=value optimized out, data=value
optimized out) at rpc-transport.c:512
#12 0x7feeb115e971 in socket_event_poll_err (fd=value
optimized out, idx=value optimized out, data=0x2bf51c0,
poll_in=value optimized out, poll_out=0,
 poll_err=0) at socket.c:1071
#13 socket_event_handler (fd=value optimized out,
idx=value optimized out, data=0x2bf51c0, poll_in=value
optimized out, poll_out=0, poll_err=0) at socket.c:2240
#14 0x7feeba9fc6a7 in event_dispatch_epoll_handler
(event_pool=0x22e2d00) at event-epoll.c:384
#15 event_dispatch_epoll (event_pool=0x22e2d00) at
event-epoll.c:445
#16 0x00407e93 in main (argc=19,
argv=0x7fffea6c7f88) at glusterfsd.c:2023
(gdb) f 4
#4  pl_inodelk_client_cleanup (this=0x230d910,
ctx=0x7fee700f0c60) at inodelk.c:428
428pl_inodelk_log_cleanup (l);
(gdb) p l-pl_inode-refkeeper
$1 = (inode_t *) 0x0
(gdb)

pl_inode-refkeeper was found to be NULL even when there
were some blocked inodelks in a certain domain of the inode,
which when dereferenced by the epoll thread in the cleanup
codepath led to a crash.

On inspecting the code (for want of a consistent
reproducer), three things were found:

1. The function where the crash happens
(pl_inodelk_log_cleanup()), makes an attempt to resolve the
inode to path as can be seen below. But the way inode_path()
itself
 works is to first construct the path based on the given
inode's ancestry and place it in the buffer provided. And if
all else fails, the gfid of the inode is placed in a certain
format (gfid:%s).
 This eliminates the need for statements from line 4
through 7 below, thereby preventing dereferencing of
pl_inode-refkeeper.
 Now, although this change prevents the crash
altogether, it still does not fix the race that led to
pl_inode-refkeeper becoming NULL, and comes at the cost of
 printing (null) in the log message on line 9 every
time pl_inode-refkeeper is found to be NULL, rendering the
logged messages

Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator

2014-06-04 Thread Pranith Kumar Karampuri


On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:

Hi,

Recently there was a crash in locks translator (BZ 1103347, BZ 
1097102) with the following backtrace:

(gdb) bt
#0  uuid_unpack (in=0x8 Address 0x8 out of bounds, 
uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
#1  0x7feeba9e19d6 in uuid_unparse_x (uu=value optimized out, 
out=0x2350fc0 081bbc7a-7551-44ac-85c7-aad5e2633db9,
fmt=0x7feebaa08e00 
%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x) at 
../../contrib/uuid/unparse.c:55
#2  0x7feeba9be837 in uuid_utoa (uuid=0x8 Address 0x8 out of 
bounds) at common-utils.c:2138
#3  0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, 
ctx=0x7fee700f0c60) at inodelk.c:396
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428
#5  0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, 
client=value optimized out) at posix.c:2550
#6  0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at 
client_t.c:368
#7  0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, 
client=0x27724a0, flags=value optimized out) at server-helpers.c:354
#8  0x7feeab77ae2c in server_rpc_notify (rpc=value optimized 
out, xl=0x2316390, event=value optimized out, data=0x2bf51c0) at 
server.c:527
#9  0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, 
trans=0x2bf51c0) at rpcsvc.c:720
#10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, 
mydata=value optimized out, event=value optimized out, 
data=0x2bf51c0) at rpcsvc.c:758
#11 0x7feeba778638 in rpc_transport_notify (this=value optimized 
out, event=value optimized out, data=value optimized out) at 
rpc-transport.c:512
#12 0x7feeb115e971 in socket_event_poll_err (fd=value optimized 
out, idx=value optimized out, data=0x2bf51c0, poll_in=value 
optimized out, poll_out=0,

poll_err=0) at socket.c:1071
#13 socket_event_handler (fd=value optimized out, idx=value 
optimized out, data=0x2bf51c0, poll_in=value optimized out, 
poll_out=0, poll_err=0) at socket.c:2240
#14 0x7feeba9fc6a7 in event_dispatch_epoll_handler 
(event_pool=0x22e2d00) at event-epoll.c:384

#15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
#16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at 
glusterfsd.c:2023

(gdb) f 4
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428

428pl_inodelk_log_cleanup (l);
(gdb) p l-pl_inode-refkeeper
$1 = (inode_t *) 0x0
(gdb)

pl_inode-refkeeper was found to be NULL even when there were some 
blocked inodelks in a certain domain of the inode,
which when dereferenced by the epoll thread in the cleanup codepath 
led to a crash.


On inspecting the code (for want of a consistent reproducer), three 
things were found:


1. The function where the crash happens (pl_inodelk_log_cleanup()), 
makes an attempt to resolve the inode to path as can be seen below. 
But the way inode_path() itself
works is to first construct the path based on the given inode's 
ancestry and place it in the buffer provided. And if all else fails, 
the gfid of the inode is placed in a certain format (gfid:%s).
This eliminates the need for statements from line 4 through 7 
below, thereby preventing dereferencing of pl_inode-refkeeper.
Now, although this change prevents the crash altogether, it still 
does not fix the race that led to pl_inode-refkeeper becoming NULL, 
and comes at the cost of
printing (null) in the log message on line 9 every time 
pl_inode-refkeeper is found to be NULL, rendering the logged messages 
somewhat useless.


code
  0 pl_inode = lock-pl_inode;
  1
  2 inode_path (pl_inode-refkeeper, NULL, path);
  3
  4 if (path)
  5 file = path;
  6 else
  7 file = uuid_utoa (pl_inode-refkeeper-gfid);
8
  9 gf_log (THIS-name, GF_LOG_WARNING,
 10 releasing lock on %s held by 
 11 {client=%p, pid=%PRId64 lk-owner=%s},
 12 file, lock-client, (uint64_t) lock-client_pid,
 13 lkowner_utoa (lock-owner));
\code
I think this logging code is from the days when gfid handle concept was 
not there. So it wasn't returning gfid:gfid-str in cases the path is 
not present in the dentries. I believe the else block can be deleted 
safely now.


Pranith


2. There is at least one codepath found that can lead to this crash:
Imagine an inode on which an inodelk operation is attempted by a 
client and is successfully granted too.
   Now, between the time the lock was granted and 
pl_update_refkeeper() was called by this thread, the client could send 
a DISCONNECT event,
   causing cleanup codepath to be executed, where the epoll thread 
crashes on dereferencing pl_inode-refkeeper which is STILL NULL at 
this point.


   Besides, there are still places in locks xlator where the refkeeper 
is NOT updated whenever the lists are modified - for instance in the 
cleanup codepath from a