Re: [Cluster-devel] [PATCH v3] fs: record task name which froze superblock
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote: On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote: On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote: Freezing and thawing are separate system calls, task which is supposed to thaw filesystem/superblock can disappear due to crash or not thaw due to a bug. At least record task name (we can't take task_struct reference) to make support engineer's life easier. Hopefully 16 bytes per superblock isn't much. TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is moved to userspace exported header to not drag sched.h into every fs.h inclusion. Signed-off-by: Alexey Dobriyan adobri...@gmail.com Freeze/thaw can be nested at the block level. That means the sb-s_writers.freeze_comm can point at the wrong process. i.e. Task A Task B freeze_bdev freeze_super freeze_comm = A freeze_bdev . thaw_bdev device still frozen crash At this point, the block device will never be unthawed, but the debug field is now pointing to the wrong task. i.e. The debug helper has not recorded the process that is actually causing the problem, and leads us all off on a wild goose chase down the wrong path. IMO, debug code is only useful if it's reliable. It can be trivially modified to be very useful to support people. Actually this patch clears saved task name on unfreeze, so in this particular scenario we would end up with no data. It only clears it i thaw_super(), which is *not called* until the last nested thaw_bdev() call is made. When the system is hung what we actually need to know is who is responsible for *thawing* the filesystem and then we can work out why that hasn't run. What this code tries to do is identify who froze the filesystem and so indicate who *might* be responsible for thawing it. If we mis-identify the agent who holds the freeze status, then we fail to identify who needs to run the thaw and hence we're still stuck not knowing WTF happened I understand why you want to record this - I'm not arguing that we shouldn't do this. My point is that we should *make it reliable* and not in any way ambiguous, otherwise we failed to solve the problem it was intended for. Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [Cluster-devel] [PATCH v3] fs: record task name which froze superblock
On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote: On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote: Freezing and thawing are separate system calls, task which is supposed to thaw filesystem/superblock can disappear due to crash or not thaw due to a bug. At least record task name (we can't take task_struct reference) to make support engineer's life easier. Hopefully 16 bytes per superblock isn't much. TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is moved to userspace exported header to not drag sched.h into every fs.h inclusion. Signed-off-by: Alexey Dobriyan adobri...@gmail.com Freeze/thaw can be nested at the block level. That means the sb-s_writers.freeze_comm can point at the wrong process. i.e. Task ATask B freeze_bdev freeze_super freeze_comm = A freeze_bdev . thaw_bdev device still frozen crash At this point, the block device will never be unthawed, but the debug field is now pointing to the wrong task. i.e. The debug helper has not recorded the process that is actually causing the problem, and leads us all off on a wild goose chase down the wrong path. IMO, debug code is only useful if it's reliable. It can be trivially modified to be very useful to support people. Actually this patch clears saved task name on unfreeze, so in this particular scenario we would end up with no data. Freezer and unfreezer names don't even have to match, so there is not much we can do here (e.g. recording all names in a linked list or something is a non-starter because of this). I propose the following: - on freezing: 1. if 0-1 save the name 2. if 1-2 have a flag to note there is an additional freezer - on unfreezing 1. if 1-0 clear the flag 2. DO NOT clear the name in any case This way we keep the name for possible future reference and we know whether something with this name was the sole freezer in this cycle. As explained below, this one task name is already very useful and likely covers majority of real life use cases. While working in support we were getting a lot of vmcores where hung task detector panicked the kernel because a lot of tasks were blocked in UN state trying to write to frozen filesystems. I presume OP has similar story. Some back on forth commuication almost always revealed one process e.g. freezing stuff and then blocking itself trying to access it. While we could see it blocked, we had no presumptive evidence to pin freezing on it. A matching name, while still not 100% conclusive, would be ok enough to push the case forward and avoid a rountrip of systemap scripts showing freezer process tree. -- Mateusz Guzik
Re: [Cluster-devel] [PATCH v3] fs: record task name which froze superblock
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote: On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote: On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote: Freezing and thawing are separate system calls, task which is supposed to thaw filesystem/superblock can disappear due to crash or not thaw due to a bug. At least record task name (we can't take task_struct reference) to make support engineer's life easier. Hopefully 16 bytes per superblock isn't much. TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is moved to userspace exported header to not drag sched.h into every fs.h inclusion. Signed-off-by: Alexey Dobriyan adobri...@gmail.com Freeze/thaw can be nested at the block level. That means the sb-s_writers.freeze_comm can point at the wrong process. i.e. Task A Task B freeze_bdev freeze_super freeze_comm = A freeze_bdev . thaw_bdev device still frozen crash At this point, the block device will never be unthawed, but the debug field is now pointing to the wrong task. i.e. The debug helper has not recorded the process that is actually causing the problem, and leads us all off on a wild goose chase down the wrong path. IMO, debug code is only useful if it's reliable. It can be trivially modified to be very useful to support people. Actually this patch clears saved task name on unfreeze, so in this particular scenario we would end up with no data. Freezer and unfreezer names don't even have to match, so there is not much we can do here (e.g. recording all names in a linked list or something is a non-starter because of this). I propose the following: - on freezing: 1. if 0-1 save the name 2. if 1-2 have a flag to note there is an additional freezer - on unfreezing 1. if 1-0 clear the flag 2. DO NOT clear the name in any case Now that I sent this e-mail I realized we could actually keep a linked list of freezer names. Unfreezing would delete all elements when going 1-0, but would not touch it otherwise. This would cover a less likely use case though, so I would be fine either way FWIW. Just my $0,03. This way we keep the name for possible future reference and we know whether something with this name was the sole freezer in this cycle. As explained below, this one task name is already very useful and likely covers majority of real life use cases. While working in support we were getting a lot of vmcores where hung task detector panicked the kernel because a lot of tasks were blocked in UN state trying to write to frozen filesystems. I presume OP has similar story. Some back on forth commuication almost always revealed one process e.g. freezing stuff and then blocking itself trying to access it. While we could see it blocked, we had no presumptive evidence to pin freezing on it. A matching name, while still not 100% conclusive, would be ok enough to push the case forward and avoid a rountrip of systemap scripts showing freezer process tree. -- Mateusz Guzik
[Cluster-devel] [PATCH] libgfs2: Make sure secontext gets freed
Free sbd.secontext after is_pathname_mounted allocates it. As the RHEL6 metafs code has no construct/destruct model similar to upstream, the secontext string has to be freed in a few places additional to cleanup_metafs. Resolves: #1121693 Signed-off-by: Andrew Price anpr...@redhat.com --- gfs2/fsck/initialize.c | 1 + gfs2/libgfs2/misc.c| 1 + gfs2/tool/misc.c | 2 ++ gfs2/tool/tune.c | 4 4 files changed, 8 insertions(+) diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c index 9abdf0c..0c374eb 100644 --- a/gfs2/fsck/initialize.c +++ b/gfs2/fsck/initialize.c @@ -1566,6 +1566,7 @@ int initialize(struct gfs2_sbd *sdp, int force_check, int preen, sizeof(sdp-device_name)); sdp-path_name = sdp-device_name; /* This gets overwritten */ is_mounted = is_pathname_mounted(sdp, ro); + free(sdp-secontext); /* If the device is busy, but not because it's mounted, fail. This protects against cases where the file system is LVM and perhaps mounted on a different node. */ diff --git a/gfs2/libgfs2/misc.c b/gfs2/libgfs2/misc.c index 5ef4a2a..955e141 100644 --- a/gfs2/libgfs2/misc.c +++ b/gfs2/libgfs2/misc.c @@ -301,6 +301,7 @@ void cleanup_metafs(struct gfs2_sbd *sdp) sigaction(SIGUSR2, sa, NULL); metafs_interrupted = 0; remove_mtab_entry(sdp); + free(sdp-secontext); } static void sighandler(int error) diff --git a/gfs2/tool/misc.c b/gfs2/tool/misc.c index 37c81cd..506f590 100644 --- a/gfs2/tool/misc.c +++ b/gfs2/tool/misc.c @@ -324,6 +324,7 @@ void do_withdraw(int argc, char **argv) if (optind == argc) die(Usage: gfs2_tool withdraw mountpoint\n); + sbd.secontext = NULL; sbd.path_name = argv[optind]; if (check_for_gfs2(sbd)) { if (errno == EINVAL) @@ -345,5 +346,6 @@ void do_withdraw(int argc, char **argv) strerror(errno)); exit(-1); } + free(sbd.secontext); } diff --git a/gfs2/tool/tune.c b/gfs2/tool/tune.c index 4666291..92a7b6f 100644 --- a/gfs2/tool/tune.c +++ b/gfs2/tool/tune.c @@ -46,6 +46,7 @@ get_tune(int argc, char **argv) if (optind == argc) die( _(Usage: gfs2_tool gettune mountpoint\n)); + sbd.secontext = NULL; sbd.path_name = argv[optind]; if (check_for_gfs2(sbd)) { if (errno == EINVAL) @@ -87,6 +88,7 @@ get_tune(int argc, char **argv) printf(%s = %s\n, de-d_name, get_sysfs(fs, path)); } closedir(d); + free(sbd.secontext); } /** @@ -105,6 +107,7 @@ set_tune(int argc, char **argv) char *fs; struct gfs2_sbd sbd; + sbd.secontext = NULL; if (optind == argc) die( _(Usage: gfs2_tool settune mountpoint parameter value\n)); sbd.path_name = argv[optind++]; @@ -141,4 +144,5 @@ set_tune(int argc, char **argv) param, strerror(errno)); exit(-1); } + free(sbd.secontext); } -- 1.9.3
[Cluster-devel] [PATCH] GFS2: Fix potential NULL dereference in gfs2_alloc_inode
Return NULL when ip is NULL instead of dereferencing it. Signed-off-by: Andrew Price anpr...@redhat.com --- fs/gfs2/super.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 1666382..37c59ee 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1628,12 +1628,13 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb) struct gfs2_inode *ip; ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL); - if (ip) { - ip-i_flags = 0; - ip-i_gl = NULL; - ip-i_rgd = NULL; - ip-i_res = NULL; - } + if (!ip) + return NULL; + + ip-i_flags = 0; + ip-i_gl = NULL; + ip-i_rgd = NULL; + ip-i_res = NULL; return ip-i_inode; } -- 1.9.3
Re: [Cluster-devel] [PATCH] GFS2: Fix potential NULL dereference in gfs2_alloc_inode
On 02/03/15 16:17, Steven Whitehouse wrote: Hi, On 02/03/15 16:15, Andrew Price wrote: Return NULL when ip is NULL instead of dereferencing it. Signed-off-by: Andrew Price anpr...@redhat.com --- fs/gfs2/super.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 1666382..37c59ee 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1628,12 +1628,13 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb) struct gfs2_inode *ip; ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL); -if (ip) { -ip-i_flags = 0; -ip-i_gl = NULL; -ip-i_rgd = NULL; -ip-i_res = NULL; -} +if (!ip) +return NULL; + +ip-i_flags = 0; +ip-i_gl = NULL; +ip-i_rgd = NULL; +ip-i_res = NULL; return ip-i_inode; } I'm not sure that I see the problem here... it should just return NULL if ip is NULL, since ip-i_inode is the first element of ip, Ah, so it is. Self-NACK then. Andy
Re: [Cluster-devel] [PATCH] GFS2: Fix potential NULL dereference in gfs2_alloc_inode
Hi, On 02/03/15 16:15, Andrew Price wrote: Return NULL when ip is NULL instead of dereferencing it. Signed-off-by: Andrew Price anpr...@redhat.com --- fs/gfs2/super.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 1666382..37c59ee 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1628,12 +1628,13 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb) struct gfs2_inode *ip; ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL); - if (ip) { - ip-i_flags = 0; - ip-i_gl = NULL; - ip-i_rgd = NULL; - ip-i_res = NULL; - } + if (!ip) + return NULL; + + ip-i_flags = 0; + ip-i_gl = NULL; + ip-i_rgd = NULL; + ip-i_res = NULL; return ip-i_inode; } I'm not sure that I see the problem here... it should just return NULL if ip is NULL, since ip-i_inode is the first element of ip, Steve.
Re: [Cluster-devel] [PATCH] GFS2: Fix potential NULL dereference in gfs2_alloc_inode
On 03/02/2015 05:30 PM, Andrew Price wrote: On 02/03/15 16:17, Steven Whitehouse wrote: Hi, On 02/03/15 16:15, Andrew Price wrote: Ah, so it is. Self-NACK then. The patch itself is still worth it though, it's not self-evident ip and ip-i_inode are the same. Andreas
[Cluster-devel] [PATCH] GFS2: Improve readability of gfs2_alloc_inode
Returning ip-i_inode when ip is NULL is safe as i_inode is the first member in struct gfs2_inode, but that's not immediately obvious. Reorganize gfs2_alloc_inode to avoid any doubt. Signed-off-by: Andrew Price anpr...@redhat.com --- Re-sending with a more appropriate commit log based on Andreas' comments. fs/gfs2/super.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 1666382..37c59ee 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1628,12 +1628,13 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb) struct gfs2_inode *ip; ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL); - if (ip) { - ip-i_flags = 0; - ip-i_gl = NULL; - ip-i_rgd = NULL; - ip-i_res = NULL; - } + if (!ip) + return NULL; + + ip-i_flags = 0; + ip-i_gl = NULL; + ip-i_rgd = NULL; + ip-i_res = NULL; return ip-i_inode; } -- 1.9.3