date:20161116

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang

2016-11-16 Thread Joseph Qi

Hi Changwei,

Why are the dead nodes still in live map, according to your dlm_state file?

Thanks,

Joseph

On 16/11/17 14:03, Gechangwei wrote:
> Hi
>
> During my recent test on OCFS2, an umount hang issue was found.
> Below clues can help us to analyze this issue.
>
>  From the debug information, we can see some abnormal stats like only node 1 
> is in DLM domain map, however, node 3 - 9 are still
> in MLE's node map and vote map.
> The root cause of unchanging vote map I think is that HB events are detached 
> too early!
> That caused no chance of transforming from BLOCK MLE into MASTER MLE. Thus 
> NODE 1 can't master lock resource even
> other nodes are all dead.
>
> To fix this, I propose a patch.
>
>  From 3163fa7024d96f8d6e6ec2b37ad44e2cc969abd9 Mon Sep 17 00:00:00 2001
> From: gechangwei 
> Date: Thu, 17 Nov 2016 14:00:45 +0800
> Subject: [PATCH] fix umount hang
>
> Signed-off-by: gechangwei 
> ---
>   fs/ocfs2/dlm/dlmmaster.c | 2 --
>   1 file changed, 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 6ea06f8..3c46882 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -3354,8 +3354,6 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm,
>  spin_unlock(&mle->spinlock);
>  wake_up(&mle->wq);
>
> -   /* Do not need events any longer, so detach from heartbeat */
> -   __dlm_mle_detach_hb_events(dlm, mle);
>  __dlm_put_mle(mle);
>  }
>   }
> --
> 2.5.1.windows.1
>
>
> root@HXY-CVK110:~# grep P00 bbb
> Lockres: P00   Owner: 255  State: 0x10 InProgress
>
> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat 
> dlm_state
> Domain: 7DA412FEB1374366B0F3C70025EB1437  Key: 0x8ff804a1  Protocol: 1.2
> Thread Pid: 21679  Node: 1  State: JOINED
> Number of Joins: 1  Joining Node: 255
> Domain Map: 1
> Exit Domain Map:
> Live Map: 1 2 3 4 5 6 7 8 9
> Lock Resources: 29 (116)
> MLEs: 1 (119)
>Blocking: 1 (4)
>Mastery: 0 (115)
>Migration: 0 (0)
> Lists: Dirty=Empty  Purge=Empty  PendingASTs=Empty  PendingBASTs=Empty
> Purge Count: 0  Refs: 1
> Dead Node: 255
> Recovery Pid: 21680  Master: 255  State: INACTIVE
> Recovery Map:
> Recovery Node State:
>
>
> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# ls
> dlm_state  locking_state  mle_state  purge_list
> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat 
> mle_state
> Dumping MLEs for Domain: 7DA412FEB1374366B0F3C70025EB1437
> P00  BLK  mas=255 new=255 evt=0use=1  
>  ref=  2
> Maybe=
> Vote=3 4 5 6 7 8 9
> Response=
> Node=3 4 5 6 7 8 9
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang

2016-11-16 Thread Gechangwei

Hi

During my recent test on OCFS2, an umount hang issue was found.
Below clues can help us to analyze this issue.

From the debug information, we can see some abnormal stats like only node 1 is 
in DLM domain map, however, node 3 - 9 are still
in MLE's node map and vote map.
The root cause of unchanging vote map I think is that HB events are detached 
too early!
That caused no chance of transforming from BLOCK MLE into MASTER MLE. Thus NODE 
1 can't master lock resource even
other nodes are all dead.

To fix this, I propose a patch.

From 3163fa7024d96f8d6e6ec2b37ad44e2cc969abd9 Mon Sep 17 00:00:00 2001
From: gechangwei 
Date: Thu, 17 Nov 2016 14:00:45 +0800
Subject: [PATCH] fix umount hang

Signed-off-by: gechangwei 
---
 fs/ocfs2/dlm/dlmmaster.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 6ea06f8..3c46882 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -3354,8 +3354,6 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm,
spin_unlock(&mle->spinlock);
wake_up(&mle->wq);

-   /* Do not need events any longer, so detach from heartbeat */
-   __dlm_mle_detach_hb_events(dlm, mle);
__dlm_put_mle(mle);
}
 }
--
2.5.1.windows.1


root@HXY-CVK110:~# grep P00 bbb
Lockres: P00   Owner: 255  State: 0x10 InProgress

root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat 
dlm_state
Domain: 7DA412FEB1374366B0F3C70025EB1437  Key: 0x8ff804a1  Protocol: 1.2
Thread Pid: 21679  Node: 1  State: JOINED
Number of Joins: 1  Joining Node: 255
Domain Map: 1
Exit Domain Map:
Live Map: 1 2 3 4 5 6 7 8 9
Lock Resources: 29 (116)
MLEs: 1 (119)
  Blocking: 1 (4)
  Mastery: 0 (115)
  Migration: 0 (0)
Lists: Dirty=Empty  Purge=Empty  PendingASTs=Empty  PendingBASTs=Empty
Purge Count: 0  Refs: 1
Dead Node: 255
Recovery Pid: 21680  Master: 255  State: INACTIVE
Recovery Map:
Recovery Node State:


root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# ls
dlm_state  locking_state  mle_state  purge_list
root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat 
mle_state
Dumping MLEs for Domain: 7DA412FEB1374366B0F3C70025EB1437
P00  BLK  mas=255 new=255 evt=0use=1   
ref=  2
Maybe=
Vote=3 4 5 6 7 8 9
Response=
Node=3 4 5 6 7 8 9
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
邮件！
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-16 Thread Eric Ren

Hi,

On 11/16/2016 06:45 PM, Dan Carpenter wrote:
> On Wed, Nov 16, 2016 at 10:33:49AM +0800, Eric Ren wrote:
> That silences the warning, of course, but I feel like the code is buggy.
> How do we know that we don't hit that exit path?
Sorry, I missed your point. Do you mean the below?

"1817 goto out_quota; " will free (*wc), but with "ret = 0". Thus, the caller
think it's OK to use (*wc), but...

Do I understand you correctly?

Eric
>
> fs/ocfs2/aops.c
>1808  /*
>1809   * ocfs2_grab_pages_for_write() returns -EAGAIN if it could 
> not lock
>1810   * the target page. In this case, we exit with no error and 
> no target
>1811   * page. This will trigger the caller, page_mkwrite(), to 
> re-try
>1812   * the operation.
>1813   */
>1814  if (ret == -EAGAIN) {
>1815  BUG_ON(wc->w_target_page);
>1816  ret = 0;
>1817  goto out_quota;
>1818  }
>
> regards,
> dan carpenter
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-16 Thread Dan Carpenter

On Wed, Nov 16, 2016 at 10:33:49AM +0800, Eric Ren wrote:
> >>>fs/ocfs2/aops.c
> >>>   2235
> >>>   2236  ret = ocfs2_write_begin_nolock(inode->i_mapping, pos, len,
> >>>   2237 OCFS2_WRITE_DIRECT, NULL,
> >>>   2238 (void **)&wc, di_bh, NULL);
> >>>
> How do you perform the static checker? Please tech me;-)
> 

It's Smatch things that's not public yet.  Soon.

> Regarding this warning, please try to make this line
> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L2128)
> into:
> 
> struct ocfs2_write_ctxt *wc = NULL;
> 
> It should work, and haven't any side effect.

That silences the warning, of course, but I feel like the code is buggy.
How do we know that we don't hit that exit path?

fs/ocfs2/aops.c
  1808  /*
  1809   * ocfs2_grab_pages_for_write() returns -EAGAIN if it could not 
lock
  1810   * the target page. In this case, we exit with no error and no 
target
  1811   * page. This will trigger the caller, page_mkwrite(), to re-try
  1812   * the operation.
  1813   */
  1814  if (ret == -EAGAIN) {
  1815  BUG_ON(wc->w_target_page);
  1816  ret = 0;
  1817  goto out_quota;
  1818  }

regards,
dan carpenter



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang

[Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

4 matches

Site Navigation

Mail list logo

Footer information