Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-08 Thread Eric Ren

Hi all,

On 10/19/2016 01:19 PM, Eric Ren wrote:

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..7e3544e 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
  {
struct buffer_head *bh = NULL;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
int status = 0;
  
-	status = ocfs2_inode_lock(inode, &bh, 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   status = ocfs2_inode_lock(inode, &bh, 1);
+   if (status < 0) {
+   if (status != -ENOENT)
+   mlog_errno(status);
+   return status;
+   }
}

This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.

So, we may need another function something like ocfs2_inode_getbh():
 if (!oh)
ocfs2_inode_lock();
   else
   ocfs2_inode_getbh();

Eric

+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!oh)
+   ocfs2_inode_unlock(inode, 1);
brelse(bh);
return status;
  }
@@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
int ret;
  
  	osb = OCFS2_SB(inode->i_sb);

if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
+
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   ret = ocfs2_inode_lock(inode, &di_bh, 0);
+   if (ret < 0) {
+   if (ret != -ENOENT)
+   mlog_errno(ret);
+   return ERR_PTR(ret);
+   }
}
  
  	acl = ocfs2_get_acl_nolock(inode, type, di_bh);
  
-	ocfs2_inode_unlock(inode, 0);

+   if (!oh)
+   ocfs2_inode_unlock(inode, 0);
brelse(di_bh);
return acl;
  }



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-11-08 Thread Eric Ren

Hi all,

On 10/19/2016 01:19 PM, Eric Ren wrote:

ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are three solutions I can think of.  The first one is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code. Frown on this.

The second one is, what I am trying now, to keep track of the processes who
lock/unlock a cluster lock by the following draft patches. But, I quickly
find out that a cluster locking which has been taken by processA can be unlocked
by processB. For example, systemfiles like journal: is locked during mout, 
and
unlocked during umount.

We can avoid the problem above by:

1) not keeping track of system file inode:

   if (!(OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE)) {
   
  }

2) only keeping track of inode metadata lockres:

   OCFS2_I(inode)->ip_inode_lockres;

because inode open lockres can also be get/release by different processes.

Eric


The thrid one is to revert that problematic commit! It looks like get/set_acl()
are always been called by other vfs callback like ocfs2_permission(). I think
we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)

Hope for your input to solve this problem;-)

Thanks,
Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2: old mle put and release after the function dlm_add_migration_mle called.

2016-11-08 Thread Guozhonghua

If the old mle is found after the dlm_add_migration_mle called, it should be 
put once.
If the return value is not - EEXIST and its type is BLOCK, it should be put 
again to release it to avoid memory leak, for it had been unhashed from the map.

Signed-off-by: Guozhonghua 

--- ocfs2.orig/dlm/dlmmaster.c  2016-11-08 16:58:30.233592715 +0800
+++ ocfs2/dlm/dlmmaster.c   2016-11-08 17:37:00.157673017 +0800
@@ -2621,20 +2621,45 @@ static int dlm_migrate_lockres(struct dl
spin_lock(&dlm->master_lock);
ret = dlm_add_migration_mle(dlm, res, mle, &oldmle, name,
namelen, target, dlm->node_num);
+   if (ret == -EEXIST) {
+   if(oldmle)
+   __dlm_put_mle(oldmle);
+
+   spin_unlock(&dlm->master_lock);
+   spin_unlock(&dlm->spinlock);
+   mlog(0, "another process is already migrating it\n");
+   goto fail;
+   }
+
+   /* If an old one mle found, it should be put. if its type is BLOCK,
+* it should be put again. Because it had been unhasded from the map
+* in the function dlm_add_migration_mle.
+* otherwise the memory will be leaked. It will not found it again from
+* the hash map.
+*/
+   if (oldmle) {
+   /* master is known, detach if not already detached */
+   __dlm_mle_detach_hb_events(dlm, oldmle);
+   __dlm_put_mle(oldmle);
+
+   /* if the type of the mle is BLOCK, should put it once for 
release.
+* otherwise memory leak may be caused because oldmle had been 
unhashed
+* from the hash map, it will not be found anymore.
+*/
+   if (oldmle->type == DLM_MLE_BLOCK)
+   __dlm_put_mle(oldmle);
+   }
+
/* get an extra reference on the mle.
 * otherwise the assert_master from the new
 * master will destroy this.
 */
dlm_get_mle_inuse(mle);
+   mle_added = 1;
+
spin_unlock(&dlm->master_lock);
spin_unlock(&dlm->spinlock);

-   if (ret == -EEXIST) {
-   mlog(0, "another process is already migrating it\n");
-   goto fail;
-   }
-   mle_added = 1;
-
/*
 * set the MIGRATING flag and flush asts
 * if we fail after this we need to re-dirty the lockres
@@ -2651,12 +2676,6 @@ static int dlm_migrate_lockres(struct dl
}

 fail:
-   if (ret != -EEXIST && oldmle) {
-   /* master is known, detach if not already detached */
-   dlm_mle_detach_hb_events(dlm, oldmle);
-   dlm_put_mle(oldmle);
-   }
-
if (ret < 0) {
if (mle_added) {
dlm_mle_detach_hb_events(dlm, mle);
@@ -3191,16 +3210,23 @@ int dlm_migrate_request_handler(struct o
if (ret < 0)
kmem_cache_free(dlm_mle_cache, mle);

+   /* If an old one mle found, it should be put. if its type is BLOCK,
+* it should be put again. Because it had been unhasded from the map
+* in the function dlm_add_migration_mle.
+* otherwise the memory will be leaked. It will not found it again from
+* the hash map.
+*/
+   if (oldmle) {
+   __dlm_mle_detach_hb_events(dlm, oldmle);
+   __dlm_put_mle(oldmle);
+   if (ret >= 0 && oldmle->type == DLM_MLE_BLOCK)
+   __dlm_put_mle(oldmle);
+   }
+
spin_unlock(&dlm->master_lock);
 unlock:
spin_unlock(&dlm->spinlock);

-   if (oldmle) {
-   /* master is known, detach if not already detached */
-   dlm_mle_detach_hb_events(dlm, oldmle);
-   dlm_put_mle(oldmle);
-   }
-
if (res)
dlm_lockres_put(res);
 leave:
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] Announcing btrfs-dedupe

2016-11-08 Thread Darrick J. Wong
On Tue, Nov 08, 2016 at 10:59:56AM -0800, Mark Fasheh wrote:
> On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong  
> wrote:
> > On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> >> Mark has already included XFS in documentation of duperemove, all that 
> >> looks
> >> amiss is btrfs-extent-same having an obsolete name.  But then, I never did
> >> any non-superficial tests on XFS, beyond "seems to work".
> 
> I'd actually be ok dropping btrfs-extent-same completely at this point
> but I'm concerned that it would leave some users behind.
> 
> 
> > /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)
> 
> Hey, Ocfs2 started the reflink party! But yeah it's fallen behind
> since then with respect to cow and dedupe. More importantly though I'd
> like to see some extra extent tracking in there like XFS did with the
> reflink b+tree.

Perhaps this should move to the ocfs2 list, but...

...as I understand ocfs2, each inode can point to the head of a refcount
tree that maintains refcounts for all the physical blocks that are
mapped by any of the files that share that refcount tree.  It wouldn't
be difficult to hook up this existing refcount structure to the reflink
and dedupe vfs ioctls, with the huge caveat that both inodes will end up
belonging to the same refcount tree (or the call fails).  This might not
be such a huge issue for reflink since we're generally only using it
during a file copy anyway, but for dedupe this could have disastrous
consequences if someone does an fs-wide dedupe and every file in the fs
ends up with the same refcount tree.

So I guess you could give each block group its own refcount tree or
something so that all the writes in the fs don't end up contending for a
single data structure.

--D

>--Mark
> 
> -- 
> "When the going gets weird, the weird turn pro."
> Hunter S. Thompson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel