Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-04-11 Thread Joseph Qi


On 18/4/12 03:31, Ashish Samant wrote:
> While reflinking an inode, we create a new inode in orphan directory, then
> take EX lock on it, reflink the original inode to orphan inode and release
> EX lock. Once the lock is released another node could request it in EX mode
> from ocfs2_recover_orphans() which causes downconvert of the lock, on this
> node, to NL mode.
> 
> Later we attempt to initialize security acl for the orphan inode and move
> it to the reflink destination. However, while doing this we dont take EX
> lock on the inode. This could potentially cause problems because we could
> be starting transaction, accessing journal and modifying metadata of the
> inode while holding NL lock and with another node holding EX lock on the
> inode.
> 
> Fix this by taking orphan inode cluster lock in EX mode before
> initializing security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.
> 
> Signed-off-by: Ashish Samant 
> 
Reviewed-by: Joseph Qi 

> V1->V2:
> Modify commit message to better reflect the problem in upstream kernel.
> ---
>  fs/ocfs2/refcounttree.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index ab156e3..1b1283f 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
>  static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
>struct dentry *new_dentry, bool preserve)
>  {
> - int error;
> + int error, had_lock;
>   struct inode *inode = d_inode(old_dentry);
>   struct buffer_head *old_bh = NULL;
>   struct inode *new_orphan_inode = NULL;
> + struct ocfs2_lock_holder oh;
>  
>   if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb)))
>   return -EOPNOTSUPP;
> @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   goto out;
>   }
>  
> + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1,
> + &oh);
> + if (had_lock < 0) {
> + error = had_lock;
> + mlog_errno(error);
> + goto out;
> + }
> +
>   /* If the security isn't preserved, we need to re-initialize them. */
>   if (!preserve) {
>   error = ocfs2_init_security_and_acl(dir, new_orphan_inode,
> @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   if (error)
>   mlog_errno(error);
>   }
> -out:
>   if (!error) {
>   error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode,
>  new_dentry);
>   if (error)
>   mlog_errno(error);
>   }
> + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock);
>  
> +out:
>   if (new_orphan_inode) {
>   /*
>* We need to open_unlock the inode no matter whether we
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Larry Chen
Hi Daniel,

Thanks for your report.
I'll try to reproduce this bug as you did.

I'm afraid there may be some bugs on the collaboration of cgroups and ocfs2.

Thanks
Larry


On 04/11/2018 08:24 PM, Daniel Sobe wrote:
> Hi Larry,
>
> below is an example config file like I use it for LXC containers. I followed 
> the instructions 
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.debian.org_LXC&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=g0D3je5kgCEJiDPFKQ1Yw-c8S8eNY8KJhFC8PNVcGZM&s=k1_NjIjuXW6KE2FAAuAd77CTAy09r-nVBvnfMYcsAEw&e=)
>  and downloaded a Debian 8 container as user (unprivileged) and adapted the 
> config file. Several of those containers run on one host and share the OCFS2 
> directory as you can see at the "lxc.mount.entry" line.
>
> Meanwhile I'm trying whether the problem can be reproduced with shared mounts 
> in one namespace, as you suggested. So far with no success, will report once 
> anything happens.
>
> Regards,
>
> Daniel
>
> 
>
> # Distribution configuration
> lxc.include = /usr/share/lxc/config/debian.common.conf
> lxc.include = /usr/share/lxc/config/debian.userns.conf
> lxc.arch = x86_64
>
> # Container specific configuration
> lxc.id_map = u 0 624288 65536
> lxc.id_map = g 0 624288 65536
>
> lxc.utsname = container1
> lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs
>
> lxc.network.type = veth
> lxc.network.flags = up
> lxc.network.link = bridge1
> lxc.network.name = eth0
> lxc.network.veth.pair = aabbccddeeff
> lxc.network.ipv4 = XX.XX.XX.XX/YY
> lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ
>
> lxc.cgroup.cpuset.cpus = 63-86
>
> lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0
>
> lxc.cgroup.memory.limit_in_bytes   = 240G
> lxc.cgroup.memory.memsw.limit_in_bytes = 240G
>
> lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf
>
> 
>
>
>
>
> -Original Message-
> From: Larry Chen [mailto:lc...@suse.com]
> Sent: Mittwoch, 11. April 2018 13:31
> To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
> Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
>
>
>
> On 04/11/2018 07:17 PM, Daniel Sobe wrote:
>> Hi Larry,
>>
>> this is what I was doing. The 2nd node, while being "declared" in the 
>> cluster.conf, does not exist yet, and thus everything was happening on one 
>> node only.
>>
>> I do not know in detail how LXC does the mount sharing, but I assume it 
>> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
>> separate namespace (or, somehow unshares the mount from the original 
>> namespace afterwards).
> I thought of there is a way to share a directory between host and docker 
> container, like
>      docker run -v /host/directory:/container/directory -other -options 
> image_name command_to_run That's different from yours.
>
> How did you setup your lxc or container?
>
> If you could, show me the procedure, I'll try to reproduce it.
>
> And by the way, if you get rid of lxc, and just mount ocfs2 on several 
> different mount point of local host, will the problem recur?
>
> Regards,
> Larry
>> Regards,
>>
>> Daniel
>>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-04-11 Thread piaojun


On 2018/4/12 3:31, Ashish Samant wrote:
> While reflinking an inode, we create a new inode in orphan directory, then
> take EX lock on it, reflink the original inode to orphan inode and release
> EX lock. Once the lock is released another node could request it in EX mode
> from ocfs2_recover_orphans() which causes downconvert of the lock, on this
> node, to NL mode.
> 
> Later we attempt to initialize security acl for the orphan inode and move
> it to the reflink destination. However, while doing this we dont take EX
> lock on the inode. This could potentially cause problems because we could
> be starting transaction, accessing journal and modifying metadata of the
> inode while holding NL lock and with another node holding EX lock on the
> inode.
> 
> Fix this by taking orphan inode cluster lock in EX mode before
> initializing security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.
> 
> Signed-off-by: Ashish Samant 
Acked-by: Jun Piao 
> 
> V1->V2:
> Modify commit message to better reflect the problem in upstream kernel.
> ---
>  fs/ocfs2/refcounttree.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index ab156e3..1b1283f 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
>  static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
>struct dentry *new_dentry, bool preserve)
>  {
> - int error;
> + int error, had_lock;
>   struct inode *inode = d_inode(old_dentry);
>   struct buffer_head *old_bh = NULL;
>   struct inode *new_orphan_inode = NULL;
> + struct ocfs2_lock_holder oh;
>  
>   if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb)))
>   return -EOPNOTSUPP;
> @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   goto out;
>   }
>  
> + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1,
> + &oh);
> + if (had_lock < 0) {
> + error = had_lock;
> + mlog_errno(error);
> + goto out;
> + }
> +
>   /* If the security isn't preserved, we need to re-initialize them. */
>   if (!preserve) {
>   error = ocfs2_init_security_and_acl(dir, new_orphan_inode,
> @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   if (error)
>   mlog_errno(error);
>   }
> -out:
>   if (!error) {
>   error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode,
>  new_dentry);
>   if (error)
>   mlog_errno(error);
>   }
> + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock);
>  
> +out:
>   if (new_orphan_inode) {
>   /*
>* We need to open_unlock the inode no matter whether we
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-04-11 Thread Junxiao Bi
On 04/12/2018 03:31 AM, Ashish Samant wrote:

> While reflinking an inode, we create a new inode in orphan directory, then
> take EX lock on it, reflink the original inode to orphan inode and release
> EX lock. Once the lock is released another node could request it in EX mode
> from ocfs2_recover_orphans() which causes downconvert of the lock, on this
> node, to NL mode.
>
> Later we attempt to initialize security acl for the orphan inode and move
> it to the reflink destination. However, while doing this we dont take EX
> lock on the inode. This could potentially cause problems because we could
> be starting transaction, accessing journal and modifying metadata of the
> inode while holding NL lock and with another node holding EX lock on the
> inode.
>
> Fix this by taking orphan inode cluster lock in EX mode before
> initializing security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.
>
> Signed-off-by: Ashish Samant 
Reviewed-by: Junxiao Bi 
>
> V1->V2:
> Modify commit message to better reflect the problem in upstream kernel.
> ---
>   fs/ocfs2/refcounttree.c | 14 --
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index ab156e3..1b1283f 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
>   static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
>struct dentry *new_dentry, bool preserve)
>   {
> - int error;
> + int error, had_lock;
>   struct inode *inode = d_inode(old_dentry);
>   struct buffer_head *old_bh = NULL;
>   struct inode *new_orphan_inode = NULL;
> + struct ocfs2_lock_holder oh;
>   
>   if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb)))
>   return -EOPNOTSUPP;
> @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   goto out;
>   }
>   
> + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1,
> + &oh);
> + if (had_lock < 0) {
> + error = had_lock;
> + mlog_errno(error);
> + goto out;
> + }
> +
>   /* If the security isn't preserved, we need to re-initialize them. */
>   if (!preserve) {
>   error = ocfs2_init_security_and_acl(dir, new_orphan_inode,
> @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   if (error)
>   mlog_errno(error);
>   }
> -out:
>   if (!error) {
>   error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode,
>  new_dentry);
>   if (error)
>   mlog_errno(error);
>   }
> + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock);
>   
> +out:
>   if (new_orphan_inode) {
>   /*
>* We need to open_unlock the inode no matter whether we


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-04-11 Thread Ashish Samant
While reflinking an inode, we create a new inode in orphan directory, then
take EX lock on it, reflink the original inode to orphan inode and release
EX lock. Once the lock is released another node could request it in EX mode
from ocfs2_recover_orphans() which causes downconvert of the lock, on this
node, to NL mode.

Later we attempt to initialize security acl for the orphan inode and move
it to the reflink destination. However, while doing this we dont take EX
lock on the inode. This could potentially cause problems because we could
be starting transaction, accessing journal and modifying metadata of the
inode while holding NL lock and with another node holding EX lock on the
inode.

Fix this by taking orphan inode cluster lock in EX mode before
initializing security and moving orphan inode to reflink destination.
Use the __tracker variant while taking inode lock to avoid recursive
locking in the ocfs2_init_security_and_acl() call chain.

Signed-off-by: Ashish Samant 

V1->V2:
Modify commit message to better reflect the problem in upstream kernel.
---
 fs/ocfs2/refcounttree.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index ab156e3..1b1283f 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
 static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
 struct dentry *new_dentry, bool preserve)
 {
-   int error;
+   int error, had_lock;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *old_bh = NULL;
struct inode *new_orphan_inode = NULL;
+   struct ocfs2_lock_holder oh;
 
if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb)))
return -EOPNOTSUPP;
@@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
struct inode *dir,
goto out;
}
 
+   had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1,
+   &oh);
+   if (had_lock < 0) {
+   error = had_lock;
+   mlog_errno(error);
+   goto out;
+   }
+
/* If the security isn't preserved, we need to re-initialize them. */
if (!preserve) {
error = ocfs2_init_security_and_acl(dir, new_orphan_inode,
@@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
struct inode *dir,
if (error)
mlog_errno(error);
}
-out:
if (!error) {
error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode,
   new_dentry);
if (error)
mlog_errno(error);
}
+   ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock);
 
+out:
if (new_orphan_inode) {
/*
 * We need to open_unlock the inode no matter whether we
-- 
1.9.1


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Daniel Sobe
Hi Larry,

below is an example config file like I use it for LXC containers. I followed 
the instructions 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.debian.org_LXC&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=BmWCVeE72QTY9ubXpj4I5tnxoA7khmxQhKu6cPriu-Y&s=XWKvduHietaYbL3xzVzkxDF9-WncOOXJneQ7413qJP0&e=)
 and downloaded a Debian 8 container as user (unprivileged) and adapted the 
config file. Several of those containers run on one host and share the OCFS2 
directory as you can see at the "lxc.mount.entry" line.

Meanwhile I'm trying whether the problem can be reproduced with shared mounts 
in one namespace, as you suggested. So far with no success, will report once 
anything happens. 

Regards,

Daniel



# Distribution configuration
lxc.include = /usr/share/lxc/config/debian.common.conf
lxc.include = /usr/share/lxc/config/debian.userns.conf
lxc.arch = x86_64

# Container specific configuration
lxc.id_map = u 0 624288 65536
lxc.id_map = g 0 624288 65536

lxc.utsname = container1
lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = bridge1
lxc.network.name = eth0
lxc.network.veth.pair = aabbccddeeff
lxc.network.ipv4 = XX.XX.XX.XX/YY
lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ

lxc.cgroup.cpuset.cpus = 63-86

lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0

lxc.cgroup.memory.limit_in_bytes   = 240G
lxc.cgroup.memory.memsw.limit_in_bytes = 240G

lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf






-Original Message-
From: Larry Chen [mailto:lc...@suse.com] 
Sent: Mittwoch, 11. April 2018 13:31
To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels



On 04/11/2018 07:17 PM, Daniel Sobe wrote:
> Hi Larry,
>
> this is what I was doing. The 2nd node, while being "declared" in the 
> cluster.conf, does not exist yet, and thus everything was happening on one 
> node only.
>
> I do not know in detail how LXC does the mount sharing, but I assume it 
> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
> separate namespace (or, somehow unshares the mount from the original 
> namespace afterwards).
I thought of there is a way to share a directory between host and docker 
container, like
    docker run -v /host/directory:/container/directory -other -options 
image_name command_to_run That's different from yours.

How did you setup your lxc or container?

If you could, show me the procedure, I'll try to reproduce it.

And by the way, if you get rid of lxc, and just mount ocfs2 on several 
different mount point of local host, will the problem recur?

Regards,
Larry
> Regards,
>
> Daniel
>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Larry Chen


On 04/11/2018 07:17 PM, Daniel Sobe wrote:
> Hi Larry,
>
> this is what I was doing. The 2nd node, while being "declared" in the 
> cluster.conf, does not exist yet, and thus everything was happening on one 
> node only.
>
> I do not know in detail how LXC does the mount sharing, but I assume it 
> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
> separate namespace (or, somehow unshares the mount from the original 
> namespace afterwards).
I thought of there is a way to share a directory between host and docker 
container, like
    docker run -v /host/directory:/container/directory -other -options 
image_name command_to_run
That's different from yours.

How did you setup your lxc or container?

If you could, show me the procedure, I'll try to reproduce it.

And by the way, if you get rid of lxc, and just mount ocfs2 on several 
different mount point of local host, will the problem recur?

Regards,
Larry
> Regards,
>
> Daniel
>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Daniel Sobe
Hi Larry,

this is what I was doing. The 2nd node, while being "declared" in the 
cluster.conf, does not exist yet, and thus everything was happening on one node 
only.

I do not know in detail how LXC does the mount sharing, but I assume it simply 
calls "mount --bind /original/mount/point /new/mount/point" in a separate 
namespace (or, somehow unshares the mount from the original namespace 
afterwards).

Regards,

Daniel

-Original Message-
From: Larry Chen [mailto:lc...@suse.com] 
Sent: Mittwoch, 11. April 2018 12:43
To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

Hi Daniel,
If you execute mkfs and mount that fs on only one node, and then share the 
mount to several namespaces, will the issue recur?

And could you please show us how you shared the mount to other namespaces?

Thanks
Larry

On 04/11/2018 05:45 PM, Daniel Sobe wrote:
>
> Hi,
>
> having used OCFS2 successfully for a while using Debian 8 with its 
> default kernel "3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 
> (2018-01-08)", I'm now facing issues trying to accomplish the same 
> with newer kernels and Debian 9. Below are the problems that occur, 
> they seem to be the same although the kernel is different.
>
> One trace is from the stock kernel of Debian 9 (at that time), the 
> other is from a very fresh kernel (4.16-rc6). In the latter case, the 
> OOM killer was triggered "shortly" before the bug appeared - it maybe 
> related. The call trace is appended below.
>
> In both cases, only one machine was active. The cluster is configured 
> for 2 machines, but the cluster is not even configured yet at the 2^nd 
> system. Only one OCFS2 file system was mounted, and the mount shared 
> to several namespaces (using LXC). Although the mount was R/W, the 
> users/containers just read from this file system.
>
> Please let me know what I can do to get rid of this issue. I can 
> provide more information about my use case if required.
>
> I already posted to ocfs2-users, only then I saw that it is now 
> recommended to post bugs on ocfs2-devel.
>
> Regards,
>
> Daniel
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here
> ]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
> /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  
> [#1] SMP
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: 
> appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb 
> ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree 
> nls_ut
>
> f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
> iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac 
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt 
> iTCO_vendor_suppor
>
> t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate 
> intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg 
> drm_kms_helper lpc_ich mfd
>
> _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi 
> ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd 
> lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas 
> usb_storage
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 
> crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel 
> aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd 
> xhci_pci uhci_hcd e
>
> hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
> scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded:
> configfs]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700
> Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP 
> ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100
> task.stack: b62f36464000
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 
> 0010:[]  []
> __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 
> 0018:b62f36467b38  EFLAGS: 00010046
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292
> RBX: 990fda6c5618 RCX: 0001
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX: 
> RSI: 0001 RDI: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003
> R08: 0101 R09: 
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038
> R11: 007c R12: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000
> R14:  R15: c0ba5080
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS: 
> (

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Larry Chen
Hi Daniel,
If you execute mkfs and mount that fs on only one node,
and then share the mount to several namespaces, will the
issue recur?

And could you please show us how you shared the mount to
other namespaces?

Thanks
Larry

On 04/11/2018 05:45 PM, Daniel Sobe wrote:
>
> Hi,
>
> having used OCFS2 successfully for a while using Debian 8 with its 
> default kernel “3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 
> (2018-01-08)”, I’m now facing issues trying to accomplish the same 
> with newer kernels and Debian 9. Below are the problems that occur, 
> they seem to be the same although the kernel is different.
>
> One trace is from the stock kernel of Debian 9 (at that time), the 
> other is from a very fresh kernel (4.16-rc6). In the latter case, the 
> OOM killer was triggered “shortly” before the bug appeared – it maybe 
> related. The call trace is appended below.
>
> In both cases, only one machine was active. The cluster is configured 
> for 2 machines, but the cluster is not even configured yet at the 2^nd 
> system. Only one OCFS2 file system was mounted, and the mount shared 
> to several namespaces (using LXC). Although the mount was R/W, the 
> users/containers just read from this file system.
>
> Please let me know what I can do to get rid of this issue. I can 
> provide more information about my use case if required.
>
> I already posted to ocfs2-users, only then I saw that it is now 
> recommended to post bugs on ocfs2-devel.
>
> Regards,
>
> Daniel
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here 
> ]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
> /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  
> [#1] SMP
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: 
> appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb 
> ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree 
> nls_ut
>
> f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
> iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac 
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt 
> iTCO_vendor_suppor
>
> t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate 
> intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg 
> drm_kms_helper lpc_ich mfd
>
> _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi 
> ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd 
> lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas usb_storage
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 
> crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel 
> aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd 
> xhci_pci uhci_hcd e
>
> hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
> scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: 
> configfs]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 
> Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP 
> ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 
> task.stack: b62f36464000
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 
> 0010:[]  [] 
> __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 
> 0018:b62f36467b38  EFLAGS: 00010046
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 
> RBX: 990fda6c5618 RCX: 0001
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX:  
> RSI: 0001 RDI: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 
> R08: 0101 R09: 
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 
> R11: 007c R12: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 
> R14:  R15: c0ba5080
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS: 
> () GS:991bbea8(0063) knlGS:f7462700
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708640] CS:  0010 DS: 002b ES: 
> 002b CR0: 80050033
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708666] CR2: ff60 
> CR3: 00341a7b6000 CR4: 00360670
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708697] DR0:  
> DR1:  DR2: 
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708728] DR3:  
> DR6: fffe0ff0 DR7: 0400
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708759] Stack:
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708771]  f

[Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Daniel Sobe
Hi,

having used OCFS2 successfully for a while using Debian 8 with its default 
kernel "3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08)", I'm now 
facing issues trying to accomplish the same with newer kernels and Debian 9. 
Below are the problems that occur, they seem to be the same although the kernel 
is different.

One trace is from the stock kernel of Debian 9 (at that time), the other is 
from a very fresh kernel (4.16-rc6). In the latter case, the OOM killer was 
triggered "shortly" before the bug appeared - it maybe related. The call trace 
is appended below.

In both cases, only one machine was active. The cluster is configured for 2 
machines, but the cluster is not even configured yet at the 2nd system. Only 
one OCFS2 file system was mounted, and the mount shared to several namespaces 
(using LXC). Although the mount was R/W, the users/containers just read from 
this file system.

Please let me know what I can do to get rid of this issue. I can provide more 
information about my use case if required.

I already posted to ocfs2-users, only then I saw that it is now recommended to 
post bugs on ocfs2-devel.

Regards,

Daniel


Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here 
]
Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
/build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  [#1] SMP
Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: appletalk 
ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2 
ocfs2_nodemanager configfs ocfs2_stackglue quota_tree nls_ut
f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_suppor
t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
nls_ascii nls_cp437 vfat fat intel_cstate intel_uncore intel_rapl_perf 
efi_pstore efivars pcspkr mgag200 ttm sg drm_kms_helper lpc_ich mfd
_core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi ipmi_si 
acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd lru_cache libcrc32c 
efivarfs ip_tables x_tables autofs4 uas usb_storage
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 crc32c_generic 
fscrypto ecb mbcache dm_mod sd_mod crc32c_intel aesni_intel aes_x86_64 
glue_helper lrw gf128mul ablk_helper cryptd xhci_pci uhci_hcd e
hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: configfs]
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 Comm: perl 
Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP ProLiant 
DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 
task.stack: b62f36464000
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 0010:[]  
[] __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 0018:b62f36467b38  
EFLAGS: 00010046
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 RBX: 
990fda6c5618 RCX: 0001
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX:  RSI: 
0001 RDI: 990fda6c5694
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 R08: 
0101 R09: 
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 R11: 
007c R12: 990fda6c5694
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 R14: 
 R15: c0ba5080
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS:  () 
GS:991bbea8(0063) knlGS:f7462700
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708640] CS:  0010 DS: 002b ES: 002b 
CR0: 80050033
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708666] CR2: ff60 CR3: 
00341a7b6000 CR4: 00360670
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708697] DR0:  DR1: 
 DR2: 
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708728] DR3:  DR6: 
fffe0ff0 DR7: 0400
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708759] Stack:
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708771]  c0b12b45 
 99101a537300 99101a51c4c8
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708812]  990fda6c5e00 
c0b02274 99101a537180 99101a4e04c8
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708849]   
99101a537300 dad51186f40d61bf 99101a4e04c8
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708886] Call Trace:
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708919]  [] ? 
ocfs2_dentry_unlock+0x35/0x80 [ocfs2]
Mar 22 19:26:55 drs1s005 kernel: [ 7545.708964]