Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir
On 18/4/12 03:31, Ashish Samant wrote: > While reflinking an inode, we create a new inode in orphan directory, then > take EX lock on it, reflink the original inode to orphan inode and release > EX lock. Once the lock is released another node could request it in EX mode > from ocfs2_recover_orphans() which causes downconvert of the lock, on this > node, to NL mode. > > Later we attempt to initialize security acl for the orphan inode and move > it to the reflink destination. However, while doing this we dont take EX > lock on the inode. This could potentially cause problems because we could > be starting transaction, accessing journal and modifying metadata of the > inode while holding NL lock and with another node holding EX lock on the > inode. > > Fix this by taking orphan inode cluster lock in EX mode before > initializing security and moving orphan inode to reflink destination. > Use the __tracker variant while taking inode lock to avoid recursive > locking in the ocfs2_init_security_and_acl() call chain. > > Signed-off-by: Ashish Samant > Reviewed-by: Joseph Qi > V1->V2: > Modify commit message to better reflect the problem in upstream kernel. > --- > fs/ocfs2/refcounttree.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c > index ab156e3..1b1283f 100644 > --- a/fs/ocfs2/refcounttree.c > +++ b/fs/ocfs2/refcounttree.c > @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry, > static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir, >struct dentry *new_dentry, bool preserve) > { > - int error; > + int error, had_lock; > struct inode *inode = d_inode(old_dentry); > struct buffer_head *old_bh = NULL; > struct inode *new_orphan_inode = NULL; > + struct ocfs2_lock_holder oh; > > if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb))) > return -EOPNOTSUPP; > @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, > struct inode *dir, > goto out; > } > > + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1, > + &oh); > + if (had_lock < 0) { > + error = had_lock; > + mlog_errno(error); > + goto out; > + } > + > /* If the security isn't preserved, we need to re-initialize them. */ > if (!preserve) { > error = ocfs2_init_security_and_acl(dir, new_orphan_inode, > @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, > struct inode *dir, > if (error) > mlog_errno(error); > } > -out: > if (!error) { > error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode, > new_dentry); > if (error) > mlog_errno(error); > } > + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock); > > +out: > if (new_orphan_inode) { > /* >* We need to open_unlock the inode no matter whether we > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
Hi Daniel, Thanks for your report. I'll try to reproduce this bug as you did. I'm afraid there may be some bugs on the collaboration of cgroups and ocfs2. Thanks Larry On 04/11/2018 08:24 PM, Daniel Sobe wrote: > Hi Larry, > > below is an example config file like I use it for LXC containers. I followed > the instructions > (https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.debian.org_LXC&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=g0D3je5kgCEJiDPFKQ1Yw-c8S8eNY8KJhFC8PNVcGZM&s=k1_NjIjuXW6KE2FAAuAd77CTAy09r-nVBvnfMYcsAEw&e=) > and downloaded a Debian 8 container as user (unprivileged) and adapted the > config file. Several of those containers run on one host and share the OCFS2 > directory as you can see at the "lxc.mount.entry" line. > > Meanwhile I'm trying whether the problem can be reproduced with shared mounts > in one namespace, as you suggested. So far with no success, will report once > anything happens. > > Regards, > > Daniel > > > > # Distribution configuration > lxc.include = /usr/share/lxc/config/debian.common.conf > lxc.include = /usr/share/lxc/config/debian.userns.conf > lxc.arch = x86_64 > > # Container specific configuration > lxc.id_map = u 0 624288 65536 > lxc.id_map = g 0 624288 65536 > > lxc.utsname = container1 > lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs > > lxc.network.type = veth > lxc.network.flags = up > lxc.network.link = bridge1 > lxc.network.name = eth0 > lxc.network.veth.pair = aabbccddeeff > lxc.network.ipv4 = XX.XX.XX.XX/YY > lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ > > lxc.cgroup.cpuset.cpus = 63-86 > > lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0 > > lxc.cgroup.memory.limit_in_bytes = 240G > lxc.cgroup.memory.memsw.limit_in_bytes = 240G > > lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf > > > > > > > -Original Message- > From: Larry Chen [mailto:lc...@suse.com] > Sent: Mittwoch, 11. April 2018 13:31 > To: Daniel Sobe ; ocfs2-devel@oss.oracle.com > Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels > > > > On 04/11/2018 07:17 PM, Daniel Sobe wrote: >> Hi Larry, >> >> this is what I was doing. The 2nd node, while being "declared" in the >> cluster.conf, does not exist yet, and thus everything was happening on one >> node only. >> >> I do not know in detail how LXC does the mount sharing, but I assume it >> simply calls "mount --bind /original/mount/point /new/mount/point" in a >> separate namespace (or, somehow unshares the mount from the original >> namespace afterwards). > I thought of there is a way to share a directory between host and docker > container, like > docker run -v /host/directory:/container/directory -other -options > image_name command_to_run That's different from yours. > > How did you setup your lxc or container? > > If you could, show me the procedure, I'll try to reproduce it. > > And by the way, if you get rid of lxc, and just mount ocfs2 on several > different mount point of local host, will the problem recur? > > Regards, > Larry >> Regards, >> >> Daniel >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir
On 2018/4/12 3:31, Ashish Samant wrote: > While reflinking an inode, we create a new inode in orphan directory, then > take EX lock on it, reflink the original inode to orphan inode and release > EX lock. Once the lock is released another node could request it in EX mode > from ocfs2_recover_orphans() which causes downconvert of the lock, on this > node, to NL mode. > > Later we attempt to initialize security acl for the orphan inode and move > it to the reflink destination. However, while doing this we dont take EX > lock on the inode. This could potentially cause problems because we could > be starting transaction, accessing journal and modifying metadata of the > inode while holding NL lock and with another node holding EX lock on the > inode. > > Fix this by taking orphan inode cluster lock in EX mode before > initializing security and moving orphan inode to reflink destination. > Use the __tracker variant while taking inode lock to avoid recursive > locking in the ocfs2_init_security_and_acl() call chain. > > Signed-off-by: Ashish Samant Acked-by: Jun Piao > > V1->V2: > Modify commit message to better reflect the problem in upstream kernel. > --- > fs/ocfs2/refcounttree.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c > index ab156e3..1b1283f 100644 > --- a/fs/ocfs2/refcounttree.c > +++ b/fs/ocfs2/refcounttree.c > @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry, > static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir, >struct dentry *new_dentry, bool preserve) > { > - int error; > + int error, had_lock; > struct inode *inode = d_inode(old_dentry); > struct buffer_head *old_bh = NULL; > struct inode *new_orphan_inode = NULL; > + struct ocfs2_lock_holder oh; > > if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb))) > return -EOPNOTSUPP; > @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, > struct inode *dir, > goto out; > } > > + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1, > + &oh); > + if (had_lock < 0) { > + error = had_lock; > + mlog_errno(error); > + goto out; > + } > + > /* If the security isn't preserved, we need to re-initialize them. */ > if (!preserve) { > error = ocfs2_init_security_and_acl(dir, new_orphan_inode, > @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, > struct inode *dir, > if (error) > mlog_errno(error); > } > -out: > if (!error) { > error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode, > new_dentry); > if (error) > mlog_errno(error); > } > + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock); > > +out: > if (new_orphan_inode) { > /* >* We need to open_unlock the inode no matter whether we > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir
On 04/12/2018 03:31 AM, Ashish Samant wrote: > While reflinking an inode, we create a new inode in orphan directory, then > take EX lock on it, reflink the original inode to orphan inode and release > EX lock. Once the lock is released another node could request it in EX mode > from ocfs2_recover_orphans() which causes downconvert of the lock, on this > node, to NL mode. > > Later we attempt to initialize security acl for the orphan inode and move > it to the reflink destination. However, while doing this we dont take EX > lock on the inode. This could potentially cause problems because we could > be starting transaction, accessing journal and modifying metadata of the > inode while holding NL lock and with another node holding EX lock on the > inode. > > Fix this by taking orphan inode cluster lock in EX mode before > initializing security and moving orphan inode to reflink destination. > Use the __tracker variant while taking inode lock to avoid recursive > locking in the ocfs2_init_security_and_acl() call chain. > > Signed-off-by: Ashish Samant Reviewed-by: Junxiao Bi > > V1->V2: > Modify commit message to better reflect the problem in upstream kernel. > --- > fs/ocfs2/refcounttree.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c > index ab156e3..1b1283f 100644 > --- a/fs/ocfs2/refcounttree.c > +++ b/fs/ocfs2/refcounttree.c > @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry, > static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir, >struct dentry *new_dentry, bool preserve) > { > - int error; > + int error, had_lock; > struct inode *inode = d_inode(old_dentry); > struct buffer_head *old_bh = NULL; > struct inode *new_orphan_inode = NULL; > + struct ocfs2_lock_holder oh; > > if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb))) > return -EOPNOTSUPP; > @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, > struct inode *dir, > goto out; > } > > + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1, > + &oh); > + if (had_lock < 0) { > + error = had_lock; > + mlog_errno(error); > + goto out; > + } > + > /* If the security isn't preserved, we need to re-initialize them. */ > if (!preserve) { > error = ocfs2_init_security_and_acl(dir, new_orphan_inode, > @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, > struct inode *dir, > if (error) > mlog_errno(error); > } > -out: > if (!error) { > error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode, > new_dentry); > if (error) > mlog_errno(error); > } > + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock); > > +out: > if (new_orphan_inode) { > /* >* We need to open_unlock the inode no matter whether we ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir
While reflinking an inode, we create a new inode in orphan directory, then take EX lock on it, reflink the original inode to orphan inode and release EX lock. Once the lock is released another node could request it in EX mode from ocfs2_recover_orphans() which causes downconvert of the lock, on this node, to NL mode. Later we attempt to initialize security acl for the orphan inode and move it to the reflink destination. However, while doing this we dont take EX lock on the inode. This could potentially cause problems because we could be starting transaction, accessing journal and modifying metadata of the inode while holding NL lock and with another node holding EX lock on the inode. Fix this by taking orphan inode cluster lock in EX mode before initializing security and moving orphan inode to reflink destination. Use the __tracker variant while taking inode lock to avoid recursive locking in the ocfs2_init_security_and_acl() call chain. Signed-off-by: Ashish Samant V1->V2: Modify commit message to better reflect the problem in upstream kernel. --- fs/ocfs2/refcounttree.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index ab156e3..1b1283f 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry, static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, bool preserve) { - int error; + int error, had_lock; struct inode *inode = d_inode(old_dentry); struct buffer_head *old_bh = NULL; struct inode *new_orphan_inode = NULL; + struct ocfs2_lock_holder oh; if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb))) return -EOPNOTSUPP; @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir, goto out; } + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1, + &oh); + if (had_lock < 0) { + error = had_lock; + mlog_errno(error); + goto out; + } + /* If the security isn't preserved, we need to re-initialize them. */ if (!preserve) { error = ocfs2_init_security_and_acl(dir, new_orphan_inode, @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir, if (error) mlog_errno(error); } -out: if (!error) { error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode, new_dentry); if (error) mlog_errno(error); } + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, &oh, had_lock); +out: if (new_orphan_inode) { /* * We need to open_unlock the inode no matter whether we -- 1.9.1 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
Hi Larry, below is an example config file like I use it for LXC containers. I followed the instructions (https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.debian.org_LXC&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=BmWCVeE72QTY9ubXpj4I5tnxoA7khmxQhKu6cPriu-Y&s=XWKvduHietaYbL3xzVzkxDF9-WncOOXJneQ7413qJP0&e=) and downloaded a Debian 8 container as user (unprivileged) and adapted the config file. Several of those containers run on one host and share the OCFS2 directory as you can see at the "lxc.mount.entry" line. Meanwhile I'm trying whether the problem can be reproduced with shared mounts in one namespace, as you suggested. So far with no success, will report once anything happens. Regards, Daniel # Distribution configuration lxc.include = /usr/share/lxc/config/debian.common.conf lxc.include = /usr/share/lxc/config/debian.userns.conf lxc.arch = x86_64 # Container specific configuration lxc.id_map = u 0 624288 65536 lxc.id_map = g 0 624288 65536 lxc.utsname = container1 lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs lxc.network.type = veth lxc.network.flags = up lxc.network.link = bridge1 lxc.network.name = eth0 lxc.network.veth.pair = aabbccddeeff lxc.network.ipv4 = XX.XX.XX.XX/YY lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ lxc.cgroup.cpuset.cpus = 63-86 lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0 lxc.cgroup.memory.limit_in_bytes = 240G lxc.cgroup.memory.memsw.limit_in_bytes = 240G lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf -Original Message- From: Larry Chen [mailto:lc...@suse.com] Sent: Mittwoch, 11. April 2018 13:31 To: Daniel Sobe ; ocfs2-devel@oss.oracle.com Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels On 04/11/2018 07:17 PM, Daniel Sobe wrote: > Hi Larry, > > this is what I was doing. The 2nd node, while being "declared" in the > cluster.conf, does not exist yet, and thus everything was happening on one > node only. > > I do not know in detail how LXC does the mount sharing, but I assume it > simply calls "mount --bind /original/mount/point /new/mount/point" in a > separate namespace (or, somehow unshares the mount from the original > namespace afterwards). I thought of there is a way to share a directory between host and docker container, like docker run -v /host/directory:/container/directory -other -options image_name command_to_run That's different from yours. How did you setup your lxc or container? If you could, show me the procedure, I'll try to reproduce it. And by the way, if you get rid of lxc, and just mount ocfs2 on several different mount point of local host, will the problem recur? Regards, Larry > Regards, > > Daniel > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
On 04/11/2018 07:17 PM, Daniel Sobe wrote: > Hi Larry, > > this is what I was doing. The 2nd node, while being "declared" in the > cluster.conf, does not exist yet, and thus everything was happening on one > node only. > > I do not know in detail how LXC does the mount sharing, but I assume it > simply calls "mount --bind /original/mount/point /new/mount/point" in a > separate namespace (or, somehow unshares the mount from the original > namespace afterwards). I thought of there is a way to share a directory between host and docker container, like docker run -v /host/directory:/container/directory -other -options image_name command_to_run That's different from yours. How did you setup your lxc or container? If you could, show me the procedure, I'll try to reproduce it. And by the way, if you get rid of lxc, and just mount ocfs2 on several different mount point of local host, will the problem recur? Regards, Larry > Regards, > > Daniel > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
Hi Larry, this is what I was doing. The 2nd node, while being "declared" in the cluster.conf, does not exist yet, and thus everything was happening on one node only. I do not know in detail how LXC does the mount sharing, but I assume it simply calls "mount --bind /original/mount/point /new/mount/point" in a separate namespace (or, somehow unshares the mount from the original namespace afterwards). Regards, Daniel -Original Message- From: Larry Chen [mailto:lc...@suse.com] Sent: Mittwoch, 11. April 2018 12:43 To: Daniel Sobe ; ocfs2-devel@oss.oracle.com Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels Hi Daniel, If you execute mkfs and mount that fs on only one node, and then share the mount to several namespaces, will the issue recur? And could you please show us how you shared the mount to other namespaces? Thanks Larry On 04/11/2018 05:45 PM, Daniel Sobe wrote: > > Hi, > > having used OCFS2 successfully for a while using Debian 8 with its > default kernel "3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 > (2018-01-08)", I'm now facing issues trying to accomplish the same > with newer kernels and Debian 9. Below are the problems that occur, > they seem to be the same although the kernel is different. > > One trace is from the stock kernel of Debian 9 (at that time), the > other is from a very fresh kernel (4.16-rc6). In the latter case, the > OOM killer was triggered "shortly" before the bug appeared - it maybe > related. The call trace is appended below. > > In both cases, only one machine was active. The cluster is configured > for 2 machines, but the cluster is not even configured yet at the 2^nd > system. Only one OCFS2 file system was mounted, and the mount shared > to several namespaces (using LXC). Although the mount was R/W, the > users/containers just read from this file system. > > Please let me know what I can do to get rid of this issue. I can > provide more information about my use case if required. > > I already posted to ocfs2-users, only then I saw that it is now > recommended to post bugs on ocfs2-devel. > > Regards, > > Daniel > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here > ] > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at > /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825! > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode: > [#1] SMP > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: > appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb > ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree > nls_ut > > f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache > iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac > edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt > iTCO_vendor_suppor > > t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul > ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate > intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg > drm_kms_helper lpc_ich mfd > > _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi > ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd > lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas > usb_storage > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060] ext4 crc16 jbd2 > crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel > aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd > xhci_pci uhci_hcd e > > hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp > scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: > configfs] > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 > Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP > ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 > task.stack: b62f36464000 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: > 0010:[] [] > __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2] > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: > 0018:b62f36467b38 EFLAGS: 00010046 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 > RBX: 990fda6c5618 RCX: 0001 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX: > RSI: 0001 RDI: 990fda6c5694 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 > R08: 0101 R09: > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 > R11: 007c R12: 990fda6c5694 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 > R14: R15: c0ba5080 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS: > (
Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
Hi Daniel, If you execute mkfs and mount that fs on only one node, and then share the mount to several namespaces, will the issue recur? And could you please show us how you shared the mount to other namespaces? Thanks Larry On 04/11/2018 05:45 PM, Daniel Sobe wrote: > > Hi, > > having used OCFS2 successfully for a while using Debian 8 with its > default kernel “3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 > (2018-01-08)”, I’m now facing issues trying to accomplish the same > with newer kernels and Debian 9. Below are the problems that occur, > they seem to be the same although the kernel is different. > > One trace is from the stock kernel of Debian 9 (at that time), the > other is from a very fresh kernel (4.16-rc6). In the latter case, the > OOM killer was triggered “shortly” before the bug appeared – it maybe > related. The call trace is appended below. > > In both cases, only one machine was active. The cluster is configured > for 2 machines, but the cluster is not even configured yet at the 2^nd > system. Only one OCFS2 file system was mounted, and the mount shared > to several namespaces (using LXC). Although the mount was R/W, the > users/containers just read from this file system. > > Please let me know what I can do to get rid of this issue. I can > provide more information about my use case if required. > > I already posted to ocfs2-users, only then I saw that it is now > recommended to post bugs on ocfs2-devel. > > Regards, > > Daniel > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here > ] > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at > /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825! > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode: > [#1] SMP > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: > appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb > ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree > nls_ut > > f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache > iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac > edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt > iTCO_vendor_suppor > > t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul > ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate > intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg > drm_kms_helper lpc_ich mfd > > _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi > ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd > lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas usb_storage > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060] ext4 crc16 jbd2 > crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel > aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd > xhci_pci uhci_hcd e > > hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp > scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: > configfs] > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 > Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP > ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 > task.stack: b62f36464000 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: > 0010:[] [] > __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2] > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: > 0018:b62f36467b38 EFLAGS: 00010046 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 > RBX: 990fda6c5618 RCX: 0001 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX: > RSI: 0001 RDI: 990fda6c5694 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 > R08: 0101 R09: > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 > R11: 007c R12: 990fda6c5694 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 > R14: R15: c0ba5080 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS: > () GS:991bbea8(0063) knlGS:f7462700 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708640] CS: 0010 DS: 002b ES: > 002b CR0: 80050033 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708666] CR2: ff60 > CR3: 00341a7b6000 CR4: 00360670 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708697] DR0: > DR1: DR2: > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708728] DR3: > DR6: fffe0ff0 DR7: 0400 > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708759] Stack: > > Mar 22 19:26:55 drs1s005 kernel: [ 7545.708771] f
[Ocfs2-devel] OCFS2 BUG with 2 different kernels
Hi, having used OCFS2 successfully for a while using Debian 8 with its default kernel "3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08)", I'm now facing issues trying to accomplish the same with newer kernels and Debian 9. Below are the problems that occur, they seem to be the same although the kernel is different. One trace is from the stock kernel of Debian 9 (at that time), the other is from a very fresh kernel (4.16-rc6). In the latter case, the OOM killer was triggered "shortly" before the bug appeared - it maybe related. The call trace is appended below. In both cases, only one machine was active. The cluster is configured for 2 machines, but the cluster is not even configured yet at the 2nd system. Only one OCFS2 file system was mounted, and the mount shared to several namespaces (using LXC). Although the mount was R/W, the users/containers just read from this file system. Please let me know what I can do to get rid of this issue. I can provide more information about my use case if required. I already posted to ocfs2-users, only then I saw that it is now recommended to post bugs on ocfs2-devel. Regards, Daniel Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here ] Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825! Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode: [#1] SMP Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree nls_ut f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_suppor t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg drm_kms_helper lpc_ich mfd _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas usb_storage Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060] ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_pci uhci_hcd e hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: configfs] Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 task.stack: b62f36464000 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 0010:[] [] __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2] Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 0018:b62f36467b38 EFLAGS: 00010046 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 RBX: 990fda6c5618 RCX: 0001 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX: RSI: 0001 RDI: 990fda6c5694 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 R08: 0101 R09: Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 R11: 007c R12: 990fda6c5694 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 R14: R15: c0ba5080 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS: () GS:991bbea8(0063) knlGS:f7462700 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708640] CS: 0010 DS: 002b ES: 002b CR0: 80050033 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708666] CR2: ff60 CR3: 00341a7b6000 CR4: 00360670 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708697] DR0: DR1: DR2: Mar 22 19:26:55 drs1s005 kernel: [ 7545.708728] DR3: DR6: fffe0ff0 DR7: 0400 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708759] Stack: Mar 22 19:26:55 drs1s005 kernel: [ 7545.708771] c0b12b45 99101a537300 99101a51c4c8 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708812] 990fda6c5e00 c0b02274 99101a537180 99101a4e04c8 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708849] 99101a537300 dad51186f40d61bf 99101a4e04c8 Mar 22 19:26:55 drs1s005 kernel: [ 7545.708886] Call Trace: Mar 22 19:26:55 drs1s005 kernel: [ 7545.708919] [] ? ocfs2_dentry_unlock+0x35/0x80 [ocfs2] Mar 22 19:26:55 drs1s005 kernel: [ 7545.708964]