Hi Eric, On 01/07/2016 10:33 AM, Eric Ren wrote: > Hi, > > On Tue, Dec 29, 2015 at 07:31:16PM -0700, He Gang wrote: >> Hello Goldwyn, >> >> When read path can't get the DLM lock immediately (NONBLOCK way), next get >> the lock with BLOCK way, this behavior will cost some time (several msecs). >> It looks make sense to delete that two line code. >> But why there are two line code existed? I just worry about, if we delete >> two line code, when read path can't get the DLM lock with NONBLOCK way, read >> path will retry to get this DLM lock repeatedly, this will lead to cost too >> much CPU (Not waiting in sleep). >> I just worry about this possibility, Eric will test this case, and give a >> feedback. > Sorry for the late reply. After applying this patch, the performance is > improved very much, > but "soft lockup" occurred several times(log message is placed at the end). > The two lines:
Did you perform this test on the upstream kernel? I performed mine on upstream and I did not receive any softlockups. AFAICS, you performed it on 3.0 based kernels (perhaps SLE11SP3) which does not have the patch you have mentioned. > > if (ocfs2_inode_lock(inode, ret_bh, ex) == 0) > ocfs2_inode_unlock(inode, ex); > > is similar with the lines of this commit: > > commit c7e25e6e0b0486492c5faaf6312b37413642c48e > Author: Jan Kara <[email protected]> > Date: Thu Jun 23 22:51:47 2011 +0200 > > ocfs2: Avoid livelock in ocfs2_readpage() > > When someone writes to an inode, readers accessing the same inode via > ocfs2_readpage() just busyloop trying to get ip_alloc_sem because > do_generic_file_read() looks up the page again and retries ->readpage() > when previous attempt failed with AOP_TRUNCATED_PAGE. When there are > enough > readers, they can occupy all CPUs and in non-preempt kernel the system is > deadlocked because writer holding ip_alloc_sem is never run to release > the > semaphore. Fix the problem by making reader block on ip_alloc_sem to > break > the busy loop. > > Signed-off-by: Jan Kara <[email protected]> > Signed-off-by: Joel Becker <[email protected]> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index 4c1ec8f..ba3ca1e 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -290,7 +290,15 @@ static int ocfs2_readpage(struct file *file, struct page > *page) > } > > if (down_read_trylock(&oi->ip_alloc_sem) == 0) { > + /* > + * Unlock the page and cycle ip_alloc_sem so that we don't > + * busyloop waiting for ip_alloc_sem to unlock > + */ > ret = AOP_TRUNCATED_PAGE; > + unlock_page(page); > + unlock = 0; > + down_read(&oi->ip_alloc_sem); > + up_read(&oi->ip_alloc_sem); > goto out_inode_unlock; > } > > So, this patch removing the two lines of code may result in "busy wait" > potentially that > reading thread repeatedly checks to see if cluster lock is available. Please > correct me if > I'm wrong. Correct. In older kernels, not upstream. Right? > > This is the "soft lookup" log: > Dec 30 17:30:40 n2 kernel: [ 248.084012] BUG: soft lockup - CPU#3 stuck for > 23s! [iomaker:4061] > Dec 30 17:30:40 n2 kernel: [ 248.084015] Modules linked in: ocfs2(FN) jbd2 > ocfs2_nodemanager(F) quota_tree ocfs2_stack_user(FN) ocfs2_stackglue(FN) > dlm(F) sctp libcrc32c configfs edd sg sd_mod crc_t10dif crc32c iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi af_packet mperf fuse loop dm_mod > ipv6 ipv6_lib 8139too pcspkr acpiphp rtc_cmos pci_hotplug floppy 8139cp > button i2c_piix4 mii virtio_balloon ext3 jbd mbcache ttm drm_kms_helper drm > i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor > thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_alua > scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh virtio_pci ata_generic > ata_piix libata scsi_mod virtio_blk virtio virtio_ring > Dec 30 17:30:40 n2 kernel: [ 248.084049] Supported: No, Unsupported modules > are loaded > Dec 30 17:30:40 n2 kernel: [ 248.084050] CPU 3 > Dec 30 17:30:40 n2 kernel: [ 248.084051] Modules linked in: ocfs2(FN) jbd2 > ocfs2_nodemanager(F) quota_tree ocfs2_stack_user(FN) ocfs2_stackglue(FN) > dlm(F) sctp libcrc32c configfs edd sg sd_mod crc_t10dif crc32c iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi af_packet mperf fuse loop dm_mod > ipv6 ipv6_lib 8139too pcspkr acpiphp rtc_cmos pci_hotplug floppy 8139cp > button i2c_piix4 mii virtio_balloon ext3 jbd mbcache ttm drm_kms_helper drm > i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor > thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_alua > scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh virtio_pci ata_generic > ata_piix libata scsi_mod virtio_blk virtio virtio_ring > Dec 30 17:30:40 n2 kernel: [ 248.084074] Supported: No, Unsupported modules > are loaded > Dec 30 17:30:40 n2 kernel: [ 248.084075] > Dec 30 17:30:40 n2 kernel: [ 248.084077] Pid: 4061, comm: iomaker Tainted: > GF N 3.0.76-0.11-default #1 Bochs Bochs > Dec 30 17:30:40 n2 kernel: [ 248.084080] RIP: 0010:[<ffffffff8145c7a9>] > [<ffffffff8145c7a9>] _raw_spin_lock+0x9/0x20 > Dec 30 17:30:40 n2 kernel: [ 248.084087] RSP: 0018:ffff8800db3fbb10 EFLAGS: > 00000206 > Dec 30 17:30:40 n2 kernel: [ 248.084088] RAX: 0000000071967196 RBX: > ffff880114c1c000 RCX: 0000000000000004 > Dec 30 17:30:40 n2 kernel: [ 248.084090] RDX: 0000000000007195 RSI: > 0000000000000000 RDI: ffff880114c1c0d0 > Dec 30 17:30:40 n2 kernel: [ 248.084091] RBP: ffff880114c1c0d0 R08: > 0000000000000000 R09: 0000000000000003 > Dec 30 17:30:40 n2 kernel: [ 248.084092] R10: 0000000000000100 R11: > 0000000000014819 R12: ffffffff81464f6e > Dec 30 17:30:40 n2 kernel: [ 248.084094] R13: ffff880037b34908 R14: > ffff880037b342c0 R15: ffff880037b34908 > Dec 30 17:30:40 n2 kernel: [ 248.084096] FS: 00007fdbe0f7a720(0000) > GS:ffff88011fd80000(0000) knlGS:0000000000000000 > Dec 30 17:30:40 n2 kernel: [ 248.084097] CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Dec 30 17:30:40 n2 kernel: [ 248.084099] CR2: 00007ff1df54b000 CR3: > 00000000db2da000 CR4: 00000000000006e0 > Dec 30 17:30:40 n2 kernel: [ 248.084103] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Dec 30 17:30:40 n2 kernel: [ 248.084106] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Dec 30 17:30:40 n2 kernel: [ 248.084108] Process iomaker (pid: 4061, > threadinfo ffff8800db3fa000, task ffff880037b342c0) > Dec 30 17:30:40 n2 kernel: [ 248.084109] Stack: > Dec 30 17:30:40 n2 kernel: [ 248.084112] ffffffffa04ab9b5 0000000000000000 > 00000039c2f58103 0000000000000000 > Dec 30 17:30:40 n2 kernel: [ 248.084115] 0000000000000000 ffff880114c1c0d0 > ffff880114c1c0d0 ffff880114c1c0d0 > Dec 30 17:30:40 n2 kernel: [ 248.084117] 0000000000000000 ffff880114c1c000 > ffff8800dba0ec38 0000000000000004 > Dec 30 17:30:40 n2 kernel: [ 248.084120] Call Trace: > Dec 30 17:30:40 n2 kernel: [ 248.084145] [<ffffffffa04ab9b5>] > ocfs2_wait_for_recovery+0x25/0xc0 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084182] [<ffffffffa049aed8>] > ocfs2_inode_lock_full_nested+0x298/0x510 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084204] [<ffffffffa049b161>] > ocfs2_inode_lock_with_page+0x11/0x40 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084225] [<ffffffffa048170f>] > ocfs2_readpage+0x8f/0x210 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084234] [<ffffffff810f938e>] > do_generic_file_read+0x13e/0x490 > Dec 30 17:30:40 n2 kernel: [ 248.084238] [<ffffffff810f9d3c>] > generic_file_aio_read+0xfc/0x260 > Dec 30 17:30:40 n2 kernel: [ 248.084250] [<ffffffffa04a168b>] > ocfs2_file_aio_read+0x14b/0x390 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084265] [<ffffffff811584b8>] > do_sync_read+0xc8/0x110 > Dec 30 17:30:40 n2 kernel: [ 248.084268] [<ffffffff81158c67>] > vfs_read+0xc7/0x130 > Dec 30 17:30:40 n2 kernel: [ 248.084271] [<ffffffff81158dd3>] > sys_read+0x53/0xa0 > Dec 30 17:30:40 n2 kernel: [ 248.084275] [<ffffffff81464592>] > system_call_fastpath+0x16/0x1b > Dec 30 17:30:40 n2 kernel: [ 248.084280] [<00007fdbe03532a0>] 0x7fdbe035329f > Dec 30 17:30:40 n2 kernel: [ 248.084281] Code: 00 75 04 f0 0f b1 17 0f 94 c2 > 0f b6 c2 85 c0 ba 01 00 00 00 75 02 31 d2 89 d0 c3 0f 1f 80 00 00 00 00 b8 00 > 00 01 00 f0 0f c1 07 <0f> b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5 c3 > 0f 1f 44 > Dec 30 17:30:40 n2 kernel: [ 248.084307] Call Trace: > Dec 30 17:30:40 n2 kernel: [ 248.084319] [<ffffffffa04ab9b5>] > ocfs2_wait_for_recovery+0x25/0xc0 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084344] [<ffffffffa049aed8>] > ocfs2_inode_lock_full_nested+0x298/0x510 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084366] [<ffffffffa049b161>] > ocfs2_inode_lock_with_page+0x11/0x40 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084386] [<ffffffffa048170f>] > ocfs2_readpage+0x8f/0x210 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084394] [<ffffffff810f938e>] > do_generic_file_read+0x13e/0x490 > Dec 30 17:30:40 n2 kernel: [ 248.084397] [<ffffffff810f9d3c>] > generic_file_aio_read+0xfc/0x260 > Dec 30 17:30:40 n2 kernel: [ 248.084409] [<ffffffffa04a168b>] > ocfs2_file_aio_read+0x14b/0x390 [ocfs2] > Dec 30 17:30:40 n2 kernel: [ 248.084423] [<ffffffff811584b8>] > do_sync_read+0xc8/0x110 > Dec 30 17:30:40 n2 kernel: [ 248.084426] [<ffffffff81158c67>] > vfs_read+0xc7/0x130 > Dec 30 17:30:40 n2 kernel: [ 248.084429] [<ffffffff81158dd3>] > sys_read+0x53/0xa0 > Dec 30 17:30:40 n2 kernel: [ 248.084431] [<ffffffff81464592>] > system_call_fastpath+0x16/0x1b > Dec 30 17:30:40 n2 kernel: [ 248.084435] [<00007fdbe03532a0>] 0x7fdbe035329f > > Thanks, > Eric > >> >> Thanks >> Gang >> >> >>>>> >>> From: Goldwyn Rodrigues <[email protected]> >>> >>> DLM does not cache locks. So, blocking lock and unlock >>> will only make the performance worse where contention over >>> the locks is high. >>> >>> Signed-off-by: Goldwyn Rodrigues <[email protected]> >>> --- >>> fs/ocfs2/dlmglue.c | 8 -------- >>> 1 file changed, 8 deletions(-) >>> >>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c >>> index 20276e3..f92612e 100644 >>> --- a/fs/ocfs2/dlmglue.c >>> +++ b/fs/ocfs2/dlmglue.c >>> @@ -2432,12 +2432,6 @@ bail: >>> * done this we have to return AOP_TRUNCATED_PAGE so the aop method >>> * that called us can bubble that back up into the VFS who will then >>> * immediately retry the aop call. >>> - * >>> - * We do a blocking lock and immediate unlock before returning, though, so >>> that >>> - * the lock has a great chance of being cached on this node by the time the >>> VFS >>> - * calls back to retry the aop. This has a potential to livelock as >>> nodes >>> - * ping locks back and forth, but that's a risk we're willing to take to >>> avoid >>> - * the lock inversion simply. >>> */ >>> int ocfs2_inode_lock_with_page(struct inode *inode, >>> struct buffer_head **ret_bh, >>> @@ -2449,8 +2443,6 @@ int ocfs2_inode_lock_with_page(struct inode *inode, >>> ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK); >>> if (ret == -EAGAIN) { >>> unlock_page(page); >>> - if (ocfs2_inode_lock(inode, ret_bh, ex) == 0) >>> - ocfs2_inode_unlock(inode, ex); >>> ret = AOP_TRUNCATED_PAGE; >>> } >>> >>> -- >>> 2.6.2 >> >> >> _______________________________________________ >> Ocfs2-devel mailing list >> [email protected] >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> -- Goldwyn _______________________________________________ Ocfs2-devel mailing list [email protected] https://oss.oracle.com/mailman/listinfo/ocfs2-devel
