Hi Eric,

On 01/07/2016 10:33 AM, Eric Ren wrote:
> Hi,
>
> On Tue, Dec 29, 2015 at 07:31:16PM -0700, He Gang wrote:
>> Hello Goldwyn,
>>
>> When read path can't get the DLM lock immediately (NONBLOCK way), next get 
>> the lock with BLOCK way, this behavior will cost some time (several msecs).
>> It looks make sense to delete that two line code.
>> But why there are two line code existed? I just worry about, if we delete 
>> two line code, when read path can't get the DLM lock with NONBLOCK way, read 
>> path will retry to get this DLM lock repeatedly, this will lead to cost too 
>> much CPU (Not waiting in sleep).
>> I just worry about this possibility, Eric will test this case, and give a 
>> feedback.
> Sorry for the late reply. After applying this patch, the performance is 
> improved very much,
> but "soft lockup" occurred several times(log message is placed at the end). 
> The two lines:

Did you perform this test on the upstream kernel? I performed mine on 
upstream and I did not receive any softlockups. AFAICS, you performed it 
on 3.0 based kernels (perhaps SLE11SP3) which does not have the patch 
you have mentioned.


>
>         if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>             ocfs2_inode_unlock(inode, ex);
>
> is similar with the lines of this commit:
>
> commit c7e25e6e0b0486492c5faaf6312b37413642c48e
> Author: Jan Kara <[email protected]>
> Date:   Thu Jun 23 22:51:47 2011 +0200
>
>      ocfs2: Avoid livelock in ocfs2_readpage()
>
>      When someone writes to an inode, readers accessing the same inode via
>      ocfs2_readpage() just busyloop trying to get ip_alloc_sem because
>      do_generic_file_read() looks up the page again and retries ->readpage()
>      when previous attempt failed with AOP_TRUNCATED_PAGE. When there are 
> enough
>      readers, they can occupy all CPUs and in non-preempt kernel the system is
>      deadlocked because writer holding ip_alloc_sem is never run to release 
> the
>      semaphore. Fix the problem by making reader block on ip_alloc_sem to 
> break
>      the busy loop.
>
>      Signed-off-by: Jan Kara <[email protected]>
>      Signed-off-by: Joel Becker <[email protected]>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 4c1ec8f..ba3ca1e 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -290,7 +290,15 @@ static int ocfs2_readpage(struct file *file, struct page 
> *page)
>          }
>
>          if (down_read_trylock(&oi->ip_alloc_sem) == 0) {
> +               /*
> +                * Unlock the page and cycle ip_alloc_sem so that we don't
> +                * busyloop waiting for ip_alloc_sem to unlock
> +                */
>                  ret = AOP_TRUNCATED_PAGE;
> +               unlock_page(page);
> +               unlock = 0;
> +               down_read(&oi->ip_alloc_sem);
> +               up_read(&oi->ip_alloc_sem);
>                  goto out_inode_unlock;
>          }
>
> So, this patch removing the two lines of code may result in "busy wait" 
> potentially that
> reading thread repeatedly checks to see if cluster lock is available. Please 
> correct me if
> I'm wrong.

Correct. In older kernels, not upstream. Right?

>
> This is the "soft lookup" log:
> Dec 30 17:30:40 n2 kernel: [  248.084012] BUG: soft lockup - CPU#3 stuck for 
> 23s! [iomaker:4061]
> Dec 30 17:30:40 n2 kernel: [  248.084015] Modules linked in: ocfs2(FN) jbd2 
> ocfs2_nodemanager(F) quota_tree ocfs2_stack_user(FN) ocfs2_stackglue(FN) 
> dlm(F) sctp libcrc32c configfs edd sg sd_mod crc_t10dif crc32c iscsi_tcp 
> libiscsi_tcp libiscsi scsi_transport_iscsi af_packet mperf fuse loop dm_mod 
> ipv6 ipv6_lib 8139too pcspkr acpiphp rtc_cmos pci_hotplug floppy 8139cp 
> button i2c_piix4 mii virtio_balloon ext3 jbd mbcache ttm drm_kms_helper drm 
> i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor 
> thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_alua 
> scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh virtio_pci ata_generic 
> ata_piix libata scsi_mod virtio_blk virtio virtio_ring
> Dec 30 17:30:40 n2 kernel: [  248.084049] Supported: No, Unsupported modules 
> are loaded
> Dec 30 17:30:40 n2 kernel: [  248.084050] CPU 3
> Dec 30 17:30:40 n2 kernel: [  248.084051] Modules linked in: ocfs2(FN) jbd2 
> ocfs2_nodemanager(F) quota_tree ocfs2_stack_user(FN) ocfs2_stackglue(FN) 
> dlm(F) sctp libcrc32c configfs edd sg sd_mod crc_t10dif crc32c iscsi_tcp 
> libiscsi_tcp libiscsi scsi_transport_iscsi af_packet mperf fuse loop dm_mod 
> ipv6 ipv6_lib 8139too pcspkr acpiphp rtc_cmos pci_hotplug floppy 8139cp 
> button i2c_piix4 mii virtio_balloon ext3 jbd mbcache ttm drm_kms_helper drm 
> i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor 
> thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_alua 
> scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh virtio_pci ata_generic 
> ata_piix libata scsi_mod virtio_blk virtio virtio_ring
> Dec 30 17:30:40 n2 kernel: [  248.084074] Supported: No, Unsupported modules 
> are loaded
> Dec 30 17:30:40 n2 kernel: [  248.084075]
> Dec 30 17:30:40 n2 kernel: [  248.084077] Pid: 4061, comm: iomaker Tainted: 
> GF          N  3.0.76-0.11-default #1 Bochs Bochs
> Dec 30 17:30:40 n2 kernel: [  248.084080] RIP: 0010:[<ffffffff8145c7a9>]  
> [<ffffffff8145c7a9>] _raw_spin_lock+0x9/0x20
> Dec 30 17:30:40 n2 kernel: [  248.084087] RSP: 0018:ffff8800db3fbb10  EFLAGS: 
> 00000206
> Dec 30 17:30:40 n2 kernel: [  248.084088] RAX: 0000000071967196 RBX: 
> ffff880114c1c000 RCX: 0000000000000004
> Dec 30 17:30:40 n2 kernel: [  248.084090] RDX: 0000000000007195 RSI: 
> 0000000000000000 RDI: ffff880114c1c0d0
> Dec 30 17:30:40 n2 kernel: [  248.084091] RBP: ffff880114c1c0d0 R08: 
> 0000000000000000 R09: 0000000000000003
> Dec 30 17:30:40 n2 kernel: [  248.084092] R10: 0000000000000100 R11: 
> 0000000000014819 R12: ffffffff81464f6e
> Dec 30 17:30:40 n2 kernel: [  248.084094] R13: ffff880037b34908 R14: 
> ffff880037b342c0 R15: ffff880037b34908
> Dec 30 17:30:40 n2 kernel: [  248.084096] FS:  00007fdbe0f7a720(0000) 
> GS:ffff88011fd80000(0000) knlGS:0000000000000000
> Dec 30 17:30:40 n2 kernel: [  248.084097] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 000000008005003b
> Dec 30 17:30:40 n2 kernel: [  248.084099] CR2: 00007ff1df54b000 CR3: 
> 00000000db2da000 CR4: 00000000000006e0
> Dec 30 17:30:40 n2 kernel: [  248.084103] DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Dec 30 17:30:40 n2 kernel: [  248.084106] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Dec 30 17:30:40 n2 kernel: [  248.084108] Process iomaker (pid: 4061, 
> threadinfo ffff8800db3fa000, task ffff880037b342c0)
> Dec 30 17:30:40 n2 kernel: [  248.084109] Stack:
> Dec 30 17:30:40 n2 kernel: [  248.084112]  ffffffffa04ab9b5 0000000000000000 
> 00000039c2f58103 0000000000000000
> Dec 30 17:30:40 n2 kernel: [  248.084115]  0000000000000000 ffff880114c1c0d0 
> ffff880114c1c0d0 ffff880114c1c0d0
> Dec 30 17:30:40 n2 kernel: [  248.084117]  0000000000000000 ffff880114c1c000 
> ffff8800dba0ec38 0000000000000004
> Dec 30 17:30:40 n2 kernel: [  248.084120] Call Trace:
> Dec 30 17:30:40 n2 kernel: [  248.084145]  [<ffffffffa04ab9b5>] 
> ocfs2_wait_for_recovery+0x25/0xc0 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084182]  [<ffffffffa049aed8>] 
> ocfs2_inode_lock_full_nested+0x298/0x510 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084204]  [<ffffffffa049b161>] 
> ocfs2_inode_lock_with_page+0x11/0x40 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084225]  [<ffffffffa048170f>] 
> ocfs2_readpage+0x8f/0x210 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084234]  [<ffffffff810f938e>] 
> do_generic_file_read+0x13e/0x490
> Dec 30 17:30:40 n2 kernel: [  248.084238]  [<ffffffff810f9d3c>] 
> generic_file_aio_read+0xfc/0x260
> Dec 30 17:30:40 n2 kernel: [  248.084250]  [<ffffffffa04a168b>] 
> ocfs2_file_aio_read+0x14b/0x390 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084265]  [<ffffffff811584b8>] 
> do_sync_read+0xc8/0x110
> Dec 30 17:30:40 n2 kernel: [  248.084268]  [<ffffffff81158c67>] 
> vfs_read+0xc7/0x130
> Dec 30 17:30:40 n2 kernel: [  248.084271]  [<ffffffff81158dd3>] 
> sys_read+0x53/0xa0
> Dec 30 17:30:40 n2 kernel: [  248.084275]  [<ffffffff81464592>] 
> system_call_fastpath+0x16/0x1b
> Dec 30 17:30:40 n2 kernel: [  248.084280]  [<00007fdbe03532a0>] 0x7fdbe035329f
> Dec 30 17:30:40 n2 kernel: [  248.084281] Code: 00 75 04 f0 0f b1 17 0f 94 c2 
> 0f b6 c2 85 c0 ba 01 00 00 00 75 02 31 d2 89 d0 c3 0f 1f 80 00 00 00 00 b8 00 
> 00 01 00 f0 0f c1 07 <0f> b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5 c3 
> 0f 1f 44
> Dec 30 17:30:40 n2 kernel: [  248.084307] Call Trace:
> Dec 30 17:30:40 n2 kernel: [  248.084319]  [<ffffffffa04ab9b5>] 
> ocfs2_wait_for_recovery+0x25/0xc0 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084344]  [<ffffffffa049aed8>] 
> ocfs2_inode_lock_full_nested+0x298/0x510 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084366]  [<ffffffffa049b161>] 
> ocfs2_inode_lock_with_page+0x11/0x40 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084386]  [<ffffffffa048170f>] 
> ocfs2_readpage+0x8f/0x210 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084394]  [<ffffffff810f938e>] 
> do_generic_file_read+0x13e/0x490
> Dec 30 17:30:40 n2 kernel: [  248.084397]  [<ffffffff810f9d3c>] 
> generic_file_aio_read+0xfc/0x260
> Dec 30 17:30:40 n2 kernel: [  248.084409]  [<ffffffffa04a168b>] 
> ocfs2_file_aio_read+0x14b/0x390 [ocfs2]
> Dec 30 17:30:40 n2 kernel: [  248.084423]  [<ffffffff811584b8>] 
> do_sync_read+0xc8/0x110
> Dec 30 17:30:40 n2 kernel: [  248.084426]  [<ffffffff81158c67>] 
> vfs_read+0xc7/0x130
> Dec 30 17:30:40 n2 kernel: [  248.084429]  [<ffffffff81158dd3>] 
> sys_read+0x53/0xa0
> Dec 30 17:30:40 n2 kernel: [  248.084431]  [<ffffffff81464592>] 
> system_call_fastpath+0x16/0x1b
> Dec 30 17:30:40 n2 kernel: [  248.084435]  [<00007fdbe03532a0>] 0x7fdbe035329f
>
> Thanks,
> Eric
>
>>
>> Thanks
>> Gang
>>
>>
>>>>>
>>> From: Goldwyn Rodrigues <[email protected]>
>>>
>>> DLM does not cache locks. So, blocking lock and unlock
>>> will only make the performance worse where contention over
>>> the locks is high.
>>>
>>> Signed-off-by: Goldwyn Rodrigues <[email protected]>
>>> ---
>>>   fs/ocfs2/dlmglue.c | 8 --------
>>>   1 file changed, 8 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 20276e3..f92612e 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -2432,12 +2432,6 @@ bail:
>>>    * done this we have to return AOP_TRUNCATED_PAGE so the aop method
>>>    * that called us can bubble that back up into the VFS who will then
>>>    * immediately retry the aop call.
>>> - *
>>> - * We do a blocking lock and immediate unlock before returning, though, so
>>> that
>>> - * the lock has a great chance of being cached on this node by the time the
>>> VFS
>>> - * calls back to retry the aop.    This has a potential to livelock as 
>>> nodes
>>> - * ping locks back and forth, but that's a risk we're willing to take to
>>> avoid
>>> - * the lock inversion simply.
>>>    */
>>>   int ocfs2_inode_lock_with_page(struct inode *inode,
>>>                           struct buffer_head **ret_bh,
>>> @@ -2449,8 +2443,6 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>>     ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>>>     if (ret == -EAGAIN) {
>>>             unlock_page(page);
>>> -           if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>>> -                   ocfs2_inode_unlock(inode, ex);
>>>             ret = AOP_TRUNCATED_PAGE;
>>>     }
>>>
>>> --
>>> 2.6.2
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> [email protected]
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>

-- 
Goldwyn

_______________________________________________
Ocfs2-devel mailing list
[email protected]
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to