------- Comment From srikar.dronamr...@in.ibm.com 2020-11-02 01:52 EDT-------
Harish,
Since its not updated for a while now, lets close this bug and reopen this or a 
new one if we see similar problem again.

Since we dont know if this problem still exists or not, I am marking
this bug as will_not_fix.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855679

Title:
  Rcu stalls and Soft-lockups observed on stressing Ubuntu 18 04 1

Status in The Ubuntu-power-systems project:
  Invalid
Status in linux package in Ubuntu:
  Invalid

Bug description:
  == Comment: #0 - Harish Sriram <hasri...@in.ibm.com> - 2018-07-31 03:50:07 ==
  --Problem Description-- 
  Rcu stalls and Soft-lockups observed on stressing Ubuntu 18 04 1

  Contact Information = hasri...@in.ibm.com

  ---Issue observed---
  [ 1196.813220] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 1196.813241]        0-....: (19 ticks this GP) idle=966/140000000000000/0 
softirq=11580/11580 fqs=1552 
  [ 1196.813249]        (detected by 24, t=5252 jiffies, g=11722, c=11721, 
q=1061088)
  [ 1196.813282] Task dump for CPU 0:
  [ 1196.813285] stress-ng-dev   R  running task        0 46323  33635 
0x00042004
  [ 1196.813294] Call Trace:
  [ 1196.813310] [c000002c75ad7b10] [c0000000018bd940] log_first_seq+0x0/0x8 
(unreliable)
  [ 1198.508930] kauditd_printk_skb: 3 callbacks suppressed
  [ 1198.508938] audit: type=1400 audit(1533020002.449:312): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/pulseaudio-eg" 
pid=12813 comm="stress-ng-appar"
  [ 1198.508954] audit: type=1400 audit(1533020002.449:313): apparmor="STATUS" 
operation="profile_load" profile="unconfined" 
name="/usr/bin/pulseaudio-eg///usr/lib/pulseaudio/pulse/gconf-helper" pid=12813 
comm="stress-ng-appar"
  [ 1199.361719] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
0-... 145-... 159-... } 5489 jiffies s: 173 root: 0x201/.
  [ 1199.361742] blocking rcu_node structures: l=1:0-15:0x1/. 
l=1:144-159:0x8002/.
  [ 1199.361749] Task dump for CPU 0:
  [ 1199.361752] stress-ng-dev   R  running task        0 46323  33635 
0x00042004
  [ 1199.361757] Call Trace:
  [ 1199.361769] [c000002c75ad7b10] [c0000000018bd940] log_first_seq+0x0/0x8 
(unreliable)
  [ 1199.361777] Task dump for CPU 145:
  [ 1199.361779] migration/145   R  running task        0   883      2 
0x00000804
  [ 1199.361783] Call Trace:
  [ 1199.361787] [c000002ff0f5fa40] [c000002ff0f5fb00] 0xc000002ff0f5fb00 
(unreliable)
  [ 1199.361791] Task dump for CPU 159:
  [ 1199.361792] migration/159   R  running task        0   967      2 
0x00000804
  [ 1199.361796] Call Trace:
  [ 1199.361799] [c000002d78a47a40] [c000002d78a47b00] 0xc000002d78a47b00 
(unreliable)
  [ 1199.787698] audit: type=1400 audit(1533020003.985:314): apparmor="STATUS" 
operation="profile_replace" profile="unconfined" name="/usr/bin/pulseaudio-eg" 
pid=12813 comm="stress-ng-appar"
  [ 1200.781159] watchdog: BUG: soft lockup - CPU#145 stuck for 23s! 
[migration/145:883]
  [ 1200.781163] Modules linked in: snd_seq snd_seq_device snd_timer snd 
soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common 
serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock 
twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth 
ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 
unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg 
powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq 
leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser rdma_cm 
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
  [ 1200.781321]  raid6_pq libcrc32c raid1 raid0 multipath linear uas 
usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr
  [ 1200.781353] CPU: 145 PID: 883 Comm: migration/145 Not tainted 
4.15.0-29-generic #31-Ubuntu
  [ 1200.781359] NIP:  c000000000206594 LR: c00000000020699c CTR: 
c000000000206470
  [ 1200.781364] REGS: c000002ff0f5f9e0 TRAP: 0901   Not tainted  
(4.15.0-29-generic)
  [ 1200.781366] MSR:  9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>  CR: 
28002222  XER: 20000000
  [ 1200.781392] CFAR: c0000000002065a4 SOFTE: 1 
                 GPR00: c00000000020699c c000002ff0f5fc60 c0000000016eaf00 
0000000000000000 
                 GPR04: 0000000000000001 0000002ffbf90000 0000009dde82957a 
0000000000000000 
                 GPR08: c00000000fae3b00 0000000000000001 c000000000d432f8 
0000000000000bdf 
                 GPR12: 0000000000000000 c00000000fae3b00 
  [ 1200.781453] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0
  [ 1200.781461] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0
  [ 1200.781462] Call Trace:
  [ 1200.781475] [c000002ff0f5fc60] [c000002ff0f5fd40] 0xc000002ff0f5fd40 
(unreliable)
  [ 1200.781487] [c000002ff0f5fcb0] [c00000000020699c] 
cpu_stopper_thread+0xfc/0x1f0
  [ 1200.781503] [c000002ff0f5fd60] [c000000000143ae0] 
smpboot_thread_fn+0x250/0x290
  [ 1200.781510] [c000002ff0f5fdc0] [c00000000013d728] kthread+0x1a8/0x1b0
  [ 1200.781522] [c000002ff0f5fe30] [c00000000000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 1200.781525] Instruction dump:
  [ 1200.781531] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac 913d0020 
2b9f0004 
  [ 1200.781551] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 
409eff74 2b890001 
  [ 1200.905158] watchdog: BUG: soft lockup - CPU#159 stuck for 22s! 
[migration/159:967]
  [ 1200.905161] Modules linked in: snd_seq snd_seq_device snd_timer snd 
soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common 
serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock 
twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth 
ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 
unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg 
powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq 
leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser rdma_cm 
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
  [ 1200.905290]  raid6_pq libcrc32c raid1 raid0 multipath linear uas 
usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr
  [ 1200.905316] CPU: 159 PID: 967 Comm: migration/159 Tainted: G             L 
  4.15.0-29-generic #31-Ubuntu
  [ 1200.905320] NIP:  c000000000206594 LR: c00000000020699c CTR: 
c000000000206470
  [ 1200.905326] REGS: c000002d78a479e0 TRAP: 0901   Tainted: G             L   
 (4.15.0-29-generic)
  [ 1200.905327] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 28002822  XER: 20000000
  [ 1200.905345] CFAR: c0000000002065a4 SOFTE: 1 
                 GPR00: c00000000020699c c000002d78a47c60 c0000000016eaf00 
0000000000000000 
                 GPR04: 0000000000000001 0000002ffc310000 0000009dd8be8a16 
0000000000000000 
                 GPR08: c00000000faed500 0000000000000001 c000000000d432f8 
0000000000000b97 
                 GPR12: 0000000000000000 c00000000faed500 
  [ 1200.905383] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0
  [ 1200.905387] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0
  [ 1200.905391] Call Trace:
  [ 1200.905398] [c000002d78a47c60] [c000002d78a47d40] 0xc000002d78a47d40 
(unreliable)
  [ 1200.905405] [c000002d78a47cb0] [c00000000020699c] 
cpu_stopper_thread+0xfc/0x1f0
  [ 1200.905413] [c000002d78a47d60] [c000000000143ae0] 
smpboot_thread_fn+0x250/0x290
  [ 1200.905418] [c000002d78a47dc0] [c00000000013d728] kthread+0x1a8/0x1b0
  [ 1200.905426] [c000002d78a47e30] [c00000000000b658] 
ret_from_kernel_thread+0x5c/0x84
  [ 1200.905429] Instruction dump:
  [ 1200.905433] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac 913d0020 
2b9f0004 
  [ 1200.905445] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 
409eff74 2b890001 

  
  ---uname output---
  # uname -a
  Linux lep8d 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:37:15 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = Power 8 BML/Tuleta

  ----Additional Info-----
  rcu stalls and soft lockups leads to Hard LOCKUPs but is cpu becomes unstuck 
after hard lockup.

  dmesg is attached.
  sosreport will be attached.

  Reproducible : 90%

  ---Steps to Reproduce---
  1. wget https://github.com/ColinIanKing/stress-ng/archive/master.zip
  2. unzip master.zip; cd stress-ng-master;
  3. make; make install;
  4. Run the following command multiple times
  stress-ng --all <nr_cpus>  --vm-bytes 80%  --aggressive --maximize --oomable  
--timeout 300  --verify  --syslog  --metrics  --times

  ---Expected---
  Test should not cause any lockup or crash.

  == Comment: #1 - Harish Sriram <hasri...@in.ibm.com> - 2018-07-31
  03:50:49 ==

  
  == Comment: #5 - SRIKAR DRONAMRAJU <srikar.dronamr...@in.ibm.com> - 
2018-08-15 02:16:49 ==
  (unreliable)
  > [ 1199.361777] Task dump for CPU 145:
  > [ 1199.361779] migration/145   R  running task        0   883      2
  > 0x00000804
  > [ 1199.361783] Call Trace:
  > [ 1199.361787] [c000002ff0f5fa40] [c000002ff0f5fb00] 0xc000002ff0f5fb00
  > (unreliable)
  > [ 1199.361791] Task dump for CPU 159:
  > [ 1199.361792] migration/159   R  running task        0   967      2
  > 0x00000804
  > [ 1199.361796] Call Trace:
  > [ 1199.361799] [c000002d78a47a40] [c000002d78a47b00] 0xc000002d78a47b00
  > (unreliable)
  > [ 1199.787698] audit: type=1400 audit(1533020003.985:314): apparmor="STATUS"
  > operation="profile_replace" profile="unconfined"
  > name="/usr/bin/pulseaudio-eg" pid=12813 comm="stress-ng-appar"
  > [ 1200.781159] watchdog: BUG: soft lockup - CPU#145 stuck for 23s!
  > [migration/145:883]
  > [ 1200.781163] Modules linked in: snd_seq snd_seq_device snd_timer snd
  > soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common
  > serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock
  > twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth
  > ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320
  > unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg
  > powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq
  > leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser
  > rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi
  > scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure
  > scsi_transport_sas btrfs zstd_compress raid10 raid456 async_raid6_recov
  > async_memcpy async_pq async_xor async_tx xor
  > [ 1200.781321]  raid6_pq libcrc32c raid1 raid0 multipath linear uas
  > usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr
  > [ 1200.781353] CPU: 145 PID: 883 Comm: migration/145 Not tainted
  > 4.15.0-29-generic #31-Ubuntu
  > [ 1200.781359] NIP:  c000000000206594 LR: c00000000020699c CTR:
  > c000000000206470
  > [ 1200.781364] REGS: c000002ff0f5f9e0 TRAP: 0901   Not tainted 
  > (4.15.0-29-generic)
  > [ 1200.781366] MSR:  9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>  CR:
  > 28002222  XER: 20000000
  > [ 1200.781392] CFAR: c0000000002065a4 SOFTE: 1 
  >                GPR00: c00000000020699c c000002ff0f5fc60 c0000000016eaf00
  > 0000000000000000 
  >                GPR04: 0000000000000001 0000002ffbf90000 0000009dde82957a
  > 0000000000000000 
  >                GPR08: c00000000fae3b00 0000000000000001 c000000000d432f8
  > 0000000000000bdf 
  >                GPR12: 0000000000000000 c00000000fae3b00 
  > [ 1200.781453] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0
  > [ 1200.781461] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0
  > [ 1200.781462] Call Trace:
  > [ 1200.781475] [c000002ff0f5fc60] [c000002ff0f5fd40] 0xc000002ff0f5fd40
  > (unreliable)
  > [ 1200.781487] [c000002ff0f5fcb0] [c00000000020699c]
  > cpu_stopper_thread+0xfc/0x1f0
  > [ 1200.781503] [c000002ff0f5fd60] [c000000000143ae0]
  > smpboot_thread_fn+0x250/0x290
  > [ 1200.781510] [c000002ff0f5fdc0] [c00000000013d728] kthread+0x1a8/0x1b0
  > [ 1200.781522] [c000002ff0f5fe30] [c00000000000b658]
  > ret_from_kernel_thread+0x5c/0x84
  > [ 1200.781525] Instruction dump:
  > [ 1200.781531] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac
  > 913d0020 2b9f0004 
  > [ 1200.781551] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840
  > 409eff74 2b890001 

  
  2610e88 stop_machine: Disable preemption after queueing stopper threads
  9fb8d5d stop_machine: Disable preemption when waking two stopper threads
  0b26351 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock

  These 3 commit are missing that could be the reason we are seeing
  these traces.

  == Comment: #13 - Harish Sriram <hasri...@in.ibm.com> - 2018-08-16
  10:29:21 ==

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1855679/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to