from:"CAI Qian"

[4.9-rc5] kernel BUG at kernel/sched/rt.c:764!

2016-11-16 Thread CAI Qian

Occasionally, this machine hit it during boot with this config.

http://people.redhat.com/qcai/tmp/config-god-4.9rc2

[   18.125103] x2apic enabled
[   18.128182] Switched APIC routing to cluster x2apic.
[   18.137063] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[   18.153805] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 
(family: 0x6, model: 0x4f, stepping: 0x1)
[   18.165021] Performance Events: PEBS fmt2+, Broadwell events, 16-deep LBR, 
full-width counters, Intel PMU driver.
[   18.176595] ... version:3
[   18.181074] ... bit width:  48
[   18.185647] ... generic registers:  4
[   18.190124] ... value mask: 
[   18.196055] ... max period: 
[   18.201986] ... fixed-purpose events:   3
[   18.206453] ... event mask: 0007000f
[   20.648609] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[   20.702972] x86: Booting SMP configuration:
[   20.712720]  node  #0, CPUs:#1[   20.847790]   #2[   20.974935]  
 #3[   21.109503]   #4[   21.245976]   #5[   21.383743]   #6[   21.554680]   
#7[   21.703806]   #8[   21.864885]   #9[   22.018063]  #10[   22.154530]  #11[ 
  22.345902]  #12[   22.523560]  #13[   22.661047]  #14[   22.821751]  #15[   
22.999171]  #16[   23.142056]  #17[   23.315885]  #18[   23.450304]  #19[   
23.642422]  #20[   23.816793]  #21[   23.955838]  node  #1, CPUs:   #22[   
24.166606]  #23[   24.340859]  #24[   24.501884]  #25[   24.679949]  #26[   
24.839650]  #27[   25.014436]  #28[   25.179093]  #29[   25.319094]  #30[   
25.482463]  #31[   25.636126]  #32[   25.820521]  #33[   25.983310]  #34[   
26.162576]  #35[   26.326769]  #36[   26.508100]  #37[   26.672296]  #38[   
26.847331]  #39[   27.011591]  #40[   27.154124]  #41[   27.314263]  #42[   
27.472972]  #43[   27.661509]  node  #0, CPUs:   #44[   27.827697]  #45[   
28.009301]  #46[   28.173231]  #47[   28.353749]  #48[   28.517099]  #49[   
28.686097]  #50[   28.850425]  #51[   28.928408]  #52[   29.006194]  #53[   
29.084035]  #54[   29.161891]  #55[   29.239825]  #56[   29.317658]  #57[   
29.395585]  #58[   29.473428]  #59[   29.551326]  #60[   29.629383]  #61[   
29.707235]  #62[   29.785018]  #63[   29.862918]  #64[   29.940800]  #65[   
30.018569]  node  #1, CPUs:   #66[   30.098050]  #67[   30.175751]  #68[   
30.253356]  #69[   30.331126]  #70[   30.408855]  #71[   30.486657]  #72[   
30.564568]  #73[   30.642370]  #74[   30.720135]  #75[   30.798071]  #76[   
30.875768]  #77[   30.953472]  #78[   31.031228]  #79[   31.109028]  #80[   
31.186751]  #81[   31.264504]  #82[   31.342254]  #83[   31.420027]  #84[   
31.497807]  #85[   31.575565]  #86[   31.653323]  #87[   31.720946] x86: Booted 
up 2 nodes, 88 CPUs
[   31.725672] 
[   31.728884] | NMI testsuite:
[   31.732102] 
[   31.735706]   remote IPI:  ok  |
[   31.749619]local IPI:  ok  |
[   31.765645] 
[   31.769257] Good, all   2 testcases passed! |
[   31.774148] -
[   31.779019] smpboot: Total of 88 processors activated (391240.60 BogoMIPS)
[   32.277215] perf: interrupt took too long (7702 > 6366), lowering 
kernel.perf_event_max_sample_rate to 25000
[   32.277237] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 
1.174 msecs
[   32.316901] [ cut here ]
[   32.322058] kernel BUG at kernel/sched/rt.c:764!
[   32.327210] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   32.334593] Modules linked in:
[   32.338013] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc5+ #2
[   32.345008] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   32.356367] task: 880e3f278000 task.stack: 88084828
[   32.362973] RIP: 0010:[]  [] 
rq_offline_rt+0x6b5/0xda0
[   32.372217] RSP: :8808482878a8  EFLAGS: 00010082
[   32.378144] RAX: 0007 RBX: fd050f80 RCX: 10675688
[   32.386109] RDX:  RSI: 833ab440 RDI: 880e3f278cd4
[   32.394073] RBP: 880848287958 R08:  R09: 880e575e2950
[   32.402037] R10: 0001 R11:  R12: 0058
[   32.410001] R13: 857390c4 R14: 85f60100 R15: dc00
[   32.417965] FS:  () GS:88085a60() 
knlGS:
[   32.426997] CS:  0010 DS:  ES:  CR0: 80050033
[   32.433409] CR2: 881077fff000 CR3: 0361 CR4: 003406f0
[   32.441373] DR0:  DR1:  DR2: 
[   32.449337] DR3:  DR6: fffe0ff0 DR7: 0400
[   32.457301] Stack:
[   32.459544]  85f607f8 85f60210 880e575e2a40 
880e575e2aa0
[   32.467842]  85f60100 880e575e2930 85f60218 
880e575e2040
[

[4.9-rc5] kernel BUG at kernel/sched/rt.c:764!

2016-11-16 Thread CAI Qian

Occasionally, this machine hit it during boot with this config.

http://people.redhat.com/qcai/tmp/config-god-4.9rc2

[   18.125103] x2apic enabled
[   18.128182] Switched APIC routing to cluster x2apic.
[   18.137063] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[   18.153805] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 
(family: 0x6, model: 0x4f, stepping: 0x1)
[   18.165021] Performance Events: PEBS fmt2+, Broadwell events, 16-deep LBR, 
full-width counters, Intel PMU driver.
[   18.176595] ... version:3
[   18.181074] ... bit width:  48
[   18.185647] ... generic registers:  4
[   18.190124] ... value mask: 
[   18.196055] ... max period: 
[   18.201986] ... fixed-purpose events:   3
[   18.206453] ... event mask: 0007000f
[   20.648609] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[   20.702972] x86: Booting SMP configuration:
[   20.712720]  node  #0, CPUs:#1[   20.847790]   #2[   20.974935]  
 #3[   21.109503]   #4[   21.245976]   #5[   21.383743]   #6[   21.554680]   
#7[   21.703806]   #8[   21.864885]   #9[   22.018063]  #10[   22.154530]  #11[ 
  22.345902]  #12[   22.523560]  #13[   22.661047]  #14[   22.821751]  #15[   
22.999171]  #16[   23.142056]  #17[   23.315885]  #18[   23.450304]  #19[   
23.642422]  #20[   23.816793]  #21[   23.955838]  node  #1, CPUs:   #22[   
24.166606]  #23[   24.340859]  #24[   24.501884]  #25[   24.679949]  #26[   
24.839650]  #27[   25.014436]  #28[   25.179093]  #29[   25.319094]  #30[   
25.482463]  #31[   25.636126]  #32[   25.820521]  #33[   25.983310]  #34[   
26.162576]  #35[   26.326769]  #36[   26.508100]  #37[   26.672296]  #38[   
26.847331]  #39[   27.011591]  #40[   27.154124]  #41[   27.314263]  #42[   
27.472972]  #43[   27.661509]  node  #0, CPUs:   #44[   27.827697]  #45[   
28.009301]  #46[   28.173231]  #47[   28.353749]  #48[   28.517099]  #49[   
28.686097]  #50[   28.850425]  #51[   28.928408]  #52[   29.006194]  #53[   
29.084035]  #54[   29.161891]  #55[   29.239825]  #56[   29.317658]  #57[   
29.395585]  #58[   29.473428]  #59[   29.551326]  #60[   29.629383]  #61[   
29.707235]  #62[   29.785018]  #63[   29.862918]  #64[   29.940800]  #65[   
30.018569]  node  #1, CPUs:   #66[   30.098050]  #67[   30.175751]  #68[   
30.253356]  #69[   30.331126]  #70[   30.408855]  #71[   30.486657]  #72[   
30.564568]  #73[   30.642370]  #74[   30.720135]  #75[   30.798071]  #76[   
30.875768]  #77[   30.953472]  #78[   31.031228]  #79[   31.109028]  #80[   
31.186751]  #81[   31.264504]  #82[   31.342254]  #83[   31.420027]  #84[   
31.497807]  #85[   31.575565]  #86[   31.653323]  #87[   31.720946] x86: Booted 
up 2 nodes, 88 CPUs
[   31.725672] 
[   31.728884] | NMI testsuite:
[   31.732102] 
[   31.735706]   remote IPI:  ok  |
[   31.749619]local IPI:  ok  |
[   31.765645] 
[   31.769257] Good, all   2 testcases passed! |
[   31.774148] -
[   31.779019] smpboot: Total of 88 processors activated (391240.60 BogoMIPS)
[   32.277215] perf: interrupt took too long (7702 > 6366), lowering 
kernel.perf_event_max_sample_rate to 25000
[   32.277237] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 
1.174 msecs
[   32.316901] [ cut here ]
[   32.322058] kernel BUG at kernel/sched/rt.c:764!
[   32.327210] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   32.334593] Modules linked in:
[   32.338013] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc5+ #2
[   32.345008] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   32.356367] task: 880e3f278000 task.stack: 88084828
[   32.362973] RIP: 0010:[]  [] 
rq_offline_rt+0x6b5/0xda0
[   32.372217] RSP: :8808482878a8  EFLAGS: 00010082
[   32.378144] RAX: 0007 RBX: fd050f80 RCX: 10675688
[   32.386109] RDX:  RSI: 833ab440 RDI: 880e3f278cd4
[   32.394073] RBP: 880848287958 R08:  R09: 880e575e2950
[   32.402037] R10: 0001 R11:  R12: 0058
[   32.410001] R13: 857390c4 R14: 85f60100 R15: dc00
[   32.417965] FS:  () GS:88085a60() 
knlGS:
[   32.426997] CS:  0010 DS:  ES:  CR0: 80050033
[   32.433409] CR2: 881077fff000 CR3: 0361 CR4: 003406f0
[   32.441373] DR0:  DR1:  DR2: 
[   32.449337] DR3:  DR6: fffe0ff0 DR7: 0400
[   32.457301] Stack:
[   32.459544]  85f607f8 85f60210 880e575e2a40 
880e575e2aa0
[   32.467842]  85f60100 880e575e2930 85f60218 
880e575e2040
[

local DoS - systemd hang or timeout with cgroup traces

2016-10-27 Thread CAI Qian

So this can still be reproduced in 4.9-rc2 by running trinity as a non-root
user within 30-minute on this machine on either ext4 or xfs. Below is the
trace on ext4 and the sysrq-w report.

http://people.redhat.com/qcai/tmp/dmesg-ext4-cgroup-hang

CAI Qian

- Original Message -
> From: "tj" <t...@kernel.org>
> Sent: Tuesday, October 4, 2016 5:42:19 PM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] 
> splice_read reworked)
> 
> ...
> > Not sure if related, but right after this lockdep happened and trinity
> > running by a
> > non-privileged user finished inside the container. The host's systemctl
> > command just
> > hang or timeout which renders the whole system unusable.
> > 
> > # systemctl status docker
> > Failed to get properties: Connection timed out
> > 
> > # systemctl reboot (hang)
> > 
> ...
> > [ 5535.893675] INFO: lockdep is turned off.
> > [ 5535.898085] INFO: task kworker/45:4:146035 blocked for more than 120
> > seconds.
> > [ 5535.906059]   Tainted: GW   4.8.0-rc8-fornext+ #1
> > [ 5535.912865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 5535.921613] kworker/45:4D 880853e9b950 14048 146035  2
> > 0x0080
> > [ 5535.929630] Workqueue: cgroup_destroy css_killed_work_fn
> > [ 5535.935582]  880853e9b950  
> > 88086c6da000
> > [ 5535.943882]  88086c9e2000 880853e9c000 880853e9baa0
> > 88086c9e2000
> > [ 5535.952205]  880853e9ba98 0001 880853e9b968
> > 817cdaaf
> > [ 5535.960522] Call Trace:
> > [ 5535.963265]  [] schedule+0x3f/0xa0
> > [ 5535.968817]  [] schedule_timeout+0x3db/0x6f0
> > [ 5535.975346]  [] ? wait_for_completion+0x45/0x130
> > [ 5535.982256]  [] wait_for_completion+0xc3/0x130
> > [ 5535.988972]  [] ? wake_up_q+0x80/0x80
> > [ 5535.994804]  [] drop_sysctl_table+0xc4/0xe0
> > [ 5536.001227]  [] drop_sysctl_table+0x77/0xe0
> > [ 5536.007648]  [] unregister_sysctl_table+0x4d/0xa0
> > [ 5536.014654]  [] unregister_sysctl_table+0x7f/0xa0
> > [ 5536.021657]  []
> > unregister_sched_domain_sysctl+0x15/0x40
> > [ 5536.029344]  [] partition_sched_domains+0x44/0x450
> > [ 5536.036447]  [] ? __mutex_unlock_slowpath+0x111/0x1f0
> > [ 5536.043844]  [] rebuild_sched_domains_locked+0x64/0xb0
> > [ 5536.051336]  [] update_flag+0x11d/0x210
> > [ 5536.057373]  [] ? mutex_lock_nested+0x2df/0x450
> > [ 5536.064186]  [] ? cpuset_css_offline+0x1b/0x60
> > [ 5536.070899]  [] ? trace_hardirqs_on+0xd/0x10
> > [ 5536.077420]  [] ? mutex_lock_nested+0x2df/0x450
> > [ 5536.084234]  [] ? css_killed_work_fn+0x25/0x220
> > [ 5536.091049]  [] cpuset_css_offline+0x35/0x60
> > [ 5536.097571]  [] css_killed_work_fn+0x5c/0x220
> > [ 5536.104207]  [] process_one_work+0x1df/0x710
> > [ 5536.110736]  [] ? process_one_work+0x160/0x710
> > [ 5536.117461]  [] worker_thread+0x12b/0x4a0
> > [ 5536.123697]  [] ? process_one_work+0x710/0x710
> > [ 5536.130426]  [] kthread+0xfe/0x120
> > [ 5536.135991]  [] ret_from_fork+0x1f/0x40
> > [ 5536.142041]  [] ? kthread_create_on_node+0x230/0x230
> 
> This one seems to be the offender.  cgroup is trying to offline a
> cpuset css, which takes place under cgroup_mutex.  The offlining ends
> up trying to drain active usages of a sysctl table which apprently is
> not happening.  Did something hang or crash while trying to generate
> sysctl content?

local DoS - systemd hang or timeout with cgroup traces

2016-10-27 Thread CAI Qian

So this can still be reproduced in 4.9-rc2 by running trinity as a non-root
user within 30-minute on this machine on either ext4 or xfs. Below is the
trace on ext4 and the sysrq-w report.

http://people.redhat.com/qcai/tmp/dmesg-ext4-cgroup-hang

CAI Qian

- Original Message -
> From: "tj" 
> Sent: Tuesday, October 4, 2016 5:42:19 PM
> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] 
> splice_read reworked)
> 
> ...
> > Not sure if related, but right after this lockdep happened and trinity
> > running by a
> > non-privileged user finished inside the container. The host's systemctl
> > command just
> > hang or timeout which renders the whole system unusable.
> > 
> > # systemctl status docker
> > Failed to get properties: Connection timed out
> > 
> > # systemctl reboot (hang)
> > 
> ...
> > [ 5535.893675] INFO: lockdep is turned off.
> > [ 5535.898085] INFO: task kworker/45:4:146035 blocked for more than 120
> > seconds.
> > [ 5535.906059]   Tainted: GW   4.8.0-rc8-fornext+ #1
> > [ 5535.912865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 5535.921613] kworker/45:4D 880853e9b950 14048 146035  2
> > 0x0080
> > [ 5535.929630] Workqueue: cgroup_destroy css_killed_work_fn
> > [ 5535.935582]  880853e9b950  
> > 88086c6da000
> > [ 5535.943882]  88086c9e2000 880853e9c000 880853e9baa0
> > 88086c9e2000
> > [ 5535.952205]  880853e9ba98 0001 880853e9b968
> > 817cdaaf
> > [ 5535.960522] Call Trace:
> > [ 5535.963265]  [] schedule+0x3f/0xa0
> > [ 5535.968817]  [] schedule_timeout+0x3db/0x6f0
> > [ 5535.975346]  [] ? wait_for_completion+0x45/0x130
> > [ 5535.982256]  [] wait_for_completion+0xc3/0x130
> > [ 5535.988972]  [] ? wake_up_q+0x80/0x80
> > [ 5535.994804]  [] drop_sysctl_table+0xc4/0xe0
> > [ 5536.001227]  [] drop_sysctl_table+0x77/0xe0
> > [ 5536.007648]  [] unregister_sysctl_table+0x4d/0xa0
> > [ 5536.014654]  [] unregister_sysctl_table+0x7f/0xa0
> > [ 5536.021657]  []
> > unregister_sched_domain_sysctl+0x15/0x40
> > [ 5536.029344]  [] partition_sched_domains+0x44/0x450
> > [ 5536.036447]  [] ? __mutex_unlock_slowpath+0x111/0x1f0
> > [ 5536.043844]  [] rebuild_sched_domains_locked+0x64/0xb0
> > [ 5536.051336]  [] update_flag+0x11d/0x210
> > [ 5536.057373]  [] ? mutex_lock_nested+0x2df/0x450
> > [ 5536.064186]  [] ? cpuset_css_offline+0x1b/0x60
> > [ 5536.070899]  [] ? trace_hardirqs_on+0xd/0x10
> > [ 5536.077420]  [] ? mutex_lock_nested+0x2df/0x450
> > [ 5536.084234]  [] ? css_killed_work_fn+0x25/0x220
> > [ 5536.091049]  [] cpuset_css_offline+0x35/0x60
> > [ 5536.097571]  [] css_killed_work_fn+0x5c/0x220
> > [ 5536.104207]  [] process_one_work+0x1df/0x710
> > [ 5536.110736]  [] ? process_one_work+0x160/0x710
> > [ 5536.117461]  [] worker_thread+0x12b/0x4a0
> > [ 5536.123697]  [] ? process_one_work+0x710/0x710
> > [ 5536.130426]  [] kthread+0xfe/0x120
> > [ 5536.135991]  [] ret_from_fork+0x1f/0x40
> > [ 5536.142041]  [] ? kthread_create_on_node+0x230/0x230
> 
> This one seems to be the offender.  cgroup is trying to offline a
> cpuset css, which takes place under cgroup_mutex.  The offlining ends
> up trying to drain active usages of a sysctl table which apprently is
> not happening.  Did something hang or crash while trying to generate
> sysctl content?

Re: [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-20 Thread CAI Qian


> CAI Qian reported crash [1] in uncore device removal related
> to CONFIG_DEBUG_TEST_DRIVER_REMOVE option.
> 
> The reason for crash is that  perf_pmu_unregister tries to remove
> pmu device which is not added at this point. We add pmu devices
> only after pmu_bus is registered which happens in perf_event_sysfs_init
> init call and sets pmu_bus_running flag.
> 
> The fix is to get the pmu_bus_running flag state at the point
> the pmu is taken  out of the pmus list and  remove the device
> later only if it's set.
> 
> [1] https://marc.info/?l=linux-kernel=147688837328451
> 
> Reported-by: CAI Qian <caiq...@redhat.com>
> Signed-off-by: Jiri Olsa <jo...@kernel.org>

Tested-by: CAI Qian <caiq...@redhat.com>

Re: [PATCH] perf: Protect pmu device removal with pmu_bus_running check CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-20 Thread CAI Qian


> CAI Qian reported crash [1] in uncore device removal related
> to CONFIG_DEBUG_TEST_DRIVER_REMOVE option.
> 
> The reason for crash is that  perf_pmu_unregister tries to remove
> pmu device which is not added at this point. We add pmu devices
> only after pmu_bus is registered which happens in perf_event_sysfs_init
> init call and sets pmu_bus_running flag.
> 
> The fix is to get the pmu_bus_running flag state at the point
> the pmu is taken  out of the pmus list and  remove the device
> later only if it's set.
> 
> [1] https://marc.info/?l=linux-kernel=147688837328451
> 
> Reported-by: CAI Qian 
> Signed-off-by: Jiri Olsa 

Tested-by: CAI Qian

Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-19 Thread CAI Qian


> I think the reason here is that presume pmu devices are always added,
> but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> is set which might happen after uncore initcall
> 
> attached patch fixes the issue for me
Tested-by: CAI Qian <caiq...@redhat.com>
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c6e47e97b33f..c2099b799d16 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
>   idr_remove(_idr, pmu->type);
>   if (pmu->nr_addr_filters)
>   device_remove_file(pmu->dev, _attr_nr_addr_filters);
> - device_del(pmu->dev);
> - put_device(pmu->dev);
> + if (pmu_bus_running) {
> + device_del(pmu->dev);
> + put_device(pmu->dev);
> + }
>   free_pmu_context(pmu);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
>

Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-19 Thread CAI Qian


> I think the reason here is that presume pmu devices are always added,
> but we add them only if pmu_bus_running (in perf_event_sysfs_init)
> is set which might happen after uncore initcall
> 
> attached patch fixes the issue for me
Tested-by: CAI Qian 
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c6e47e97b33f..c2099b799d16 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu)
>   idr_remove(_idr, pmu->type);
>   if (pmu->nr_addr_filters)
>   device_remove_file(pmu->dev, _attr_nr_addr_filters);
> - device_del(pmu->dev);
> - put_device(pmu->dev);
> + if (pmu_bus_running) {
> + device_del(pmu->dev);
> + put_device(pmu->dev);
> + }
>   free_pmu_context(pmu);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_unregister);
>

[4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-19 Thread CAI Qian

:  0010 DS:  ES:  CR0: 80050033
[   66.917265] CR2:  CR3: 0360a000 CR4: 003406e0
[   66.925228] DR0:  DR1:  DR2: 
[   66.933191] DR3:  DR6: fffe0ff0 DR7: 0400
[   66.941154] Stack:
[   66.943396]  82c8a5d2 881077f705c0 110108f5ff13 
880847aff920
[   66.951698]   86d346c8 41b58ab3 
8338e870
[   66.959997]  822413d0 880e0044  
880847aff8c0
[   66.968296] Call Trace:
[   66.971025]  [] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.978603]  [] ? cleanup_glue_dir+0x140/0x140
[   66.985309]  [] perf_pmu_unregister+0x142/0x6d0
[   66.992111]  [] ? preempt_count_sub+0x5e/0xe0
[   66.998720]  [] uncore_pmu_unregister+0x67/0xd0
[   67.005523]  [] uncore_pci_remove+0x32c/0x510
[   67.012131]  [] pci_device_remove+0xb2/0x240
[   67.018641]  [] driver_probe_device+0x146/0xfc0
[   67.025442]  [] ? driver_probe_device+0xfc0/0xfc0
[   67.032437]  [] __driver_attach+0x1b5/0x230
[   67.038852]  [] bus_for_each_dev+0x130/0x200
[   67.045361]  [] ? do_raw_spin_trylock+0x110/0x110
[   67.052357]  [] ? subsys_dev_iter_init+0x100/0x100
[   67.059450]  [] ? preempt_count_sub+0x5e/0xe0
[   67.066056]  [] driver_attach+0x42/0x70
[   67.072081]  [] bus_add_driver+0x406/0x870
[   67.078397]  [] driver_register+0x1a9/0x3d0
[   67.084809]  [] ? __raw_spin_lock_init+0x32/0x120
[   67.091803]  [] __pci_register_driver+0x1ad/0x2b0
[   67.098798]  [] ? pci_pm_runtime_idle+0x180/0x180
[   67.105792]  [] intel_uncore_init+0x58d/0x64c
[   67.112399]  [] ? amd_iommu_pc_init+0x16/0x344
[   67.119103]  [] ? uncore_type_init+0x5cb/0x5cb
[   67.125806]  [] do_one_initcall+0xb7/0x2a0
[   67.132124]  [] ? initcall_blacklisted+0x1a0/0x1a0
[   67.139215]  [] ? up_write+0x7d/0x120
[   67.145046]  [] ? up_read+0x40/0x40
[   67.150684]  [] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   67.158262]  [] ? __wake_up+0x44/0x50
[   67.164094]  [] kernel_init_freeable+0x68a/0x768
[   67.170992]  [] ? start_kernel+0x751/0x751
[   67.177310]  [] ? compat_start_thread+0xa0/0xa0
[   67.184111]  [] ? rest_init+0x190/0x190
[   67.190137]  [] kernel_init+0x13/0x140
[   67.196064]  [] ? rest_init+0x190/0x190
[   67.202090]  [] ret_from_fork+0x27/0x40
[   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 
ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 
02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   67.229872] RIP  [] device_del+0x96/0x860
[   67.236101]  RSP 
[   67.240059] ---[ end trace 69358e866a1e3f6c ]---
[   67.245377] Kernel panic - not syncing: Fatal exception
[   67.251271] ---[ end Kernel panic - not syncing: Fatal exception


- Original Message -
> From: "Rob Herring" <r...@kernel.org>
> To: "Greg Kroah-Hartman" <gre...@linuxfoundation.org>
> Cc: "CAI Qian" <caiq...@redhat.com>, "linux-kernel" 
> <linux-kernel@vger.kernel.org>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gre...@linuxfoundation.org> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
> 
> Rob
>

[4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-19 Thread CAI Qian

:  0010 DS:  ES:  CR0: 80050033
[   66.917265] CR2:  CR3: 0360a000 CR4: 003406e0
[   66.925228] DR0:  DR1:  DR2: 
[   66.933191] DR3:  DR6: fffe0ff0 DR7: 0400
[   66.941154] Stack:
[   66.943396]  82c8a5d2 881077f705c0 110108f5ff13 
880847aff920
[   66.951698]   86d346c8 41b58ab3 
8338e870
[   66.959997]  822413d0 880e0044  
880847aff8c0
[   66.968296] Call Trace:
[   66.971025]  [] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   66.978603]  [] ? cleanup_glue_dir+0x140/0x140
[   66.985309]  [] perf_pmu_unregister+0x142/0x6d0
[   66.992111]  [] ? preempt_count_sub+0x5e/0xe0
[   66.998720]  [] uncore_pmu_unregister+0x67/0xd0
[   67.005523]  [] uncore_pci_remove+0x32c/0x510
[   67.012131]  [] pci_device_remove+0xb2/0x240
[   67.018641]  [] driver_probe_device+0x146/0xfc0
[   67.025442]  [] ? driver_probe_device+0xfc0/0xfc0
[   67.032437]  [] __driver_attach+0x1b5/0x230
[   67.038852]  [] bus_for_each_dev+0x130/0x200
[   67.045361]  [] ? do_raw_spin_trylock+0x110/0x110
[   67.052357]  [] ? subsys_dev_iter_init+0x100/0x100
[   67.059450]  [] ? preempt_count_sub+0x5e/0xe0
[   67.066056]  [] driver_attach+0x42/0x70
[   67.072081]  [] bus_add_driver+0x406/0x870
[   67.078397]  [] driver_register+0x1a9/0x3d0
[   67.084809]  [] ? __raw_spin_lock_init+0x32/0x120
[   67.091803]  [] __pci_register_driver+0x1ad/0x2b0
[   67.098798]  [] ? pci_pm_runtime_idle+0x180/0x180
[   67.105792]  [] intel_uncore_init+0x58d/0x64c
[   67.112399]  [] ? amd_iommu_pc_init+0x16/0x344
[   67.119103]  [] ? uncore_type_init+0x5cb/0x5cb
[   67.125806]  [] do_one_initcall+0xb7/0x2a0
[   67.132124]  [] ? initcall_blacklisted+0x1a0/0x1a0
[   67.139215]  [] ? up_write+0x7d/0x120
[   67.145046]  [] ? up_read+0x40/0x40
[   67.150684]  [] ? _raw_spin_unlock_irqrestore+0x42/0x70
[   67.158262]  [] ? __wake_up+0x44/0x50
[   67.164094]  [] kernel_init_freeable+0x68a/0x768
[   67.170992]  [] ? start_kernel+0x751/0x751
[   67.177310]  [] ? compat_start_thread+0xa0/0xa0
[   67.184111]  [] ? rest_init+0x190/0x190
[   67.190137]  [] kernel_init+0x13/0x140
[   67.196064]  [] ? rest_init+0x190/0x190
[   67.202090]  [] ret_from_fork+0x27/0x40
[   67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 
ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 
02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   67.229872] RIP  [] device_del+0x96/0x860
[   67.236101]  RSP 
[   67.240059] ---[ end trace 69358e866a1e3f6c ]---
[   67.245377] Kernel panic - not syncing: Fatal exception
[   67.251271] ---[ end Kernel panic - not syncing: Fatal exception


- Original Message -
> From: "Rob Herring" 
> To: "Greg Kroah-Hartman" 
> Cc: "CAI Qian" , "linux-kernel" 
> 
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
>  wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
> 
> Rob
>

Re: [4.9-rc1] kernel panic from `cat /proc/driver/rtc`

2016-10-18 Thread CAI Qian


> Is that fixed by http://patchwork.ozlabs.org/patch/683728/ ?
Yup.

Re: [4.9-rc1] kernel panic from `cat /proc/driver/rtc`

2016-10-18 Thread CAI Qian


> Is that fixed by http://patchwork.ozlabs.org/patch/683728/ ?
Yup.

Re: [4.9-rc1] kernel panic from `cat /proc/driver/rtc`

2016-10-18 Thread CAI Qian

It turns out this panic can only be reproduced with 
CONFIG_DEBUG_TEST_DRIVER_REMOVE
enabled. There are some errors in dmesg when the config is enabled.

[   71.215937] rtc_cmos 00:00: RTC can wake from S4
[   71.218096] input: AT Translated Set 2 keyboard as 
/devices/platform/i8042/serio1/input/input2
[   71.232591] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[   71.239518] rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes nvram, 
hpet irqs
[   71.248160] rtc_cmos 00:00: RTC can wake from S4
[   71.267680] rtc_cmos: probe of 00:00 failed with error -16

It works fine without it.

$ cat /proc/driver/rtc 
rtc_time: 14:30:56
rtc_date: 2016-10-18
alrm_time: 19:48:53
alrm_date: 2016-10-18
alarm_IRQ: no
alrm_pending: no
update IRQ enabled: no
periodic IRQ enabled: no
periodic IRQ frequency: 1024
max user IRQ frequency: 64
24hr: yes
periodic_IRQ: no
update_IRQ: no
HPET_emulated: yes
BCD: yes
DST_enable: no
periodic_freq: 1024
batt_status: okay

   CAI Qian

- Original Message -
> From: "CAI Qian" <caiq...@redhat.com>
> To: rtc-li...@googlegroups.com, "linux-kernel" <linux-kernel@vger.kernel.org>
> Cc: "Alessandro Zummo" <a.zu...@towertech.it>, "Alexandre Belloni" 
> <alexandre.bell...@free-electrons.com>
> Sent: Tuesday, October 18, 2016 9:28:12 AM
> Subject: [4.9-rc1] kernel panic from `cat /proc/driver/rtc`
> 
> This looks like new introduced in the 4.9 merge window. I have never saw any
> of
> those while testing v4.8.
>    CAI Qian
> 
> $ cat /proc/driver/rtc
> 
> [ 7890.728704] UBSAN: Undefined behaviour in drivers/rtc/rtc-cmos.c:433:10
> [ 7890.736088] member access within null pointer of type 'struct cmos_rtc'
> [ 7890.743472] CPU: 81 PID: 32522 Comm: proc01 Tainted: G        W
> 4.9.0-rc1 #32
> [ 7890.752017] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS
> GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [ 7890.763373]  88080a54f868 81d23184 41b58ab3
> 8334713f
> [ 7890.771670]  81d230c4 88080a54f890 88080a54f840
> 88081f324900
> [ 7890.779966]  82ff7b40 831279a0 01b1
> d3274681
> [ 7890.788262] Call Trace:
> [ 7890.790993]  [] dump_stack+0xc0/0x12c
> [ 7890.796825]  [] ? _atomic_dec_and_lock+0xc4/0xc4
> [ 7890.803723]  [] ubsan_epilogue+0xd/0x8a
> [ 7890.809748]  [] __ubsan_handle_type_mismatch+0x166/0x434
> [ 7890.817421]  [] ? ubsan_epilogue+0x8a/0x8a
> [ 7890.823738]  [] ? __this_cpu_preempt_check+0x13/0x20
> [ 7890.831025]  [] ? trace_hardirqs_on_caller+0x520/0x720
> [ 7890.838509]  [] cmos_procfs+0x1b1/0x1e0
> [ 7890.844535]  [] ? rtc_handler+0x140/0x140
> [ 7890.850754]  [] rtc_proc_show+0x180/0x640
> [ 7890.856973]  [] ? rtc_proc_open+0xd0/0xd0
> [ 7890.863196]  [] ? kasan_kmalloc+0xad/0xe0
> [ 7890.869419]  [] seq_read+0x334/0x1400
> [ 7890.875252]  [] ? seq_hlist_start_percpu+0x4a0/0x4a0
> [ 7890.882538]  [] ? save_stack_trace+0x1b/0x20
> [ 7890.889050]  [] ? save_stack+0x46/0xd0
> [ 7890.894979]  [] ? kasan_slab_free+0x71/0xb0
> [ 7890.901393]  [] ? kmem_cache_free+0xe9/0x660
> [ 7890.907905]  [] ? putname+0xe0/0x120
> [ 7890.913639]  [] ? print_usage_bug+0x700/0x700
> [ 7890.920250]  [] proc_reg_read+0x110/0x270
> [ 7890.926470]  [] __vfs_read+0x106/0x990
> [ 7890.932398]  [] ? do_iter_readv_writev+0x840/0x840
> [ 7890.939490]  [] ? selinux_file_permission+0x3c5/0x550
> [ 7890.946874]  [] ? security_file_permission+0x176/0x220
> [ 7890.954354]  [] ? rw_verify_area+0xd8/0x380
> [ 7890.960767]  [] vfs_read+0x118/0x400
> [ 7890.966500]  [] SyS_read+0xdf/0x1d0
> [ 7890.972137]  [] ? vfs_copy_file_range+0x8f0/0x8f0
> [ 7890.979132]  [] ? __this_cpu_preempt_check+0x13/0x20
> [ 7890.986416]  [] ? vfs_copy_file_range+0x8f0/0x8f0
> [ 7890.993412]  [] do_syscall_64+0x19d/0x540
> [ 7890.999631]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [ 7891.006820]
> 
> [ 7891.016322] kasan: CONFIG_KASAN_INLINE enabled
> [ 7891.021292] kasan: GPF could be caused by NULL-ptr deref or user memory
> access
> [ 7891.029371] general protection fault:  [#1] PREEMPT SMP
> DEBUG_PAGEALLOC KASAN
> [ 7891.037722] Modules linked in: tun ext4 jbd2 mbcache loop veth
> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat
> nf_conntrack br_netfilter bridge stp llc overlay intel_rapl sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel aesni_i

Re: [4.9-rc1] kernel panic from `cat /proc/driver/rtc`

2016-10-18 Thread CAI Qian

It turns out this panic can only be reproduced with 
CONFIG_DEBUG_TEST_DRIVER_REMOVE
enabled. There are some errors in dmesg when the config is enabled.

[   71.215937] rtc_cmos 00:00: RTC can wake from S4
[   71.218096] input: AT Translated Set 2 keyboard as 
/devices/platform/i8042/serio1/input/input2
[   71.232591] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[   71.239518] rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes nvram, 
hpet irqs
[   71.248160] rtc_cmos 00:00: RTC can wake from S4
[   71.267680] rtc_cmos: probe of 00:00 failed with error -16

It works fine without it.

$ cat /proc/driver/rtc 
rtc_time: 14:30:56
rtc_date: 2016-10-18
alrm_time: 19:48:53
alrm_date: 2016-10-18
alarm_IRQ: no
alrm_pending: no
update IRQ enabled: no
periodic IRQ enabled: no
periodic IRQ frequency: 1024
max user IRQ frequency: 64
24hr: yes
periodic_IRQ: no
update_IRQ: no
HPET_emulated: yes
BCD: yes
DST_enable: no
periodic_freq: 1024
batt_status: okay

   CAI Qian

- Original Message -
> From: "CAI Qian" 
> To: rtc-li...@googlegroups.com, "linux-kernel" 
> Cc: "Alessandro Zummo" , "Alexandre Belloni" 
> 
> Sent: Tuesday, October 18, 2016 9:28:12 AM
> Subject: [4.9-rc1] kernel panic from `cat /proc/driver/rtc`
> 
> This looks like new introduced in the 4.9 merge window. I have never saw any
> of
> those while testing v4.8.
>    CAI Qian
> 
> $ cat /proc/driver/rtc
> 
> [ 7890.728704] UBSAN: Undefined behaviour in drivers/rtc/rtc-cmos.c:433:10
> [ 7890.736088] member access within null pointer of type 'struct cmos_rtc'
> [ 7890.743472] CPU: 81 PID: 32522 Comm: proc01 Tainted: G        W
> 4.9.0-rc1 #32
> [ 7890.752017] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS
> GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
> [ 7890.763373]  88080a54f868 81d23184 41b58ab3
> 8334713f
> [ 7890.771670]  81d230c4 88080a54f890 88080a54f840
> 88081f324900
> [ 7890.779966]  82ff7b40 831279a0 01b1
> d3274681
> [ 7890.788262] Call Trace:
> [ 7890.790993]  [] dump_stack+0xc0/0x12c
> [ 7890.796825]  [] ? _atomic_dec_and_lock+0xc4/0xc4
> [ 7890.803723]  [] ubsan_epilogue+0xd/0x8a
> [ 7890.809748]  [] __ubsan_handle_type_mismatch+0x166/0x434
> [ 7890.817421]  [] ? ubsan_epilogue+0x8a/0x8a
> [ 7890.823738]  [] ? __this_cpu_preempt_check+0x13/0x20
> [ 7890.831025]  [] ? trace_hardirqs_on_caller+0x520/0x720
> [ 7890.838509]  [] cmos_procfs+0x1b1/0x1e0
> [ 7890.844535]  [] ? rtc_handler+0x140/0x140
> [ 7890.850754]  [] rtc_proc_show+0x180/0x640
> [ 7890.856973]  [] ? rtc_proc_open+0xd0/0xd0
> [ 7890.863196]  [] ? kasan_kmalloc+0xad/0xe0
> [ 7890.869419]  [] seq_read+0x334/0x1400
> [ 7890.875252]  [] ? seq_hlist_start_percpu+0x4a0/0x4a0
> [ 7890.882538]  [] ? save_stack_trace+0x1b/0x20
> [ 7890.889050]  [] ? save_stack+0x46/0xd0
> [ 7890.894979]  [] ? kasan_slab_free+0x71/0xb0
> [ 7890.901393]  [] ? kmem_cache_free+0xe9/0x660
> [ 7890.907905]  [] ? putname+0xe0/0x120
> [ 7890.913639]  [] ? print_usage_bug+0x700/0x700
> [ 7890.920250]  [] proc_reg_read+0x110/0x270
> [ 7890.926470]  [] __vfs_read+0x106/0x990
> [ 7890.932398]  [] ? do_iter_readv_writev+0x840/0x840
> [ 7890.939490]  [] ? selinux_file_permission+0x3c5/0x550
> [ 7890.946874]  [] ? security_file_permission+0x176/0x220
> [ 7890.954354]  [] ? rw_verify_area+0xd8/0x380
> [ 7890.960767]  [] vfs_read+0x118/0x400
> [ 7890.966500]  [] SyS_read+0xdf/0x1d0
> [ 7890.972137]  [] ? vfs_copy_file_range+0x8f0/0x8f0
> [ 7890.979132]  [] ? __this_cpu_preempt_check+0x13/0x20
> [ 7890.986416]  [] ? vfs_copy_file_range+0x8f0/0x8f0
> [ 7890.993412]  [] do_syscall_64+0x19d/0x540
> [ 7890.999631]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [ 7891.006820]
> 
> [ 7891.016322] kasan: CONFIG_KASAN_INLINE enabled
> [ 7891.021292] kasan: GPF could be caused by NULL-ptr deref or user memory
> access
> [ 7891.029371] general protection fault:  [#1] PREEMPT SMP
> DEBUG_PAGEALLOC KASAN
> [ 7891.037722] Modules linked in: tun ext4 jbd2 mbcache loop veth
> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat
> nf_conntrack br_netfilter bridge stp llc overlay intel_rapl sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> ablk_helper cryptd intel_uncore iTCO_wdt iTCO_vendor_support pcspkr i2c_i801
> i2c_smbus sg mei_me m

[4.9-rc1] kernel panic from `cat /proc/driver/rtc`

2016-10-18 Thread CAI Qian

This looks like new introduced in the 4.9 merge window. I have never saw any of
those while testing v4.8.
   CAI Qian

$ cat /proc/driver/rtc

[ 7890.728704] UBSAN: Undefined behaviour in drivers/rtc/rtc-cmos.c:433:10
[ 7890.736088] member access within null pointer of type 'struct cmos_rtc'
[ 7890.743472] CPU: 81 PID: 32522 Comm: proc01 Tainted: GW   
4.9.0-rc1 #32
[ 7890.752017] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ 7890.763373]  88080a54f868 81d23184 41b58ab3 
8334713f
[ 7890.771670]  81d230c4 88080a54f890 88080a54f840 
88081f324900
[ 7890.779966]  82ff7b40 831279a0 01b1 
d3274681
[ 7890.788262] Call Trace:
[ 7890.790993]  [] dump_stack+0xc0/0x12c
[ 7890.796825]  [] ? _atomic_dec_and_lock+0xc4/0xc4
[ 7890.803723]  [] ubsan_epilogue+0xd/0x8a
[ 7890.809748]  [] __ubsan_handle_type_mismatch+0x166/0x434
[ 7890.817421]  [] ? ubsan_epilogue+0x8a/0x8a
[ 7890.823738]  [] ? __this_cpu_preempt_check+0x13/0x20
[ 7890.831025]  [] ? trace_hardirqs_on_caller+0x520/0x720
[ 7890.838509]  [] cmos_procfs+0x1b1/0x1e0
[ 7890.844535]  [] ? rtc_handler+0x140/0x140
[ 7890.850754]  [] rtc_proc_show+0x180/0x640
[ 7890.856973]  [] ? rtc_proc_open+0xd0/0xd0
[ 7890.863196]  [] ? kasan_kmalloc+0xad/0xe0
[ 7890.869419]  [] seq_read+0x334/0x1400
[ 7890.875252]  [] ? seq_hlist_start_percpu+0x4a0/0x4a0
[ 7890.882538]  [] ? save_stack_trace+0x1b/0x20
[ 7890.889050]  [] ? save_stack+0x46/0xd0
[ 7890.894979]  [] ? kasan_slab_free+0x71/0xb0
[ 7890.901393]  [] ? kmem_cache_free+0xe9/0x660
[ 7890.907905]  [] ? putname+0xe0/0x120
[ 7890.913639]  [] ? print_usage_bug+0x700/0x700
[ 7890.920250]  [] proc_reg_read+0x110/0x270
[ 7890.926470]  [] __vfs_read+0x106/0x990
[ 7890.932398]  [] ? do_iter_readv_writev+0x840/0x840
[ 7890.939490]  [] ? selinux_file_permission+0x3c5/0x550
[ 7890.946874]  [] ? security_file_permission+0x176/0x220
[ 7890.954354]  [] ? rw_verify_area+0xd8/0x380
[ 7890.960767]  [] vfs_read+0x118/0x400
[ 7890.966500]  [] SyS_read+0xdf/0x1d0
[ 7890.972137]  [] ? vfs_copy_file_range+0x8f0/0x8f0
[ 7890.979132]  [] ? __this_cpu_preempt_check+0x13/0x20
[ 7890.986416]  [] ? vfs_copy_file_range+0x8f0/0x8f0
[ 7890.993412]  [] do_syscall_64+0x19d/0x540
[ 7890.999631]  [] entry_SYSCALL64_slow_path+0x25/0x25
[ 7891.006820] 

[ 7891.016322] kasan: CONFIG_KASAN_INLINE enabled
[ 7891.021292] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[ 7891.029371] general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC 
KASAN
[ 7891.037722] Modules linked in: tun ext4 jbd2 mbcache loop veth 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat 
nf_conntrack br_netfilter bridge stp llc overlay intel_rapl sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
intel_uncore iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 i2c_smbus sg mei_me 
mei lpc_ich shpchp ipmi_ssif mxm_wmi ipmi_si ipmi_msghandler wmi 
acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crc32c_intel drm ixgbe 
serio_raw ahci libahci libata mdio ptp i2c_core pps_core dca fjes dm_mirror 
dm_region_hash dm_log dm_mod
[ 7891.127218] CPU: 81 PID: 32522 Comm: proc01 Tainted: GW   
4.9.0-rc1 #32
[ 7891.135764] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ 7891.147124] task: 88081f324900 task.stack: 88080a548000
[ 7891.153731] RIP: 0010:[]  [] 
cmos_procfs+0xb0/0x1e0
[ 7891.162677] RSP: 0018:88080a54f938  EFLAGS: 00010246
[ 7891.168605] RAX: dc00 RBX:  RCX: 
[ 7891.176569] RDX:  RSI: 82e9a500 RDI: ed01014a9f20
[ 7891.184534] RBP: 88080a54f990 R08: 88081f324900 R09: 0007
[ 7891.192499] R10: 88080a54f780 R11: 0006 R12: 0002
[ 7891.200463] R13: 831272e0 R14: 1101014a9f39 R15: 83127d60
[ 7891.208430] FS:  7fe516b93800() GS:880e5680() 
knlGS:
[ 7891.217461] CS:  0010 DS:  ES:  CR0: 80050033
[ 7891.223873] CR2: 7f153a7200a0 CR3: 000e48f8 CR4: 003406e0
[ 7891.231838] DR0:  DR1:  DR2: 
[ 7891.239802] DR3:  DR6: fffe0ff0 DR7: 0400
[ 7891.247765] Stack:
[ 7891.250006]  01230001 8808 1101014a9f39 
d3274681
[ 7891.258299]  d3274681 880e108a9a40 880e108a9a40 
88084745b300

[4.9-rc1] kernel panic from `cat /proc/driver/rtc`

2016-10-18 Thread CAI Qian

This looks like new introduced in the 4.9 merge window. I have never saw any of
those while testing v4.8.
   CAI Qian

$ cat /proc/driver/rtc

[ 7890.728704] UBSAN: Undefined behaviour in drivers/rtc/rtc-cmos.c:433:10
[ 7890.736088] member access within null pointer of type 'struct cmos_rtc'
[ 7890.743472] CPU: 81 PID: 32522 Comm: proc01 Tainted: GW   
4.9.0-rc1 #32
[ 7890.752017] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ 7890.763373]  88080a54f868 81d23184 41b58ab3 
8334713f
[ 7890.771670]  81d230c4 88080a54f890 88080a54f840 
88081f324900
[ 7890.779966]  82ff7b40 831279a0 01b1 
d3274681
[ 7890.788262] Call Trace:
[ 7890.790993]  [] dump_stack+0xc0/0x12c
[ 7890.796825]  [] ? _atomic_dec_and_lock+0xc4/0xc4
[ 7890.803723]  [] ubsan_epilogue+0xd/0x8a
[ 7890.809748]  [] __ubsan_handle_type_mismatch+0x166/0x434
[ 7890.817421]  [] ? ubsan_epilogue+0x8a/0x8a
[ 7890.823738]  [] ? __this_cpu_preempt_check+0x13/0x20
[ 7890.831025]  [] ? trace_hardirqs_on_caller+0x520/0x720
[ 7890.838509]  [] cmos_procfs+0x1b1/0x1e0
[ 7890.844535]  [] ? rtc_handler+0x140/0x140
[ 7890.850754]  [] rtc_proc_show+0x180/0x640
[ 7890.856973]  [] ? rtc_proc_open+0xd0/0xd0
[ 7890.863196]  [] ? kasan_kmalloc+0xad/0xe0
[ 7890.869419]  [] seq_read+0x334/0x1400
[ 7890.875252]  [] ? seq_hlist_start_percpu+0x4a0/0x4a0
[ 7890.882538]  [] ? save_stack_trace+0x1b/0x20
[ 7890.889050]  [] ? save_stack+0x46/0xd0
[ 7890.894979]  [] ? kasan_slab_free+0x71/0xb0
[ 7890.901393]  [] ? kmem_cache_free+0xe9/0x660
[ 7890.907905]  [] ? putname+0xe0/0x120
[ 7890.913639]  [] ? print_usage_bug+0x700/0x700
[ 7890.920250]  [] proc_reg_read+0x110/0x270
[ 7890.926470]  [] __vfs_read+0x106/0x990
[ 7890.932398]  [] ? do_iter_readv_writev+0x840/0x840
[ 7890.939490]  [] ? selinux_file_permission+0x3c5/0x550
[ 7890.946874]  [] ? security_file_permission+0x176/0x220
[ 7890.954354]  [] ? rw_verify_area+0xd8/0x380
[ 7890.960767]  [] vfs_read+0x118/0x400
[ 7890.966500]  [] SyS_read+0xdf/0x1d0
[ 7890.972137]  [] ? vfs_copy_file_range+0x8f0/0x8f0
[ 7890.979132]  [] ? __this_cpu_preempt_check+0x13/0x20
[ 7890.986416]  [] ? vfs_copy_file_range+0x8f0/0x8f0
[ 7890.993412]  [] do_syscall_64+0x19d/0x540
[ 7890.999631]  [] entry_SYSCALL64_slow_path+0x25/0x25
[ 7891.006820] 

[ 7891.016322] kasan: CONFIG_KASAN_INLINE enabled
[ 7891.021292] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[ 7891.029371] general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC 
KASAN
[ 7891.037722] Modules linked in: tun ext4 jbd2 mbcache loop veth 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat 
nf_conntrack br_netfilter bridge stp llc overlay intel_rapl sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
intel_uncore iTCO_wdt iTCO_vendor_support pcspkr i2c_i801 i2c_smbus sg mei_me 
mei lpc_ich shpchp ipmi_ssif mxm_wmi ipmi_si ipmi_msghandler wmi 
acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crc32c_intel drm ixgbe 
serio_raw ahci libahci libata mdio ptp i2c_core pps_core dca fjes dm_mirror 
dm_region_hash dm_log dm_mod
[ 7891.127218] CPU: 81 PID: 32522 Comm: proc01 Tainted: GW   
4.9.0-rc1 #32
[ 7891.135764] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ 7891.147124] task: 88081f324900 task.stack: 88080a548000
[ 7891.153731] RIP: 0010:[]  [] 
cmos_procfs+0xb0/0x1e0
[ 7891.162677] RSP: 0018:88080a54f938  EFLAGS: 00010246
[ 7891.168605] RAX: dc00 RBX:  RCX: 
[ 7891.176569] RDX:  RSI: 82e9a500 RDI: ed01014a9f20
[ 7891.184534] RBP: 88080a54f990 R08: 88081f324900 R09: 0007
[ 7891.192499] R10: 88080a54f780 R11: 0006 R12: 0002
[ 7891.200463] R13: 831272e0 R14: 1101014a9f39 R15: 83127d60
[ 7891.208430] FS:  7fe516b93800() GS:880e5680() 
knlGS:
[ 7891.217461] CS:  0010 DS:  ES:  CR0: 80050033
[ 7891.223873] CR2: 7f153a7200a0 CR3: 000e48f8 CR4: 003406e0
[ 7891.231838] DR0:  DR1:  DR2: 
[ 7891.239802] DR3:  DR6: fffe0ff0 DR7: 0400
[ 7891.247765] Stack:
[ 7891.250006]  01230001 8808 1101014a9f39 
d3274681
[ 7891.258299]  d3274681 880e108a9a40 880e108a9a40 
88084745b300

Re: [PATCH] mm: kmemleak: Ensure that the task stack is not freed during scanning

2016-10-12 Thread CAI Qian



- Original Message -
> From: "Catalin Marinas" <catalin.mari...@arm.com>
> To: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org, "Andrew Morton" 
> <a...@linux-foundation.org>, "Andy Lutomirski" <l...@kernel.org>,
> "CAI Qian" <caiq...@redhat.com>
> Sent: Wednesday, October 12, 2016 5:57:03 AM
> Subject: [PATCH] mm: kmemleak: Ensure that the task stack is not freed during 
> scanning
> 
> Commit 68f24b08ee89 ("sched/core: Free the stack early if
> CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
> during kmemleak_scan() execution, leading to either a NULL pointer
> fault (if task->stack is NULL) or kmemleak accessing already freed
> memory. This patch uses the new try_get_task_stack() API to ensure that
> the task stack is not freed during kmemleak stack scanning.
> 
> Fixes: 68f24b08ee89 ("sched/core: Free the stack early if
> CONFIG_THREAD_INFO_IN_TASK")
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Andy Lutomirski <l...@kernel.org>
> Cc: CAI Qian <caiq...@redhat.com>
> Reported-by: CAI Qian <caiq...@redhat.com>
> Signed-off-by: Catalin Marinas <catalin.mari...@arm.com>

Tested-by: CAI Qian <caiq...@redhat.com>

> ---
> 
> This was reported in a subsequent comment here:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=173901
> 
> However, the original bugzilla entry doesn't look related to task stack
> freeing as it was first reported on 4.8-rc8. Andy, sorry for cc'ing you
> to bugzilla, please feel free to remove your email from the bug above (I
> can't seem to be able to do it).
> 
>  mm/kmemleak.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index a5e453cf05c4..e5355a5b423f 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1453,8 +1453,11 @@ static void kmemleak_scan(void)
>  
>   read_lock(_lock);
>   do_each_thread(g, p) {
> - scan_block(task_stack_page(p), task_stack_page(p) +
> -THREAD_SIZE, NULL);
> + void *stack = try_get_task_stack(p);
> + if (stack) {
> + scan_block(stack, stack + THREAD_SIZE, NULL);
> + put_task_stack(p);
> + }
>   } while_each_thread(g, p);
>   read_unlock(_lock);
>   }
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> 
>

Re: [PATCH] mm: kmemleak: Ensure that the task stack is not freed during scanning

2016-10-12 Thread CAI Qian



- Original Message -
> From: "Catalin Marinas" 
> To: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org, "Andrew Morton" 
> , "Andy Lutomirski" ,
> "CAI Qian" 
> Sent: Wednesday, October 12, 2016 5:57:03 AM
> Subject: [PATCH] mm: kmemleak: Ensure that the task stack is not freed during 
> scanning
> 
> Commit 68f24b08ee89 ("sched/core: Free the stack early if
> CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
> during kmemleak_scan() execution, leading to either a NULL pointer
> fault (if task->stack is NULL) or kmemleak accessing already freed
> memory. This patch uses the new try_get_task_stack() API to ensure that
> the task stack is not freed during kmemleak stack scanning.
> 
> Fixes: 68f24b08ee89 ("sched/core: Free the stack early if
> CONFIG_THREAD_INFO_IN_TASK")
> Cc: Andrew Morton 
> Cc: Andy Lutomirski 
> Cc: CAI Qian 
> Reported-by: CAI Qian 
> Signed-off-by: Catalin Marinas 

Tested-by: CAI Qian 

> ---
> 
> This was reported in a subsequent comment here:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=173901
> 
> However, the original bugzilla entry doesn't look related to task stack
> freeing as it was first reported on 4.8-rc8. Andy, sorry for cc'ing you
> to bugzilla, please feel free to remove your email from the bug above (I
> can't seem to be able to do it).
> 
>  mm/kmemleak.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index a5e453cf05c4..e5355a5b423f 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1453,8 +1453,11 @@ static void kmemleak_scan(void)
>  
>   read_lock(_lock);
>   do_each_thread(g, p) {
> - scan_block(task_stack_page(p), task_stack_page(p) +
> -THREAD_SIZE, NULL);
> + void *stack = try_get_task_stack(p);
> + if (stack) {
> + scan_block(stack, stack + THREAD_SIZE, NULL);
> + put_task_stack(p);
> + }
>   } while_each_thread(g, p);
>   read_unlock(_lock);
>   }
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 
> 
>

Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian


> Is the backtrace the same in that case?
Very close. I saw "intel" there, and here is the list those modules on the 
system.

# lsmod | grep intel
intel_rapl 20480  0 
intel_powerclamp   16384  0 
kvm_intel 208896  0 
kvm   630784  1 kvm_intel
ghash_clmulni_intel16384  0 
aesni_intel   167936  0 
lrw16384  1 aesni_intel
glue_helper16384  1 aesni_intel
ablk_helper16384  1 aesni_intel
cryptd 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
crc32c_intel   24576  1

[   17.884926] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[   17.893700] IP: [] device_del+0x17/0x280
[   17.899848] PGD 0 
[   17.902109] Oops:  [#1] PREEMPT SMP
[   17.906394] Modules linked in:
[   17.909823] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.8.0-remove-nokasan+ 
#5
[   17.917985] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   17.929347] task: 8810556c8000 task.stack: c9078000
[   17.935955] RIP: 0010:[]  [] 
device_del+0x17/0x280
[   17.944811] RSP: :c907bc00  EFLAGS: 00010286
[   17.950742] RAX:  RBX: 88085c8e3c00 RCX: 0001
[   17.958708] RDX: 881059d6 RSI: 000b RDI: 
[   17.966675] RBP: c907bc38 R08: d38c0f63 R09: 
[   17.974640] R10:  R11:  R12: 
[   17.982606] R13: 881054099000 R14: 0001 R15: 
[   17.990574] FS:  () GS:88105e40() 
knlGS:
[   17.999606] CS:  0010 DS:  ES:  CR0: 80050033
[   18.006022] CR2:  CR3: 01c06000 CR4: 003406e0
[   18.013989] DR0:  DR1:  DR2: 
[   18.021954] DR3:  DR6: fffe0ff0 DR7: 0400
[   18.029919] Stack:
[   18.032163]   dd652bd0 88085c8e3c00 
88085c8e3c00
[   18.040475]  88085c8e3400 881054099000 0001 
c907bc58
[   18.048788]  811c9680 88085c8e3c00 88085c8e3400 
c907bc88
[   18.057090] Call Trace:
[   18.059819]  [] perf_pmu_unregister+0x90/0x150
[   18.066529]  [] uncore_pci_remove+0xc8/0x160
[   18.073044]  [] pci_device_remove+0x39/0xc0
[   18.079468]  [] driver_probe_device+0xbe/0x4d0
[   18.086176]  [] __driver_attach+0xe3/0xf0
[   18.092399]  [] ? driver_probe_device+0x4d0/0x4d0
[   18.099400]  [] bus_for_each_dev+0x73/0xc0
[   18.105722]  [] driver_attach+0x1e/0x20
[   18.111752]  [] bus_add_driver+0x200/0x270
[   18.118078]  [] driver_register+0x60/0xe0
[   18.124303]  [] __pci_register_driver+0x60/0x70
[   18.131117]  [] intel_uncore_init+0x277/0x2df
[   18.137728]  [] ? uncore_type_init+0x15f/0x15f
[   18.11]  [] do_one_initcall+0x50/0x190
[   18.150768]  [] ? parse_args+0x2d1/0x490
[   18.156894]  [] kernel_init_freeable+0x1ff/0x29e
[   18.163801]  [] ? rest_init+0x140/0x140
[   18.169831]  [] kernel_init+0xe/0x100
[   18.175668]  [] ret_from_fork+0x2a/0x40
[   18.181695] Code: e8 cf d4 29 00 5b 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 
00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fc 48 83 ec 18 <4c> 8b 
2f 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 87 
[   18.203631] RIP  [] device_del+0x17/0x280
[   18.209867]  RSP 
[   18.213759] CR2: 
[   18.217548] ---[ end trace 91188545987fc9d9 ]---
[   18.222706] Kernel panic - not syncing: Fatal exception
[   18.228692] ---[ end Kernel panic - not syncing: Fatal exception

Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian


> Is the backtrace the same in that case?
Very close. I saw "intel" there, and here is the list those modules on the 
system.

# lsmod | grep intel
intel_rapl 20480  0 
intel_powerclamp   16384  0 
kvm_intel 208896  0 
kvm   630784  1 kvm_intel
ghash_clmulni_intel16384  0 
aesni_intel   167936  0 
lrw16384  1 aesni_intel
glue_helper16384  1 aesni_intel
ablk_helper16384  1 aesni_intel
cryptd 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
crc32c_intel   24576  1

[   17.884926] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[   17.893700] IP: [] device_del+0x17/0x280
[   17.899848] PGD 0 
[   17.902109] Oops:  [#1] PREEMPT SMP
[   17.906394] Modules linked in:
[   17.909823] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.8.0-remove-nokasan+ 
#5
[   17.917985] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   17.929347] task: 8810556c8000 task.stack: c9078000
[   17.935955] RIP: 0010:[]  [] 
device_del+0x17/0x280
[   17.944811] RSP: :c907bc00  EFLAGS: 00010286
[   17.950742] RAX:  RBX: 88085c8e3c00 RCX: 0001
[   17.958708] RDX: 881059d6 RSI: 000b RDI: 
[   17.966675] RBP: c907bc38 R08: d38c0f63 R09: 
[   17.974640] R10:  R11:  R12: 
[   17.982606] R13: 881054099000 R14: 0001 R15: 
[   17.990574] FS:  () GS:88105e40() 
knlGS:
[   17.999606] CS:  0010 DS:  ES:  CR0: 80050033
[   18.006022] CR2:  CR3: 01c06000 CR4: 003406e0
[   18.013989] DR0:  DR1:  DR2: 
[   18.021954] DR3:  DR6: fffe0ff0 DR7: 0400
[   18.029919] Stack:
[   18.032163]   dd652bd0 88085c8e3c00 
88085c8e3c00
[   18.040475]  88085c8e3400 881054099000 0001 
c907bc58
[   18.048788]  811c9680 88085c8e3c00 88085c8e3400 
c907bc88
[   18.057090] Call Trace:
[   18.059819]  [] perf_pmu_unregister+0x90/0x150
[   18.066529]  [] uncore_pci_remove+0xc8/0x160
[   18.073044]  [] pci_device_remove+0x39/0xc0
[   18.079468]  [] driver_probe_device+0xbe/0x4d0
[   18.086176]  [] __driver_attach+0xe3/0xf0
[   18.092399]  [] ? driver_probe_device+0x4d0/0x4d0
[   18.099400]  [] bus_for_each_dev+0x73/0xc0
[   18.105722]  [] driver_attach+0x1e/0x20
[   18.111752]  [] bus_add_driver+0x200/0x270
[   18.118078]  [] driver_register+0x60/0xe0
[   18.124303]  [] __pci_register_driver+0x60/0x70
[   18.131117]  [] intel_uncore_init+0x277/0x2df
[   18.137728]  [] ? uncore_type_init+0x15f/0x15f
[   18.11]  [] do_one_initcall+0x50/0x190
[   18.150768]  [] ? parse_args+0x2d1/0x490
[   18.156894]  [] kernel_init_freeable+0x1ff/0x29e
[   18.163801]  [] ? rest_init+0x140/0x140
[   18.169831]  [] kernel_init+0xe/0x100
[   18.175668]  [] ret_from_fork+0x2a/0x40
[   18.181695] Code: e8 cf d4 29 00 5b 5d c3 66 90 66 2e 0f 1f 84 00 00 00 00 
00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fc 48 83 ec 18 <4c> 8b 
2f 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 87 
[   18.203631] RIP  [] device_del+0x17/0x280
[   18.209867]  RSP 
[   18.213759] CR2: 
[   18.217548] ---[ end trace 91188545987fc9d9 ]---
[   18.222706] Kernel panic - not syncing: Fatal exception
[   18.228692] ---[ end Kernel panic - not syncing: Fatal exception

Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian



- Original Message -
> From: "Rob Herring" <r...@kernel.org>
> To: "CAI Qian" <caiq...@redhat.com>
> Cc: "linux-kernel" <linux-kernel@vger.kernel.org>, "Greg Kroah-Hartman" 
> <gre...@linuxfoundation.org>
> Sent: Monday, October 10, 2016 1:09:43 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian <caiq...@redhat.com> wrote:
> > Not sure if anyone reported this before. With this kernel config, it is
> > 100% kernel panic so far with today's
> > mainline master HEAD.
> 
> Looks like it is catching what it is supposed to. Though looking
> through the code, I haven't found where the problem is. Does bind and
> unbind for this normally work?
I am not sure. It just panic at the bootup. If you can tell me debugging steps
you want to run, I can help test it out.
   CAI qian

Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian



- Original Message -
> From: "Rob Herring" 
> To: "CAI Qian" 
> Cc: "linux-kernel" , "Greg Kroah-Hartman" 
> 
> Sent: Monday, October 10, 2016 1:09:43 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 10:37 AM, CAI Qian  wrote:
> > Not sure if anyone reported this before. With this kernel config, it is
> > 100% kernel panic so far with today's
> > mainline master HEAD.
> 
> Looks like it is catching what it is supposed to. Though looking
> through the code, I haven't found where the problem is. Does bind and
> unbind for this normally work?
I am not sure. It just panic at the bootup. If you can tell me debugging steps
you want to run, I can help test it out.
   CAI qian

Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian



- Original Message -
> From: "Rob Herring" <r...@kernel.org>
> To: "Greg Kroah-Hartman" <gre...@linuxfoundation.org>
> Cc: "CAI Qian" <caiq...@redhat.com>, "linux-kernel" 
> <linux-kernel@vger.kernel.org>
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
> <gre...@linuxfoundation.org> wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
FYI, this can also be reproduced without kasan.
CAI Qian

Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian



- Original Message -
> From: "Rob Herring" 
> To: "Greg Kroah-Hartman" 
> Cc: "CAI Qian" , "linux-kernel" 
> 
> Sent: Monday, October 10, 2016 2:15:29 PM
> Subject: Re: kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic
> 
> On Mon, Oct 10, 2016 at 12:20 PM, Greg Kroah-Hartman
>  wrote:
> > On Mon, Oct 10, 2016 at 11:37:27AM -0400, CAI Qian wrote:
> >> Not sure if anyone reported this before. With this kernel config, it is
> >> 100% kernel panic so far with today's
> >> mainline master HEAD.
> >>
> >> http://people.redhat.com/qcai/tmp/config-kasan-remove
> >
> > Oh it breaks things with kasan disabled as well :)
> >
> > See Laszlo's bug report already a few hours ago, Rob is on it...
> 
> I think this one is different though. It has a remove() hook.
FYI, this can also be reproduced without kasan.
CAI Qian

KASAN (inline) + CONFIG_KPROBES_SANITY_TEST failures and kernel panic

2016-10-10 Thread CAI Qian

It usually report failures when enabled KASAN (inline) and 
CONFIG_KPROBES_SANITY_TEST on today's
mainline HEAD. Occasionally, kernel panic with trace at the bottom.

[   52.973247] Kprobe smoke test: started
[   53.078585] 
==
[   53.08] BUG: KASAN: stack-out-of-bounds in 
setjmp_pre_handler+0x17c/0x280 at addr 88085259fba8
[   53.097060] Read of size 64 by task swapper/0/1
[   53.102125] page:ea00214967c0 count:0 mapcount:0 mapping:          
(null) index:0x0
[   53.111073] flags: 0x2f8000()
[   53.115163] page dumped because: kasan: bad access detected
[   53.121392] CPU: 87 PID: 1 Comm: swapper/0 Not tainted 4.8.0+ #3
[   53.128103] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   53.139468]  88085259f8d8 81a6a6e1 88085259f970 
88085259fba8
[   53.147779]  88085259f960 816322e3 88085259f9a0 
0046
[   53.156090]  019e2c79 0092 0246 
88085259f920
[   53.164415] Call Trace:
[   53.167161]  [] dump_stack+0x85/0xc4
[   53.172906]  [] kasan_report_error+0x4c3/0x4f0
[   53.179624]  [] ? __this_cpu_preempt_check+0x13/0x20
[   53.186916]  [] kasan_report+0x58/0x60
[   53.192854]  [] ? setjmp_pre_handler+0x17c/0x280
[   53.199763]  [] check_memory_region+0x13e/0x1a0
[   53.206573]  [] memcpy+0x23/0x50
[   53.211937]  [] setjmp_pre_handler+0x17c/0x280
[   53.218656]  [] ? kprobe_target+0x1/0x20
[   53.224787]  [] ? kprobe_target+0x1/0x20
[   53.230917]  [] kprobe_ftrace_handler+0x1cb/0x300
[   53.237919]  [] ? kprobe_target+0x5/0x20
[   53.244060]  [] ? 
stop_machine_from_inactive_cpu+0x250/0x250
[   53.252141]  [] ftrace_ops_assist_func+0x259/0x3b0
[   53.259240]  [] 0xa0d5
[   53.264804]  [] ? kprobe_target+0x1/0x20
[   53.270938]  [] kprobe_target+0x5/0x20
[   53.276875]  [] init_test_probes+0x1e0/0x5d0
[   53.283395]  [] ? kprobe_target+0x5/0x20
[   53.289525]  [] ? init_test_probes+0x1e0/0x5d0
[   53.296245]  [] ? j_kprobe_target+0x40/0x40
[   53.302676]  [] init_kprobes+0x3f8/0x43d
[   53.308807]  [] ? debugfs_kprobe_init+0x12f/0x12f
[   53.315811]  [] ? debug_mutex_init+0x2d/0x60
[   53.322330]  [] ? __mutex_init+0xcf/0x100
[   53.328559]  [] ? audit_fsnotify_init+0x3a/0x3a
[   53.335362]  [] ? fsnotify_alloc_group+0x185/0x250
[   53.342454]  [] ? debugfs_kprobe_init+0x12f/0x12f
[   53.349458]  [] do_one_initcall+0xa9/0x240
[   53.355783]  [] ? initcall_blacklisted+0x180/0x180
[   53.362883]  [] ? parse_args+0x520/0x990
[   53.369016]  [] ? 
__usermodehelper_set_disable_depth+0x42/0x50
[   53.377284]  [] kernel_init_freeable+0x540/0x610
[   53.384188]  [] ? start_kernel+0x70d/0x70d
[   53.390514]  [] ? _raw_spin_unlock_irq+0x3d/0x60
[   53.397411]  [] ? finish_task_switch+0x189/0x6c0
[   53.404317]  [] ? finish_task_switch+0x15b/0x6c0
[   53.411227]  [] ? rest_init+0x160/0x160
[   53.417262]  [] kernel_init+0x13/0x120
[   53.423196]  [] ? rest_init+0x160/0x160
[   53.429229]  [] ret_from_fork+0x2a/0x40
[   53.435260] Memory state around the buggy address:
[   53.440616]  88085259fa80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[   53.448675]  88085259fb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[   53.456741] >88085259fb80: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f4 f4 f3 
f3
[   53.464808]                                      ^
[   53.470159]  88085259fc00: f3 f3 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 
f4
[   53.478226]  88085259fc80: f4 f2 f2 f2 f2 00 f4 f4 f4 f3 f3 f3 f3 00 00 
00
[   53.486291] 
==
[   53.494355] Disabling lock debugging due to kernel taint
[   53.500374] 
==
[   53.508449] BUG: KASAN: stack-out-of-bounds in 
longjmp_break_handler+0x1df/0x2a0 at addr 88085259fba8
[   53.519134] Write of size 64 by task swapper/0/1
[   53.524294] page:ea00214967c0 count:0 mapcount:0 mapping:          
(null) index:0x0
[   53.533245] flags: 0x2f8000()
[   53.537333] page dumped because: kasan: bad access detected
[   53.543560] CPU: 87 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0+ 
#3
[   53.551627] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   53.562987]  880e5eecfd98 81a6a6e1 880e5eecfe30 
88085259fba8
[   53.571291]  880e5eecfe20 816322e3  

[   53.579588]   0092  

[   53.587899] Call Trace:
[   53.590635]  <#DB>  [] dump_stack+0x85/0xc4
[   53.597084]  [] kasan_report_error+0x4c3/0x4f0
[   53.603797]  [] kasan_report+0x58/0x60
[   53.609733]  [] ? longjmp_break_handler+0x1df/0x2a0
[   53.616932]  [] check_memory_region+0x13e/0x1a0
[   53.623732]  [] memcpy+0x37/0x50
[   53.629085]  []

KASAN (inline) + CONFIG_KPROBES_SANITY_TEST failures and kernel panic

2016-10-10 Thread CAI Qian

It usually report failures when enabled KASAN (inline) and 
CONFIG_KPROBES_SANITY_TEST on today's
mainline HEAD. Occasionally, kernel panic with trace at the bottom.

[   52.973247] Kprobe smoke test: started
[   53.078585] 
==
[   53.08] BUG: KASAN: stack-out-of-bounds in 
setjmp_pre_handler+0x17c/0x280 at addr 88085259fba8
[   53.097060] Read of size 64 by task swapper/0/1
[   53.102125] page:ea00214967c0 count:0 mapcount:0 mapping:          
(null) index:0x0
[   53.111073] flags: 0x2f8000()
[   53.115163] page dumped because: kasan: bad access detected
[   53.121392] CPU: 87 PID: 1 Comm: swapper/0 Not tainted 4.8.0+ #3
[   53.128103] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   53.139468]  88085259f8d8 81a6a6e1 88085259f970 
88085259fba8
[   53.147779]  88085259f960 816322e3 88085259f9a0 
0046
[   53.156090]  019e2c79 0092 0246 
88085259f920
[   53.164415] Call Trace:
[   53.167161]  [] dump_stack+0x85/0xc4
[   53.172906]  [] kasan_report_error+0x4c3/0x4f0
[   53.179624]  [] ? __this_cpu_preempt_check+0x13/0x20
[   53.186916]  [] kasan_report+0x58/0x60
[   53.192854]  [] ? setjmp_pre_handler+0x17c/0x280
[   53.199763]  [] check_memory_region+0x13e/0x1a0
[   53.206573]  [] memcpy+0x23/0x50
[   53.211937]  [] setjmp_pre_handler+0x17c/0x280
[   53.218656]  [] ? kprobe_target+0x1/0x20
[   53.224787]  [] ? kprobe_target+0x1/0x20
[   53.230917]  [] kprobe_ftrace_handler+0x1cb/0x300
[   53.237919]  [] ? kprobe_target+0x5/0x20
[   53.244060]  [] ? 
stop_machine_from_inactive_cpu+0x250/0x250
[   53.252141]  [] ftrace_ops_assist_func+0x259/0x3b0
[   53.259240]  [] 0xa0d5
[   53.264804]  [] ? kprobe_target+0x1/0x20
[   53.270938]  [] kprobe_target+0x5/0x20
[   53.276875]  [] init_test_probes+0x1e0/0x5d0
[   53.283395]  [] ? kprobe_target+0x5/0x20
[   53.289525]  [] ? init_test_probes+0x1e0/0x5d0
[   53.296245]  [] ? j_kprobe_target+0x40/0x40
[   53.302676]  [] init_kprobes+0x3f8/0x43d
[   53.308807]  [] ? debugfs_kprobe_init+0x12f/0x12f
[   53.315811]  [] ? debug_mutex_init+0x2d/0x60
[   53.322330]  [] ? __mutex_init+0xcf/0x100
[   53.328559]  [] ? audit_fsnotify_init+0x3a/0x3a
[   53.335362]  [] ? fsnotify_alloc_group+0x185/0x250
[   53.342454]  [] ? debugfs_kprobe_init+0x12f/0x12f
[   53.349458]  [] do_one_initcall+0xa9/0x240
[   53.355783]  [] ? initcall_blacklisted+0x180/0x180
[   53.362883]  [] ? parse_args+0x520/0x990
[   53.369016]  [] ? 
__usermodehelper_set_disable_depth+0x42/0x50
[   53.377284]  [] kernel_init_freeable+0x540/0x610
[   53.384188]  [] ? start_kernel+0x70d/0x70d
[   53.390514]  [] ? _raw_spin_unlock_irq+0x3d/0x60
[   53.397411]  [] ? finish_task_switch+0x189/0x6c0
[   53.404317]  [] ? finish_task_switch+0x15b/0x6c0
[   53.411227]  [] ? rest_init+0x160/0x160
[   53.417262]  [] kernel_init+0x13/0x120
[   53.423196]  [] ? rest_init+0x160/0x160
[   53.429229]  [] ret_from_fork+0x2a/0x40
[   53.435260] Memory state around the buggy address:
[   53.440616]  88085259fa80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[   53.448675]  88085259fb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[   53.456741] >88085259fb80: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f4 f4 f3 
f3
[   53.464808]                                      ^
[   53.470159]  88085259fc00: f3 f3 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 
f4
[   53.478226]  88085259fc80: f4 f2 f2 f2 f2 00 f4 f4 f4 f3 f3 f3 f3 00 00 
00
[   53.486291] 
==
[   53.494355] Disabling lock debugging due to kernel taint
[   53.500374] 
==
[   53.508449] BUG: KASAN: stack-out-of-bounds in 
longjmp_break_handler+0x1df/0x2a0 at addr 88085259fba8
[   53.519134] Write of size 64 by task swapper/0/1
[   53.524294] page:ea00214967c0 count:0 mapcount:0 mapping:          
(null) index:0x0
[   53.533245] flags: 0x2f8000()
[   53.537333] page dumped because: kasan: bad access detected
[   53.543560] CPU: 87 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0+ 
#3
[   53.551627] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   53.562987]  880e5eecfd98 81a6a6e1 880e5eecfe30 
88085259fba8
[   53.571291]  880e5eecfe20 816322e3  

[   53.579588]   0092  

[   53.587899] Call Trace:
[   53.590635]  <#DB>  [] dump_stack+0x85/0xc4
[   53.597084]  [] kasan_report_error+0x4c3/0x4f0
[   53.603797]  [] kasan_report+0x58/0x60
[   53.609733]  [] ? longjmp_break_handler+0x1df/0x2a0
[   53.616932]  [] check_memory_region+0x13e/0x1a0
[   53.623732]  [] memcpy+0x37/0x50
[   53.629085]  []

kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian

Not sure if anyone reported this before. With this kernel config, it is 100% 
kernel panic so far with today's
mainline master HEAD.

http://people.redhat.com/qcai/tmp/config-kasan-remove

[   36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at 
[880071c7d000-880075c7cfff]
[   36.339108] Intel CQM monitoring enabled
[   36.343507] Intel MBM enabled
[   36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms 
ovfl timer
[   36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[   36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules
[   36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules
[   36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[   36.410064] kasan: CONFIG_KASAN_INLINE enabled
[   36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[   36.423111] general protection fault:  [#1] PREEMPT SMP KASAN
[   36.429911] Modules linked in:
[   36.41] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4
[   36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   36.451974] task: 880e524d task.stack: 88085288
[   36.458578] RIP: 0010:[]  [] 
device_del+0x80/0x700
[   36.467431] RSP: :880852887938  EFLAGS: 00010246
[   36.473357] RAX:  RBX:  RCX: 110109e6f101
[   36.481319] RDX: dc00 RSI: 000b RDI: 
[   36.489281] RBP: 8808528879e8 R08: 0001 R09: 
[   36.497243] R10:  R11:  R12: 880e501b4b00
[   36.505208] R13: 880e31988480 R14: 0001 R15: 880e31988480
[   36.513171] FS:  () GS:88085ec8() 
knlGS:
[   36.522201] CS:  0010 DS:  ES:  CR0: 80050033
[   36.528613] CR2:  CR3: 02e0a000 CR4: 003406e0
[   36.536576] DR0:  DR1:  DR2: 
[   36.544537] DR3:  DR6: fffe0ff0 DR7: 0400
[   36.552499] Stack:
[   36.554742]  11010a510f28 11010a510f2c 82d3abe4 
81a6d060
[   36.563037]  0296 41b58ab3 82d48cc5 
81ea0840
[   36.571329]  828a3040 8808 880852887980 
82f0ba20
[   36.579624] Call Trace:
[   36.582355]  [] ? idr_mark_full+0xc0/0xc0
[   36.588573]  [] ? cleanup_glue_dir+0xe0/0xe0
[   36.595086]  [] perf_pmu_unregister+0x18d/0x530
[   36.601890]  [] ? _raw_spin_unlock+0x31/0x50
[   36.608393]  [] ? uncore_pcibus_to_physid+0x10e/0x1c0
[   36.615766]  [] uncore_pci_remove+0x24e/0x440
[   36.622375]  [] pci_device_remove+0xa2/0x1e0
[   36.62]  [] driver_probe_device+0x171/0xd50
[   36.635688]  [] ? driver_probe_device+0xd50/0xd50
[   36.642685]  [] __driver_attach+0x199/0x1e0
[   36.649097]  [] bus_for_each_dev+0x126/0x1e0
[   36.655607]  [] ? subsys_dev_iter_exit+0x10/0x10
[   36.662508]  [] ? preempt_count_sub+0x5e/0xe0
[   36.669105]  [] driver_attach+0x3d/0x50
[   36.675129]  [] bus_add_driver+0x554/0x790
[   36.681444]  [] driver_register+0x18c/0x3b0
[   36.687861]  [] ? __raw_spin_lock_init+0x32/0x100
[   36.694854]  [] __pci_register_driver+0x13a/0x1e0
[   36.701853]  [] intel_uncore_init+0x465/0x54f
[   36.708459]  [] ? uncore_type_init+0x4d6/0x4d6
[   36.715165]  [] do_one_initcall+0xa9/0x240
[   36.721473]  [] ? initcall_blacklisted+0x180/0x180
[   36.728568]  [] ? parse_args+0x520/0x990
[   36.734692]  [] ? 
__usermodehelper_set_disable_depth+0x42/0x50
[   36.742948]  [] kernel_init_freeable+0x540/0x610
[   36.749845]  [] ? start_kernel+0x70d/0x70d
[   36.756161]  [] ? _raw_spin_unlock_irq+0x3d/0x60
[   36.763060]  [] ? finish_task_switch+0x189/0x6c0
[   36.769957]  [] ? finish_task_switch+0x15b/0x6c0
[   36.776857]  [] ? rest_init+0x160/0x160
[   36.782875]  [] kernel_init+0x13/0x120
[   36.788802]  [] ? rest_init+0x160/0x160
[   36.794826]  [] ret_from_fork+0x2a/0x40
[   36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 
f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 
10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   36.822549] RIP  [] device_del+0x80/0x700
[   36.828778]  RSP 
[   36.832743] ---[ end trace f3cec3a0c6cb2258 ]---
[   36.838054] Kernel panic - not syncing: Fatal exception
[   36.843967] ---[ end Kernel panic - not syncing: Fatal exception

kasan inline + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic

2016-10-10 Thread CAI Qian

Not sure if anyone reported this before. With this kernel config, it is 100% 
kernel panic so far with today's
mainline master HEAD.

http://people.redhat.com/qcai/tmp/config-kasan-remove

[   36.318420] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   36.325626] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at 
[880071c7d000-880075c7cfff]
[   36.339108] Intel CQM monitoring enabled
[   36.343507] Intel MBM enabled
[   36.358713] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms 
ovfl timer
[   36.367563] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[   36.373984] RAPL PMU: hw unit of domain package 2^-14 Joules
[   36.380308] RAPL PMU: hw unit of domain dram 2^-14 Joules
[   36.386337] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[   36.410064] kasan: CONFIG_KASAN_INLINE enabled
[   36.415042] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[   36.423111] general protection fault:  [#1] PREEMPT SMP KASAN
[   36.429911] Modules linked in:
[   36.41] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 4.8.0remove+ #4
[   36.440616] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[   36.451974] task: 880e524d task.stack: 88085288
[   36.458578] RIP: 0010:[]  [] 
device_del+0x80/0x700
[   36.467431] RSP: :880852887938  EFLAGS: 00010246
[   36.473357] RAX:  RBX:  RCX: 110109e6f101
[   36.481319] RDX: dc00 RSI: 000b RDI: 
[   36.489281] RBP: 8808528879e8 R08: 0001 R09: 
[   36.497243] R10:  R11:  R12: 880e501b4b00
[   36.505208] R13: 880e31988480 R14: 0001 R15: 880e31988480
[   36.513171] FS:  () GS:88085ec8() 
knlGS:
[   36.522201] CS:  0010 DS:  ES:  CR0: 80050033
[   36.528613] CR2:  CR3: 02e0a000 CR4: 003406e0
[   36.536576] DR0:  DR1:  DR2: 
[   36.544537] DR3:  DR6: fffe0ff0 DR7: 0400
[   36.552499] Stack:
[   36.554742]  11010a510f28 11010a510f2c 82d3abe4 
81a6d060
[   36.563037]  0296 41b58ab3 82d48cc5 
81ea0840
[   36.571329]  828a3040 8808 880852887980 
82f0ba20
[   36.579624] Call Trace:
[   36.582355]  [] ? idr_mark_full+0xc0/0xc0
[   36.588573]  [] ? cleanup_glue_dir+0xe0/0xe0
[   36.595086]  [] perf_pmu_unregister+0x18d/0x530
[   36.601890]  [] ? _raw_spin_unlock+0x31/0x50
[   36.608393]  [] ? uncore_pcibus_to_physid+0x10e/0x1c0
[   36.615766]  [] uncore_pci_remove+0x24e/0x440
[   36.622375]  [] pci_device_remove+0xa2/0x1e0
[   36.62]  [] driver_probe_device+0x171/0xd50
[   36.635688]  [] ? driver_probe_device+0xd50/0xd50
[   36.642685]  [] __driver_attach+0x199/0x1e0
[   36.649097]  [] bus_for_each_dev+0x126/0x1e0
[   36.655607]  [] ? subsys_dev_iter_exit+0x10/0x10
[   36.662508]  [] ? preempt_count_sub+0x5e/0xe0
[   36.669105]  [] driver_attach+0x3d/0x50
[   36.675129]  [] bus_add_driver+0x554/0x790
[   36.681444]  [] driver_register+0x18c/0x3b0
[   36.687861]  [] ? __raw_spin_lock_init+0x32/0x100
[   36.694854]  [] __pci_register_driver+0x13a/0x1e0
[   36.701853]  [] intel_uncore_init+0x465/0x54f
[   36.708459]  [] ? uncore_type_init+0x4d6/0x4d6
[   36.715165]  [] do_one_initcall+0xa9/0x240
[   36.721473]  [] ? initcall_blacklisted+0x180/0x180
[   36.728568]  [] ? parse_args+0x520/0x990
[   36.734692]  [] ? 
__usermodehelper_set_disable_depth+0x42/0x50
[   36.742948]  [] kernel_init_freeable+0x540/0x610
[   36.749845]  [] ? start_kernel+0x70d/0x70d
[   36.756161]  [] ? _raw_spin_unlock_irq+0x3d/0x60
[   36.763060]  [] ? finish_task_switch+0x189/0x6c0
[   36.769957]  [] ? finish_task_switch+0x15b/0x6c0
[   36.776857]  [] ? rest_init+0x160/0x160
[   36.782875]  [] kernel_init+0x13/0x120
[   36.788802]  [] ? rest_init+0x160/0x160
[   36.794826]  [] ret_from_fork+0x2a/0x40
[   36.800851] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 00 07 f4 f4 c7 40 08 f3 f3 
f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 89 f8 48 c1 e8 03 <80> 3c 
10 00 0f 85 1a 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 
[   36.822549] RIP  [] device_del+0x80/0x700
[   36.828778]  RSP 
[   36.832743] ---[ end trace f3cec3a0c6cb2258 ]---
[   36.838054] Kernel panic - not syncing: Fatal exception
[   36.843967] ---[ end Kernel panic - not syncing: Fatal exception

Re: [RFC] autokdump - automated kdump testsuite

2014-09-22 Thread CAI Qian



- Original Message -
> From: "Vivek Goyal" 
> To: "CAI Qian" 
> Cc: "ltp-list" sourceforge.net>, "kexec kdump redhat mailing list" 
> ,
> "kexec" , "linux-kernel" vger.kernel.org>, 
> "crash-utility"
> redhat.com>
> Sent: Monday, September 22, 2014 10:47:13 PM
> Subject: Re: [RFC] autokdump - automated kdump testsuite
> 
> On Mon, Sep 22, 2014 at 09:00:00AM -0400, CAI Qian wrote:
> > 
> > 
> > - Original Message -
> > > From: "Vivek Goyal" 
> > > To: "CAI Qian" 
> > > Cc: "linux-kernel" vger.kernel.org>, "ltp-list"
> > > sourceforge.net>, "crash-utility"
> > > redhat.com>, "kexec" , "kexec
> > > kdump redhat mailing list"
> > > 
> > > Sent: Friday, September 19, 2014 9:22:36 PM
> > > Subject: Re: [RFC] autokdump - automated kdump testsuite
> > > 
> > > On Fri, Sep 19, 2014 at 05:52:25AM -0400, CAI Qian wrote:
> > > > I plan to release an automated kdump testsuite that will be
> > > 
> > > So will this be a standalone test suit? Can it be merged with
> > > something already existing say, LTP.
> > Yes, it is likely to be standalone. It won't make use of the LTP
> > API, and the LTP kdump test suite is outdated, so there is no
> > benefit to continue working over there.
> 
> So why make it standalone and not replace the old LTP kdump test suite
> with this new one?
It will be totally a rewrite from scratch, and will have no direct relationship
with the rest of the LTP.
> 
> > > 
> > > > focus on testing kernel and the crash utility. It should work
> > > > for all major distros since it will use none of distro-specific
> > > > stuff, and also support different arches including x86, ARM,
> > > > PPC64 and s390x.
> > > > 
> > > > It does the following:
> > > > 1) check if there is a memory reserved for kdump. If not,
> > > >reserve the memory and reboot the system.
> > > > 2) once the system is back, load kexec on panic and
> > > >prepare a separate initramfs that including needed
> > > >modules to load a local filesystem and necessary utilities
> > > 
> > > So you will write logic to prepare custom initramfs or will rely
> > > on dracut or some other utility for that.
> > I'll probably prepare custom initramfs for the sake of simplicity.
> 
> Well, preparing custom initramfs will become very tricky. We used
> to do that and finally we switched to dracut.
> 
> Why not simply let the respective service on the host do this job and
> test only makes sure that kdump service is running. It feels little
> out of place that a test is generating custom initramfs.
Because not every distro will have a kdump service like Fedora.
> 
> > > 
> > > >in order to analyse /proc/vmcore in the 2nd kernel.
> > > > 3) trigger the system crash using methods like sysrq-c, NMI,
> > > >and panic_on_hung_task etc.
> > > > 4) in the 2nd kernel, mount a filesystem and use the crash
> > > >utility to analyse /proc/vmcore. Then, gather the analyse
> > > >logs, serial console output, dmesg etc into the filesystem.
> > > 
> > > Why not save core and boot back in first kernel and then analyze.
> > > 
> > > Trying to work directly with /proc/vmcore does not test makedumfile
> > > which everybody uses. Also it will require more memory to be reserved
> > > and packing crash and debug vmlinux into initramfs.
> > The additional memory for vmlinux and the crash utility is predictable
> > and manageable, so it can just ask 256M memory reserved before running
> > the program. On the other hand, it is not usually feasible to ask
> > the systems under testing has enough available disk spaces bigger than
> > the memory size.
> 
> makedumpfile will reduce the vmcore file size to few hundreds of mega
> bytes on most of the systems. Especially, this is just a test, so
> system will be lightly loaded and vmcore will be small after filtering.
It probably actually have test cases to heavily loaded the memory before
dumping.
> 
> If there is not enough space, test fails, period. I don't think there
> is any need to try to circumvent that and try to run crash in initramfs.
> And in the process we don't test makedumpfile which is very imporatnt
> component of this whole process.
Test makedumpfile is in plan. Tests failed because of lack of disk space
is a testsuite d

Re: [RFC] autokdump - automated kdump testsuite

2014-09-22 Thread CAI Qian



- Original Message -
> From: "Vivek Goyal" 
> To: "CAI Qian" 
> Cc: "linux-kernel" , "ltp-list" 
> , "crash-utility"
> , "kexec" , "kexec kdump 
> redhat mailing list"
> 
> Sent: Friday, September 19, 2014 9:22:36 PM
> Subject: Re: [RFC] autokdump - automated kdump testsuite
> 
> On Fri, Sep 19, 2014 at 05:52:25AM -0400, CAI Qian wrote:
> > I plan to release an automated kdump testsuite that will be
> 
> So will this be a standalone test suit? Can it be merged with
> something already existing say, LTP.
Yes, it is likely to be standalone. It won't make use of the LTP
API, and the LTP kdump test suite is outdated, so there is no
benefit to continue working over there.
> 
> > focus on testing kernel and the crash utility. It should work
> > for all major distros since it will use none of distro-specific
> > stuff, and also support different arches including x86, ARM,
> > PPC64 and s390x.
> > 
> > It does the following:
> > 1) check if there is a memory reserved for kdump. If not,
> >reserve the memory and reboot the system.
> > 2) once the system is back, load kexec on panic and
> >prepare a separate initramfs that including needed
> >modules to load a local filesystem and necessary utilities
> 
> So you will write logic to prepare custom initramfs or will rely
> on dracut or some other utility for that.
I'll probably prepare custom initramfs for the sake of simplicity.
> 
> >in order to analyse /proc/vmcore in the 2nd kernel.
> > 3) trigger the system crash using methods like sysrq-c, NMI,
> >and panic_on_hung_task etc.
> > 4) in the 2nd kernel, mount a filesystem and use the crash
> >utility to analyse /proc/vmcore. Then, gather the analyse
> >logs, serial console output, dmesg etc into the filesystem.
> 
> Why not save core and boot back in first kernel and then analyze.
> 
> Trying to work directly with /proc/vmcore does not test makedumfile
> which everybody uses. Also it will require more memory to be reserved
> and packing crash and debug vmlinux into initramfs.
The additional memory for vmlinux and the crash utility is predictable
and manageable, so it can just ask 256M memory reserved before running
the program. On the other hand, it is not usually feasible to ask
the systems under testing has enough available disk spaces bigger than
the memory size.
   CAI Qian
> 
> I think being able to test makedumpfile also is the key here.
> 
> Thanks
> Vivek
> > 5) reboot back into the 1st kernel.
> > 
> > implementation:
> > It will setup a daemon to handle reboots.
> > 
> > plan:
> > I might also to test the makedumpfile all together later.
> >CAI Qian
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] autokdump - automated kdump testsuite

2014-09-22 Thread CAI Qian

- Original Message -
 From: Vivek Goyal vgo...@redhat.com
 To: CAI Qian caiq...@redhat.com
 Cc: linux-kernel linux-kernel@vger.kernel.org, ltp-list 
 ltp-l...@lists.sourceforge.net, crash-utility
 crash-util...@redhat.com, kexec ke...@lists.infradead.org, kexec kdump 
 redhat mailing list
 kexec-kdump-l...@redhat.com
 Sent: Friday, September 19, 2014 9:22:36 PM
 Subject: Re: [RFC] autokdump - automated kdump testsuite

 On Fri, Sep 19, 2014 at 05:52:25AM -0400, CAI Qian wrote:
  I plan to release an automated kdump testsuite that will be

 So will this be a standalone test suit? Can it be merged with
 something already existing say, LTP.
Yes, it is likely to be standalone. It won't make use of the LTP
API, and the LTP kdump test suite is outdated, so there is no
benefit to continue working over there.

  focus on testing kernel and the crash utility. It should work
  for all major distros since it will use none of distro-specific
  stuff, and also support different arches including x86, ARM,
  PPC64 and s390x.

  It does the following:
  1) check if there is a memory reserved for kdump. If not,
 reserve the memory and reboot the system.
  2) once the system is back, load kexec on panic and
 prepare a separate initramfs that including needed
 modules to load a local filesystem and necessary utilities

 So you will write logic to prepare custom initramfs or will rely
 on dracut or some other utility for that.
I'll probably prepare custom initramfs for the sake of simplicity.

 in order to analyse /proc/vmcore in the 2nd kernel.
  3) trigger the system crash using methods like sysrq-c, NMI,
 and panic_on_hung_task etc.
  4) in the 2nd kernel, mount a filesystem and use the crash
 utility to analyse /proc/vmcore. Then, gather the analyse
 logs, serial console output, dmesg etc into the filesystem.

 Why not save core and boot back in first kernel and then analyze.

 Trying to work directly with /proc/vmcore does not test makedumfile
 which everybody uses. Also it will require more memory to be reserved
 and packing crash and debug vmlinux into initramfs.
The additional memory for vmlinux and the crash utility is predictable
and manageable, so it can just ask 256M memory reserved before running
the program. On the other hand, it is not usually feasible to ask
the systems under testing has enough available disk spaces bigger than
the memory size.
   CAI Qian

 I think being able to test makedumpfile also is the key here.

 Thanks
 Vivek
  5) reboot back into the 1st kernel.

  implementation:
  It will setup a daemon to handle reboots.

  plan:
  I might also to test the makedumpfile all together later.
 CAI Qian

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] autokdump - automated kdump testsuite

2014-09-22 Thread CAI Qian

- Original Message -
 From: Vivek Goyal vgo...@redhat.com
 To: CAI Qian caiq...@redhat.com
 Cc: ltp-list sourceforge.net, kexec kdump redhat mailing list 
 kexec-kdump-l...@redhat.com,
 kexec ke...@lists.infradead.org, linux-kernel vger.kernel.org, 
 crash-utility
 redhat.com
 Sent: Monday, September 22, 2014 10:47:13 PM
 Subject: Re: [RFC] autokdump - automated kdump testsuite

 On Mon, Sep 22, 2014 at 09:00:00AM -0400, CAI Qian wrote:

  - Original Message -
   From: Vivek Goyal vgo...@redhat.com
   To: CAI Qian caiq...@redhat.com
   Cc: linux-kernel vger.kernel.org, ltp-list
   sourceforge.net, crash-utility
   redhat.com, kexec ke...@lists.infradead.org, kexec
   kdump redhat mailing list
   kexec-kdump-l...@redhat.com
   Sent: Friday, September 19, 2014 9:22:36 PM
   Subject: Re: [RFC] autokdump - automated kdump testsuite

   On Fri, Sep 19, 2014 at 05:52:25AM -0400, CAI Qian wrote:
I plan to release an automated kdump testsuite that will be

   So will this be a standalone test suit? Can it be merged with
   something already existing say, LTP.
  Yes, it is likely to be standalone. It won't make use of the LTP
  API, and the LTP kdump test suite is outdated, so there is no
  benefit to continue working over there.

 So why make it standalone and not replace the old LTP kdump test suite
 with this new one?
It will be totally a rewrite from scratch, and will have no direct relationship
with the rest of the LTP.

focus on testing kernel and the crash utility. It should work
for all major distros since it will use none of distro-specific
stuff, and also support different arches including x86, ARM,
PPC64 and s390x.

It does the following:
1) check if there is a memory reserved for kdump. If not,
   reserve the memory and reboot the system.
2) once the system is back, load kexec on panic and
   prepare a separate initramfs that including needed
   modules to load a local filesystem and necessary utilities

   So you will write logic to prepare custom initramfs or will rely
   on dracut or some other utility for that.
  I'll probably prepare custom initramfs for the sake of simplicity.

 Well, preparing custom initramfs will become very tricky. We used
 to do that and finally we switched to dracut.

 Why not simply let the respective service on the host do this job and
 test only makes sure that kdump service is running. It feels little
 out of place that a test is generating custom initramfs.
Because not every distro will have a kdump service like Fedora.

   in order to analyse /proc/vmcore in the 2nd kernel.
3) trigger the system crash using methods like sysrq-c, NMI,
   and panic_on_hung_task etc.
4) in the 2nd kernel, mount a filesystem and use the crash
   utility to analyse /proc/vmcore. Then, gather the analyse
   logs, serial console output, dmesg etc into the filesystem.

   Why not save core and boot back in first kernel and then analyze.

   Trying to work directly with /proc/vmcore does not test makedumfile
   which everybody uses. Also it will require more memory to be reserved
   and packing crash and debug vmlinux into initramfs.
  The additional memory for vmlinux and the crash utility is predictable
  and manageable, so it can just ask 256M memory reserved before running
  the program. On the other hand, it is not usually feasible to ask
  the systems under testing has enough available disk spaces bigger than
  the memory size.

 makedumpfile will reduce the vmcore file size to few hundreds of mega
 bytes on most of the systems. Especially, this is just a test, so
 system will be lightly loaded and vmcore will be small after filtering.
It probably actually have test cases to heavily loaded the memory before
dumping.

 If there is not enough space, test fails, period. I don't think there
 is any need to try to circumvent that and try to run crash in initramfs.
 And in the process we don't test makedumpfile which is very imporatnt
 component of this whole process.
Test makedumpfile is in plan. Tests failed because of lack of disk space
is a testsuite design problem, and especially problematic on those large
memory systems that we had seen more and more those days.

 IMHO, just rely on systemctl start kdump to generate and load custom
 initramfs and save filtered vmcore to root fs by default and alanyze
 vmcore post reboot. That will keep things simple.
Again, not every major distro has that.
   CAI Qian

 Thanks
 Vivek

 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] autokdump - automated kdump testsuite

2014-09-19 Thread CAI Qian

I plan to release an automated kdump testsuite that will be
focus on testing kernel and the crash utility. It should work
for all major distros since it will use none of distro-specific
stuff, and also support different arches including x86, ARM,
PPC64 and s390x.

It does the following:
1) check if there is a memory reserved for kdump. If not,
   reserve the memory and reboot the system.
2) once the system is back, load kexec on panic and
   prepare a separate initramfs that including needed
   modules to load a local filesystem and necessary utilities
   in order to analyse /proc/vmcore in the 2nd kernel.
3) trigger the system crash using methods like sysrq-c, NMI,
   and panic_on_hung_task etc.
4) in the 2nd kernel, mount a filesystem and use the crash
   utility to analyse /proc/vmcore. Then, gather the analyse
   logs, serial console output, dmesg etc into the filesystem.
5) reboot back into the 1st kernel.

implementation:
It will setup a daemon to handle reboots.

plan:
I might also to test the makedumpfile all together later.
   CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] autokdump - automated kdump testsuite

2014-09-19 Thread CAI Qian

I plan to release an automated kdump testsuite that will be
focus on testing kernel and the crash utility. It should work
for all major distros since it will use none of distro-specific
stuff, and also support different arches including x86, ARM,
PPC64 and s390x.

It does the following:
1) check if there is a memory reserved for kdump. If not,
   reserve the memory and reboot the system.
2) once the system is back, load kexec on panic and
   prepare a separate initramfs that including needed
   modules to load a local filesystem and necessary utilities
   in order to analyse /proc/vmcore in the 2nd kernel.
3) trigger the system crash using methods like sysrq-c, NMI,
   and panic_on_hung_task etc.
4) in the 2nd kernel, mount a filesystem and use the crash
   utility to analyse /proc/vmcore. Then, gather the analyse
   logs, serial console output, dmesg etc into the filesystem.
5) reboot back into the 1st kernel.

implementation:
It will setup a daemon to handle reboots.

plan:
I might also to test the makedumpfile all together later.
   CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-18 Thread CAI Qian



- 原始邮件 -
> 发件人: "H. Peter Anvin" 
> 收件人: "CAI Qian" 
> 抄送: "Steven Rostedt" , "Thomas Gleixner" 
> , "Sarah Sharp"
> , "Linus Torvalds" 
> , "Ingo Molnar" ,
> "Guenter Roeck" , "Greg Kroah-Hartman" 
> , "Dave Jones"
> , "Linux Kernel Mailing List" 
> , "Andrew Morton"
> , "stable" , "Darren Hart" 
> 
> 发送时间: 星期四, 2013年 7 月 18日 下午 1:03:41
> 主题: Re: [ 00/19] 3.10.1-stable review
> 
> On 07/17/2013 09:01 PM, CAI Qian wrote:
> > 
> > Please don't get me wrong. I did neither compare Linus to those child
> > abusers
> > nor Thomas to those children. I simply pointed out there is also some
> > common
> > sense need to consider.
> >
> 
> Actually, you did.
I am sorry to mislead you feeling that way, hpa.
> 
>   -hpa
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-18 Thread CAI Qian



- 原始邮件 -
 发件人: H. Peter Anvin h...@zytor.com
 收件人: CAI Qian caiq...@redhat.com
 抄送: Steven Rostedt rost...@goodmis.org, Thomas Gleixner 
 t...@linutronix.de, Sarah Sharp
 sarah.a.sh...@linux.intel.com, Linus Torvalds 
 torva...@linux-foundation.org, Ingo Molnar mi...@kernel.org,
 Guenter Roeck li...@roeck-us.net, Greg Kroah-Hartman 
 gre...@linuxfoundation.org, Dave Jones
 da...@redhat.com, Linux Kernel Mailing List 
 linux-kernel@vger.kernel.org, Andrew Morton
 a...@linux-foundation.org, stable sta...@vger.kernel.org, Darren Hart 
 dvh...@linux.intel.com
 发送时间: 星期四, 2013年 7 月 18日 下午 1:03:41
 主题: Re: [ 00/19] 3.10.1-stable review
 
 On 07/17/2013 09:01 PM, CAI Qian wrote:
  
  Please don't get me wrong. I did neither compare Linus to those child
  abusers
  nor Thomas to those children. I simply pointed out there is also some
  common
  sense need to consider.
 
 
 Actually, you did.
I am sorry to mislead you feeling that way, hpa.
 
   -hpa
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian



- Original Message -
> From: "Steven Rostedt" 
> To: "CAI Qian" 
> Cc: "Thomas Gleixner" , "Sarah Sharp" 
> , "Linus Torvalds"
> , "Ingo Molnar" , "Guenter 
> Roeck" , "Greg
> Kroah-Hartman" , "Dave Jones" , 
> "Linux Kernel Mailing List"
> , "Andrew Morton" , 
> "stable" ,
> "Darren Hart" 
> Sent: Thursday, July 18, 2013 11:47:34 AM
> Subject: Re: [ 00/19] 3.10.1-stable review
> 
> On Wed, 2013-07-17 at 23:16 -0400, CAI Qian wrote:
> 
> > > So if you talk about abuse, then you need an abuser and a victim. So
> > > your argumentation falls flat because there is no victim.
> > Could victim be someone else in the future since it is an example that
> > people may follow?
> > http://en.wikipedia.org/wiki/Silvio_Berlusconi_underage_prostitution_charges
> > It called "abuse of office" or abuse of the power.
> 
> Wow! You are now comparing Linus to a Prime Minister that has paid
> underage prostitutes for sex?
> 
> That's pretty low.
> 
> What Linus does is not an abuse of power, it's a protection of his baby.
> He created Linux, and although today he's not the one writing the code,
> he is ultimately the front man responsible for the kernel.
Surely Linus has great responsibility, but isn't that every powerful 
person/organizatio
could tell the same story? Berlusconi has a country to take care of; Jimmy 
Savile has a
television kingdom to manage; NSA needs to protect world peace etc.
> 
> Think about it. If Linux does something horrible, Linus is the one that
> takes the most blame. That's a HUGE responsibility. Linus has the most
> to lose if Linux becomes crap.
> 
> Not only does Linus have to check on code, he must also dictate policy.
> Which means dealing with different people, and how they work. If someone
> gets lazy and uses his trust to get something whacky in, Linus takes the
> blame for it if that happens. Thus, to prevent people from taking
> advantage of his trust, he has to be hard on them to make sure he can
> keep their trust.
> 
> Linus takes his job seriously. He may joke and name his kernel after
> 90's operating systems, but that's just to make the job more fun. But to
> keep the job, he needs to be a hard ass.
> 
> The few times he's yelled at me, he always did it with a bit of comedy
> and wit. That makes the harsh yelling not so bad, and I actually got a
> chuckle out of it. But I also took the harsh yelling in a way that I had
> better not do that again.
> 
> This is the big leagues folks. You think major league baseball managers
> are nice to their players?
> 
> "You just walked 4 players. That's not good. Keep this up I'll have to
> take you out off the team".
> 
>   vs
> 
> "What the f*ck is wrong with you. Get you head out of your @ss and start
> throwing the ball over the God damn plate before I throw your @ss out of
> this field!"
> 
> They both relay basically the same thing. The first one is nice and
> polite but states that bad things will happen if they keep it up. The
> second is quite harsh (although never calling the person a name), and
> will probably wake the person up and change his game. Which one of those
> tones do you think successful baseball managers use?
> 
> Sometimes tone *does* matter. You want quality from the top maintainers,
> and they start to slack, you can't just treat them like this is a grade
> school sport. Results matter. You want them to understand that this is
> serious and cursing someone out gives that person that feeling.
> 
> -- Steve
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian



- Original Message -
> From: "Steven Rostedt" 
> To: "CAI Qian" 
> Cc: "Thomas Gleixner" , "Sarah Sharp" 
> , "Linus Torvalds"
> , "Ingo Molnar" , "Guenter 
> Roeck" , "Greg
> Kroah-Hartman" , "Dave Jones" , 
> "Linux Kernel Mailing List"
> , "Andrew Morton" , 
> "stable" ,
> "Darren Hart" 
> Sent: Thursday, July 18, 2013 11:47:34 AM
> Subject: Re: [ 00/19] 3.10.1-stable review
> 
> On Wed, 2013-07-17 at 23:16 -0400, CAI Qian wrote:
> 
> > > So if you talk about abuse, then you need an abuser and a victim. So
> > > your argumentation falls flat because there is no victim.
> > Could victim be someone else in the future since it is an example that
> > people may follow?
> > http://en.wikipedia.org/wiki/Silvio_Berlusconi_underage_prostitution_charges
> > It called "abuse of office" or abuse of the power.
> 
> Wow! You are now comparing Linus to a Prime Minister that has paid
> underage prostitutes for sex?
I apologize that this leads to misunderstanding. It was just happened to
read the news that underage child does not feel like she is a victim
either while the law still think that is an abuse. Another example, those
BBC child abusers took ages to track down that probably because those
children did not feel victims at that time either. 

Please don't get me wrong. I did neither compare Linus to those child abusers
nor Thomas to those children. I simply pointed out there is also some common
sense need to consider.
> 
> That's pretty low.
> 
> What Linus does is not an abuse of power, it's a protection of his baby.
> He created Linux, and although today he's not the one writing the code,
> he is ultimately the front man responsible for the kernel.
> 
> Think about it. If Linux does something horrible, Linus is the one that
> takes the most blame. That's a HUGE responsibility. Linus has the most
> to lose if Linux becomes crap.
> 
> Not only does Linus have to check on code, he must also dictate policy.
> Which means dealing with different people, and how they work. If someone
> gets lazy and uses his trust to get something whacky in, Linus takes the
> blame for it if that happens. Thus, to prevent people from taking
> advantage of his trust, he has to be hard on them to make sure he can
> keep their trust.
> 
> Linus takes his job seriously. He may joke and name his kernel after
> 90's operating systems, but that's just to make the job more fun. But to
> keep the job, he needs to be a hard ass.
> 
> The few times he's yelled at me, he always did it with a bit of comedy
> and wit. That makes the harsh yelling not so bad, and I actually got a
> chuckle out of it. But I also took the harsh yelling in a way that I had
> better not do that again.
> 
> This is the big leagues folks. You think major league baseball managers
> are nice to their players?
> 
> "You just walked 4 players. That's not good. Keep this up I'll have to
> take you out off the team".
> 
>   vs
> 
> "What the f*ck is wrong with you. Get you head out of your @ss and start
> throwing the ball over the God damn plate before I throw your @ss out of
> this field!"
> 
> They both relay basically the same thing. The first one is nice and
> polite but states that bad things will happen if they keep it up. The
> second is quite harsh (although never calling the person a name), and
> will probably wake the person up and change his game. Which one of those
> tones do you think successful baseball managers use?
> 
> Sometimes tone *does* matter. You want quality from the top maintainers,
> and they start to slack, you can't just treat them like this is a grade
> school sport. Results matter. You want them to understand that this is
> serious and cursing someone out gives that person that feeling.
> 
> -- Steve
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian



- Original Message -
> From: "Thomas Gleixner" 
> To: "Sarah Sharp" 
> Cc: "Linus Torvalds" , "Ingo Molnar" 
> , "Guenter Roeck"
> , "Greg Kroah-Hartman" , 
> "Steven Rostedt" ,
> "Dave Jones" , "Linux Kernel Mailing List" 
> , "Andrew Morton"
> , "stable" , "Darren Hart" 
> 
> Sent: Thursday, July 18, 2013 8:42:16 AM
> Subject: Re: [ 00/19] 3.10.1-stable review
> 
> On Mon, 15 Jul 2013, Sarah Sharp wrote:
> > On Mon, Jul 15, 2013 at 12:07:56PM -0700, Linus Torvalds wrote:
> > > On Mon, Jul 15, 2013 at 11:46 AM, Sarah Sharp
> > >  wrote:
> > > >
> > > > Bullshit.  I've seen you be polite, and explain to clueless maintainers
> > > > why there's no way you can revert their merge that caused regressions,
> > > > and ask them to fit it without resorting to tearing them down
> > > > emotionally:
> > > 
> > > Oh, I'll be polite when it's called for.
> > > 
> > > But when people who know better send me crap, I'll curse at them.
> > > 
> > > I suspect you'll notice me cursing *way* more at top developers than
> > > random people on the list. I expect more from them, and conversely
> > > I'll be a lot more upset when they do something that I really think
> > > was not great.
> > > 
> > > For example, my latest cursing explosion was for the x86 maintainers,
> > > and it comes from the fact that I *know* they know to do better. The
> > > x86 tip pulls have generally been through way more testing than most
> > > other pulls I get (not just compiling, but even booting randconfigs
> > > etc). So when an x86 pull request comes in that clearly missed that
> > > expected level of quality, I go to town.
> > >
> > Good lord.  So anyone that is one of your "top maintainers" could be
> > exposed to your verbal abuse just because they "should have known
> > better"?
> 
> I'm one of the "victims" of Linus' latest "verbal abuse". :)
>  
> Just for the record. I got grilled by Linus several times over the
> last years and I can't remember a single instance where it was
> unjustified. When I see such a mail in my inbox, I know that I fucked
> up royally and all I do is to figure out what I broke this time and
> fix it. I don't give a rat's ass about his "abusive" language. See
> below.
> 
> > exposed to your verbal abuse just because they "should have known
> > better"?
> 
> You know what "should have known better" stands for?
> 
> It stands for violating trust.
> 
> Linus simply has to trusts his top level maintainers, because he
> cannot review, audit and check 10k patches which flow into his tree
> every merge window himself.
> 
> So if he finds out that someone who has his ultimate trust sends him a
> pile of crap, he tells that person in his own unmisunderstandable way
> that he's not amused.
> 
> > You know what the definition of an abuser is?  Someone that seeks out
> > victims that they know will "just take it" and keep the abuse "between
> > the two of them".  They pick victims that won't fight back or report the
> > abuse.
> 
> IOW, I'm a typical victim of abuse.
> 
> Let me clarify that.
> 
> The person who gets away with picking me for this kind of abuse has
> not been born yet. And Linus knows very well, that he gets the full
> pack back from me (in some different form of "abusive language") if he
> yelled at me for no reason. It's documented out there including his
> apologies.
> 
> So if you talk about abuse, then you need an abuser and a victim. So
> your argumentation falls flat because there is no victim.
Could victim be someone else in the future since it is an example that
people may follow?
http://en.wikipedia.org/wiki/Silvio_Berlusconi_underage_prostitution_charges
It called "abuse of office" or abuse of the power.
> 
> I do not care about his swear words and rants at all, because I know
> that it makes him feel better.
> 
>  That's a cultural thing.
> 
> Where I grew up it's part of the culture to explode, let off steam and
> then go and have a beer together. I strongly believe this prevents
> gastric ulcer and keeps you honest. Linus and I have this kind of
> relationship. We respect each other, we trust each other and when one
> side fucks up we yell at each other and then meet at the bar for a
> drink.
> 
> Linus did NOT abuse me in his latest rant. He simply told me in a very
> strong language that he's grumpy because I violated his trust. And
> that's legitimate. It's also legitimate to do that in public because
> it documents that the top level maintainers are not impeccable. And it
> sets a clear expectation bar for those who want to become maintainers
> of any level.
> 
> Aside of that I completely agree with Linus, that this policital
> correctness crusades are merily creating more subtle and hard to fight
> forms of real abuse.
> 
> I observe that every other day in big corporates, which have written
> down code of conducts and a gazillion of rules for interaction; they
> just foster dishonesty and other fallacies.
> 
> I really prefer the honest slap from Linus than dealing with

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-17 Thread CAI Qian



- Original Message -
> From: "Sarah Sharp" 
> To: "CAI Qian" 
> Cc: "Trond Myklebust" , "Ric Wheeler" 
> , "David Lang"
> , ksummit-2013-disc...@lists.linuxfoundation.org, "Greg 
> Kroah-Hartman" ,
> "Darren Hart" , "Ingo Molnar" , 
> "Olivier Galibert" ,
> "Linux Kernel Mailing List" , "stable" 
> , "Linus Torvalds"
> , "Willy Tarreau" 
> Sent: Wednesday, July 17, 2013 10:48:49 PM
> Subject: Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML
> 
> On Wed, Jul 17, 2013 at 03:36:36AM -0400, CAI Qian wrote:
> > > On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
> > > > On 07/16/2013 07:12 PM, Sarah Sharp wrote:
> > > > > On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
> > > > >> On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:
> > > > > In order to make our community better, we need to figure out where
> > > > > the
> > > > > baseline of "good" behavior is.  We need to define what behavior we
> > > > > want
> > > > > from both maintainers and patch submitters.  E.g. "No regressions"
> > > > > and
> > > > > "don't break userspace" and "no personal attacks".  That needs to be
> > > > > written down somewhere, and it isn't.  If it's documented somewhere,
> > > > > point me to the file in Documentation.  Hint: it's not there.
> > > > >
> > > > > That is the problem.
> > > > >
> > > > > Sarah Sharp
> > > > 
> > > > The problem you are pointing out - and it is a problem - makes us less
> > > > effective
> > > > as a community.
> > > 
> > > Not really. Most of the people who already work as part of this
> > > community are completely used to it. We've created the environment, and
> > > have no problems with it.
> > > 
> > > Where it could possibly be a problem is when it comes to recruiting
> > > _new_ members to our community. Particularly so given that some
> > > journalists take a special pleasure in reporting particularly juicy
> > > comments and antics. That would tend to scare off a lot of gun-shy
> > > newbies.
> > >
> > > On the other hand, it might tend to bias our recruitment toward people
> > > of a more "special" disposition. Perhaps we finally need the services of
> > > a social scientist to help us find out...
> >
> > Does that sound like there are not going to have enough direct/thick skin
> > new kernel developers around to maintain the future Linux community? Maybe
> > just need a better pipeline for people comfortable for this culture?
> 
> No, we don't need a better pipeline for people who can "put up with
> shit".  We need a better pipeline for people who can work together
> civilly, and still get shit done.
> 
> I'm working on getting a pipeline of women into kernel development,
> through the FOSS Outreach Program for Women.  They slowly get introduced
> to Linux development culture, starting with a very friendly separate
> mailing list and IRC channel, and finally moving to work with a kernel
> mentor on a bigger project on the main Linux kernel development lists.
> We have seven women participating this round, and I suspect we'll have
> even more the next round.
> 
> So deal with it.  You're going to have a lot more women in the kernel
> community, and not all of them will be willing to put up with verbal
> abuse.  If you want to attract top talent that also happen to be women
> or racial minorities, the verbal abuse needs to stop.
Maybe we need something like this?
http://us.battle.net/en/community/conduct
> 
> Sarah Sharp
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-17 Thread CAI Qian



- Original Message -
> From: "Trond Myklebust" 
> To: "Ric Wheeler" 
> Cc: "Sarah Sharp" , "David Lang" 
> ,
> ksummit-2013-disc...@lists.linuxfoundation.org, "Greg Kroah-Hartman" 
> , "Darren Hart"
> , "Ingo Molnar" , "Olivier 
> Galibert" , "Linux Kernel
> Mailing List" , "stable" 
> , "Linus Torvalds"
> , "Willy Tarreau" 
> Sent: Wednesday, July 17, 2013 7:53:30 AM
> Subject: Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML
> 
> On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
> > On 07/16/2013 07:12 PM, Sarah Sharp wrote:
> > > On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
> > >> On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:
> > >>
> > >>> Yes, that's true.  Some kernel developers are better at moderating
> > >>> their
> > >>> comments and tone towards individuals who are "sensitive".  Others
> > >>> simply don't give a shit.  So we need to figure out how to meet
> > >>> somewhere in the middle, in order to establish a baseline of civility.
> > >> I have to ask this because I'm thick, and don't really understand,
> > >> but ...
> > >>
> > >> What problem exactly are we trying to solve here?
> > > Personal attacks are not cool Steve.  Some people simply don't care if a
> > > verbal tirade is directed at them.  Others do not want anyone to attack
> > > them personally, but they're fine with people attacking their code.
> > >
> > > Bystanders that don't understand the kernel community structure are
> > > discouraged from contributing because they don't want to be verbally
> > > abused, and they really don't want to see either personal attacks or
> > > intense belittling, demeaning comments about code.
> > >
> > > In order to make our community better, we need to figure out where the
> > > baseline of "good" behavior is.  We need to define what behavior we want
> > > from both maintainers and patch submitters.  E.g. "No regressions" and
> > > "don't break userspace" and "no personal attacks".  That needs to be
> > > written down somewhere, and it isn't.  If it's documented somewhere,
> > > point me to the file in Documentation.  Hint: it's not there.
> > >
> > > That is the problem.
> > >
> > > Sarah Sharp
> > 
> > The problem you are pointing out - and it is a problem - makes us less
> > effective
> > as a community.
> 
> Not really. Most of the people who already work as part of this
> community are completely used to it. We've created the environment, and
> have no problems with it.
> 
> Where it could possibly be a problem is when it comes to recruiting
> _new_ members to our community. Particularly so given that some
> journalists take a special pleasure in reporting particularly juicy
> comments and antics. That would tend to scare off a lot of gun-shy
> newbies.
> On the other hand, it might tend to bias our recruitment toward people
> of a more "special" disposition. Perhaps we finally need the services of
> a social scientist to help us find out...
Does that sound like there are not going to have enough direct/thick skin
new kernel developers around to maintain the future Linux community? Maybe
just need a better pipeline for people comfortable for this culture?
> 
> --
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> trond.mykleb...@netapp.com
> www.netapp.com
> N�r��y���b�X��ǧv�^�)޺{.n�+��z)���w*jgݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian



- Original Message -
> From: "Joe Perches" 
> To: "NeilBrown" 
> Cc: "Steven Rostedt" , "J. Bruce Fields" 
> , "Linus Torvalds"
> , "Sarah Sharp" 
> , "Ingo Molnar" ,
> "Guenter Roeck" , "Greg Kroah-Hartman" 
> , "Dave Jones"
> , "Linux Kernel Mailing List" 
> , "Andrew Morton"
> , "stable" , "Darren Hart" 
> 
> Sent: Tuesday, July 16, 2013 7:50:52 AM
> Subject: Re: [ 00/19] 3.10.1-stable review
> 
> On Tue, 2013-07-16 at 09:42 +1000, NeilBrown wrote:
> > Being "polite" without being "nice" is quite possible.
> > It even has a name:  Diplomacy.
> 
> And we all know how circular/indirect/implied/useless
> some of those diplomatic conversations can be.
Modern human is more diplomatic than ancient barbarians. Will the trend
continue?
> 
> Just remember to bring a 'Big Stick' and don't be shy
> when it's necessary to display it.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian



- Original Message -
 From: Joe Perches j...@perches.com
 To: NeilBrown ne...@suse.de
 Cc: Steven Rostedt rost...@goodmis.org, J. Bruce Fields 
 bfie...@fieldses.org, Linus Torvalds
 torva...@linux-foundation.org, Sarah Sharp 
 sarah.a.sh...@linux.intel.com, Ingo Molnar mi...@kernel.org,
 Guenter Roeck li...@roeck-us.net, Greg Kroah-Hartman 
 gre...@linuxfoundation.org, Dave Jones
 da...@redhat.com, Linux Kernel Mailing List 
 linux-kernel@vger.kernel.org, Andrew Morton
 a...@linux-foundation.org, stable sta...@vger.kernel.org, Darren Hart 
 dvh...@linux.intel.com
 Sent: Tuesday, July 16, 2013 7:50:52 AM
 Subject: Re: [ 00/19] 3.10.1-stable review
 
 On Tue, 2013-07-16 at 09:42 +1000, NeilBrown wrote:
  Being polite without being nice is quite possible.
  It even has a name:  Diplomacy.
 
 And we all know how circular/indirect/implied/useless
 some of those diplomatic conversations can be.
Modern human is more diplomatic than ancient barbarians. Will the trend
continue?
 
 Just remember to bring a 'Big Stick' and don't be shy
 when it's necessary to display it.
 
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-17 Thread CAI Qian

- Original Message -
From: Trond Myklebust trond.mykleb...@netapp.com
To: Ric Wheeler ricwhee...@gmail.com
Cc: Sarah Sharp sarah.a.sh...@linux.intel.com, David Lang
da...@lang.hm,
ksummit-2013-disc...@lists.linuxfoundation.org, Greg Kroah-Hartman
gre...@linuxfoundation.org, Darren Hart
dvh...@linux.intel.com, Ingo Molnar mi...@kernel.org, Olivier
Galibert galib...@pobox.com, Linux Kernel
Mailing List linux-kernel@vger.kernel.org, stable
sta...@vger.kernel.org, Linus Torvalds
torva...@linux-foundation.org, Willy Tarreau w...@1wt.eu
Sent: Wednesday, July 17, 2013 7:53:30 AM
Subject: Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
On 07/16/2013 07:12 PM, Sarah Sharp wrote:
On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:

Yes, that's true. Some kernel developers are better at moderating
their
comments and tone towards individuals who are sensitive. Others
simply don't give a shit. So we need to figure out how to meet
somewhere in the middle, in order to establish a baseline of civility.
I have to ask this because I'm thick, and don't really understand,
but ...

What problem exactly are we trying to solve here?
Personal attacks are not cool Steve. Some people simply don't care if a
verbal tirade is directed at them. Others do not want anyone to attack
them personally, but they're fine with people attacking their code.

Bystanders that don't understand the kernel community structure are
discouraged from contributing because they don't want to be verbally
abused, and they really don't want to see either personal attacks or
intense belittling, demeaning comments about code.

In order to make our community better, we need to figure out where the
baseline of good behavior is. We need to define what behavior we want
from both maintainers and patch submitters. E.g. No regressions and
don't break userspace and no personal attacks. That needs to be
written down somewhere, and it isn't. If it's documented somewhere,
point me to the file in Documentation. Hint: it's not there.

That is the problem.

Sarah Sharp

The problem you are pointing out - and it is a problem - makes us less
effective
as a community.

Not really. Most of the people who already work as part of this
community are completely used to it. We've created the environment, and
have no problems with it.

Where it could possibly be a problem is when it comes to recruiting
_new_ members to our community. Particularly so given that some
journalists take a special pleasure in reporting particularly juicy
comments and antics. That would tend to scare off a lot of gun-shy
newbies.
On the other hand, it might tend to bias our recruitment toward people
of a more special disposition. Perhaps we finally need the services of
a social scientist to help us find out...
Does that sound like there are not going to have enough direct/thick skin
new kernel developers around to maintain the future Linux community? Maybe
just need a better pipeline for people comfortable for this culture?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��y���b�X��ǧv�^�)޺{.n�+��z)���w*jgݢj/���z�ޖ��2�ޙ)ߡ�a�����G���h��j:+v���w�٥
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-17 Thread CAI Qian

- Original Message -
 From: Sarah Sharp sarah.a.sh...@linux.intel.com
 To: CAI Qian caiq...@redhat.com
 Cc: Trond Myklebust trond.mykleb...@netapp.com, Ric Wheeler 
 ricwhee...@gmail.com, David Lang
 da...@lang.hm, ksummit-2013-disc...@lists.linuxfoundation.org, Greg 
 Kroah-Hartman gre...@linuxfoundation.org,
 Darren Hart dvh...@linux.intel.com, Ingo Molnar mi...@kernel.org, 
 Olivier Galibert galib...@pobox.com,
 Linux Kernel Mailing List linux-kernel@vger.kernel.org, stable 
 sta...@vger.kernel.org, Linus Torvalds
 torva...@linux-foundation.org, Willy Tarreau w...@1wt.eu
 Sent: Wednesday, July 17, 2013 10:48:49 PM
 Subject: Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

 On Wed, Jul 17, 2013 at 03:36:36AM -0400, CAI Qian wrote:
   On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
On 07/16/2013 07:12 PM, Sarah Sharp wrote:
 On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
 On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:
 In order to make our community better, we need to figure out where
 the
 baseline of good behavior is.  We need to define what behavior we
 want
 from both maintainers and patch submitters.  E.g. No regressions
 and
 don't break userspace and no personal attacks.  That needs to be
 written down somewhere, and it isn't.  If it's documented somewhere,
 point me to the file in Documentation.  Hint: it's not there.

 That is the problem.

 Sarah Sharp

The problem you are pointing out - and it is a problem - makes us less
effective
as a community.

   Not really. Most of the people who already work as part of this
   community are completely used to it. We've created the environment, and
   have no problems with it.

   Where it could possibly be a problem is when it comes to recruiting
   _new_ members to our community. Particularly so given that some
   journalists take a special pleasure in reporting particularly juicy
   comments and antics. That would tend to scare off a lot of gun-shy
   newbies.

   On the other hand, it might tend to bias our recruitment toward people
   of a more special disposition. Perhaps we finally need the services of
   a social scientist to help us find out...

  Does that sound like there are not going to have enough direct/thick skin
  new kernel developers around to maintain the future Linux community? Maybe
  just need a better pipeline for people comfortable for this culture?

 No, we don't need a better pipeline for people who can put up with
 shit.  We need a better pipeline for people who can work together
 civilly, and still get shit done.

 I'm working on getting a pipeline of women into kernel development,
 through the FOSS Outreach Program for Women.  They slowly get introduced
 to Linux development culture, starting with a very friendly separate
 mailing list and IRC channel, and finally moving to work with a kernel
 mentor on a bigger project on the main Linux kernel development lists.
 We have seven women participating this round, and I suspect we'll have
 even more the next round.

 So deal with it.  You're going to have a lot more women in the kernel
 community, and not all of them will be willing to put up with verbal
 abuse.  If you want to attract top talent that also happen to be women
 or racial minorities, the verbal abuse needs to stop.
Maybe we need something like this?
http://us.battle.net/en/community/conduct

 Sarah Sharp
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian

- Original Message -
 From: Thomas Gleixner t...@linutronix.de
 To: Sarah Sharp sarah.a.sh...@linux.intel.com
 Cc: Linus Torvalds torva...@linux-foundation.org, Ingo Molnar 
 mi...@kernel.org, Guenter Roeck
 li...@roeck-us.net, Greg Kroah-Hartman gre...@linuxfoundation.org, 
 Steven Rostedt rost...@goodmis.org,
 Dave Jones da...@redhat.com, Linux Kernel Mailing List 
 linux-kernel@vger.kernel.org, Andrew Morton
 a...@linux-foundation.org, stable sta...@vger.kernel.org, Darren Hart 
 dvh...@linux.intel.com
 Sent: Thursday, July 18, 2013 8:42:16 AM
 Subject: Re: [ 00/19] 3.10.1-stable review

 On Mon, 15 Jul 2013, Sarah Sharp wrote:
  On Mon, Jul 15, 2013 at 12:07:56PM -0700, Linus Torvalds wrote:
   On Mon, Jul 15, 2013 at 11:46 AM, Sarah Sharp
   sarah.a.sh...@linux.intel.com wrote:

Bullshit.  I've seen you be polite, and explain to clueless maintainers
why there's no way you can revert their merge that caused regressions,
and ask them to fit it without resorting to tearing them down
emotionally:

   Oh, I'll be polite when it's called for.

   But when people who know better send me crap, I'll curse at them.

   I suspect you'll notice me cursing *way* more at top developers than
   random people on the list. I expect more from them, and conversely
   I'll be a lot more upset when they do something that I really think
   was not great.

   For example, my latest cursing explosion was for the x86 maintainers,
   and it comes from the fact that I *know* they know to do better. The
   x86 tip pulls have generally been through way more testing than most
   other pulls I get (not just compiling, but even booting randconfigs
   etc). So when an x86 pull request comes in that clearly missed that
   expected level of quality, I go to town.

  Good lord.  So anyone that is one of your top maintainers could be
  exposed to your verbal abuse just because they should have known
  better?

 I'm one of the victims of Linus' latest verbal abuse. :)

 Just for the record. I got grilled by Linus several times over the
 last years and I can't remember a single instance where it was
 unjustified. When I see such a mail in my inbox, I know that I fucked
 up royally and all I do is to figure out what I broke this time and
 fix it. I don't give a rat's ass about his abusive language. See
 below.

  exposed to your verbal abuse just because they should have known
  better?

 You know what should have known better stands for?

 It stands for violating trust.

 Linus simply has to trusts his top level maintainers, because he
 cannot review, audit and check 10k patches which flow into his tree
 every merge window himself.

 So if he finds out that someone who has his ultimate trust sends him a
 pile of crap, he tells that person in his own unmisunderstandable way
 that he's not amused.

  You know what the definition of an abuser is?  Someone that seeks out
  victims that they know will just take it and keep the abuse between
  the two of them.  They pick victims that won't fight back or report the
  abuse.

 IOW, I'm a typical victim of abuse.

 Let me clarify that.

 The person who gets away with picking me for this kind of abuse has
 not been born yet. And Linus knows very well, that he gets the full
 pack back from me (in some different form of abusive language) if he
 yelled at me for no reason. It's documented out there including his
 apologies.

 So if you talk about abuse, then you need an abuser and a victim. So
 your argumentation falls flat because there is no victim.
Could victim be someone else in the future since it is an example that
people may follow?
http://en.wikipedia.org/wiki/Silvio_Berlusconi_underage_prostitution_charges
It called abuse of office or abuse of the power.

 I do not care about his swear words and rants at all, because I know
 that it makes him feel better.

  That's a cultural thing.

 Where I grew up it's part of the culture to explode, let off steam and
 then go and have a beer together. I strongly believe this prevents
 gastric ulcer and keeps you honest. Linus and I have this kind of
 relationship. We respect each other, we trust each other and when one
 side fucks up we yell at each other and then meet at the bar for a
 drink.

 Linus did NOT abuse me in his latest rant. He simply told me in a very
 strong language that he's grumpy because I violated his trust. And
 that's legitimate. It's also legitimate to do that in public because
 it documents that the top level maintainers are not impeccable. And it
 sets a clear expectation bar for those who want to become maintainers
 of any level.

 Aside of that I completely agree with Linus, that this policital
 correctness crusades are merily creating more subtle and hard to fight
 forms of real abuse.

 I observe that every other day in big corporates, which have written
 down code of conducts and a gazillion of rules for interaction; they
 just foster dishonesty and other

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian

- Original Message -
 From: Steven Rostedt rost...@goodmis.org
 To: CAI Qian caiq...@redhat.com
 Cc: Thomas Gleixner t...@linutronix.de, Sarah Sharp 
 sarah.a.sh...@linux.intel.com, Linus Torvalds
 torva...@linux-foundation.org, Ingo Molnar mi...@kernel.org, Guenter 
 Roeck li...@roeck-us.net, Greg
 Kroah-Hartman gre...@linuxfoundation.org, Dave Jones da...@redhat.com, 
 Linux Kernel Mailing List
 linux-kernel@vger.kernel.org, Andrew Morton a...@linux-foundation.org, 
 stable sta...@vger.kernel.org,
 Darren Hart dvh...@linux.intel.com
 Sent: Thursday, July 18, 2013 11:47:34 AM
 Subject: Re: [ 00/19] 3.10.1-stable review

 On Wed, 2013-07-17 at 23:16 -0400, CAI Qian wrote:

   So if you talk about abuse, then you need an abuser and a victim. So
   your argumentation falls flat because there is no victim.
  Could victim be someone else in the future since it is an example that
  people may follow?
  http://en.wikipedia.org/wiki/Silvio_Berlusconi_underage_prostitution_charges
  It called abuse of office or abuse of the power.

 Wow! You are now comparing Linus to a Prime Minister that has paid
 underage prostitutes for sex?
I apologize that this leads to misunderstanding. It was just happened to
read the news that underage child does not feel like she is a victim
either while the law still think that is an abuse. Another example, those
BBC child abusers took ages to track down that probably because those
children did not feel victims at that time either. 

Please don't get me wrong. I did neither compare Linus to those child abusers
nor Thomas to those children. I simply pointed out there is also some common
sense need to consider.

 That's pretty low.

 What Linus does is not an abuse of power, it's a protection of his baby.
 He created Linux, and although today he's not the one writing the code,
 he is ultimately the front man responsible for the kernel.

 Think about it. If Linux does something horrible, Linus is the one that
 takes the most blame. That's a HUGE responsibility. Linus has the most
 to lose if Linux becomes crap.

 Not only does Linus have to check on code, he must also dictate policy.
 Which means dealing with different people, and how they work. If someone
 gets lazy and uses his trust to get something whacky in, Linus takes the
 blame for it if that happens. Thus, to prevent people from taking
 advantage of his trust, he has to be hard on them to make sure he can
 keep their trust.

 Linus takes his job seriously. He may joke and name his kernel after
 90's operating systems, but that's just to make the job more fun. But to
 keep the job, he needs to be a hard ass.

 The few times he's yelled at me, he always did it with a bit of comedy
 and wit. That makes the harsh yelling not so bad, and I actually got a
 chuckle out of it. But I also took the harsh yelling in a way that I had
 better not do that again.

 This is the big leagues folks. You think major league baseball managers
 are nice to their players?

 You just walked 4 players. That's not good. Keep this up I'll have to
 take you out off the team.

   vs

 What the f*ck is wrong with you. Get you head out of your @ss and start
 throwing the ball over the God damn plate before I throw your @ss out of
 this field!

 They both relay basically the same thing. The first one is nice and
 polite but states that bad things will happen if they keep it up. The
 second is quite harsh (although never calling the person a name), and
 will probably wake the person up and change his game. Which one of those
 tones do you think successful baseball managers use?

 Sometimes tone *does* matter. You want quality from the top maintainers,
 and they start to slack, you can't just treat them like this is a grade
 school sport. Results matter. You want them to understand that this is
 serious and cursing someone out gives that person that feeling.

 -- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/19] 3.10.1-stable review

2013-07-17 Thread CAI Qian

- Original Message -
 From: Steven Rostedt rost...@goodmis.org
 To: CAI Qian caiq...@redhat.com
 Cc: Thomas Gleixner t...@linutronix.de, Sarah Sharp 
 sarah.a.sh...@linux.intel.com, Linus Torvalds
 torva...@linux-foundation.org, Ingo Molnar mi...@kernel.org, Guenter 
 Roeck li...@roeck-us.net, Greg
 Kroah-Hartman gre...@linuxfoundation.org, Dave Jones da...@redhat.com, 
 Linux Kernel Mailing List
 linux-kernel@vger.kernel.org, Andrew Morton a...@linux-foundation.org, 
 stable sta...@vger.kernel.org,
 Darren Hart dvh...@linux.intel.com
 Sent: Thursday, July 18, 2013 11:47:34 AM
 Subject: Re: [ 00/19] 3.10.1-stable review

 On Wed, 2013-07-17 at 23:16 -0400, CAI Qian wrote:

   So if you talk about abuse, then you need an abuser and a victim. So
   your argumentation falls flat because there is no victim.
  Could victim be someone else in the future since it is an example that
  people may follow?
  http://en.wikipedia.org/wiki/Silvio_Berlusconi_underage_prostitution_charges
  It called abuse of office or abuse of the power.

 Wow! You are now comparing Linus to a Prime Minister that has paid
 underage prostitutes for sex?

 That's pretty low.

 What Linus does is not an abuse of power, it's a protection of his baby.
 He created Linux, and although today he's not the one writing the code,
 he is ultimately the front man responsible for the kernel.
Surely Linus has great responsibility, but isn't that every powerful 
person/organizatio
could tell the same story? Berlusconi has a country to take care of; Jimmy 
Savile has a
television kingdom to manage; NSA needs to protect world peace etc.

 Think about it. If Linux does something horrible, Linus is the one that
 takes the most blame. That's a HUGE responsibility. Linus has the most
 to lose if Linux becomes crap.

 Not only does Linus have to check on code, he must also dictate policy.
 Which means dealing with different people, and how they work. If someone
 gets lazy and uses his trust to get something whacky in, Linus takes the
 blame for it if that happens. Thus, to prevent people from taking
 advantage of his trust, he has to be hard on them to make sure he can
 keep their trust.

 Linus takes his job seriously. He may joke and name his kernel after
 90's operating systems, but that's just to make the job more fun. But to
 keep the job, he needs to be a hard ass.

 The few times he's yelled at me, he always did it with a bit of comedy
 and wit. That makes the harsh yelling not so bad, and I actually got a
 chuckle out of it. But I also took the harsh yelling in a way that I had
 better not do that again.

 This is the big leagues folks. You think major league baseball managers
 are nice to their players?

 You just walked 4 players. That's not good. Keep this up I'll have to
 take you out off the team.

   vs

 What the f*ck is wrong with you. Get you head out of your @ss and start
 throwing the ball over the God damn plate before I throw your @ss out of
 this field!

 They both relay basically the same thing. The first one is nice and
 polite but states that bad things will happen if they keep it up. The
 second is quite harsh (although never calling the person a name), and
 will probably wake the person up and change his game. Which one of those
 tones do you think successful baseball managers use?

 Sometimes tone *does* matter. You want quality from the top maintainers,
 and they start to slack, you can't just treat them like this is a grade
 school sport. Results matter. You want them to understand that this is
 serious and cursing someone out gives that person that feeling.

 -- Steve

 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel BUG at mm/slub.c:3352!

2013-06-06 Thread CAI Qian

18] 
.xfs_log_commit_cil+0x188/0x600 [xfs] 
[ 1104.320175] [c003f22a7660] [d23baa48] 
.xfs_trans_commit+0x168/0x300 [xfs] 
[ 1104.320217] [c003f22a7710] [d23662c4] .xfs_create+0x5e4/0x620 
[xfs] 
[ 1104.320259] [c003f22a7830] [d235d8fc] .xfs_vn_mknod+0x8c/0x230 
[xfs] 
[ 1104.320267] [c003f22a7900] [c0222b90] .vfs_create+0xf0/0x180 
[ 1104.320274] [c003f22a79b0] [c0225bcc] .do_last+0x9ec/0xdf0 
[ 1104.320280] [c003f22a7ad0] [c02260bc] .path_openat+0xec/0x5c0 
[ 1104.320287] [c003f22a7bf0] [c02269e0] .do_filp_open+0x40/0xb0 
[ 1104.320294] [c003f22a7d10] [c0210c30] .do_sys_open+0x140/0x250 
[ 1104.320300] [c003f22a7dc0] [c0210d98] .SyS_creat+0x18/0x30 
[ 1104.320308] [c003f22a7e30] [c0009e54] syscall_exit+0x0/0x98 
[ 1104.320313] Instruction dump: 
[ 1104.320317] 7ce95214 e9070008 7fc9502a e9270010 2fbe 41de0088 2fa9 
3b20  
[ 1104.320329] 419e007c e95c0022 e93c 79290720 <7f1e502a> 0b09 0b19 
3920  
[ 1104.320342] ---[ end trace c320e07d73bae693 ]--- 
[ 1104.329423]  
CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel BUG at mm/slub.c:3352!

2013-06-06 Thread CAI Qian

] 
[ 1104.320130] [c003f22a7560] [d23c1018] 
.xfs_log_commit_cil+0x188/0x600 [xfs] 
[ 1104.320175] [c003f22a7660] [d23baa48] 
.xfs_trans_commit+0x168/0x300 [xfs] 
[ 1104.320217] [c003f22a7710] [d23662c4] .xfs_create+0x5e4/0x620 
[xfs] 
[ 1104.320259] [c003f22a7830] [d235d8fc] .xfs_vn_mknod+0x8c/0x230 
[xfs] 
[ 1104.320267] [c003f22a7900] [c0222b90] .vfs_create+0xf0/0x180 
[ 1104.320274] [c003f22a79b0] [c0225bcc] .do_last+0x9ec/0xdf0 
[ 1104.320280] [c003f22a7ad0] [c02260bc] .path_openat+0xec/0x5c0 
[ 1104.320287] [c003f22a7bf0] [c02269e0] .do_filp_open+0x40/0xb0 
[ 1104.320294] [c003f22a7d10] [c0210c30] .do_sys_open+0x140/0x250 
[ 1104.320300] [c003f22a7dc0] [c0210d98] .SyS_creat+0x18/0x30 
[ 1104.320308] [c003f22a7e30] [c0009e54] syscall_exit+0x0/0x98 
[ 1104.320313] Instruction dump: 
[ 1104.320317] 7ce95214 e9070008 7fc9502a e9270010 2fbe 41de0088 2fa9 
3b20  
[ 1104.320329] 419e007c e95c0022 e93c 79290720 7f1e502a 0b09 0b19 
3920  
[ 1104.320342] ---[ end trace c320e07d73bae693 ]--- 
[ 1104.329423]  
CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running xfstests)

2013-06-03 Thread CAI Qian


> Cai, I did ask you for the information that would have answered this
> question:
> 
> > >   3. if you can't reproduce it like that, does it reproduce on
> > > an xfstest run on a pristine system? If so, what command
> > > line are you running, and what are the filesystem
> > > configurations?
> 
> So, I need xfstests command line and the xfs_info output from the
> filesystems in use at the time this problem occurs..
Here you are.
[root@hp-z210-01 xfstests-dev]# a=`grep ' swap' /etc/fstab | cut -f 1 -d ' '`
[root@hp-z210-01 xfstests-dev]# b=`grep ' /home' /etc/fstab | cut -f 1 -d ' '`
[root@hp-z210-01 xfstests-dev]# swapoff -a
[root@hp-z210-01 xfstests-dev]# umount /home
[root@hp-z210-01 xfstests-dev]# echo "swap = $a"
swap = /dev/mapper/rhel_hp--z210--01-swap
[root@hp-z210-01 xfstests-dev]# echo "home = $b"
home = /dev/mapper/rhel_hp--z210--01-home
[root@hp-z210-01 xfstests-dev]# export TEST_DEV=$a
[root@hp-z210-01 xfstests-dev]# export TEST_DIR=/mnt/testarea/test
[root@hp-z210-01 xfstests-dev]# export SCRATCH_DEV=$b
[root@hp-z210-01 xfstests-dev]# export SCRATCH_MNT=/mnt/testarea/scratch
[root@hp-z210-01 xfstests-dev]# mkdir -p /mnt/testarea/test
[root@hp-z210-01 xfstests-dev]# mkdir -p /mnt/testarea/scratch
[root@hp-z210-01 xfstests-dev]# 
[root@hp-z210-01 xfstests-dev]# mkfs.xfs -f $a
meta-data=/dev/mapper/rhel_hp--z210--01-swap isize=256agcount=4, 
agsize=251904 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=1007616, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
[root@hp-z210-01 xfstests-dev]# mkfs.xfs -f $b
meta-data=/dev/mapper/rhel_hp--z210--01-home isize=256agcount=4, 
agsize=11701504 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=46806016, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=22854, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

[root@hp-z210-01 xfstests-dev]# 
[root@hp-z210-01 xfstests-dev]# mount /dev/mapper/rhel_hp--z210--01-home 
/mnt/testarea/scratch
[root@hp-z210-01 xfstests-dev]# 
[root@hp-z210-01 xfstests-dev]# mount /dev/mapper/rhel_hp--z210--01-swap 
/mnt/testarea/test
[root@hp-z210-01 xfstests-dev]# xfs_info $a
meta-data=/dev/mapper/rhel_hp--z210--01-swap isize=256agcount=4, 
agsize=251904 blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=1007616, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
[root@hp-z210-01 xfstests-dev]# xfs_info $b
meta-data=/dev/mapper/rhel_hp--z210--01-home isize=256agcount=4, 
agsize=11701504 blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=46806016, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal   bsize=4096   blocks=22854, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
[root@hp-z210-01 xfstests-dev]# ./check 20
FSTYP -- xfs (non-debug)
PLATFORM  -- Linux/x86_64 hp-z210-01 3.9.4
MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/rhel_hp--z210--01-home
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 
/dev/mapper/rhel_hp--z210--01-home /mnt/testarea/scratch
020 
CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running xfstests)

2013-06-03 Thread CAI Qian



- Original Message -
> From: "Dave Chinner" 
> To: "CAI Qian" 
> Cc: x...@oss.sgi.com, sta...@vger.kernel.org, "LKML" 
> , "linux-mm" 
> Sent: Monday, June 3, 2013 12:00:38 PM
> Subject: Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running 
> xfstests)
> 
> On Sun, Jun 02, 2013 at 11:04:11PM -0400, CAI Qian wrote:
> > 
> > > There's memory corruption all over the place.  It is most likely
> > > that trinity is causing this - it's purpose is to trigger corruption
> > > issues, but they aren't always immediately seen.  If you can trigger
> > > this xfs trace without trinity having been run and without all the
> > > RCU/idle/scheduler/cgroup issues occuring at the same time, then
> > > it's likely to be caused by XFS. But right now, I'd say XFS is just
> > > an innocent bystander caught in the crossfire. There's nothing I can
> > > do from an XFS persepctive to track this down...
> > OK, this can be reproduced by just running LTP and then xfstests without
> > trinity at all...
> 
> Cai, can you be more precise about what is triggering it?  LTP and
> xfstests do a large amount of stuff, and stack traces do not do not
> help narrow down the cause at all.  Can you provide the follwoing
> information and perform the follwoing steps:
> 
>   1. What xfstest is tripping over it?
Test #20.
>   2. Can you reproduce it just by running that one specific test
> on a pristine system (i.e. freshly mkfs'd filesystems,
> immediately after boot)
Yes, it was reproduced without LTP at all.
[   98.534402] XFS (dm-0): Mounting Filesystem
[   98.586673] XFS (dm-0): Ending clean mount
[   99.741704] XFS (dm-2): Mounting Filesystem
[  100.117248] XFS (dm-2): Ending clean mount
[  100.723228] XFS (dm-0): Mounting Filesystem
[  100.775965] XFS (dm-0): Ending clean mount
[  101.980250] BUG: unable to handle kernel NULL pointer dereference at 
0098
[  101.988136] IP: [] tg_load_down+0x4c/0x80
[  101.993737] PGD 0 
[  101.995769] Oops: 0002 [#1] SMP 
[  101.999038] Modules linked in: lockd sunrpc nf_conntrack_netbios_ns 
nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle 
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat 
iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
ip_tables sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm hp_wmi sparse_keymap 
rfkill iTCO_wdt e1000e pcspkr snd_page_alloc iTCO_vendor_support mei ptp 
pps_core lpc_ich i2c_i801 snd_timer mfd_core microcode(+) snd soundcore xfs 
libcrc32c sr_mod sd_mod cdrom crc_t10dif nouveau video mxm_wmi i2c_algo_bit 
drm_kms_helper ahci ata_generic ttm libahci pata_acpi drm i2c_core libata wmi 
dm_mirror dm_region_hash dm_log dm_mod
[  102.075355] CPU 2 
[  102.077197] Pid: 356, comm: kworker/2:2 Not tainted 3.9.4 #1 Hewlett-Packard 
HP Z210 Workstation/1587h
[  102.086691] RIP: 0010:[]  [] 
tg_load_down+0x4c/0x80
[  102.094705] RSP: 0018:880078307c78  EFLAGS: 00010002
[  102.100020] RAX: 0001f2b5a618ed0f RBX: 0001 RCX: 068a
[  102.107157] RDX:  RSI: 0001 RDI: 8800772ceee8
[  102.114293] RBP: 880078307c78 R08: 0008 R09: 88007d094400
[  102.121422] R10: 0344 R11: 0001 R12: 81c78560
[  102.128552] R13: 8108c460 R14:  R15: 8800772ceee8
[  102.135682] FS:  () GS:88007d10() 
knlGS:
[  102.143776] CS:  0010 DS:  ES:  CR0: 80050033
[  102.149524] CR2: 0098 CR3: 018fa000 CR4: 000407e0
[  102.156654] DR0:  DR1:  DR2: 
[  102.163785] DR3:  DR6: 0ff0 DR7: 0400
[  102.170915] Process kworker/2:2 (pid: 356, threadinfo 880078306000, task 
88007736b580)
[  102.179527] Stack:
[  102.181545]  880078307cc0 810926b2 81098c60 
8800772cf008
[  102.189005]  880079b96f00 069c 880079b96ee8 
00014400
[  102.196464]  88007d094400 880078307db0 8109f773 
88007cc10480
[  102.203920] Call Trace:
[  102.206372]  [] walk_tg_tree_from+0x32/0xe0
[  102.212118]  [] ? task_waking_fair+0x20/0x20
[  102.217955]  [] load_balance+0x2a3/0x7d0
[  102.223444]  [] ? update_rq_clock.part.67+0x1c/0x170
[  102.229977]  [] idle_balance+0x182/0x2f0
[  102.235468]  [] __schedule+0x7bc/0x7d0
[  102.240786]  [] schedule+0x29/0x70
[  102.245756]  [] worker_thread+0x1b4/0x3d0
[  102.251332]  [] ? __alloc_workqueue_key+0x500/0x500
[  102.25]  [] kthread+0xc0/0xd0
[  102.262662]  [] ?

Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running xfstests)

2013-06-03 Thread CAI Qian


 Cai, I did ask you for the information that would have answered this
 question:
 
 3. if you can't reproduce it like that, does it reproduce on
   an xfstest run on a pristine system? If so, what command
   line are you running, and what are the filesystem
   configurations?
 
 So, I need xfstests command line and the xfs_info output from the
 filesystems in use at the time this problem occurs..
Here you are.
[root@hp-z210-01 xfstests-dev]# a=`grep ' swap' /etc/fstab | cut -f 1 -d ' '`
[root@hp-z210-01 xfstests-dev]# b=`grep ' /home' /etc/fstab | cut -f 1 -d ' '`
[root@hp-z210-01 xfstests-dev]# swapoff -a
[root@hp-z210-01 xfstests-dev]# umount /home
[root@hp-z210-01 xfstests-dev]# echo swap = $a
swap = /dev/mapper/rhel_hp--z210--01-swap
[root@hp-z210-01 xfstests-dev]# echo home = $b
home = /dev/mapper/rhel_hp--z210--01-home
[root@hp-z210-01 xfstests-dev]# export TEST_DEV=$a
[root@hp-z210-01 xfstests-dev]# export TEST_DIR=/mnt/testarea/test
[root@hp-z210-01 xfstests-dev]# export SCRATCH_DEV=$b
[root@hp-z210-01 xfstests-dev]# export SCRATCH_MNT=/mnt/testarea/scratch
[root@hp-z210-01 xfstests-dev]# mkdir -p /mnt/testarea/test
[root@hp-z210-01 xfstests-dev]# mkdir -p /mnt/testarea/scratch
[root@hp-z210-01 xfstests-dev]# 
[root@hp-z210-01 xfstests-dev]# mkfs.xfs -f $a
meta-data=/dev/mapper/rhel_hp--z210--01-swap isize=256agcount=4, 
agsize=251904 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=1007616, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
[root@hp-z210-01 xfstests-dev]# mkfs.xfs -f $b
meta-data=/dev/mapper/rhel_hp--z210--01-home isize=256agcount=4, 
agsize=11701504 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=46806016, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=22854, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

[root@hp-z210-01 xfstests-dev]# 
[root@hp-z210-01 xfstests-dev]# mount /dev/mapper/rhel_hp--z210--01-home 
/mnt/testarea/scratch
[root@hp-z210-01 xfstests-dev]# 
[root@hp-z210-01 xfstests-dev]# mount /dev/mapper/rhel_hp--z210--01-swap 
/mnt/testarea/test
[root@hp-z210-01 xfstests-dev]# xfs_info $a
meta-data=/dev/mapper/rhel_hp--z210--01-swap isize=256agcount=4, 
agsize=251904 blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=1007616, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
[root@hp-z210-01 xfstests-dev]# xfs_info $b
meta-data=/dev/mapper/rhel_hp--z210--01-home isize=256agcount=4, 
agsize=11701504 blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=46806016, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal   bsize=4096   blocks=22854, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
[root@hp-z210-01 xfstests-dev]# ./check 20
FSTYP -- xfs (non-debug)
PLATFORM  -- Linux/x86_64 hp-z210-01 3.9.4
MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/rhel_hp--z210--01-home
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 
/dev/mapper/rhel_hp--z210--01-home /mnt/testarea/scratch
020 crashed immediately...
CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running xfstests)

2013-06-03 Thread CAI Qian



- Original Message -
 From: Dave Chinner da...@fromorbit.com
 To: CAI Qian caiq...@redhat.com
 Cc: x...@oss.sgi.com, sta...@vger.kernel.org, LKML 
 linux-kernel@vger.kernel.org, linux-mm linux...@kvack.org
 Sent: Monday, June 3, 2013 12:00:38 PM
 Subject: Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running 
 xfstests)
 
 On Sun, Jun 02, 2013 at 11:04:11PM -0400, CAI Qian wrote:
  
   There's memory corruption all over the place.  It is most likely
   that trinity is causing this - it's purpose is to trigger corruption
   issues, but they aren't always immediately seen.  If you can trigger
   this xfs trace without trinity having been run and without all the
   RCU/idle/scheduler/cgroup issues occuring at the same time, then
   it's likely to be caused by XFS. But right now, I'd say XFS is just
   an innocent bystander caught in the crossfire. There's nothing I can
   do from an XFS persepctive to track this down...
  OK, this can be reproduced by just running LTP and then xfstests without
  trinity at all...
 
 Cai, can you be more precise about what is triggering it?  LTP and
 xfstests do a large amount of stuff, and stack traces do not do not
 help narrow down the cause at all.  Can you provide the follwoing
 information and perform the follwoing steps:
 
   1. What xfstest is tripping over it?
Test #20.
   2. Can you reproduce it just by running that one specific test
 on a pristine system (i.e. freshly mkfs'd filesystems,
 immediately after boot)
Yes, it was reproduced without LTP at all.
[   98.534402] XFS (dm-0): Mounting Filesystem
[   98.586673] XFS (dm-0): Ending clean mount
[   99.741704] XFS (dm-2): Mounting Filesystem
[  100.117248] XFS (dm-2): Ending clean mount
[  100.723228] XFS (dm-0): Mounting Filesystem
[  100.775965] XFS (dm-0): Ending clean mount
[  101.980250] BUG: unable to handle kernel NULL pointer dereference at 
0098
[  101.988136] IP: [81098cac] tg_load_down+0x4c/0x80
[  101.993737] PGD 0 
[  101.995769] Oops: 0002 [#1] SMP 
[  101.999038] Modules linked in: lockd sunrpc nf_conntrack_netbios_ns 
nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle 
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat 
iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
ip_tables sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm hp_wmi sparse_keymap 
rfkill iTCO_wdt e1000e pcspkr snd_page_alloc iTCO_vendor_support mei ptp 
pps_core lpc_ich i2c_i801 snd_timer mfd_core microcode(+) snd soundcore xfs 
libcrc32c sr_mod sd_mod cdrom crc_t10dif nouveau video mxm_wmi i2c_algo_bit 
drm_kms_helper ahci ata_generic ttm libahci pata_acpi drm i2c_core libata wmi 
dm_mirror dm_region_hash dm_log dm_mod
[  102.075355] CPU 2 
[  102.077197] Pid: 356, comm: kworker/2:2 Not tainted 3.9.4 #1 Hewlett-Packard 
HP Z210 Workstation/1587h
[  102.086691] RIP: 0010:[81098cac]  [81098cac] 
tg_load_down+0x4c/0x80
[  102.094705] RSP: 0018:880078307c78  EFLAGS: 00010002
[  102.100020] RAX: 0001f2b5a618ed0f RBX: 0001 RCX: 068a
[  102.107157] RDX:  RSI: 0001 RDI: 8800772ceee8
[  102.114293] RBP: 880078307c78 R08: 0008 R09: 88007d094400
[  102.121422] R10: 0344 R11: 0001 R12: 81c78560
[  102.128552] R13: 8108c460 R14:  R15: 8800772ceee8
[  102.135682] FS:  () GS:88007d10() 
knlGS:
[  102.143776] CS:  0010 DS:  ES:  CR0: 80050033
[  102.149524] CR2: 0098 CR3: 018fa000 CR4: 000407e0
[  102.156654] DR0:  DR1:  DR2: 
[  102.163785] DR3:  DR6: 0ff0 DR7: 0400
[  102.170915] Process kworker/2:2 (pid: 356, threadinfo 880078306000, task 
88007736b580)
[  102.179527] Stack:
[  102.181545]  880078307cc0 810926b2 81098c60 
8800772cf008
[  102.189005]  880079b96f00 069c 880079b96ee8 
00014400
[  102.196464]  88007d094400 880078307db0 8109f773 
88007cc10480
[  102.203920] Call Trace:
[  102.206372]  [810926b2] walk_tg_tree_from+0x32/0xe0
[  102.212118]  [81098c60] ? task_waking_fair+0x20/0x20
[  102.217955]  [8109f773] load_balance+0x2a3/0x7d0
[  102.223444]  [8108fa7c] ? update_rq_clock.part.67+0x1c/0x170
[  102.229977]  [810a0142] idle_balance+0x182/0x2f0
[  102.235468]  [8160f1ac] __schedule+0x7bc/0x7d0
[  102.240786]  [8160f1e9] schedule+0x29/0x70
[  102.245756]  [8107f404] worker_thread+0x1b4/0x3d0
[  102.251332]  [8107f250] ? __alloc_workqueue_key+0x500/0x500
[  102.25

Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running xfstests)

2013-06-02 Thread CAI Qian

xattr_set+0x42/0x70 [xfs] 
[ 7267.598595]  [] generic_setxattr+0x62/0x80 
[ 7267.604252]  [] __vfs_setxattr_noperm+0x63/0x1b0 
[ 7267.610429]  [] vfs_setxattr+0xb5/0xc0 
[ 7267.615738]  [] setxattr+0x126/0x1c0 
[ 7267.620875]  [] ? kmem_cache_free+0x1cd/0x1e0 
[ 7267.626791]  [] ? final_putname+0x22/0x50 
[ 7267.632361]  [] ? putname+0x2b/0x40 
[ 7267.637412]  [] ? user_path_at_empty+0x5f/0x90 
[ 7267.643415]  [] ? __sb_start_write+0x49/0x100 
[ 7267.649331]  [] sys_lsetxattr+0x8f/0xd0 
[ 7267.654727]  [] system_call_fastpath+0x16/0x1b 
[ 7267.660726] Code: c1 01 48 83 c0 02 48 83 c7 08 49 83 c0 20 41 81 f9 00 01 
00 00 74 4c 4c 8b 13 0f b7 30 0f b7 88 00 02 00 00 0f b7 90 00 04 00 00 <4d> 8b 
12 66 c1 ee 02 66 c1 e9 02 66 c1 ea 02 66 41 81 3a 6f 90  
[ 7267.680665] RIP  [] nv50_crtc_lut_load+0x98/0x110 
[nouveau] 
[ 7267.687835]  RSP  
[ 7267.691324] ---[ end trace 0ac6265371f9a5bf ]--- 
[ 7287.146356] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
3 
[ 7288.185689] Shutting down cpus with NMI 
[ 7288.189526] drm_kms_helper: panic occurred, switching back to text console 
CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body

Re: 3.9.4 Oops running xfstests (WAS Re: 3.9.3: Oops running xfstests)

2013-06-02 Thread CAI Qian


 There's memory corruption all over the place.  It is most likely
 that trinity is causing this - it's purpose is to trigger corruption
 issues, but they aren't always immediately seen.  If you can trigger
 this xfs trace without trinity having been run and without all the
 RCU/idle/scheduler/cgroup issues occuring at the same time, then
 it's likely to be caused by XFS. But right now, I'd say XFS is just
 an innocent bystander caught in the crossfire. There's nothing I can
 do from an XFS persepctive to track this down...
OK, this can be reproduced by just running LTP and then xfstests without
trinity at all...
[  302.311213] XFS (dm-0): Mounting Filesystem 
[  302.608320] XFS (dm-0): Ending clean mount 
[  303.625760] XFS (dm-2): Mounting Filesystem 
[  303.674648] XFS (dm-2): Ending clean mount 
[  304.247740] XFS (dm-0): Mounting Filesystem 
[  304.563899] XFS (dm-0): Ending clean mount 
[  305.118268] BUG: unable to handle kernel paging request at 8801f7067000 
[  305.156637] IP: [813022fa] memmove+0x4a/0x1a0 
[  305.185560] PGD 1ddf067 PUD 20bdf9067 PMD 1f9b56063 PTE 8001f7067161 
[  305.222852] Oops: 0003 [#1] SMP  
[  305.238742] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F) 
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) 
ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) 
iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) 
nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) 
ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) 
iptable_filter(F) ip_tables(F) sg(F) iTCO_wdt(F) e1000e(F) ixgbe(F) 
iTCO_vendor_support(F) ptp(F) mdio(F) dca(F) serio_raw(F) hpwdt(F) pcspkr(F) 
pps_core(F) hpilo(F) lpc_ich(F) mfd_core(F) microcode(F) xfs(F) libcrc32c(F) 
sd_mod(F) mgag200(F) ata_generic(F) crc_t10dif(F) i2c_algo_bit(F) pata_acpi(F) 
drm_kms_helper(F) ttm(F) ata_piix(F) drm(F) hpsa(F) libata(F) i2c_core(F) 
dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) 
[  305.615528] CPU 3  
[  305.624597] Pid: 19138, comm: attr Tainted: GF3.9.4 #1 HP 
ProLiant DL120 G7 
[  305.667290] RIP: 0010:[813022fa]  [813022fa] 
memmove+0x4a/0x1a0 
[  305.707799] RSP: 0018:8801da2a7ad8  EFLAGS: 00010282 
[  305.736413] RAX: 8801f7034540 RBX: 8801d3bba000 RCX: 
d00b 
[  305.774576] RDX: fffcd4f8 RSI: 8801f70673f8 RDI: 
8801f7067000 
[  305.811143] RBP: 8801da2a7b30 R08: 3ee6e8057eed854d R09: 
9066aaebe17ba23c 
[  305.847038] R10: 0d8b3a7401f8834f R11: 74c08510ffdf89f6 R12: 
8801f7034520 
[  305.883189] R13: 03d8 R14: 1800 R15: 
007b 
[  305.921160] FS:  7fbdabf4e740() GS:88020b46() 
knlGS: 
[  305.961370] CS:  0010 DS:  ES:  CR0: 80050033 
[  305.990403] CR2: 8801f7067000 CR3: 0001d3d9 CR4: 
000407e0 
[  306.025585] DR0:  DR1:  DR2: 
 
[  306.061610] DR3:  DR6: 0ff0 DR7: 
0400 
[  306.097768] Process attr (pid: 19138, threadinfo 8801da2a6000, task 
8801dfdb3580) 
[  306.138073] Stack: 
[  306.148523]  a01691a1   
8801f7034540 
[  306.187280]  8801f7034520 0018 8801f7034520 
 
[  306.227244]  8801e2425cd0 8801d207e880 8801d3e6ad20 
8801da2a7b68 
[  306.267164] Call Trace: 
[  306.280421]  [a01691a1] ? xfs_attr_leaf_moveents.isra.2+0x91/0x280 
[xfs] 
[  306.321077]  [a0169467] xfs_attr_leaf_compact+0xd7/0x130 [xfs] 
[  306.355110]  [a016aa2e] xfs_attr_leaf_add+0xce/0x170 [xfs] 
[  306.386559]  [a0166850] xfs_attr_leaf_addname+0xc0/0x3d0 [xfs] 
[  306.420459]  [a0174d4e] ? xfs_bmap_one_block+0x3e/0xa0 [xfs] 
[  306.453034]  [a016778c] xfs_attr_set_int+0x30c/0x420 [xfs] 
[  306.484498]  [811be9f4] ? setxattr+0xa4/0x1c0 
[  306.511064]  [a0167d1f] xfs_attr_set+0x7f/0x90 [xfs] 
[  306.539972]  [a015da12] xfs_xattr_set+0x42/0x70 [xfs] 
[  306.570403]  [811bdef2] generic_setxattr+0x62/0x80 
[  306.598333]  [811be743] __vfs_setxattr_noperm+0x63/0x1b0 
[  306.629324]  [811be945] vfs_setxattr+0xb5/0xc0 
[  306.655470]  [811bea76] setxattr+0x126/0x1c0 
[  306.681365]  [8118358d] ? kmem_cache_free+0x1cd/0x1e0 
[  306.710684]  [811a72b2] ? final_putname+0x22/0x50 
[  306.740728]  [811a74cb] ? putname+0x2b/0x40 
[  306.767873]  [811ab96f] ? user_path_at_empty+0x5f/0x90 
[  306.799781]  [8119e5c9] ? __sb_start_write+0x49/0x100 
[  306.831920]  [811bedef] sys_lsetxattr+0x8f/0xd0 
[  306.861316]  [81619359] system_call_fastpath+0x16/0x1b 
[  306.894531] Code: 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 41 48 83 ea 
20 48 83 ea 20 4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d

Re: 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-23 Thread CAI Qian

OK, here is clearer stack output from the run.
CAI Qian

+ ./check
FSTYP -- xfs (non-debug)
PLATFORM  -- Linux/s390x ibm-z10-23 3.9.3

001  29s
002  3s
003  2s
004  [not run] this test requires a valid $SCRATCH_DEV
005  2s
006  9s
007  10s
008  7s
009  [not run] this test requires a valid $SCRATCH_DEV
010  [not run] dbtest was not built for this platform
011  9s
012  10s
013  35s
014  5s
015  [not run] this test requires a valid $SCRATCH_DEV
016  [not run] this test requires a valid $SCRATCH_DEV
017  [not run] this test requires a valid $SCRATCH_DEV
018  [not run] this test requires a valid $SCRATCH_DEV
019  [not run] this test requires a valid $SCRATCH_DEV
020 


[ 1316.571227] XFS (dm-0): Mounting Filesystem
[ 1316.697803] XFS (dm-0): Ending clean mount
[ 1318.080615] XFS (dm-0): Ending clean mount
[ 1348.791125] XFS (dm-0): Mounting Filesystem
[ 1348.989166] XFS (dm-0): Ending clean mount
[ 1353.335478] XFS (dm-0): Mounting Filesystem
[ 1353.496364] XFS (dm-0): Ending clean mount
[ 1357.495427] XFS (dm-0): Mounting Filesystem
[ 1357.676971] XFS (dm-0): Ending clean mount
[ 1361.646399] XFS (dm-0): Mounting Filesystem
[ 1361.890426] XFS (dm-0): Ending clean mount
[ 1371.798944] XFS (dm-0): Mounting Filesystem
[ 1371.976922] XFS (dm-0): Ending clean mount
[ 1384.559103] XFS (dm-0): Mounting Filesystem
[ 1384.725657] XFS (dm-0): Ending clean mount
[ 1393.131347] XFS (dm-0): Mounting Filesystem
[ 1393.357927] XFS (dm-0): Ending clean mount
[ 1407.282708] XFS (dm-0): Mounting Filesystem
[ 1407.745176] XFS (dm-0): Ending clean mount
[ 1422.927074] XFS (dm-0): Mounting Filesystem
[ 1423.136266] XFS (dm-0): Ending clean mount
[ 1425.500910] XFS (dm-0): Mounting Filesystem
[ 1425.608851] XFS (dm-0): Ending clean mount
[ 1450.978110] XFS (dm-0): Mounting Filesystem
[ 1451.255368] XFS (dm-0): Ending clean mount
[ 1453.603742] XFS (dm-0): Mounting Filesystem
[ 1453.680657] XFS (dm-0): Ending clean mount
[ 1456.262266] XFS (dm-0): Mounting Filesystem
[ 1456.330515] XFS (dm-0): Ending clean mount
[ 1457.053767] XFS (dm-0): Mounting Filesystem
[ 1457.107258] XFS (dm-0): Ending clean mount
[ 1462.049374] XFS (dm-0): Mounting Filesystem
[ 1462.111389] XFS (dm-0): Ending clean mount
[ 1471.109589] ODEBUG: deactivate not available (active state 0) object type: ti
mer_list hint: process_timeout+0x0/0x8
[ 1471.109683] [ cut here ]
[ 1471.109688] WARNING: at lib/debugobjects.c:260
[ 1471.109692] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F) dm_l
og(F) dm_mod(F)
[ 1471.109848] CPU: 0 Tainted: GF3.9.3 #2
[ 1471.109858] Process swapper/0 (pid: 0, task: 00a2b4d0, ksp: 0
0a17d28)
[ 1471.109868] Krnl PSW : 0404c0018000 0046c84a (debug_print_object+
0xca/0xd8)
[ 1471.114762]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS:  00a2b4d0 0067 0101f708
[ 1471.114769]0046c846 84a4d448 0086936a 000
001040700
[ 1471.114773]01a0f290 0401 00874cf8 000
000a395d8
[ 1471.114777]0195f820 0001bd20 0046c846 000
1bc20
[ 1471.114792] Krnl Code: 0046c83a: e3441004lg  %r4,0(%r
4,%r1)
   0046c840: c0e500139f88   brasl   %r14,6e0750
  #0046c846: a7f40001   brc 15,46c848
  >0046c84a: a7f4ffc2   brc 15,46c7ce
   0046c84e: a729   lghi%r2,0
   0046c852: a7f4ffd7   brc 15,46c800
   0046c856: 0707   bcr 0,%r7
   0046c858: ebaff0680024   stmg%r10,%r15,104(%r15)
[ 1471.114825] Call Trace:
[ 1471.114828] ([<0046c846>] debug_print_object+0xc6/0xd8)
[ 1471.114833]  [<0046d35c>] debug_object_deactivate+0x15c/0x160
[ 1471.114838]  [<00148244>] run_timer_softirq+0x180/0x464
[ 1471.114843]  [<0013d8d6>] __do_softirq+0x112/0x42c
[ 1471.114847]  [<0013ddf8>] irq_exit+0xc8/0xe8
[ 1471.114851]  [<0010d55e>] do_extint+0x25e/0x318
[ 1471.114859]  [<006f0d90>] ext_skip+0x40/0x44
[ 1471.114866]  [<006f05d6>] vtime_st

Re: 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-23 Thread CAI Qian

OK, here is clearer stack output from the run.
CAI Qian

+ ./check
FSTYP -- xfs (non-debug)
PLATFORM  -- Linux/s390x ibm-z10-23 3.9.3

001  29s
002  3s
003  2s
004  [not run] this test requires a valid $SCRATCH_DEV
005  2s
006  9s
007  10s
008  7s
009  [not run] this test requires a valid $SCRATCH_DEV
010  [not run] dbtest was not built for this platform
011  9s
012  10s
013  35s
014  5s
015  [not run] this test requires a valid $SCRATCH_DEV
016  [not run] this test requires a valid $SCRATCH_DEV
017  [not run] this test requires a valid $SCRATCH_DEV
018  [not run] this test requires a valid $SCRATCH_DEV
019  [not run] this test requires a valid $SCRATCH_DEV
020 


[ 1316.571227] XFS (dm-0): Mounting Filesystem
[ 1316.697803] XFS (dm-0): Ending clean mount
[ 1318.080615] XFS (dm-0): Ending clean mount
[ 1348.791125] XFS (dm-0): Mounting Filesystem
[ 1348.989166] XFS (dm-0): Ending clean mount
[ 1353.335478] XFS (dm-0): Mounting Filesystem
[ 1353.496364] XFS (dm-0): Ending clean mount
[ 1357.495427] XFS (dm-0): Mounting Filesystem
[ 1357.676971] XFS (dm-0): Ending clean mount
[ 1361.646399] XFS (dm-0): Mounting Filesystem
[ 1361.890426] XFS (dm-0): Ending clean mount
[ 1371.798944] XFS (dm-0): Mounting Filesystem
[ 1371.976922] XFS (dm-0): Ending clean mount
[ 1384.559103] XFS (dm-0): Mounting Filesystem
[ 1384.725657] XFS (dm-0): Ending clean mount
[ 1393.131347] XFS (dm-0): Mounting Filesystem
[ 1393.357927] XFS (dm-0): Ending clean mount
[ 1407.282708] XFS (dm-0): Mounting Filesystem
[ 1407.745176] XFS (dm-0): Ending clean mount
[ 1422.927074] XFS (dm-0): Mounting Filesystem
[ 1423.136266] XFS (dm-0): Ending clean mount
[ 1425.500910] XFS (dm-0): Mounting Filesystem
[ 1425.608851] XFS (dm-0): Ending clean mount
[ 1450.978110] XFS (dm-0): Mounting Filesystem
[ 1451.255368] XFS (dm-0): Ending clean mount
[ 1453.603742] XFS (dm-0): Mounting Filesystem
[ 1453.680657] XFS (dm-0): Ending clean mount
[ 1456.262266] XFS (dm-0): Mounting Filesystem
[ 1456.330515] XFS (dm-0): Ending clean mount
[ 1457.053767] XFS (dm-0): Mounting Filesystem
[ 1457.107258] XFS (dm-0): Ending clean mount
[ 1462.049374] XFS (dm-0): Mounting Filesystem
[ 1462.111389] XFS (dm-0): Ending clean mount
[ 1471.109589] ODEBUG: deactivate not available (active state 0) object type: ti
mer_list hint: process_timeout+0x0/0x8
[ 1471.109683] [ cut here ]
[ 1471.109688] WARNING: at lib/debugobjects.c:260
[ 1471.109692] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F) dm_l
og(F) dm_mod(F)
[ 1471.109848] CPU: 0 Tainted: GF3.9.3 #2
[ 1471.109858] Process swapper/0 (pid: 0, task: 00a2b4d0, ksp: 0
0a17d28)
[ 1471.109868] Krnl PSW : 0404c0018000 0046c84a (debug_print_object+
0xca/0xd8)
[ 1471.114762]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS:  00a2b4d0 0067 0101f708
[ 1471.114769]0046c846 84a4d448 0086936a 000
001040700
[ 1471.114773]01a0f290 0401 00874cf8 000
000a395d8
[ 1471.114777]0195f820 0001bd20 0046c846 000
1bc20
[ 1471.114792] Krnl Code: 0046c83a: e3441004lg  %r4,0(%r
4,%r1)
   0046c840: c0e500139f88   brasl   %r14,6e0750
  #0046c846: a7f40001   brc 15,46c848
  0046c84a: a7f4ffc2   brc 15,46c7ce
   0046c84e: a729   lghi%r2,0
   0046c852: a7f4ffd7   brc 15,46c800
   0046c856: 0707   bcr 0,%r7
   0046c858: ebaff0680024   stmg%r10,%r15,104(%r15)
[ 1471.114825] Call Trace:
[ 1471.114828] ([0046c846] debug_print_object+0xc6/0xd8)
[ 1471.114833]  [0046d35c] debug_object_deactivate+0x15c/0x160
[ 1471.114838]  [00148244] run_timer_softirq+0x180/0x464
[ 1471.114843]  [0013d8d6] __do_softirq+0x112/0x42c
[ 1471.114847]  [0013ddf8] irq_exit+0xc8/0xe8
[ 1471.114851]  [0010d55e] do_extint+0x25e/0x318
[ 1471.114859]  [006f0d90] ext_skip+0x40/0x44
[ 1471.114866]  [006f05d6] vtime_stop_cpu+0x52/0xbc
[ 1471.114870] ([006f05b4] vtime_stop_cpu+0x30/0xbc

3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-22 Thread CAI Qian

Original report:
http://oss.sgi.com/archives/xfs/2013-05/msg00683.html

Also seen on Power7:
http://marc.info/?l=linux-kernel=136927904900692=2

CAI Qian

- Original Message -
> From: "Dave Chinner" 
> To: "CAI Qian" 
> Cc: "LKML" , sta...@vger.kernel.org, 
> x...@oss.sgi.com
> Sent: Thursday, May 23, 2013 11:46:11 AM
> Subject: Re: 3.9.2: xfstests triggered panic
> 
> On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
> > - Original Message -
> > > From: "Dave Chinner" 
> > > To: "CAI Qian" 
> > > Cc: "LKML" , sta...@vger.kernel.org,
> > > x...@oss.sgi.com
> > > Sent: Wednesday, May 22, 2013 5:53:00 PM
> > > Subject: Re: 3.9.2: xfstests triggered panic
> > > 
> > > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > > > Reproduced on almost all s390x guests by running xfstests.
> > > > 
> > > > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > > > 14634.525522¨ XFS (dm-1): Ending clean mount
> > > > 14640.413007¨  <0017c6d4>¨ idle_balance+0x1a0/0x340
> > > > 14640.413010¨  <0063303e>¨ __schedule+0xa22/0xaf0
> > > > 14640.428279¨  <00630da6>¨ schedule_timeout+0x186/0x2c0
> > > > 14640.428289¨  <001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > > > 14640.428300¨  <00158c5a>¨ kthread+0xe6/0xec
> > > > 14640.428304¨  <00634de6>¨ kernel_thread_starter+0x6/0xc
> > > > 14640.428308¨  <00634de0>¨ kernel_thread_starter+0x0/0xc
> > > > 14640.428311¨ Last Breaking-Event-Address:
> > > > 14640.428314¨  <0016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > > > 14640.428319¨  list_add corruption. next->prev should be prev
> > > > (0918
> > > > ), but was   (null). (next=  (null)).
> > > 
> > > Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> > > code. This kind of implies a stack corruption
> > > 
> > > > Sometimes, this pops up,
> > > > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > > > 
> > > > or this,
> > > > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > > > 15316.255796¨ XFS (dm-1): Ending clean mount
> > > > 15320.364246¨006367a2: e310b0080004lg
> > > > %r1,8(%r
> > > > 11)
> > > > 15320.364249¨006367a8: 41101010la
> > > > %r1,16(%
> > > > r1)
> > > > 15320.364251¨006367ac: e3301004lg
> > > > %r3,0(%r
> > > > 1)
> > > > 15320.364252¨ Call Trace:
> > > > 15320.364252¨ Last Breaking-Event-Address:
> > > > 15320.364253¨  � <>¨ Kernel stack overflow.
> > > > 15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
> > > > 15320.364309¨ Process rhts-test-runne (pid: 625, task:
> > > > 3dccc890,
> > > > ksp: 0
> > > 
> > >  and there you go - a stack overflow. Your kernel stack size is
> > > too small.
> > > 
> > > I'd suggest that you need 16k stacks on s390 - IIRC every function
> > > call has 128 byte stack frame, and there are call chains 70-80
> > > functions deep in the storage stack...
> > Hmm, I am unsure how to set to 16k stack there
> 
> Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
> kernels only have an 8k stack size, 64 bit kernels are 16k (see
> arch/s390/Makefile).
> 
> $ git grep STACK_SIZE arch/s390 |head -2
> arch/s390/Makefile:STACK_SIZE   := 8192
> arch/s390/Makefile:STACK_SIZE   := 16384
> 
> As it is, the stack frame usage is worse than I thought:
> 
> $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96  /*
> size of minimum stack frame */
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160  /*
> size of minimum stack frame */
> 
> Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
> stack size is going to have big troubles with a 70-80 function deep
> call chain.
> 
> As for powerpc:
> 
> arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
> 
> Yeah, same issue.
> 
> But, seriously, these stack traces are meaningless to anyone not
> familiar with s390 or power7 - they indicate a problem detected
> in the idle loop, not where ever the stack overran.
> 
> Can you please work with the s390/power7 people to obtain whatever
> stack it was that overflowed, and we can go from there.
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.2: xfstests triggered panic

2013-05-22 Thread CAI Qian



- Original Message -
> From: "Dave Chinner" 
> To: "CAI Qian" 
> Cc: "LKML" , sta...@vger.kernel.org, 
> x...@oss.sgi.com
> Sent: Thursday, May 23, 2013 11:46:11 AM
> Subject: Re: 3.9.2: xfstests triggered panic
> 
> On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
> > - Original Message -
> > > From: "Dave Chinner" 
> > > To: "CAI Qian" 
> > > Cc: "LKML" , sta...@vger.kernel.org,
> > > x...@oss.sgi.com
> > > Sent: Wednesday, May 22, 2013 5:53:00 PM
> > > Subject: Re: 3.9.2: xfstests triggered panic
> > > 
> > > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > > > Reproduced on almost all s390x guests by running xfstests.
> > > > 
> > > > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > > > 14634.525522¨ XFS (dm-1): Ending clean mount
> > > > 14640.413007¨  <0017c6d4>¨ idle_balance+0x1a0/0x340
> > > > 14640.413010¨  <0063303e>¨ __schedule+0xa22/0xaf0
> > > > 14640.428279¨  <00630da6>¨ schedule_timeout+0x186/0x2c0
> > > > 14640.428289¨  <001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > > > 14640.428300¨  <00158c5a>¨ kthread+0xe6/0xec
> > > > 14640.428304¨  <00634de6>¨ kernel_thread_starter+0x6/0xc
> > > > 14640.428308¨  <00634de0>¨ kernel_thread_starter+0x0/0xc
> > > > 14640.428311¨ Last Breaking-Event-Address:
> > > > 14640.428314¨  <0016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > > > 14640.428319¨  list_add corruption. next->prev should be prev
> > > > (0918
> > > > ), but was   (null). (next=  (null)).
> > > 
> > > Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> > > code. This kind of implies a stack corruption
> > > 
> > > > Sometimes, this pops up,
> > > > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > > > 
> > > > or this,
> > > > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > > > 15316.255796¨ XFS (dm-1): Ending clean mount
> > > > 15320.364246¨006367a2: e310b0080004lg
> > > > %r1,8(%r
> > > > 11)
> > > > 15320.364249¨006367a8: 41101010la
> > > > %r1,16(%
> > > > r1)
> > > > 15320.364251¨006367ac: e3301004lg
> > > > %r3,0(%r
> > > > 1)
> > > > 15320.364252¨ Call Trace:
> > > > 15320.364252¨ Last Breaking-Event-Address:
> > > > 15320.364253¨  � <>¨ Kernel stack overflow.
> > > > 15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
> > > > 15320.364309¨ Process rhts-test-runne (pid: 625, task:
> > > > 3dccc890,
> > > > ksp: 0
> > > 
> > >  and there you go - a stack overflow. Your kernel stack size is
> > > too small.
> > > 
> > > I'd suggest that you need 16k stacks on s390 - IIRC every function
> > > call has 128 byte stack frame, and there are call chains 70-80
> > > functions deep in the storage stack...
> > Hmm, I am unsure how to set to 16k stack there
> 
> Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
> kernels only have an 8k stack size, 64 bit kernels are 16k (see
> arch/s390/Makefile).
It is 64-bit.
> 
> $ git grep STACK_SIZE arch/s390 |head -2
> arch/s390/Makefile:STACK_SIZE   := 8192
> arch/s390/Makefile:STACK_SIZE   := 16384
> 
> As it is, the stack frame usage is worse than I thought:
> 
> $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96  /*
> size of minimum stack frame */
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160  /*
> size of minimum stack frame */
> 
> Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
> stack size is going to have big troubles with a 70-80 function deep
> call chain.
> 
> As for powerpc:
> 
> arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
> 
> Yeah, same issue.
> 
> But, seriously, these stack traces are meaningless to anyone not
> familiar with s390 or power7 - they indicate a problem detected
> in the idle loop, not where ever the stack overran.
> 
> Can you please work with the s390/power7 people to obtain whatever
> stack it was that overflowed, and we can go from there.
OK, I'll do.
CAI Qian
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.2: trinity triggered oops

2013-05-22 Thread CAI Qian



- Original Message -
> From: "Greg KH" 
> To: "CAI Qian" 
> Cc: "Li Zefan" , "LKML" , 
> "Dave Jones" ,
> sta...@vger.kernel.org
> Sent: Wednesday, May 22, 2013 11:30:24 PM
> Subject: Re: 3.9.2: trinity triggered oops
> 
> On Wed, May 22, 2013 at 04:40:45PM +0800, Li Zefan wrote:
> > On 2013/5/22 16:31, CAI Qian wrote:
> > > Reproduced on a few systems.
> > > CAI Qian
> > > 
> > > created 375 sockets
> > > Generating file descriptors
> > > Added 45 filenames from /dev
> > > Added 19858 filenames from /proc
> > > Added 11816 filenames from /sys
> > > [1143] Random reseed: 1433907474
> > > trinity(1143): Randomness reseeded to 0x5577b112
> > > trinity: trinity(1143) Randomness reseeded to 0x5577b112
> > > msgrcv (70) returned ENOSYS, marking as inactive.
> > > uselib (134) returned ENOSYS, marking as inactive.
> > > [1143] Random reseed: 801659033
> > > trinity(1143): Randomness reseeded to 0x2fc85899
> > > trinity: trinity(1143) Randomness reseeded to 0x2fc85899
> > > nfsservctl (180) returned ENOSYS, marking as inactive.
> > > kcmp (312) returned ENOSYS, marking as inactive.
> > > [watchdog] 1329 iterations. [F:1158 S:168]
> > > [1143] Random reseed: 715320073
> > > trinity(1143): Randomness reseeded to 0x2aa2eb09
> > > trinity: trinity(1143) Randomness reseeded to 0x2aa2eb09
> > > [watchdog] 3567 iterations. [F:3060 S:506]
> > > [watchdog] 4953 iterations. [F:4255 S:697]
> > > [ 4508.627400] BUG: unable to handle kernel NULL pointer dereference at
> > > 0008
> > > [ 4508.670547] IP: [] newseg+0x102/0x310
> > > [ 4508.698846] PGD 18d827067 PUD 19a85f067 PMD 0
> > > [ 4508.723288] Oops:  [#1] SMP
> > > [ 4508.741135] Modules linked in: ipt_ULOG(F) scsi_transport_iscsi(F)
> > > pppoe(F) pppox(F) ppp_generic(F) slhc(F) af_key(F) nfc(F) af_802154(F)
> > > atm(F) rds(F) btrfs(F) zlib_deflate(F) raid6_pq(F) xor(F) vfat(F) fat(F)
> > > nfsv3(F) nfs_acl(F) nfsv2(F) nfs(F) lockd(F) sunrpc(F) fscache(F)
> > > nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) arc4(F) md4(F)
> > > nls_utf8(F) cifs(F) dns_resolver(F) nf_tproxy_core(F) nls_koi8_u(F)
> > > nls_cp932(F) ts_kmp(F) sctp(F) nf_conntrack_netbios_ns(F)
> > > nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F)
> > > nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F)
> > > nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F)
> > > iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F)
> > > xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F)
> > > ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F)
> > > iTCO_wdt(F) iTCO_vendor_support(F) e1000e(F) bnx2x(F) hpwdt(F) ptp(F)
> > > mdio(F) hpilo(F) serio_raw(F) lpc_ich(F) pps_core(F)!
> >  p!
> > > cspkr(F) mfd_core(F) microcode(F) xfs(F) libcrc32c(F) ata_generic(F)
> > > mgag200(F) pata_acpi(F) i2c_algo_bit(F) sd_mod(F) ata_piix(F)
> > > drm_kms_helper(F) ttm(F) crc_t10dif(F) drm(F) hpsa(F) libata(F)
> > > i2c_core(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last
> > > unloaded: brd]
> > > [ 4509.308340] CPU 3
> > > [ 4509.318654] Pid: 4068, comm: trinity-child2 Tainted: GF
> > > 3.9.2 #1 HP ProLiant DL120 G7
> > > [ 4509.363440] RIP: 0010:[]  []
> > > newseg+0x102/0x310
> > 
> > The fix has already been queued for 3.9.3.
> > 
> > commit 091d0d55b286c9340201b4ed4470be87fc568228
> > ("shm: fix null pointer deref when userspace specifies invalid hugepage
> > size")
> 
> Yes, can you please test 3.9.3 to verify that this is fixed?
Yes, I am never running into this again in 3.9.3 so far. I'll keep an eye
on it though.
CAI Qian
> 
> thanks,
> 
> greg k-h
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.2: xfstests triggered panic

2013-05-22 Thread CAI Qian



- Original Message -
> From: "Dave Chinner" 
> To: "CAI Qian" 
> Cc: "LKML" , sta...@vger.kernel.org, 
> x...@oss.sgi.com
> Sent: Wednesday, May 22, 2013 5:53:00 PM
> Subject: Re: 3.9.2: xfstests triggered panic
> 
> On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > Reproduced on almost all s390x guests by running xfstests.
> > 
> > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > 14634.525522¨ XFS (dm-1): Ending clean mount
> > 14640.413007¨  <0017c6d4>¨ idle_balance+0x1a0/0x340
> > 14640.413010¨  <0063303e>¨ __schedule+0xa22/0xaf0
> > 14640.428279¨  <00630da6>¨ schedule_timeout+0x186/0x2c0
> > 14640.428289¨  <001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > 14640.428300¨  <00158c5a>¨ kthread+0xe6/0xec
> > 14640.428304¨  <00634de6>¨ kernel_thread_starter+0x6/0xc
> > 14640.428308¨  <00634de0>¨ kernel_thread_starter+0x0/0xc
> > 14640.428311¨ Last Breaking-Event-Address:
> > 14640.428314¨  <0016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > 14640.428319¨  list_add corruption. next->prev should be prev
> > (0918
> > ), but was   (null). (next=  (null)).
> 
> Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> code. This kind of implies a stack corruption
> 
> > Sometimes, this pops up,
> > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > 
> > or this,
> > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > 15316.255796¨ XFS (dm-1): Ending clean mount
> > 15320.364246¨006367a2: e310b0080004lg
> > %r1,8(%r
> > 11)
> > 15320.364249¨006367a8: 41101010la
> > %r1,16(%
> > r1)
> > 15320.364251¨006367ac: e3301004lg
> > %r3,0(%r
> > 1)
> > 15320.364252¨ Call Trace:
> > 15320.364252¨ Last Breaking-Event-Address:
> > 15320.364253¨  � <>¨ Kernel stack overflow.
> > 15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
> > 15320.364309¨ Process rhts-test-runne (pid: 625, task: 3dccc890,
> > ksp: 0
> 
>  and there you go - a stack overflow. Your kernel stack size is
> too small.
> 
> I'd suggest that you need 16k stacks on s390 - IIRC every function
> call has 128 byte stack frame, and there are call chains 70-80
> functions deep in the storage stack...
Hmm, I am unsure how to set to 16k stack there, and power 7 has looks
like has the same problem.

[14927.117017] XFS (dm-0): Mounting Filesystem 
[14927.299854] XFS (dm-0): Ending clean mount 
[14927.668909] Unable to handle kernel paging request for data at address 
0x0040 
[14927.668913] Unable to handle kernel paging request for data at address 
0x00f8 
[14927.668914] Unable to handle kernel paging request for data at address 
0x00bb 
[14927.668915] Faulting instruction address: 0xc00d1bd8 
[14927.668916] Faulting instruction address: 0xc00d1bd8 
[14927.668919] Unable to handle kernel paging request for data at address 
0x0018 
[14927.668920] Faulting instruction address: 0xc03d34b8 
[14927.668922] Oops: Kernel access of bad area, sig: 11 [#1] 
[14927.668924] SMP NR_CPUS=1024 NUMA pSeries 
[14927.668927] Modules linked in: binfmt_misc(F) tun(F) ipt_ULOG(F) rds(F) 
scsi_transport_iscsi(F) atm(F) nfc(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) 
af_802154(F) af_key(F) sctp(F) btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) 
nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) 
nfnetlink(F) bluetooth(F) rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) 
dns_resolver(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) 
ts_kmp(F)[14927.668955] Faulting instruction address: 0xc00d1bd8 
 fuse(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) 
ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) 
nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) 
iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) 
xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) 
ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) ehea(F) 
xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) ibmvscsi(F) scsi_transport_srp(F) 
scsi_tgt(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: 
brd] 
[14927.669041] NIP: c00d1bd8 LR: c00d1b94 CTR: c00d7e30 
[14927.669048] REGS: c001fbfb3120 TRAP: 0300   Tainted: GF 
(3.9.3) 
[14927.669053] MSR: 80009032   CR: 2828  XER: 
 
[14927.669069] SOFTE: 0 
[14927.669072] CFAR: c000908c 
[14927.669076] DAR: 00f8, DSISR: 400

3.9.2: xfstests triggered panic

2013-05-22 Thread CAI Qian

%r3,0(%r 
1) 
15320.366518¨ Call Trace: 
15320.366518¨ Last Breaking-Event-Address: 
15320.366519¨  � <>¨ Kernel stack overflow. 
15320.366541¨ CPU: 0 Tainted: GF   W3.9.2 #1 
15320.366542¨ Process rhts-test-runne (pid: 625, task: 3dccc890, ksp: 0 
00037433c78) 
15320.366543¨ Krnl PSW : 0404c0018000 00636796 (do_dat_exception+0x 
26/0x36c) 
15320.366546¨R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA: 
3 
15320.366547¨ Krnl GPRS: e500 00014138 ffbf 000 
000901110 
15320.366548¨001a9fe6 00124438 00020001 000 
1 
15320.366549¨000141d8 001a9fe6 0044 000 
141d8 
15320.366550¨3743 0063df78 00634f24 000 
14028 
15320.366557¨ Krnl Code: 0063678a: b9040082    lgr %r8,%r2 
15320.366560¨0063678e: a729ffbf 

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.2: trinity triggered oops

2013-05-22 Thread CAI Qian

Reproduced on a few systems.
CAI Qian

created 375 sockets 
Generating file descriptors 
Added 45 filenames from /dev 
Added 19858 filenames from /proc 
Added 11816 filenames from /sys 
[1143] Random reseed: 1433907474 
trinity(1143): Randomness reseeded to 0x5577b112 
trinity: trinity(1143) Randomness reseeded to 0x5577b112 
msgrcv (70) returned ENOSYS, marking as inactive. 
uselib (134) returned ENOSYS, marking as inactive. 
[1143] Random reseed: 801659033 
trinity(1143): Randomness reseeded to 0x2fc85899 
trinity: trinity(1143) Randomness reseeded to 0x2fc85899 
nfsservctl (180) returned ENOSYS, marking as inactive. 
kcmp (312) returned ENOSYS, marking as inactive. 
[watchdog] 1329 iterations. [F:1158 S:168] 
[1143] Random reseed: 715320073 
trinity(1143): Randomness reseeded to 0x2aa2eb09 
trinity: trinity(1143) Randomness reseeded to 0x2aa2eb09 
[watchdog] 3567 iterations. [F:3060 S:506] 
[watchdog] 4953 iterations. [F:4255 S:697] 
[ 4508.627400] BUG: unable to handle kernel NULL pointer dereference at 
0008 
[ 4508.670547] IP: [] newseg+0x102/0x310 
[ 4508.698846] PGD 18d827067 PUD 19a85f067 PMD 0  
[ 4508.723288] Oops:  [#1] SMP  
[ 4508.741135] Modules linked in: ipt_ULOG(F) scsi_transport_iscsi(F) pppoe(F) 
pppox(F) ppp_generic(F) slhc(F) af_key(F) nfc(F) af_802154(F) atm(F) rds(F) 
btrfs(F) zlib_deflate(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) 
nfsv2(F) nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) nfnetlink(F) 
bluetooth(F) rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) dns_resolver(F) 
nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) sctp(F) 
nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) 
ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) 
nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) 
iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) 
xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) 
ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) 
iTCO_wdt(F) iTCO_vendor_support(F) e1000e(F) bnx2x(F) hpwdt(F) ptp(F) mdio(F) 
hpilo(F) serio_raw(F) lpc_ich(F) pps_core(F) pcspkr(F) mfd_core(F) microcode(F) 
xfs(F) libcrc32c(F) ata_generic(F) mgag200(F) pata_acpi(F) i2c_algo_bit(F) 
sd_mod(F) ata_piix(F) drm_kms_helper(F) ttm(F) crc_t10dif(F) drm(F) hpsa(F) 
libata(F) i2c_core(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last 
unloaded: brd] 
[ 4509.308340] CPU 3  
[ 4509.318654] Pid: 4068, comm: trinity-child2 Tainted: GF3.9.2 #1 
HP ProLiant DL120 G7 
[ 4509.363440] RIP: 0010:[]  [] 
newseg+0x102/0x310 
[ 4509.401795] RSP: 0018:8801ab009e88  EFLAGS: 00010246 
[ 4509.427958] RAX:  RBX: 8197b240 RCX: 
0009 
[ 4509.463783] RDX: 81d63338 RSI: 1000 RDI: 
1000 
[ 4509.499290] RBP: 8801ab009ed8 R08: 0010 R09: 
001c 
[ 4509.535044] R10:  R11: 000f R12: 
1001 
[ 4509.571219] R13: 8801b8181460 R14: 722cae77 R15: 
7df2570b 
[ 4509.607233] FS:  7fdbc6bf7740() GS:88020f46() 
knlGS: 
[ 4509.648291] CS:  0010 DS:  ES:  CR0: 80050033 
[ 4509.677448] CR2: 0008 CR3: 00019a856000 CR4: 
000407e0 
[ 4509.712725] DR0:  DR1:  DR2: 
 
[ 4509.751540] DR3:  DR6: 0ff0 DR7: 
0400 
[ 4509.789290] Process trinity-child2 (pid: 4068, threadinfo 8801ab008000, 
task 8802007b) 
[ 4509.838164] Stack: 
[ 4509.849066]  001c 0002 375653595300 
0062303735326664 
[ 4509.886160]  913099dc fffe 0001 
8197b2f8 
[ 4509.923131]  0001 8801b23ba6a8 8801ab009f40 
81282adc 
[ 4509.959682] Call Trace: 
[ 4509.971710]  [] ipcget+0x17c/0x1c0 
[ 4509.996384]  [] sys_shmget+0x5a/0x60 
[ 4510.021725]  [] ? shm_security+0x10/0x10 
[ 4510.049611]  [] ? shm_close+0xd0/0xd0 
[ 4510.075500]  [] ? shm_get_unmapped_area+0x20/0x20 
[ 4510.107046]  [] system_call_fastpath+0x16/0x1b 
[ 4510.136619] Code: 00 00 0f 84 e9 00 00 00 45 89 f1 41 c1 e9 1a 45 85 c9 0f 
85 31 01 00 00 8b 05 3b 3d ae 00 48 69 c0 78 70 00 00 48 05 c0 c2 d5 81 <8b> 48 
08 b8 00 10 00 00 4c 89 f2 48 c1 e2 09 48 8d 7d c3 41 b8  
[ 4510.231305] RIP  [] newseg+0x102/0x310 
[ 4510.258711]  RSP  
[ 4510.277488] CR2: 0008 
[watchdog] 7096 iterations. [F:6109 S:986] 
[ 4510.351897] ---[ end trace 4eaee96d0aeec2cb ]--- 
[watchdog] 7117 iterations. [F:6126 S:989] 
[watchdog] pid 4068 hasn't made progress in 30 seconds! (last:1368510503 
now:1368510533 diff:30). Stuck in syscall 29:shmget. Sending SIGKILL. 
[watchdog] pid 4072 hasn't made progress in 30 seconds! (last:1368510505 
now:1368510535 diff:30). Stuck in syscall 29:shmget. Sending SIGKILL. 
[watchdog] pid 4068 hasn'

3.9.2: trinity triggered oops

2013-05-22 Thread CAI Qian

Reproduced on a few systems.
CAI Qian

created 375 sockets 
Generating file descriptors 
Added 45 filenames from /dev 
Added 19858 filenames from /proc 
Added 11816 filenames from /sys 
[1143] Random reseed: 1433907474 
trinity(1143): Randomness reseeded to 0x5577b112 
trinity: trinity(1143) Randomness reseeded to 0x5577b112 
msgrcv (70) returned ENOSYS, marking as inactive. 
uselib (134) returned ENOSYS, marking as inactive. 
[1143] Random reseed: 801659033 
trinity(1143): Randomness reseeded to 0x2fc85899 
trinity: trinity(1143) Randomness reseeded to 0x2fc85899 
nfsservctl (180) returned ENOSYS, marking as inactive. 
kcmp (312) returned ENOSYS, marking as inactive. 
[watchdog] 1329 iterations. [F:1158 S:168] 
[1143] Random reseed: 715320073 
trinity(1143): Randomness reseeded to 0x2aa2eb09 
trinity: trinity(1143) Randomness reseeded to 0x2aa2eb09 
[watchdog] 3567 iterations. [F:3060 S:506] 
[watchdog] 4953 iterations. [F:4255 S:697] 
[ 4508.627400] BUG: unable to handle kernel NULL pointer dereference at 
0008 
[ 4508.670547] IP: [81286682] newseg+0x102/0x310 
[ 4508.698846] PGD 18d827067 PUD 19a85f067 PMD 0  
[ 4508.723288] Oops:  [#1] SMP  
[ 4508.741135] Modules linked in: ipt_ULOG(F) scsi_transport_iscsi(F) pppoe(F) 
pppox(F) ppp_generic(F) slhc(F) af_key(F) nfc(F) af_802154(F) atm(F) rds(F) 
btrfs(F) zlib_deflate(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) 
nfsv2(F) nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) nfnetlink(F) 
bluetooth(F) rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) dns_resolver(F) 
nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) sctp(F) 
nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) 
ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) 
nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) 
iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) 
xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) 
ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) 
iTCO_wdt(F) iTCO_vendor_support(F) e1000e(F) bnx2x(F) hpwdt(F) ptp(F) mdio(F) 
hpilo(F) serio_raw(F) lpc_ich(F) pps_core(F) pcspkr(F) mfd_core(F) microcode(F) 
xfs(F) libcrc32c(F) ata_generic(F) mgag200(F) pata_acpi(F) i2c_algo_bit(F) 
sd_mod(F) ata_piix(F) drm_kms_helper(F) ttm(F) crc_t10dif(F) drm(F) hpsa(F) 
libata(F) i2c_core(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last 
unloaded: brd] 
[ 4509.308340] CPU 3  
[ 4509.318654] Pid: 4068, comm: trinity-child2 Tainted: GF3.9.2 #1 
HP ProLiant DL120 G7 
[ 4509.363440] RIP: 0010:[81286682]  [81286682] 
newseg+0x102/0x310 
[ 4509.401795] RSP: 0018:8801ab009e88  EFLAGS: 00010246 
[ 4509.427958] RAX:  RBX: 8197b240 RCX: 
0009 
[ 4509.463783] RDX: 81d63338 RSI: 1000 RDI: 
1000 
[ 4509.499290] RBP: 8801ab009ed8 R08: 0010 R09: 
001c 
[ 4509.535044] R10:  R11: 000f R12: 
1001 
[ 4509.571219] R13: 8801b8181460 R14: 722cae77 R15: 
7df2570b 
[ 4509.607233] FS:  7fdbc6bf7740() GS:88020f46() 
knlGS: 
[ 4509.648291] CS:  0010 DS:  ES:  CR0: 80050033 
[ 4509.677448] CR2: 0008 CR3: 00019a856000 CR4: 
000407e0 
[ 4509.712725] DR0:  DR1:  DR2: 
 
[ 4509.751540] DR3:  DR6: 0ff0 DR7: 
0400 
[ 4509.789290] Process trinity-child2 (pid: 4068, threadinfo 8801ab008000, 
task 8802007b) 
[ 4509.838164] Stack: 
[ 4509.849066]  001c 0002 375653595300 
0062303735326664 
[ 4509.886160]  913099dc fffe 0001 
8197b2f8 
[ 4509.923131]  0001 8801b23ba6a8 8801ab009f40 
81282adc 
[ 4509.959682] Call Trace: 
[ 4509.971710]  [81282adc] ipcget+0x17c/0x1c0 
[ 4509.996384]  [81286f0a] sys_shmget+0x5a/0x60 
[ 4510.021725]  [81286580] ? shm_security+0x10/0x10 
[ 4510.049611]  [81286570] ? shm_close+0xd0/0xd0 
[ 4510.075500]  [812863a0] ? shm_get_unmapped_area+0x20/0x20 
[ 4510.107046]  [816189d9] system_call_fastpath+0x16/0x1b 
[ 4510.136619] Code: 00 00 0f 84 e9 00 00 00 45 89 f1 41 c1 e9 1a 45 85 c9 0f 
85 31 01 00 00 8b 05 3b 3d ae 00 48 69 c0 78 70 00 00 48 05 c0 c2 d5 81 8b 48 
08 b8 00 10 00 00 4c 89 f2 48 c1 e2 09 48 8d 7d c3 41 b8  
[ 4510.231305] RIP  [81286682] newseg+0x102/0x310 
[ 4510.258711]  RSP 8801ab009e88 
[ 4510.277488] CR2: 0008 
[watchdog] 7096 iterations. [F:6109 S:986] 
[ 4510.351897] ---[ end trace 4eaee96d0aeec2cb ]--- 
[watchdog] 7117 iterations. [F:6126 S:989] 
[watchdog] pid 4068 hasn't made progress in 30 seconds! (last:1368510503 
now:1368510533 diff:30). Stuck in syscall 29:shmget. Sending SIGKILL. 
[watchdog

3.9.2: xfstests triggered panic

2013-05-22 Thread CAI Qian

¨  � ¨ Kernel stack overflow. 
15320.366541¨ CPU: 0 Tainted: GF   W3.9.2 #1 
15320.366542¨ Process rhts-test-runne (pid: 625, task: 3dccc890, ksp: 0 
00037433c78) 
15320.366543¨ Krnl PSW : 0404c0018000 00636796 (do_dat_exception+0x 
26/0x36c) 
15320.366546¨R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA: 
3 
15320.366547¨ Krnl GPRS: e500 00014138 ffbf 000 
000901110 
15320.366548¨001a9fe6 00124438 00020001 000 
1 
15320.366549¨000141d8 001a9fe6 0044 000 
141d8 
15320.366550¨3743 0063df78 00634f24 000 
14028 
15320.366557¨ Krnl Code: 0063678a: b9040082lgr %r8,%r2 
15320.366560¨0063678e: a729ffbf 

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.2: xfstests triggered panic

2013-05-22 Thread CAI Qian



- Original Message -
 From: Dave Chinner da...@fromorbit.com
 To: CAI Qian caiq...@redhat.com
 Cc: LKML linux-kernel@vger.kernel.org, sta...@vger.kernel.org, 
 x...@oss.sgi.com
 Sent: Wednesday, May 22, 2013 5:53:00 PM
 Subject: Re: 3.9.2: xfstests triggered panic
 
 On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
  Reproduced on almost all s390x guests by running xfstests.
  
  14634.396658¨ XFS (dm-1): Mounting Filesystem
  14634.525522¨ XFS (dm-1): Ending clean mount
  14640.413007¨  0017c6d4¨ idle_balance+0x1a0/0x340
  14640.413010¨  0063303e¨ __schedule+0xa22/0xaf0
  14640.428279¨  00630da6¨ schedule_timeout+0x186/0x2c0
  14640.428289¨  001cf864¨ rcu_gp_kthread+0x1bc/0x298
  14640.428300¨  00158c5a¨ kthread+0xe6/0xec
  14640.428304¨  00634de6¨ kernel_thread_starter+0x6/0xc
  14640.428308¨  00634de0¨ kernel_thread_starter+0x0/0xc
  14640.428311¨ Last Breaking-Event-Address:
  14640.428314¨  0016bd76¨ walk_tg_tree_from+0x3a/0xf4
  14640.428319¨  list_add corruption. next-prev should be prev
  (0918
  ), but was   (null). (next=  (null)).
 
 Where's XFS in this? walk_tg_tree_from() is part of the scheduler
 code. This kind of implies a stack corruption
 
  Sometimes, this pops up,
  [16907.275002] WARNING: at kernel/rcutree.c:1960
  
  or this,
  15316.154171¨ XFS (dm-1): Mounting Filesystem
  15316.255796¨ XFS (dm-1): Ending clean mount
  15320.364246¨006367a2: e310b0080004lg
  %r1,8(%r
  11)
  15320.364249¨006367a8: 41101010la
  %r1,16(%
  r1)
  15320.364251¨006367ac: e3301004lg
  %r3,0(%r
  1)
  15320.364252¨ Call Trace:
  15320.364252¨ Last Breaking-Event-Address:
  15320.364253¨  � ¨ Kernel stack overflow.
  15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
  15320.364309¨ Process rhts-test-runne (pid: 625, task: 3dccc890,
  ksp: 0
 
  and there you go - a stack overflow. Your kernel stack size is
 too small.
 
 I'd suggest that you need 16k stacks on s390 - IIRC every function
 call has 128 byte stack frame, and there are call chains 70-80
 functions deep in the storage stack...
Hmm, I am unsure how to set to 16k stack there, and power 7 has looks
like has the same problem.

[14927.117017] XFS (dm-0): Mounting Filesystem 
[14927.299854] XFS (dm-0): Ending clean mount 
[14927.668909] Unable to handle kernel paging request for data at address 
0x0040 
[14927.668913] Unable to handle kernel paging request for data at address 
0x00f8 
[14927.668914] Unable to handle kernel paging request for data at address 
0x00bb 
[14927.668915] Faulting instruction address: 0xc00d1bd8 
[14927.668916] Faulting instruction address: 0xc00d1bd8 
[14927.668919] Unable to handle kernel paging request for data at address 
0x0018 
[14927.668920] Faulting instruction address: 0xc03d34b8 
[14927.668922] Oops: Kernel access of bad area, sig: 11 [#1] 
[14927.668924] SMP NR_CPUS=1024 NUMA pSeries 
[14927.668927] Modules linked in: binfmt_misc(F) tun(F) ipt_ULOG(F) rds(F) 
scsi_transport_iscsi(F) atm(F) nfc(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) 
af_802154(F) af_key(F) sctp(F) btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) 
nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) 
nfnetlink(F) bluetooth(F) rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) 
dns_resolver(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) 
ts_kmp(F)[14927.668955] Faulting instruction address: 0xc00d1bd8 
 fuse(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) 
ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) 
nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) 
iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) 
xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) 
ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) ehea(F) 
xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) ibmvscsi(F) scsi_transport_srp(F) 
scsi_tgt(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: 
brd] 
[14927.669041] NIP: c00d1bd8 LR: c00d1b94 CTR: c00d7e30 
[14927.669048] REGS: c001fbfb3120 TRAP: 0300   Tainted: GF 
(3.9.3) 
[14927.669053] MSR: 80009032 SF,EE,ME,IR,DR,RI  CR: 2828  XER: 
 
[14927.669069] SOFTE: 0 
[14927.669072] CFAR: c000908c 
[14927.669076] DAR: 00f8, DSISR: 4000 
[14927.669080] TASK = c001fbf14880[0] 'swapper/2' THREAD: c001fbfb 
CPU: 2 
GPR00: c00d1b94 c001fbfb33a0 c10f3038 0d939e66add6  
GPR04:  0001001651f2 0099 c0af3038  
GPR08: c1163038 0002 00b8 000c3420953d115d  
GPR12: 4822 ced90800 c001fbfb3f90 0eee7bc0  
GPR16

Re: 3.9.2: trinity triggered oops

2013-05-22 Thread CAI Qian

- Original Message -
 From: Greg KH gre...@linuxfoundation.org
 To: CAI Qian caiq...@redhat.com
 Cc: Li Zefan lize...@huawei.com, LKML linux-kernel@vger.kernel.org, 
 Dave Jones da...@redhat.com,
 sta...@vger.kernel.org
 Sent: Wednesday, May 22, 2013 11:30:24 PM
 Subject: Re: 3.9.2: trinity triggered oops

 On Wed, May 22, 2013 at 04:40:45PM +0800, Li Zefan wrote:
  On 2013/5/22 16:31, CAI Qian wrote:
   Reproduced on a few systems.
   CAI Qian

   created 375 sockets
   Generating file descriptors
   Added 45 filenames from /dev
   Added 19858 filenames from /proc
   Added 11816 filenames from /sys
   [1143] Random reseed: 1433907474
   trinity(1143): Randomness reseeded to 0x5577b112
   trinity: trinity(1143) Randomness reseeded to 0x5577b112
   msgrcv (70) returned ENOSYS, marking as inactive.
   uselib (134) returned ENOSYS, marking as inactive.
   [1143] Random reseed: 801659033
   trinity(1143): Randomness reseeded to 0x2fc85899
   trinity: trinity(1143) Randomness reseeded to 0x2fc85899
   nfsservctl (180) returned ENOSYS, marking as inactive.
   kcmp (312) returned ENOSYS, marking as inactive.
   [watchdog] 1329 iterations. [F:1158 S:168]
   [1143] Random reseed: 715320073
   trinity(1143): Randomness reseeded to 0x2aa2eb09
   trinity: trinity(1143) Randomness reseeded to 0x2aa2eb09
   [watchdog] 3567 iterations. [F:3060 S:506]
   [watchdog] 4953 iterations. [F:4255 S:697]
   [ 4508.627400] BUG: unable to handle kernel NULL pointer dereference at
   0008
   [ 4508.670547] IP: [81286682] newseg+0x102/0x310
   [ 4508.698846] PGD 18d827067 PUD 19a85f067 PMD 0
   [ 4508.723288] Oops:  [#1] SMP
   [ 4508.741135] Modules linked in: ipt_ULOG(F) scsi_transport_iscsi(F)
   pppoe(F) pppox(F) ppp_generic(F) slhc(F) af_key(F) nfc(F) af_802154(F)
   atm(F) rds(F) btrfs(F) zlib_deflate(F) raid6_pq(F) xor(F) vfat(F) fat(F)
   nfsv3(F) nfs_acl(F) nfsv2(F) nfs(F) lockd(F) sunrpc(F) fscache(F)
   nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) arc4(F) md4(F)
   nls_utf8(F) cifs(F) dns_resolver(F) nf_tproxy_core(F) nls_koi8_u(F)
   nls_cp932(F) ts_kmp(F) sctp(F) nf_conntrack_netbios_ns(F)
   nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F)
   nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F)
   nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F)
   iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F)
   xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F)
   ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F)
   iTCO_wdt(F) iTCO_vendor_support(F) e1000e(F) bnx2x(F) hpwdt(F) ptp(F)
   mdio(F) hpilo(F) serio_raw(F) lpc_ich(F) pps_core(F)!
   p!
   cspkr(F) mfd_core(F) microcode(F) xfs(F) libcrc32c(F) ata_generic(F)
   mgag200(F) pata_acpi(F) i2c_algo_bit(F) sd_mod(F) ata_piix(F)
   drm_kms_helper(F) ttm(F) crc_t10dif(F) drm(F) hpsa(F) libata(F)
   i2c_core(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last
   unloaded: brd]
   [ 4509.308340] CPU 3
   [ 4509.318654] Pid: 4068, comm: trinity-child2 Tainted: GF
   3.9.2 #1 HP ProLiant DL120 G7
   [ 4509.363440] RIP: 0010:[81286682]  [81286682]
   newseg+0x102/0x310

  The fix has already been queued for 3.9.3.

  commit 091d0d55b286c9340201b4ed4470be87fc568228
  (shm: fix null pointer deref when userspace specifies invalid hugepage
  size)

 Yes, can you please test 3.9.3 to verify that this is fixed?
Yes, I am never running into this again in 3.9.3 so far. I'll keep an eye
on it though.
CAI Qian

 thanks,

 greg k-h

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.2: xfstests triggered panic

2013-05-22 Thread CAI Qian



- Original Message -
 From: Dave Chinner da...@fromorbit.com
 To: CAI Qian caiq...@redhat.com
 Cc: LKML linux-kernel@vger.kernel.org, sta...@vger.kernel.org, 
 x...@oss.sgi.com
 Sent: Thursday, May 23, 2013 11:46:11 AM
 Subject: Re: 3.9.2: xfstests triggered panic
 
 On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
  - Original Message -
   From: Dave Chinner da...@fromorbit.com
   To: CAI Qian caiq...@redhat.com
   Cc: LKML linux-kernel@vger.kernel.org, sta...@vger.kernel.org,
   x...@oss.sgi.com
   Sent: Wednesday, May 22, 2013 5:53:00 PM
   Subject: Re: 3.9.2: xfstests triggered panic
   
   On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
Reproduced on almost all s390x guests by running xfstests.

14634.396658¨ XFS (dm-1): Mounting Filesystem
14634.525522¨ XFS (dm-1): Ending clean mount
14640.413007¨  0017c6d4¨ idle_balance+0x1a0/0x340
14640.413010¨  0063303e¨ __schedule+0xa22/0xaf0
14640.428279¨  00630da6¨ schedule_timeout+0x186/0x2c0
14640.428289¨  001cf864¨ rcu_gp_kthread+0x1bc/0x298
14640.428300¨  00158c5a¨ kthread+0xe6/0xec
14640.428304¨  00634de6¨ kernel_thread_starter+0x6/0xc
14640.428308¨  00634de0¨ kernel_thread_starter+0x0/0xc
14640.428311¨ Last Breaking-Event-Address:
14640.428314¨  0016bd76¨ walk_tg_tree_from+0x3a/0xf4
14640.428319¨  list_add corruption. next-prev should be prev
(0918
), but was   (null). (next=  (null)).
   
   Where's XFS in this? walk_tg_tree_from() is part of the scheduler
   code. This kind of implies a stack corruption
   
Sometimes, this pops up,
[16907.275002] WARNING: at kernel/rcutree.c:1960

or this,
15316.154171¨ XFS (dm-1): Mounting Filesystem
15316.255796¨ XFS (dm-1): Ending clean mount
15320.364246¨006367a2: e310b0080004lg
%r1,8(%r
11)
15320.364249¨006367a8: 41101010la
%r1,16(%
r1)
15320.364251¨006367ac: e3301004lg
%r3,0(%r
1)
15320.364252¨ Call Trace:
15320.364252¨ Last Breaking-Event-Address:
15320.364253¨  � ¨ Kernel stack overflow.
15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
15320.364309¨ Process rhts-test-runne (pid: 625, task:
3dccc890,
ksp: 0
   
    and there you go - a stack overflow. Your kernel stack size is
   too small.
   
   I'd suggest that you need 16k stacks on s390 - IIRC every function
   call has 128 byte stack frame, and there are call chains 70-80
   functions deep in the storage stack...
  Hmm, I am unsure how to set to 16k stack there
 
 Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
 kernels only have an 8k stack size, 64 bit kernels are 16k (see
 arch/s390/Makefile).
It is 64-bit.
 
 $ git grep STACK_SIZE arch/s390 |head -2
 arch/s390/Makefile:STACK_SIZE   := 8192
 arch/s390/Makefile:STACK_SIZE   := 16384
 
 As it is, the stack frame usage is worse than I thought:
 
 $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
 arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96  /*
 size of minimum stack frame */
 arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160  /*
 size of minimum stack frame */
 
 Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
 stack size is going to have big troubles with a 70-80 function deep
 call chain.
 
 As for powerpc:
 
 arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
 
 Yeah, same issue.
 
 But, seriously, these stack traces are meaningless to anyone not
 familiar with s390 or power7 - they indicate a problem detected
 in the idle loop, not where ever the stack overran.
 
 Can you please work with the s390/power7 people to obtain whatever
 stack it was that overflowed, and we can go from there.
OK, I'll do.
CAI Qian
 
 Cheers,
 
 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-22 Thread CAI Qian

Original report:
http://oss.sgi.com/archives/xfs/2013-05/msg00683.html

Also seen on Power7:
http://marc.info/?l=linux-kernelm=136927904900692w=2

CAI Qian

- Original Message -
 From: Dave Chinner da...@fromorbit.com
 To: CAI Qian caiq...@redhat.com
 Cc: LKML linux-kernel@vger.kernel.org, sta...@vger.kernel.org, 
 x...@oss.sgi.com
 Sent: Thursday, May 23, 2013 11:46:11 AM
 Subject: Re: 3.9.2: xfstests triggered panic
 
 On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
  - Original Message -
   From: Dave Chinner da...@fromorbit.com
   To: CAI Qian caiq...@redhat.com
   Cc: LKML linux-kernel@vger.kernel.org, sta...@vger.kernel.org,
   x...@oss.sgi.com
   Sent: Wednesday, May 22, 2013 5:53:00 PM
   Subject: Re: 3.9.2: xfstests triggered panic
   
   On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
Reproduced on almost all s390x guests by running xfstests.

14634.396658¨ XFS (dm-1): Mounting Filesystem
14634.525522¨ XFS (dm-1): Ending clean mount
14640.413007¨  0017c6d4¨ idle_balance+0x1a0/0x340
14640.413010¨  0063303e¨ __schedule+0xa22/0xaf0
14640.428279¨  00630da6¨ schedule_timeout+0x186/0x2c0
14640.428289¨  001cf864¨ rcu_gp_kthread+0x1bc/0x298
14640.428300¨  00158c5a¨ kthread+0xe6/0xec
14640.428304¨  00634de6¨ kernel_thread_starter+0x6/0xc
14640.428308¨  00634de0¨ kernel_thread_starter+0x0/0xc
14640.428311¨ Last Breaking-Event-Address:
14640.428314¨  0016bd76¨ walk_tg_tree_from+0x3a/0xf4
14640.428319¨  list_add corruption. next-prev should be prev
(0918
), but was   (null). (next=  (null)).
   
   Where's XFS in this? walk_tg_tree_from() is part of the scheduler
   code. This kind of implies a stack corruption
   
Sometimes, this pops up,
[16907.275002] WARNING: at kernel/rcutree.c:1960

or this,
15316.154171¨ XFS (dm-1): Mounting Filesystem
15316.255796¨ XFS (dm-1): Ending clean mount
15320.364246¨006367a2: e310b0080004lg
%r1,8(%r
11)
15320.364249¨006367a8: 41101010la
%r1,16(%
r1)
15320.364251¨006367ac: e3301004lg
%r3,0(%r
1)
15320.364252¨ Call Trace:
15320.364252¨ Last Breaking-Event-Address:
15320.364253¨  � ¨ Kernel stack overflow.
15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
15320.364309¨ Process rhts-test-runne (pid: 625, task:
3dccc890,
ksp: 0
   
    and there you go - a stack overflow. Your kernel stack size is
   too small.
   
   I'd suggest that you need 16k stacks on s390 - IIRC every function
   call has 128 byte stack frame, and there are call chains 70-80
   functions deep in the storage stack...
  Hmm, I am unsure how to set to 16k stack there
 
 Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
 kernels only have an 8k stack size, 64 bit kernels are 16k (see
 arch/s390/Makefile).
 
 $ git grep STACK_SIZE arch/s390 |head -2
 arch/s390/Makefile:STACK_SIZE   := 8192
 arch/s390/Makefile:STACK_SIZE   := 16384
 
 As it is, the stack frame usage is worse than I thought:
 
 $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
 arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96  /*
 size of minimum stack frame */
 arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160  /*
 size of minimum stack frame */
 
 Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
 stack size is going to have big troubles with a 70-80 function deep
 call chain.
 
 As for powerpc:
 
 arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
 
 Yeah, same issue.
 
 But, seriously, these stack traces are meaningless to anyone not
 familiar with s390 or power7 - they indicate a problem detected
 in the idle loop, not where ever the stack overran.
 
 Can you please work with the s390/power7 people to obtain whatever
 stack it was that overflowed, and we can go from there.
 
 Cheers,
 
 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.0: WARNING: at drivers/base/core.c:575

2013-05-09 Thread CAI Qian



- Original Message -
> From: "Borislav Petkov" 
> To: "CAI Qian" 
> Cc: "Srivatsa S. Bhat" , "LKML" 
> , mche...@redhat.com,
> "Greg KH" , ba...@ti.com
> Sent: Tuesday, May 7, 2013 9:28:29 PM
> Subject: Re: 3.9.0: WARNING: at drivers/base/core.c:575
> 
> On Tue, May 07, 2013 at 01:52:54PM +0530, Srivatsa S. Bhat wrote:
> > For the x86-64 case, does the patch posted here fix the issue?
> > http://marc.info/?l=linux-edac=136731542432210=2
> 
> CAI, can I have your Tested-by before I pick it up?
Sure, and the system is allocated to someone else at the moment, so
I'll test it out as soon as possible.
CAI Qian
> 
> Thanks.
> 
> --
> Regards/Gruss,
> Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.0: WARNING: at drivers/base/core.c:575

2013-05-09 Thread CAI Qian

- Original Message -
 From: Borislav Petkov b...@alien8.de
 To: CAI Qian caiq...@redhat.com
 Cc: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com, LKML 
 linux-kernel@vger.kernel.org, mche...@redhat.com,
 Greg KH gre...@linuxfoundation.org, ba...@ti.com
 Sent: Tuesday, May 7, 2013 9:28:29 PM
 Subject: Re: 3.9.0: WARNING: at drivers/base/core.c:575

 On Tue, May 07, 2013 at 01:52:54PM +0530, Srivatsa S. Bhat wrote:
  For the x86-64 case, does the patch posted here fix the issue?
  http://marc.info/?l=linux-edacm=136731542432210w=2

 CAI, can I have your Tested-by before I pick it up?
Sure, and the system is allocated to someone else at the moment, so
I'll test it out as soon as possible.
CAI Qian

 Thanks.

 --
 Regards/Gruss,
 Boris.

 Sent from a fat crate under my desk. Formatting is fine.
 --

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.0: WARNING: at drivers/base/core.c:575

2013-05-07 Thread CAI Qian

OK, also saw it on x64,

[   18.305143] WARNING: at drivers/base/core.c:575 
device_create_file+0x82/0xa0() 
[   18.313208] Write permission without 'store' 
[   18.317985] Modules linked in: i5000_edac(F+) iTCO_vendor_support(F) 
coretemp(F) edac_core(F) kvm_intel(F) lpc_ich(F) kvm(F) i5k_amb(F) ptp(F) 
i2c_i801(F) shpchp(F) mfd_core(F) pps_core(F) microcode(F) pcspkr(F) xfs(F) 
libcrc32c(F) sr_mod(F) sd_mod(F) crc_t10dif(F) cdrom(F) ata_generic(F) 
pata_acpi(F) mgag200(F) syscopyarea(F) sysfillrect(F) sysimgblt(F) ata_piix(F) 
i2c_algo_bit(F) drm_kms_helper(F) ttm(F) drm(F) libata(F) i2c_core(F) 
dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) 
[   18.366804] CPU: 0 PID: 355 Comm: systemd-udevd Tainted: GF
3.9.0+ #1 
[   18.375065] Hardware name: NEC Express800/120Rg-1 [N8100-1242]/MS-9172-02S, 
BIOS 1.0.5S42 10/05/2006 
[   18.385360]  0009 880069917938 815f105e 
880069917978 
[   18.393800]  8104e4f0 8800663099c8 a03304a0 
8800661cea40 
[   18.402136] Rquest for unknown module key 'Magrathea: Glacier signing key: 
cf3c1f5ed7be276234c447dbf4a4a5a83a249db4' err -11 
[   18.402241]   8800661cea40  
8800699179d8 
[   18.402242] Call Trace: 
[   18.402247]  [ dump_stack+0x19/0x1b 
[   18.402251]  [] warn_slowpath_common+0x70/0xa0 
[   18.402252]  [] warn_slowpath_fmt+0x46/0x50 
[   18.402254]  [] device_create_file+0x82/0xa0 
[   18.402260]  [] edac_create_sysfs_mci_device+0x3b8/0x550 
[edac_core] 
[   18.402263]  [] edac_mc_add_mc+0xf4/0x260 [edac_core] 
[   18.402265]  [] i5000_probe1+0x859/0xb20 [i5000_edac] 
[   18.402267]  [] i5000_init_one+0x31/0x40 [i5000_edac] 
[   18.402271]  [] local_pci_probe+0x4b/0x80 
[   18.402272]  [] pci_device_probe+0x111/0x120 
[   18.402275]  [] driver_probe_device+0x8b/0x390 
[   18.402276]  [] __driver_attach+0xab/0xb0 
[   18.402278]  [] ? driver_probe_device+0x390/0x390 
[   .402279]  [] bus_for_each_dev+0x5d/0xa0 
[   18.402281]  [] driver_attach+0x1e/0x20 
[   18.402282]  [] bus_add_driver+0x11e/0x2a0 
[   18.402284]  [] ? 0xa002efff 
[   18.402285]  [] ? 0xa002efff 
[   18.402286]  [] driver_register+0x77/0x170 
[   18.402288]  [] ? 0xa002efff 
[   18.402289]  [] ? 0xa002efff 
[   18.402 [] __pci_register_driver+0x4c/0x50 
[   18.402292]  [] i5000_init+0x31/0x1000 [i5000_edac] 
[   18.402295]  [] do_one_initcall+0xea/0x1a0 
[   18.402298]  [] le+0x11e4/0x1b00 
[   18.402300]  [] ? ddebug_proc_open+0xc0/0xc0 
[   18.402303]  [] ? page_fault+0x22/0x30 
[   18.402305]  [] SyS_init_module+0xd7/0x120 
[   18.402307]  [] sysall_fastpath+0x16/0x1b 
[   18.402308] ---[ end trace 3eb6a6f51fb8cafb ]--- 
[   18.402406] [ cut here ] 

- Original Message -
> From: "CAI Qian" 
> To: "LKML" 
> Sent: Tuesday, May 7, 2013 3:25:58 PM
> Subject: 3.9.0: WARNING: at drivers/base/core.c:575
> 
> Never saw any of those messages were floating in any of the RC testing, but
> now happened in 3.9 GA on Power 7 systems.
> 
> [0.329753] EEH: devices created
> [0.340203] atomic64 test passed
> [0.340407] NET: Registered protocol family 16
> [0.340457] EEH: No capable adapters found
> [0.340609] IBM eBus Device Driver
> [0.358825] Write permission without 'store'
> [0.358852] [ cut here ]
> [0.358859] WARNING: at drivers/base/core.c:575
> [0.358866] Modules linked in:
> [0.358877] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1
> [0.358887] task: c003f8bc ti: c003f8c0 task.ti:
> c003f8c0
> [0.358897] NIP: c04a7f44 LR: c04a7f40 CTR:
> 01766760
> [0.358906] REGS: c003f8c03780 TRAP: 0700   Not tainted  (3.9.0+)
> [0.358914] MSR: 80029032   CR: 28139d24  XER:
> 0001
> [0.358939] SOFTE: 1
> [0.358944] CFAR: c074427c
> [0.358950]
> GPR00: c04a7f40 c003f8c03a00 c1117e50 0020
> GPR04:   160c9615 
> GPR08: c1064af0   3fef
> GPR12: 28139d22 c7f0 c000bb30 
> GPR16:   c0ff1c78 0001
> GPR20:    c117eb08
> GPR24: 00a6 c0af0028  c0ff21a0
> GPR28: 0002 c1500030 c11b8200 c1500030
> [0.359070] NIP [c04a7f44] .device_create_file+0xa4/0xe0
> [0.359080] LR [c04a7f40] .device_create_file+0xa0/0xe0
> [0.359088] Call Trace:
> [0.359095] [c003f8c03a00] [c04a7f40]
> .device_create_file+0xa0/0xe0 (unreliable)
> [0.359112] [c003f

3.9.0: WARNING: at drivers/base/core.c:575

2013-05-07 Thread CAI Qian

Never saw any of those messages were floating in any of the RC testing, but
now happened in 3.9 GA on Power 7 systems.

[0.329753] EEH: devices created
[0.340203] atomic64 test passed
[0.340407] NET: Registered protocol family 16
[0.340457] EEH: No capable adapters found
[0.340609] IBM eBus Device Driver
[0.358825] Write permission without 'store'
[0.358852] [ cut here ]
[0.358859] WARNING: at drivers/base/core.c:575
[0.358866] Modules linked in:
[0.358877] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1
[0.358887] task: c003f8bc ti: c003f8c0 task.ti: 
c003f8c0
[0.358897] NIP: c04a7f44 LR: c04a7f40 CTR: 01766760
[0.358906] REGS: c003f8c03780 TRAP: 0700   Not tainted  (3.9.0+)
[0.358914] MSR: 80029032   CR: 28139d24  XER: 
0001
[0.358939] SOFTE: 1
[0.358944] CFAR: c074427c
[0.358950] 
GPR00: c04a7f40 c003f8c03a00 c1117e50 0020 
GPR04:   160c9615  
GPR08: c1064af0   3fef 
GPR12: 28139d22 c7f0 c000bb30  
GPR16:   c0ff1c78 0001 
GPR20:    c117eb08 
GPR24: 00a6 c0af0028  c0ff21a0 
GPR28: 0002 c1500030 c11b8200 c1500030 
[0.359070] NIP [c04a7f44] .device_create_file+0xa4/0xe0
[0.359080] LR [c04a7f40] .device_create_file+0xa0/0xe0
[0.359088] Call Trace:
[0.359095] [c003f8c03a00] [c04a7f40] 
.device_create_file+0xa0/0xe0 (unreliable)
[0.359112] [c003f8c03a90] [c074daf8] 
.register_cpu_online+0x21c/0x284
[0.359126] [c003f8c03b30] [c0a78308] .topology_init+0x130/0x200
[0.359138] [c003f8c03c10] [c000b4d4] 
.do_one_initcall+0x144/0x1f0
[0.359150] [c003f8c03cd0] [c0a746a8] 
.kernel_init_freeable+0x23c/0x32c
[0.359163] [c003f8c03db0] [c000bb4c] .kernel_init+0x1c/0x130
[0.359175] [c003f8c03e30] [c000a164] 
.ret_from_kernel_thread+0x64/0x80
[0.359184] Instruction dump:
[0.359190] 3863e610 4829c2f1 6000 0fe0 e8810070 4bb0 6000 
3c62ff8c 
[0.359211] f8810070 3863e5e8 4829c2cd 6000 <0fe0> e8810070 a1240008 
4b80 
[0.359241] ---[ end trace c949de754c984735 ]---

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.0: system deadlock at semctl running selinux testsuite

2013-05-06 Thread CAI Qian

8 818cbfd8 818cbf08 
8100ac1e 
[ 3242.675012]  818cbf58 8109eae9 818cbfd8 
81a8e2e0 
[ 3242.675012]  88007ffa4780  81a86020 
81a8e2e0 
[ 3242.675012] Call Trace: 
[ 3242.675012]  [] arch_cpu_idle+0x1e/0x30 
[ 3242.675012]  [] cpu_startup_entry+0x89/0x210 
[ 3242.675012]  [] rest_init+0x77/0x80 
[ 3242.675012]  [] start_kernel+0x3f9/0x406 
[ 3242.675012]  [] ? repair_env_string+0x5e/0x5e 
[ 3242.675012]  [] x86_64_start_reservations+0x2a/0x2c 
[ 3242.675012]  [] x86_64_start_kernel+0xce/0xd2 
[ 3242.675012] Code: 89 e5 e8 48 eb 06 00 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 
00 55 48 89 e5 41 54 65 44 8b 24 25 1c b0 00 00 53 0f 1f 44 00 00 fb f4 <65> 44 
8b 24 25 1c b0 00 00 0f 1f 44 00 00 5b 41 5c 5d c3 90 e8 

CAI Qian 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.0: panic during boot at tick_do_broadcast

2013-05-06 Thread CAI Qian

Never saw any of those during testing of all 3.9 rc releases.

[1.023422] Intel PMU driver. 
[1.025859] perf_event_intel: PEBS disabled due to CPU errata, please 
upgrade microcode 
[1.032534] ... version:3 
[1.036078] ... bit width:  48 
[1.039506] ... generic registers:  8 
[1.042856] ... value mask:  
[1.047382] ... max period: 7fff 
[1.052229] ... fixed-purpose events:   3 
[1.055641] ... event mask: 000700ff 
[1.065070] smpboot: Booting Node   0, Processors  #1 
[1.082422] BUG: unable to handle kernel NULL pointer dereference at 
0048 
[1.089361] IP: [] tick_do_broadcast+0x6f/0xb0 
[1.094455] PGD 0  
[1.096126] Oops:  [#1] SMP  
[1.098793] Modules linked in: 
[1.101464] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1 
[1.106487] Hardware nt DL120 G7, BIOS J01 02/01/2012 
[1.510871] task: 8802041c8000 ti: 8802041c2000 task.ti: 
8802041c2000 
[1.517310] RIP: 0010:[]  [] 
tick_do_broadcast+0x6f/0xb0 
[1.524408] RSP: :88020f403d10  EFLAGS: 00010002 
[1.528867] RAX:  RBX: 88020705a0d0 RCX: 
0001 
[1.535019] RDX: 0001 RSI: 0040 RDI: 
88020705a0d0 
[1.540963] RBP: 88020f403d20 R08: 0002 R09: 
0001 
[1.546980] R10: 0002 R11: b37c R12: 
d9a0 
[1.553124] R13:  R14:  R15: 
0001 
[1.559060] FS:  () GS:88020f40() 
knlGS: 
[1.566007] CS:  0010 DS:  ES:  CR0: 80050033 
[1.570771] CR2: 0048 CR3: 018d5000 CR4: 
000407f0 
[1.576792] DR0:  DR1:  DR2: 
 
[1.582933] DR3:  DR6: 0ff0 DR7: 
0400 
[1.50] Stack: 
[1.590563]  818dca80 818e8400 88020f403d30 
810a75b1 
[1.596932]  88020f403d50 810a75d4 0  
[2.002208]  88020f403d60 81004935 88020f403db0 
810df624 
[2.008450] Call Trace: 
[2.010471]
[2.012057]  [] tick_do_periodic_broadcast+0x41/0x50 
[2.017847]  [] tick_handle_periodic_broadcast+0x14/0x50 
[2.023640]  [] timer_interrupt+0x15/0x20 
[2.028291]  [] handle_irq_event_percpu+0x54/0x1e0 
[2.033810]  [] ? task_tick_fair+0x16e/0x550 
[2.038626]  [] handle_irq_event+0x42/0x70 
[2.043418]  [] handle_edge_irq+0x6f/0x110 
[2.048508]  [] handle_irq+0xbf/0x150 
[2.052945]  [] ? irq_enter+0x51/0x90 
[2.057362]  [] ? update_curr+0xec/0x170 
[2.062044]  [] do_IRQ+0x5a/0xe0 
[2.066110]  [] common_interrupt+0x6a/0x6a 
[2.070846]  [] ? __do_softirq+0x94/0x220 
[2.07]  [] ? __do_softirq+0x51/0x220 
[2.080335]  [] irq_exit+0xa5/0xb0 
[2.084518]  [] smp_apic_timer_interrupt+0x6e/0x99 
[2.089842]  [] apic_timer_interrupt+0x6a/0x70 
[2.095022]
[2.096597]  [] ? native_cpu_up+0x157.309461]  
[] _cpu_up+0xc0/0x133 
[2.505201]  [] cpu_up+0xd7/0xea 
[2.509184]  [] smp_init+0x76/0xa6 
[2.513717]  [] kernel_init_freeable+0xdb/0x1ec 
[2.518789]  [] ? rest_init+0x80/0x80 
[2.523207]  [] kernel_init+0xe/0xf0 
[2.527647]  [] ret_from_fork+0x7c/0xb0 
[2.532127]  [] ? rest_init+0x80/0x80 
[2.536569] Code: c3 0f 1f 00 48 63 35 15 8e 92 00 48 89 df 49 c7 c4 a0 d9 
00 00 e8 52 e0 24 00 89 c0 48 89 df 48 8b 04 c5 e0 43 9c 81 4a 8b 04 20  50 
48 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 40 00 f0 0f b3 07  
[2.552724] RIP  [] tick_do_broadcast+0x6f/0xb0 
[2.558161]  RSP  
[2.561059] CR2: 0048 
[2.563901] ---[ end trace e29222d88d06c928 ]--- 
[2.567755] Kernel panic - not syncing: Fatal exception in interrupt 
[3.600013] Shutting down cpus with NMI 

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.0: panic during boot - kernel BUG at include/linux/gfp.h:323!

2013-05-06 Thread CAI Qian

Never saw any of those in any of 3.9 RC releases, but now saw it on
multiple systems,

[0.878873] Performance Events: AMD PMU driver. 
[0.884837] ... version:0 
[0.890248] ... bit width:  48 
[0.895815] ... generic registers:  4 
[0.901048] ... value mask:  
[0.908207] ... max period: 7fff 
[0.915165] ... fixed-purpose events:   0 
[0.920620] ... event mask: 000f 
[0.928031] [ cut here ] 
[0.934231] kernel BUG at include/linux/gfp.h:323! 
[0.940581] invalid opcode:  [#1] SMP  
[0.945982] Modules linked in: 
[0.950048] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1 
[0.957877] Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011 
[1.066325] task: 880234608000 ti: 880234602000 task.ti: 
880234602000 
[1.076603] RIP: 0010:[]  [] 
new_slab+0x2ad/0x340 
[1.087043] RSP: :880234603bf8  EFLAGS: 00010246 
[1.094067] RAX:  RBX: 880237404b40 RCX: 
00d0 
[1.103565] RDX: 0001 RSI: 0003 RDI: 
002052d0 
[1.113071] RBP: 880234603c28 R08:  R09: 
0001 
[1.122461] R10: 0001 R11: 812e3aa8 R12: 
0001 
[1.132025] R13: 8802378161c0 R14: 00030027 R15: 
40d0 
[1.141532] FS:  () GS:88023780() 
knlGS: 
[1.152306] CS:  0010 DS:  ES:  CR0: 8005003b 
[1.160004] CR2: 88043fdff000 CR3: 018d5000 CR4: 
07f0 
[1.169519] DR0:  DR1:  DR2: 
 
[1.179009] DR3:  DR6: 0ff0 DR7: 
0400 
[1.188383] Stack: 
[1.191088]  880234603c28 0001 00d0 
8802378161c0 
[1.200825]  880237404b40 880237404b40 880234603d28 
815edba1 
[1.21ea0008dd0300 880237816140  88023740e1c0 
[1.519233] Call Trace: 
[1.522392]  [] __slab_alloc+0x330/0x4f2 
[1.529758]  [] ? alloc_cpumask_var_node+0x28/0x90 
[1.538126]  [] ? wq_numa_init+0xc8/0x1be 
[1.545642]  [] kmem_cache_alloc_node_trace+0xa5/0x200 
[1.554480]  [] ? alloc_cpumask_var_node+0x28/0x90 
[1.662913]  [] alloc_cpumask_var_node+0x28/0x90 
[1.671224]  [] wq_numa_init+0x10d/0x1be 
[1.678483]  [] ? wq_numa_init+0x1be/0x1be 
[1.686085]  [] init_workqueues+0x64/0x341 
[1.693537]  [] ? smpboot_register_percpu_thread+0xc7/0xf0 
[1.702970]  [] ? ftrace_define_fields_softirq+0x32/0x32 
[1.712039]  [] ? wq_numa_init+0x1be/0x1be 
[1.719683]  [] do_one_initcall+0xea/0x1a0 
[1.727162]  [] kernel_init_freeable+0xb7/0x1ec 
[1.735316]  [] ? rest_init+0x80/0x80 
[1.742121]  [] kernel_init+0xe/0xf0 
[1.748950]  [] ret_from_fork+0x7c/0xb0 
[1.756443]  [] ? rest_init+0x80/0x80 
[1.763250] Code: 45  84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 
1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff <0f> 0b 4c 8b 73 38 
44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee  
[2.187072] RIP  [] new_slab+0x2ad/0x340 
[2.194238]  RSP  
[2.198982] ---[ end trace 43bf8bb0334e5135 ]--- 
[2.205097] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b 

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.0: panic during boot - kernel BUG at include/linux/gfp.h:323!

2013-05-06 Thread CAI Qian

Never saw any of those in any of 3.9 RC releases, but now saw it on
multiple systems,

[0.878873] Performance Events: AMD PMU driver. 
[0.884837] ... version:0 
[0.890248] ... bit width:  48 
[0.895815] ... generic registers:  4 
[0.901048] ... value mask:  
[0.908207] ... max period: 7fff 
[0.915165] ... fixed-purpose events:   0 
[0.920620] ... event mask: 000f 
[0.928031] [ cut here ] 
[0.934231] kernel BUG at include/linux/gfp.h:323! 
[0.940581] invalid opcode:  [#1] SMP  
[0.945982] Modules linked in: 
[0.950048] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1 
[0.957877] Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011 
[1.066325] task: 880234608000 ti: 880234602000 task.ti: 
880234602000 
[1.076603] RIP: 0010:[8117495d]  [8117495d] 
new_slab+0x2ad/0x340 
[1.087043] RSP: :880234603bf8  EFLAGS: 00010246 
[1.094067] RAX:  RBX: 880237404b40 RCX: 
00d0 
[1.103565] RDX: 0001 RSI: 0003 RDI: 
002052d0 
[1.113071] RBP: 880234603c28 R08:  R09: 
0001 
[1.122461] R10: 0001 R11: 812e3aa8 R12: 
0001 
[1.132025] R13: 8802378161c0 R14: 00030027 R15: 
40d0 
[1.141532] FS:  () GS:88023780() 
knlGS: 
[1.152306] CS:  0010 DS:  ES:  CR0: 8005003b 
[1.160004] CR2: 88043fdff000 CR3: 018d5000 CR4: 
07f0 
[1.169519] DR0:  DR1:  DR2: 
 
[1.179009] DR3:  DR6: 0ff0 DR7: 
0400 
[1.188383] Stack: 
[1.191088]  880234603c28 0001 00d0 
8802378161c0 
[1.200825]  880237404b40 880237404b40 880234603d28 
815edba1 
[1.21ea0008dd0300 880237816140  88023740e1c0 
[1.519233] Call Trace: 
[1.522392]  [815edba1] __slab_alloc+0x330/0x4f2 
[1.529758]  [812e3aa8] ? alloc_cpumask_var_node+0x28/0x90 
[1.538126]  [81a0bd6e] ? wq_numa_init+0xc8/0x1be 
[1.545642]  [81174b25] kmem_cache_alloc_node_trace+0xa5/0x200 
[1.554480]  [812e8] ? alloc_cpumask_var_node+0x28/0x90 
[1.662913]  [812e3aa8] alloc_cpumask_var_node+0x28/0x90 
[1.671224]  [81a0bdb3] wq_numa_init+0x10d/0x1be 
[1.678483]  [81a0be64] ? wq_numa_init+0x1be/0x1be 
[1.686085]  [81a0bec8] init_workqueues+0x64/0x341 
[1.693537]  [8107b687] ? smpboot_register_percpu_thread+0xc7/0xf0 
[1.702970]  [81a0ac4a] ? ftrace_define_fields_softirq+0x32/0x32 
[1.712039]  [81a0be64] ? wq_numa_init+0x1be/0x1be 
[1.719683]  [810002ea] do_one_initcall+0xea/0x1a0 
[1.727162]  [819f1f31] kernel_init_freeable+0xb7/0x1ec 
[1.735316]  [815d50d0] ? rest_init+0x80/0x80 
[1.742121]  [815d50de] kernel_init+0xe/0xf0 
[1.748950]  [815ff89c] ret_from_fork+0x7c/0xb0 
[1.756443]  [815d50d0] ? rest_init+0x80/0x80 
[1.763250] Code: 45  84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 
1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff 0f 0b 4c 8b 73 38 
44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee  
[2.187072] RIP  [8117495d] new_slab+0x2ad/0x340 
[2.194238]  RSP 880234603bf8 
[2.198982] ---[ end trace 43bf8bb0334e5135 ]--- 
[2.205097] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b 

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.0: panic during boot at tick_do_broadcast

2013-05-06 Thread CAI Qian

Never saw any of those during testing of all 3.9 rc releases.

[1.023422] Intel PMU driver. 
[1.025859] perf_event_intel: PEBS disabled due to CPU errata, please 
upgrade microcode 
[1.032534] ... version:3 
[1.036078] ... bit width:  48 
[1.039506] ... generic registers:  8 
[1.042856] ... value mask:  
[1.047382] ... max period: 7fff 
[1.052229] ... fixed-purpose events:   3 
[1.055641] ... event mask: 000700ff 
[1.065070] smpboot: Booting Node   0, Processors  #1 
[1.082422] BUG: unable to handle kernel NULL pointer dereference at 
0048 
[1.089361] IP: [810a73af] tick_do_broadcast+0x6f/0xb0 
[1.094455] PGD 0  
[1.096126] Oops:  [#1] SMP  
[1.098793] Modules linked in: 
[1.101464] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1 
[1.106487] Hardware nt DL120 G7, BIOS J01 02/01/2012 
[1.510871] task: 8802041c8000 ti: 8802041c2000 task.ti: 
8802041c2000 
[1.517310] RIP: 0010:[810a73af]  [810a73af] 
tick_do_broadcast+0x6f/0xb0 
[1.524408] RSP: :88020f403d10  EFLAGS: 00010002 
[1.528867] RAX:  RBX: 88020705a0d0 RCX: 
0001 
[1.535019] RDX: 0001 RSI: 0040 RDI: 
88020705a0d0 
[1.540963] RBP: 88020f403d20 R08: 0002 R09: 
0001 
[1.546980] R10: 0002 R11: b37c R12: 
d9a0 
[1.553124] R13:  R14:  R15: 
0001 
[1.559060] FS:  () GS:88020f40() 
knlGS: 
[1.566007] CS:  0010 DS:  ES:  CR0: 80050033 
[1.570771] CR2: 0048 CR3: 018d5000 CR4: 
000407f0 
[1.576792] DR0:  DR1:  DR2: 
 
[1.582933] DR3:  DR6: 0ff0 DR7: 
0400 
[1.50] Stack: 
[1.590563]  818dca80 818e8400 88020f403d30 
810a75b1 
[1.596932]  88020f403d50 810a75d4 0  
[2.002208]  88020f403d60 81004935 88020f403db0 
810df624 
[2.008450] Call Trace: 
[2.010471]  IRQ  
[2.012057]  [810a75b1] tick_do_periodic_broadcast+0x41/0x50 
[2.017847]  [810a75d4] tick_handle_periodic_broadcast+0x14/0x50 
[2.023640]  [81004935] timer_interrupt+0x15/0x20 
[2.028291]  [810df624] handle_irq_event_percpu+0x54/0x1e0 
[2.033810]  [8108a7be] ? task_tick_fair+0x16e/0x550 
[2.038626]  [810df7f2] handle_irq_event+0x42/0x70 
[2.043418]  [810e20ff] handle_edge_irq+0x6f/0x110 
[2.048508]  [8100415f] handle_irq+0xbf/0x150 
[2.052945]  [810569c1] ? irq_enter+0x51/0x90 
[2.057362]  [8108915c] ? update_curr+0xec/0x170 
[2.062044]  [816011fa] do_IRQ+0x5a/0xe0 
[2.066110]  [815f78ea] common_interrupt+0x6a/0x6a 
[2.070846]  [810567a4] ? __do_softirq+0x94/0x220 
[2.07]  [81056761] ? __do_softirq+0x51/0x220 
[2.080335]  [81056aa5] irq_exit+0xa5/0xb0 
[2.084518]  [816012ee] smp_apic_timer_interrupt+0x6e/0x99 
[2.089842]  [8160057a] apic_timer_interrupt+0x6a/0x70 
[2.095022]  EOI  
[2.096597]  [815e0b30] ? native_cpu_up+0x157.309461]  
[815e21ea] _cpu_up+0xc0/0x133 
[2.505201]  [815e2334] cpu_up+0xd7/0xea 
[2.509184]  [81a0eec3] smp_init+0x76/0xa6 
[2.513717]  [819f1f55] kernel_init_freeable+0xdb/0x1ec 
[2.518789]  [815d50d0] ? rest_init+0x80/0x80 
[2.523207]  [815d50de] kernel_init+0xe/0xf0 
[2.527647]  [815ff89c] ret_from_fork+0x7c/0xb0 
[2.532127]  [815d50d0] ? rest_init+0x80/0x80 
[2.536569] Code: c3 0f 1f 00 48 63 35 15 8e 92 00 48 89 df 49 c7 c4 a0 d9 
00 00 e8 52 e0 24 00 89 c0 48 89 df 48 8b 04 c5 e0 43 9c 81 4a 8b 04 20 ff 50 
48 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 40 00 f0 0f b3 07  
[2.552724] RIP  [810a73af] tick_do_broadcast+0x6f/0xb0 
[2.558161]  RSP 88020f403d10 
[2.561059] CR2: 0048 
[2.563901] ---[ end trace e29222d88d06c928 ]--- 
[2.567755] Kernel panic - not syncing: Fatal exception in interrupt 
[3.600013] Shutting down cpus with NMI 

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9.0: system deadlock at semctl running selinux testsuite

2013-05-06 Thread CAI Qian

:  R12: 
 
[ 3242.675012] R13: 818cbfd8 R14: 818cbfd8 R15: 
 
[ 3242.675012] FS:  () GS:88007fc0() 
knlGS: 
[ 3242.675012] CS:  0010 DS:  ES:  CR0: 8005003b 
[ 3242.675012] CR2: 7fbee5878420 CR3: 7ab49000 CR4: 
06f0 
[ 3242.675012] DR0:  DR1:  DR2: 
 
[ 3242.675012] DR3:  DR6: 0ff0 DR7: 
0400 
[ 3242.675012] Stack: 
[ 3242.675012]  818cbfd8 818cbfd8 818cbf08 
8100ac1e 
[ 3242.675012]  818cbf58 8109eae9 818cbfd8 
81a8e2e0 
[ 3242.675012]  88007ffa4780  81a86020 
81a8e2e0 
[ 3242.675012] Call Trace: 
[ 3242.675012]  [8100ac1e] arch_cpu_idle+0x1e/0x30 
[ 3242.675012]  [8109eae9] cpu_startup_entry+0x89/0x210 
[ 3242.675012]  [815d50c7] rest_init+0x77/0x80 
[ 3242.675012]  [819f1e6d] start_kernel+0x3f9/0x406 
[ 3242.675012]  [819f1873] ? repair_env_string+0x5e/0x5e 
[ 3242.675012]  [819f15a3] x86_64_start_reservations+0x2a/0x2c 
[ 3242.675012]  [819f1673] x86_64_start_kernel+0xce/0xd2 
[ 3242.675012] Code: 89 e5 e8 48 eb 06 00 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 
00 55 48 89 e5 41 54 65 44 8b 24 25 1c b0 00 00 53 0f 1f 44 00 00 fb f4 65 44 
8b 24 25 1c b0 00 00 0f 1f 44 00 00 5b 41 5c 5d c3 90 e8 

CAI Qian 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

2013-04-21 Thread CAI Qian

Hi Ted,

- Original Message -
> From: "Theodore Ts'o" 
> To: "CAI Qian" 
> Cc: "Eric Whitney" , "Dmitry Monakhov" 
> , "Christian Kujau"
> nerdbynature.de>, "LKML" , "linux-s390" 
> , "Steve
> Best" , linux-e...@vger.kernel.org
> Sent: Saturday, April 20, 2013 11:19:45 PM
> Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
> 
> On Mon, Apr 08, 2013 at 11:05:11PM -0400, CAI Qian wrote:
> > I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
> > KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
> > I plan to run this weekly and setup a wiki page to update the testing
> > status by every Friday.
> 
> Hi CAI,
> 
> Sorry for not getting back to you sooner; I was at Collaboration
> Summit and LSF/MM last week.
> 
> It would be great if you could help run xfstests on the ext4 dev tree
> on various platforms.  We don't have any coverage on Power7 or
> s390/Z10 at the moment, so that would be especially welcome.  Coverage
> on alternate storage backends can be interesting in finding timing
> problems so they would be valuable as well.
> 
> If you have any Itanium platforms, that would be great too, since we
> don't have that today.
Unfortunately, to get those ia64 up and running with the upstream kernel
required some significant efforts. I'd leave that for now until it is
something very important.
> 
> The various ext4 configurations which I test can be found in the
> kvm-autorun/conf directory in my xfstests-bld git repository:
> 
> git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
> 
> (This repository is a convenient setup to do build xfstests in a
> hermetic environment, and convenience scripts to run xfstests under
> kvm, and scripts on the host OS kick off the kvm test run and parse
> the test output afterwards.)
> 
> Thanks for offering to test the dev branch!
OK, will check with that. BTW, git.kernel.org is kind of broken for me
very often those days, as almost all my tests got this,
+ git clone http://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git -b dev
Cloning into 'ext4'...
fatal: The remote end hung up unexpectedly
fatal: recursion detected in die handler

Therefore, it is going to take a while to re-try later as the test systems
here always do testing from a clean environment, i.e., re-install OS, and
then re-clone the tree etc. :\
CAI Qian
> 
> - Ted
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

2013-04-21 Thread CAI Qian

Hi Ted,

- Original Message -
 From: Theodore Ts'o ty...@mit.edu
 To: CAI Qian caiq...@redhat.com
 Cc: Eric Whitney enwli...@gmail.com, Dmitry Monakhov 
 dmonak...@openvz.org, Christian Kujau
 nerdbynature.de, LKML linux-kernel@vger.kernel.org, linux-s390 
 linux-s...@vger.kernel.org, Steve
 Best sb...@redhat.com, linux-e...@vger.kernel.org
 Sent: Saturday, April 20, 2013 11:19:45 PM
 Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

 On Mon, Apr 08, 2013 at 11:05:11PM -0400, CAI Qian wrote:
  I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
  KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
  I plan to run this weekly and setup a wiki page to update the testing
  status by every Friday.

 Hi CAI,

 Sorry for not getting back to you sooner; I was at Collaboration
 Summit and LSF/MM last week.

 It would be great if you could help run xfstests on the ext4 dev tree
 on various platforms.  We don't have any coverage on Power7 or
 s390/Z10 at the moment, so that would be especially welcome.  Coverage
 on alternate storage backends can be interesting in finding timing
 problems so they would be valuable as well.

 If you have any Itanium platforms, that would be great too, since we
 don't have that today.
Unfortunately, to get those ia64 up and running with the upstream kernel
required some significant efforts. I'd leave that for now until it is
something very important.

 The various ext4 configurations which I test can be found in the
 kvm-autorun/conf directory in my xfstests-bld git repository:

 git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git

 (This repository is a convenient setup to do build xfstests in a
 hermetic environment, and convenience scripts to run xfstests under
 kvm, and scripts on the host OS kick off the kvm test run and parse
 the test output afterwards.)

 Thanks for offering to test the dev branch!
OK, will check with that. BTW, git.kernel.org is kind of broken for me
very often those days, as almost all my tests got this,
+ git clone http://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git -b dev
Cloning into 'ext4'...
fatal: The remote end hung up unexpectedly
fatal: recursion detected in die handler

Therefore, it is going to take a while to re-try later as the test systems
here always do testing from a clean environment, i.e., re-install OS, and
then re-clone the tree etc. :\
CAI Qian

 - Ted

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] Fatal exception in interrupt - nf_nat_cleanup_conntrack during IPv6 tests

2013-04-09 Thread CAI Qian

10 DS:  ES:  CR0: 80050033 
[ 3602.298692] CR2:  CR3: 0001ea35d000 CR4: 
000407e0 
[ 3602.334682] DR0:  DR1:  DR2: 
 
[ 3602.370618] DR3:  DR6: 0ff0 DR7: 
0400 
[ 3602.407160] Process grep (pid: 63862, threadinfo 8801b04ee000, task 
8801ff9fb560) 
[ 3602.448654] Stack: 
[ 3602.458587]  81094856 8801fb5406c0 8801ffaf5010 
880202c74180 
[ 3602.496119]   8801b04efd28 810951f3 
8801b04efd48 
[ 3602.533501]  8109564c 8801ffaf5010 8801ffaf57f4 
8801b04efda8 
[ 3602.569906] Call Trace: 
[ 3602.582433]  [] ? enqueue_task+0x66/0x80 
[ 3602.609676]  [] activate_task+0x23/0x30 
[ 3602.636830]  [] ttwu_do_activate.constprop.78+0x3c/0x70 
[ 3602.672787]  [] try_to_wake_up+0x1dc/0x2d0 
[ 3602.703479]  [] ? __internal_add_timer+0x130/0x130 
[ 3602.737967]  [] wake_up_process+0x27/0x50 
[ 3602.767596]  [] process_timeout+0xe/0x10 
[ 3602.796510]  [] call_timer_fn+0x3a/0x120 
[ 3602.823533]  [] ? __internal_add_timer+0x130/0x130 
[ 3602.855717]  [] run_timer_softirq+0x1fe/0x2b0 
[ 3602.885014]  [] __do_softirq+0xd0/0x210 
[ 3602.912611]  [] ? native_sched_clock+0x13/0x80 
[ 3602.942503]  [] call_softirq+0x1c/0x30 
[ 3602.969510]  [] do_softirq+0x75/0xb0 
[ 3602.994957]  [] irq_exit+0xb5/0xc0 
[ 3603.020036]  [] smp_apic_timer_interrupt+0x6e/0x99 
[ 3603.051018]  [] apic_timer_interrupt+0x6d/0x80 
[ 3603.081602] Code:  Bad RIP value. 
[ 3603.097893] RIP  [<  (null)>]   (null) 
[ 3603.123740]  RSP  
[ 3603.140928] CR2:  
[ 3603.157190] ---[ end trace 62e555f1b47d35f6 ]--- 
[ 3603.181234] Kernel panic - not syncing: Fatal exception in interrupt 

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] Fatal exception in interrupt - nf_nat_cleanup_conntrack during IPv6 tests

2013-04-09 Thread CAI Qian

(F) ebtables(F) 
ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) 
coretemp(F) ixgbe(F) kvm_intel(F) kvm(F) ptp(F) iTCO_wdt(F) 
iTCO_vendor_support(F) crc32c_intel(F) pps_core(F) mdio(F) 
ghash_clmulni_intel(F) e1000e(F) lpc_ich(F) dca(F) hpilo(F) hpwdt(F) 
mfd_core(F) pcspkr(F) serio_raw(F) microcode(F) xfs(F) libcrc32c(F) 
ata_generic(F) mgag200(F) i2c_algo_bit(F) pata_acpi(F) drm_kms_helper(F) 
sd_mod(F) crc_t10dif(F) ttm(F) ata_piix(F) drm(F) hpsa(F) libata(F) i2c_core(F) 
dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: iptable_nat] 
[ 3601.922124] CPU 3  
[ 3601.931628] Pid: 63862, comm: grep Tainted: GF3.8.5+ #1 HP 
ProLiant DL120 G7 
[ 3601.973462] RIP: 0010:[]  [  (null)]   
(null) 
[ 3602.010902] RSP: :8801b04efcf0  EFLAGS: 00010046 
[ 3602.037249] RAX: 81631ac0 RBX: 8801ffaf5010 RCX: 
1048f4fc 
[ 3602.073506] RDX: 0005 RSI: 8801ffaf5010 RDI: 
880202c74180 
[ 3602.112346] RBP: 8801b04efd18 R08:  R09: 
 
[ 3602.150489] R10:  R11: bbe2 R12: 
880202c74180 
[ 3602.189458] R13: 0005 R14: 0246 R15: 
880202c74180 
[ 3602.227532] FS:  7fe849139740() GS:880202c6() 
knlGS: 
[ 3602.270418] CS:  0010 DS:  ES:  CR0: 80050033 
[ 3602.298692] CR2:  CR3: 0001ea35d000 CR4: 
000407e0 
[ 3602.334682] DR0:  DR1:  DR2: 
 
[ 3602.370618] DR3:  DR6: 0ff0 DR7: 
0400 
[ 3602.407160] Process grep (pid: 63862, threadinfo 8801b04ee000, task 
8801ff9fb560) 
[ 3602.448654] Stack: 
[ 3602.458587]  81094856 8801fb5406c0 8801ffaf5010 
880202c74180 
[ 3602.496119]   8801b04efd28 810951f3 
8801b04efd48 
[ 3602.533501]  8109564c 8801ffaf5010 8801ffaf57f4 
8801b04efda8 
[ 3602.569906] Call Trace: 
[ 3602.582433]  [81094856] ? enqueue_task+0x66/0x80 
[ 3602.609676]  [810951f3] activate_task+0x23/0x30 
[ 3602.636830]  [8109564c] ttwu_do_activate.constprop.78+0x3c/0x70 
[ 3602.672787]  [81097cbc] try_to_wake_up+0x1dc/0x2d0 
[ 3602.703479]  [81071710] ? __internal_add_timer+0x130/0x130 
[ 3602.737967]  [81097e17] wake_up_process+0x27/0x50 
[ 3602.767596]  [8107171e] process_timeout+0xe/0x10 
[ 3602.796510]  [81070c9a] call_timer_fn+0x3a/0x120 
[ 3602.823533]  [81071710] ? __internal_add_timer+0x130/0x130 
[ 3602.855717]  [81072a2e] run_timer_softirq+0x1fe/0x2b0 
[ 3602.885014]  [8106a8f0] __do_softirq+0xd0/0x210 
[ 3602.912611]  [8101b883] ? native_sched_clock+0x13/0x80 
[ 3602.942503]  [8161bddc] call_softirq+0x1c/0x30 
[ 3602.969510]  [810162a5] do_softirq+0x75/0xb0 
[ 3602.994957]  [8106abc5] irq_exit+0xb5/0xc0 
[ 3603.020036]  [8161c75e] smp_apic_timer_interrupt+0x6e/0x99 
[ 3603.051018]  [8161b69d] apic_timer_interrupt+0x6d/0x80 
[ 3603.081602] Code:  Bad RIP value. 
[ 3603.097893] RIP  [  (null)]   (null) 
[ 3603.123740]  RSP 8801b04efcf0 
[ 3603.140928] CR2:  
[ 3603.157190] ---[ end trace 62e555f1b47d35f6 ]--- 
[ 3603.181234] Kernel panic - not syncing: Fatal exception in interrupt 

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

2013-04-08 Thread CAI Qian

Hello Ted,

- Original Message -
> From: "Theodore Ts'o" 
> To: "Eric Whitney" 
> Cc: "Dmitry Monakhov" , "Christian Kujau" 
> , "CAI Qian"
> , "LKML" , "linux-s390" 
> , "Steve Best"
> , linux-e...@vger.kernel.org
> Sent: Wednesday, April 3, 2013 10:41:14 PM
> Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
> 
> On Wed, Apr 03, 2013 at 10:34:06AM -0400, Eric Whitney wrote:
> > 
> > The TI OMAP4 processor on my Pandaboard test system is little endian.
> 
> Ah... so basically, we need to find a test platform which allows us to
> boot arbitrary kernels and allows us to have root access (which means
> it's unlikely we'll be able to do this via remote access) and which
> doesn't have exotic power requirements (which as far as I know rules
> out pSeries and zSeries systems)
> 
> It would also be nice if we could run tests in finite time, which
> probably rules out the Hercules emulator (it runs at one-tenth zSeries
> processor speeds, which doesn't win speed competitions by default, and
> I suspect their storage speeds are even worse).
> 
> Anyone else have any suggestions?  Or anyone willing to help us run
> ext4 regression tests on the ext4 dev tree, so we can find these
> problems before we merge into mainline?
I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
I plan to run this weekly and setup a wiki page to update the testing
status by every Friday.
CAI Qian
> 
> Thanks,
> 
>   - Ted
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

2013-04-08 Thread CAI Qian

Hello Ted,

- Original Message -
 From: Theodore Ts'o ty...@mit.edu
 To: Eric Whitney enwli...@gmail.com
 Cc: Dmitry Monakhov dmonak...@openvz.org, Christian Kujau 
 li...@nerdbynature.de, CAI Qian
 caiq...@redhat.com, LKML linux-kernel@vger.kernel.org, linux-s390 
 linux-s...@vger.kernel.org, Steve Best
 sb...@redhat.com, linux-e...@vger.kernel.org
 Sent: Wednesday, April 3, 2013 10:41:14 PM
 Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

 On Wed, Apr 03, 2013 at 10:34:06AM -0400, Eric Whitney wrote:

  The TI OMAP4 processor on my Pandaboard test system is little endian.

 Ah... so basically, we need to find a test platform which allows us to
 boot arbitrary kernels and allows us to have root access (which means
 it's unlikely we'll be able to do this via remote access) and which
 doesn't have exotic power requirements (which as far as I know rules
 out pSeries and zSeries systems)

 It would also be nice if we could run tests in finite time, which
 probably rules out the Hercules emulator (it runs at one-tenth zSeries
 processor speeds, which doesn't win speed competitions by default, and
 I suspect their storage speeds are even worse).

 Anyone else have any suggestions?  Or anyone willing to help us run
 ext4 regression tests on the ext4 dev tree, so we can find these
 problems before we merge into mainline?
I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
I plan to run this weekly and setup a wiki page to update the testing
status by every Friday.
CAI Qian

 Thanks,

   - Ted

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bisected! (WAS Re: s390x: kernel BUG at fs/ext4/inode.c:1591!)

2013-04-03 Thread CAI Qian


> > [Text Documents:disable-es_lookup_extent.patch]
With this patch, I cannot reproduce it any more.

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bisected! (WAS Re: s390x: kernel BUG at fs/ext4/inode.c:1591!)

2013-04-03 Thread CAI Qian


  [Text Documents:disable-es_lookup_extent.patch]
With this patch, I cannot reproduce it any more.

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer at kset_find_obj

2013-04-02 Thread CAI Qian



- Original Message -
> From: "David Howells" 
> To: "CAI Qian" , ru...@rustcorp.com.au
> Cc: dhowe...@redhat.com, "LKML" 
> Sent: Wednesday, April 3, 2013 1:38:50 AM
> Subject: Re: NULL pointer at kset_find_obj
> 
> CAI Qian  wrote:
> 
> > Just booted the latest mainline,
> > 
> > [   35.217698] Request for unknown module key 'Magrathea: Glacier signing
> > key: 8b7774b08bc4ee9637073434c10f0823f6fbe523' err -11
> 
> Can you check back earlier in the dmesg to see whether the kernel tried to
> load the key?  -11 is presumably -EAGAIN - in which case no such key was
> found
> (rather than there being a cached lookup failure which is what -ENOKEY would
> indicate).  It is possible that you encountered the key-not-yet-valid problem
> due to your h/w clock showing a value prior to the start date on the key.
Hmm, unsure about how to check it, but here is the full log prior the panic,
http://people.redhat.com/qcai/stable/log.key

CAI Qian
>  
> > [   35.218511] BUG: unable to handle kernel paging request at
> > a03093f0
> > [   35.218521] IP: [] kset_find_obj+0x30/0x80
> > ...
> > [   35.218575] Call Trace:
> > [   35.218583]  [] load_module+0xb0d/0x1b00
> > [   35.218587]  [] ? ddebug_proc_open+0xc0/0xc0
> > [   35.218593]  [] ? page_fault+0x28/0x30
> > [   35.218596]  [] sys_init_module+0xd7/0x120
> > [   35.218601]  [] system_call_fastpath+0x16/0x1b
> 
> I think this bit should be waved in front of Rusty.  It looks like it might
> be
> a bug in error handling code.
> 
> David
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread CAI Qian



- Original Message -
> From: "Jens Axboe" 
> To: "CAI Qian" 
> Cc: "Dave Chinner" , x...@oss.sgi.com, "LKML" 
> 
> Sent: Tuesday, April 2, 2013 5:00:47 PM
> Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running 
> xfstests case #78]
> 
> On Tue, Apr 02 2013, CAI Qian wrote:
> > 
> > 
> > - Original Message -
> > > From: "Jens Axboe" 
> > > To: "Dave Chinner" 
> > > Cc: "CAI Qian" , x...@oss.sgi.com, "LKML"
> > > 
> > > Sent: Tuesday, April 2, 2013 3:30:35 PM
> > > Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5
> > > running xfstests case #78]
> > > 
> > > On Tue, Apr 02 2013, Jens Axboe wrote:
> > > > On Tue, Apr 02 2013, Dave Chinner wrote:
> > > > > [Added jens Axboe to CC]
> > > > > 
> > > > > On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > > > > > Saw on almost all the servers range from x64, ppc64 and s390x with
> > > > > > kernel
> > > > > > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks
> > > > > > like
> > > > > > something new broke this. Log is here with sysrq debug info.
> > > > > > http://people.redhat.com/qcai/stable/log
> > > > 
> > > > CAI Qian, can you try and back the below out and test again?
> > > 
> > > Nevermind, it's clearly that one. The below should improve the
> > > situation, but it's not pretty. A better fix would be to allow
> > > auto-deletion even if PART_NO_SCAN is set.
> > Jens, when compiled the mainline (up to fefcdbe) with this patch,
> > it error-ed out,
> 
> Looks like I sent the wrong one, updated below.
The patch works well. Thanks!
CAI Qian
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index fe5f640..faa3afa 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1057,14 +1057,15 @@ static int loop_clr_fd(struct loop_device *lo)
>   struct disk_part_iter piter;
>   struct hd_struct *part;
>  
> - mutex_lock_nested(>bd_mutex, 1);
> - invalidate_partition(bdev->bd_disk, 0);
> - disk_part_iter_init(, bdev->bd_disk,
> - DISK_PITER_INCL_EMPTY);
> - while ((part = disk_part_iter_next()))
> - delete_partition(bdev->bd_disk, part->partno);
> - disk_part_iter_exit();
> - mutex_unlock(>bd_mutex);
> + if (mutex_trylock(>bd_mutex)) {
> + invalidate_partition(bdev->bd_disk, 0);
> + disk_part_iter_init(, bdev->bd_disk,
> + DISK_PITER_INCL_EMPTY);
> + while ((part = disk_part_iter_next()))
> + delete_partition(bdev->bd_disk, part->partno);
> + disk_part_iter_exit();
> + mutex_unlock(>bd_mutex);
> + }
>   }
>  
>   /*
> 
> --
> Jens Axboe
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread CAI Qian



- Original Message -
> From: "Jens Axboe" 
> To: "Dave Chinner" 
> Cc: "CAI Qian" , x...@oss.sgi.com, "LKML" 
> 
> Sent: Tuesday, April 2, 2013 3:30:35 PM
> Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running 
> xfstests case #78]
> 
> On Tue, Apr 02 2013, Jens Axboe wrote:
> > On Tue, Apr 02 2013, Dave Chinner wrote:
> > > [Added jens Axboe to CC]
> > > 
> > > On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > > > Saw on almost all the servers range from x64, ppc64 and s390x with
> > > > kernel
> > > > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks
> > > > like
> > > > something new broke this. Log is here with sysrq debug info.
> > > > http://people.redhat.com/qcai/stable/log
> > 
> > CAI Qian, can you try and back the below out and test again?
> 
> Nevermind, it's clearly that one. The below should improve the
> situation, but it's not pretty. A better fix would be to allow
> auto-deletion even if PART_NO_SCAN is set.
Jens, when compiled the mainline (up to fefcdbe) with this patch,
it error-ed out,

drivers/block/loop.c: In function ‘loop_clr_fd’:
drivers/block/loop.c:1067:3: error: too many arguments to function 
‘mutex_trylock’
In file included from include/linux/notifier.h:13:0,
 from include/linux/memory_hotplug.h:6,
 from include/linux/mmzone.h:771,
 from include/linux/gfp.h:4,
 from include/linux/kmod.h:22,
 from include/linux/module.h:13,
 from drivers/block/loop.c:52:
include/linux/mutex.h:168:12: note: declared here
drivers/block/loop.c: At top level:
drivers/block/loop.c:1084:2: warning: data definition has no type or storage 
class [enabled by default]
drivers/block/loop.c:1084:2: warning: type defaults to ‘int’ in declaration of 
‘fput’ [-Wimplicit-int]
drivers/block/loop.c:1084:2: warning: parameter names (without types) in 
function declaration [enabled by default]
drivers/block/loop.c:1084:2: error: conflicting types for ‘fput’
In file included from drivers/block/loop.c:56:0:
include/linux/file.h:14:13: note: previous declaration of ‘fput’ was here
drivers/block/loop.c:1085:2: error: expected identifier or ‘(’ before ‘return’
drivers/block/loop.c:1086:1: error: expected identifier or ‘(’ before ‘}’ token
  CC  crypto/gf128mul.o
  CC  lib/sort.o
drivers/block/loop.c: In function ‘loop_clr_fd’:
drivers/block/loop.c:1076:2: warning: control reaches end of non-void function 
[-Wreturn-type]
  CC  lib/parser.o
  CC [M]  sound/pci/atiixp.o
make[2]: *** [drivers/block/loop.o] Error 1

CAI Qian
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index fe5f640..d6c5764 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1057,14 +1057,15 @@ static int loop_clr_fd(struct loop_device *lo)
>   struct disk_part_iter piter;
>   struct hd_struct *part;
>  
> - mutex_lock_nested(>bd_mutex, 1);
> - invalidate_partition(bdev->bd_disk, 0);
> - disk_part_iter_init(, bdev->bd_disk,
> - DISK_PITER_INCL_EMPTY);
> - while ((part = disk_part_iter_next()))
> - delete_partition(bdev->bd_disk, part->partno);
> - disk_part_iter_exit();
> - mutex_unlock(>bd_mutex);
> + if (mutex_trylock(>bd_mutex, 1))
> + invalidate_partition(bdev->bd_disk, 0);
> + disk_part_iter_init(, bdev->bd_disk,
> + DISK_PITER_INCL_EMPTY);
> + while ((part = disk_part_iter_next()))
> + delete_partition(bdev->bd_disk, part->partno);
> + disk_part_iter_exit();
> + mutex_unlock(>bd_mutex);
> + }
>   }
>  
>   /*
> 
> --
> Jens Axboe
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NULL pointer at kset_find_obj

2013-04-02 Thread CAI Qian

) drm(F) mptbase(F) i2c_core(F) usb_storage(F) dm_mirror(F) 
dm_region_hash(F) dm_log(F) dm_mod(F) 
[   60.348015] CPU 3  
[   60.348015] Pid: 454, comm: systemd-udevd Tainted: GF D  3.9.0-rc5+ 
#1 Dell Inc. PowerEdge M605/0NC596 
[   60.348015] RIP: 0010:[]  [] 
_raw_spin_lock+0x25/0x30 
[   60.348015] RSP: 0018:88011b0bbd68  EFLAGS: 0293 
[   60.348015] RAX: 013b RBX: 81648fc0 RCX: 
 
[   60.348015] RDX: 013d RSI: a02e5398 RDI: 
88011c13ba08 
[   60.348015] RBP: 88011b0bbd68 R08: 8000 R09: 
7fff 
[   60.348015] R10: 0001 R11: c900047b3172 R12: 
8802 
[   60.348015] R13:  R14:  R15: 
 
[   60.348015] FS:  7ffe5be08880() GS:88011fc8() 
knlGS: 
[   60.348015] CS:  0010 DS:  ES:  CR0: 8005003b 
[   60.348015] CR2: 7ffe5bc54000 CR3: 00011b09d000 CR4: 
07e0 
[   60.348015] DR0:  DR1:  DR2: 
 
[   60.348015] DR3:  DR6: 0ff0 DR7: 
0400 
[   60.348015] Process systemd-udevd (pid: 454, threadinfo 88011b0ba000, 
task 88011a1c) 
[   60.348015] Stack: 
[   60.348015]  88011b0bbd98 813046fc  
a02e5398 
[   60.348015]   a02e5380 88011b0bbed8 
810c827d 
[   60.348015]  81321880 c900047b6fff 88011d00c120 
c900047b7000 
[   60.348015] Call Trace: 
[   60.348015]  [] kset_find_obj+0x1c/0x80 
[   60.348015]  [] load_module+0xb0d/0x1b00 
[   60.348015]  [] ? ddebug_proc_open+0xc0/0xc0 
[   60.348015]  [] ? page_fault+0x28/0x30 
[   60.348015]  [] sys_init_module+0xd7/0x120 
[   60.348015]  [] system_call_fastpath+0x16/0x1b 
[   60.348015] Code: 90 90 90 90 90 90 66 66 66 66 90 55 b8 00 00 01 00 48 89 
e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 0e 0f 1f 40 00 f3 90 0f b7 07 <66> 39 
d0 75 f6 5d c3 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 66  
[   60.200012] RDX: 013e RSI: a033a278 RDI: 
88011c13ba08 
[   60.200012] RBP: 88007a90fd68 R08: 8000 R09: 
7fff 
[   60.200012] R10: 0001 R11: c90004813a72 R12: 
00d0 
[   60.200012] R13:  R14: 0001 R15: 
88011ffa1340 
[   60.200012] FS:  7ffe5be08880() GS:88011fc0() 
knlGS: 
[   60.200012] CS:  0010 DS:  ES:  CR0: 80050033 
[   60.200012] CR2: 7ffe5bc54000 CR3: 6a5f6000 CR4: 
07e0 
[   60.200012] DR0:  DR1:  DR2: 
 
[   60.200012] DR3:  DR6: 0ff0 DR7: 
0400 
[   60.200012] Process systemd-udevd (pid: 357, threadinfo 88007a90e000, 
task 88006a5c1ac0) 
[   60.200012] Stack: 
[   60.200012]  88007a90fd98 813046fc  
a033a278 
[   60.200012]   a033a260 88007a90fed8 
810c827d 
[   60.200012]  81321880 c9000481dfff 88011d00c128 
c9000481e000 
[   60.200012] Call Trace: 
[   60.200012]  [] kset_find_obj+0x1c/0x80 
[   60.200012]  [] load_module+0xb0d/0x1b00 
[   60.200012]  [] ? ddebug_proc_open+0xc0/0xc0 
[   60.200012]  [] ? page_fault+0x28/0x30 
[   60.200012]  [] sys_init_module+0xd7/0x120 
[   60.200012]  [] system_call_fastpath+0x16/0x1b 

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

xfs deadlock on 3.9-rc5 running xfstests case #78

2013-04-02 Thread CAI Qian

Saw on almost all the servers range from x64, ppc64 and s390x with kernel
3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks like
something new broke this. Log is here with sysrq debug info.
http://people.redhat.com/qcai/stable/log

CAI Qian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

xfs deadlock on 3.9-rc5 running xfstests case #78

2013-04-02 Thread CAI Qian

Saw on almost all the servers range from x64, ppc64 and s390x with kernel
3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks like
something new broke this. Log is here with sysrq debug info.
http://people.redhat.com/qcai/stable/log

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NULL pointer at kset_find_obj

2013-04-02 Thread CAI Qian

 07 89 c2 c1 ea 10 66 39 c2 74 0e 0f 1f 40 00 f3 90 0f b7 
07 66 39 d0 75 f6 5d c3 0f 1f 40 00 66 66 66 66 90 55 48  
[   60.348015] BUG: soft lockup - CPU#3 stuck for 23s! [systemd-udevd:454] 
[   60.348015] Modules linked in: bnx2(F+) dcdbas(F+) edac_mce_amd(F+) kvm(F+) 
microcode(F+) i2c_nforce2(F) edac_core(F) k8temp(F) shpchp(F) serio_raw(F) 
pcspkr(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) sr_mod(F) cdrom(F) 
mptsas(F) radeon(F) i2c_algo_bit(F) scsi_transport_sas(F) drm_kms_helper(F) 
mptscsih(F) ttm(F) drm(F) mptbase(F) i2c_core(F) usb_storage(F) dm_mirror(F) 
dm_region_hash(F) dm_log(F) dm_mod(F) 
[   60.348015] CPU 3  
[   60.348015] Pid: 454, comm: systemd-udevd Tainted: GF D  3.9.0-rc5+ 
#1 Dell Inc. PowerEdge M605/0NC596 
[   60.348015] RIP: 0010:[81628375]  [81628375] 
_raw_spin_lock+0x25/0x30 
[   60.348015] RSP: 0018:88011b0bbd68  EFLAGS: 0293 
[   60.348015] RAX: 013b RBX: 81648fc0 RCX: 
 
[   60.348015] RDX: 013d RSI: a02e5398 RDI: 
88011c13ba08 
[   60.348015] RBP: 88011b0bbd68 R08: 8000 R09: 
7fff 
[   60.348015] R10: 0001 R11: c900047b3172 R12: 
8802 
[   60.348015] R13:  R14:  R15: 
 
[   60.348015] FS:  7ffe5be08880() GS:88011fc8() 
knlGS: 
[   60.348015] CS:  0010 DS:  ES:  CR0: 8005003b 
[   60.348015] CR2: 7ffe5bc54000 CR3: 00011b09d000 CR4: 
07e0 
[   60.348015] DR0:  DR1:  DR2: 
 
[   60.348015] DR3:  DR6: 0ff0 DR7: 
0400 
[   60.348015] Process systemd-udevd (pid: 454, threadinfo 88011b0ba000, 
task 88011a1c) 
[   60.348015] Stack: 
[   60.348015]  88011b0bbd98 813046fc  
a02e5398 
[   60.348015]   a02e5380 88011b0bbed8 
810c827d 
[   60.348015]  81321880 c900047b6fff 88011d00c120 
c900047b7000 
[   60.348015] Call Trace: 
[   60.348015]  [813046fc] kset_find_obj+0x1c/0x80 
[   60.348015]  [810c827d] load_module+0xb0d/0x1b00 
[   60.348015]  [81321880] ? ddebug_proc_open+0xc0/0xc0 
[   60.348015]  [81628cd8] ? page_fault+0x28/0x30 
[   60.348015]  [810c9347] sys_init_module+0xd7/0x120 
[   60.348015]  [81630d59] system_call_fastpath+0x16/0x1b 
[   60.348015] Code: 90 90 90 90 90 90 66 66 66 66 90 55 b8 00 00 01 00 48 89 
e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 0e 0f 1f 40 00 f3 90 0f b7 07 66 39 
d0 75 f6 5d c3 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 66  
[   60.200012] RDX: 013e RSI: a033a278 RDI: 
88011c13ba08 
[   60.200012] RBP: 88007a90fd68 R08: 8000 R09: 
7fff 
[   60.200012] R10: 0001 R11: c90004813a72 R12: 
00d0 
[   60.200012] R13:  R14: 0001 R15: 
88011ffa1340 
[   60.200012] FS:  7ffe5be08880() GS:88011fc0() 
knlGS: 
[   60.200012] CS:  0010 DS:  ES:  CR0: 80050033 
[   60.200012] CR2: 7ffe5bc54000 CR3: 6a5f6000 CR4: 
07e0 
[   60.200012] DR0:  DR1:  DR2: 
 
[   60.200012] DR3:  DR6: 0ff0 DR7: 
0400 
[   60.200012] Process systemd-udevd (pid: 357, threadinfo 88007a90e000, 
task 88006a5c1ac0) 
[   60.200012] Stack: 
[   60.200012]  88007a90fd98 813046fc  
a033a278 
[   60.200012]   a033a260 88007a90fed8 
810c827d 
[   60.200012]  81321880 c9000481dfff 88011d00c128 
c9000481e000 
[   60.200012] Call Trace: 
[   60.200012]  [813046fc] kset_find_obj+0x1c/0x80 
[   60.200012]  [810c827d] load_module+0xb0d/0x1b00 
[   60.200012]  [81321880] ? ddebug_proc_open+0xc0/0xc0 
[   60.200012]  [81628cd8] ? page_fault+0x28/0x30 
[   60.200012]  [810c9347] sys_init_module+0xd7/0x120 
[   60.200012]  [81630d59] system_call_fastpath+0x16/0x1b 

CAI Qian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread CAI Qian



- Original Message -
 From: Jens Axboe ax...@kernel.dk
 To: Dave Chinner da...@fromorbit.com
 Cc: CAI Qian caiq...@redhat.com, x...@oss.sgi.com, LKML 
 linux-kernel@vger.kernel.org
 Sent: Tuesday, April 2, 2013 3:30:35 PM
 Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running 
 xfstests case #78]
 
 On Tue, Apr 02 2013, Jens Axboe wrote:
  On Tue, Apr 02 2013, Dave Chinner wrote:
   [Added jens Axboe to CC]
   
   On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
Saw on almost all the servers range from x64, ppc64 and s390x with
kernel
3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks
like
something new broke this. Log is here with sysrq debug info.
http://people.redhat.com/qcai/stable/log
  
  CAI Qian, can you try and back the below out and test again?
 
 Nevermind, it's clearly that one. The below should improve the
 situation, but it's not pretty. A better fix would be to allow
 auto-deletion even if PART_NO_SCAN is set.
Jens, when compiled the mainline (up to fefcdbe) with this patch,
it error-ed out,

drivers/block/loop.c: In function ‘loop_clr_fd’:
drivers/block/loop.c:1067:3: error: too many arguments to function 
‘mutex_trylock’
In file included from include/linux/notifier.h:13:0,
 from include/linux/memory_hotplug.h:6,
 from include/linux/mmzone.h:771,
 from include/linux/gfp.h:4,
 from include/linux/kmod.h:22,
 from include/linux/module.h:13,
 from drivers/block/loop.c:52:
include/linux/mutex.h:168:12: note: declared here
drivers/block/loop.c: At top level:
drivers/block/loop.c:1084:2: warning: data definition has no type or storage 
class [enabled by default]
drivers/block/loop.c:1084:2: warning: type defaults to ‘int’ in declaration of 
‘fput’ [-Wimplicit-int]
drivers/block/loop.c:1084:2: warning: parameter names (without types) in 
function declaration [enabled by default]
drivers/block/loop.c:1084:2: error: conflicting types for ‘fput’
In file included from drivers/block/loop.c:56:0:
include/linux/file.h:14:13: note: previous declaration of ‘fput’ was here
drivers/block/loop.c:1085:2: error: expected identifier or ‘(’ before ‘return’
drivers/block/loop.c:1086:1: error: expected identifier or ‘(’ before ‘}’ token
  CC  crypto/gf128mul.o
  CC  lib/sort.o
drivers/block/loop.c: In function ‘loop_clr_fd’:
drivers/block/loop.c:1076:2: warning: control reaches end of non-void function 
[-Wreturn-type]
  CC  lib/parser.o
  CC [M]  sound/pci/atiixp.o
make[2]: *** [drivers/block/loop.o] Error 1

CAI Qian
 
 diff --git a/drivers/block/loop.c b/drivers/block/loop.c
 index fe5f640..d6c5764 100644
 --- a/drivers/block/loop.c
 +++ b/drivers/block/loop.c
 @@ -1057,14 +1057,15 @@ static int loop_clr_fd(struct loop_device *lo)
   struct disk_part_iter piter;
   struct hd_struct *part;
  
 - mutex_lock_nested(bdev-bd_mutex, 1);
 - invalidate_partition(bdev-bd_disk, 0);
 - disk_part_iter_init(piter, bdev-bd_disk,
 - DISK_PITER_INCL_EMPTY);
 - while ((part = disk_part_iter_next(piter)))
 - delete_partition(bdev-bd_disk, part-partno);
 - disk_part_iter_exit(piter);
 - mutex_unlock(bdev-bd_mutex);
 + if (mutex_trylock(bdev-bd_mutex, 1))
 + invalidate_partition(bdev-bd_disk, 0);
 + disk_part_iter_init(piter, bdev-bd_disk,
 + DISK_PITER_INCL_EMPTY);
 + while ((part = disk_part_iter_next(piter)))
 + delete_partition(bdev-bd_disk, part-partno);
 + disk_part_iter_exit(piter);
 + mutex_unlock(bdev-bd_mutex);
 + }
   }
  
   /*
 
 --
 Jens Axboe
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread CAI Qian



- Original Message -
 From: Jens Axboe ax...@kernel.dk
 To: CAI Qian caiq...@redhat.com
 Cc: Dave Chinner da...@fromorbit.com, x...@oss.sgi.com, LKML 
 linux-kernel@vger.kernel.org
 Sent: Tuesday, April 2, 2013 5:00:47 PM
 Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running 
 xfstests case #78]
 
 On Tue, Apr 02 2013, CAI Qian wrote:
  
  
  - Original Message -
   From: Jens Axboe ax...@kernel.dk
   To: Dave Chinner da...@fromorbit.com
   Cc: CAI Qian caiq...@redhat.com, x...@oss.sgi.com, LKML
   linux-kernel@vger.kernel.org
   Sent: Tuesday, April 2, 2013 3:30:35 PM
   Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5
   running xfstests case #78]
   
   On Tue, Apr 02 2013, Jens Axboe wrote:
On Tue, Apr 02 2013, Dave Chinner wrote:
 [Added jens Axboe to CC]
 
 On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
  Saw on almost all the servers range from x64, ppc64 and s390x with
  kernel
  3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks
  like
  something new broke this. Log is here with sysrq debug info.
  http://people.redhat.com/qcai/stable/log

CAI Qian, can you try and back the below out and test again?
   
   Nevermind, it's clearly that one. The below should improve the
   situation, but it's not pretty. A better fix would be to allow
   auto-deletion even if PART_NO_SCAN is set.
  Jens, when compiled the mainline (up to fefcdbe) with this patch,
  it error-ed out,
 
 Looks like I sent the wrong one, updated below.
The patch works well. Thanks!
CAI Qian
 
 diff --git a/drivers/block/loop.c b/drivers/block/loop.c
 index fe5f640..faa3afa 100644
 --- a/drivers/block/loop.c
 +++ b/drivers/block/loop.c
 @@ -1057,14 +1057,15 @@ static int loop_clr_fd(struct loop_device *lo)
   struct disk_part_iter piter;
   struct hd_struct *part;
  
 - mutex_lock_nested(bdev-bd_mutex, 1);
 - invalidate_partition(bdev-bd_disk, 0);
 - disk_part_iter_init(piter, bdev-bd_disk,
 - DISK_PITER_INCL_EMPTY);
 - while ((part = disk_part_iter_next(piter)))
 - delete_partition(bdev-bd_disk, part-partno);
 - disk_part_iter_exit(piter);
 - mutex_unlock(bdev-bd_mutex);
 + if (mutex_trylock(bdev-bd_mutex)) {
 + invalidate_partition(bdev-bd_disk, 0);
 + disk_part_iter_init(piter, bdev-bd_disk,
 + DISK_PITER_INCL_EMPTY);
 + while ((part = disk_part_iter_next(piter)))
 + delete_partition(bdev-bd_disk, part-partno);
 + disk_part_iter_exit(piter);
 + mutex_unlock(bdev-bd_mutex);
 + }
   }
  
   /*
 
 --
 Jens Axboe
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer at kset_find_obj

2013-04-02 Thread CAI Qian

- Original Message -
 From: David Howells dhowe...@redhat.com
 To: CAI Qian caiq...@redhat.com, ru...@rustcorp.com.au
 Cc: dhowe...@redhat.com, LKML linux-kernel@vger.kernel.org
 Sent: Wednesday, April 3, 2013 1:38:50 AM
 Subject: Re: NULL pointer at kset_find_obj

 CAI Qian caiq...@redhat.com wrote:

  Just booted the latest mainline,

  [   35.217698] Request for unknown module key 'Magrathea: Glacier signing
  key: 8b7774b08bc4ee9637073434c10f0823f6fbe523' err -11

 Can you check back earlier in the dmesg to see whether the kernel tried to
 load the key?  -11 is presumably -EAGAIN - in which case no such key was
 found
 (rather than there being a cached lookup failure which is what -ENOKEY would
 indicate).  It is possible that you encountered the key-not-yet-valid problem
 due to your h/w clock showing a value prior to the start date on the key.
Hmm, unsure about how to check it, but here is the full log prior the panic,
http://people.redhat.com/qcai/stable/log.key

CAI Qian

  [   35.218511] BUG: unable to handle kernel paging request at
  a03093f0
  [   35.218521] IP: [81304710] kset_find_obj+0x30/0x80
  ...
  [   35.218575] Call Trace:
  [   35.218583]  [810c827d] load_module+0xb0d/0x1b00
  [   35.218587]  [81321880] ? ddebug_proc_open+0xc0/0xc0
  [   35.218593]  [81628cd8] ? page_fault+0x28/0x30
  [   35.218596]  [810c9347] sys_init_module+0xd7/0x120
  [   35.218601]  [81630d59] system_call_fastpath+0x16/0x1b

 I think this bit should be waved in front of Rusty.  It looks like it might
 be
 a bug in error handling code.

 David

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

bisected! (WAS Re: s390x: kernel BUG at fs/ext4/inode.c:1591!)

2013-04-01 Thread CAI Qian

Bisect indicated this is the culprit,

0e401101db49959f5783f6ee9e676124b5a183ac
ext4: fix memory leakage in mext_check_coverage

This following with Dmitry's debug patch applied,

CAI Qian

Ý  101.408610¨ ES cache assertation failed for inode: 753 es_cached ex Ý56/5/744
81/20¨ != found ex Ý56/5/3396400/0¨ retval 0 flags 5
Ý  209.858899¨ ES cache assertation failed for inode: 384 es_cached ex Ý57/7/332
82/20¨ != found ex Ý57/7/3396400/0¨ retval 0 flags 5
Ý  209.860656¨ ES cache assertation failed for inode: 384 es_cached ex Ý25/1/332
50/20¨ != found ex Ý25/1/0/0¨ retval 0 flags 0
Ý  209.893587¨ ES cache assertation failed for inode: 384 es_cached ex Ý22/1/332
47/20¨ != found ex Ý22/1/34838/1000¨ retval 1 flags 0
Ý  209.913482¨ ES cache assertation failed for inode: 384 es_cached ex Ý27/1/329
40/20¨ != found ex Ý27/1/0/0¨ retval 0 flags 0
Ý  209.919950¨ ES cache assertation failed for inode: 384 es_cached ex Ý59/5/338
48/20¨ != found ex Ý59/5/3396400/0¨ retval 0 flags 5
Ý  209.931856¨ ES cache assertation failed for inode: 384 es_cached ex Ý7/1/3292
0/20¨ != found ex Ý7/1/35879/20¨ retval 1 flags 43
Ý  209.969282¨ ES cache assertation failed for inode: 384 es_cached ex Ý35/1/361
97/20¨ != found ex Ý35/1/36197/1000¨ retval 1 flags 0
Ý  209.969290¨ ES cache assertation failed for inode: 384 es_cached ex Ý48/1/362
10/20¨ != found ex Ý48/1/0/0¨ retval 0 flags 0
Ý  209.980724¨ ES cache assertation failed for inode: 384 es_cached ex Ý13/4/334
89/20¨ != found ex Ý13/4/2161372/0¨ retval 0 flags 5
Ý  209.980744¨ ES cache assertation failed for inode: 384 es_cached ex Ý61/3/335
37/20¨ != found ex Ý61/3/3396400/0¨ retval 0 flags 5
Ý  209.983848¨ ES cache assertation failed for inode: 384 es_cached ex Ý44/2/335
20/20¨ != found ex Ý44/2/36216/20¨ retval 2 flags 43
Ý  210.020041¨ ES cache assertation failed for inode: 384 es_cached ex Ý61/3/341
91/20¨ != found ex Ý61/3/3396400/0¨ retval 0 flags 5
Ý  210.050100¨ ES cache assertation failed for inode: 384 es_cached ex Ý22/11/34
565/20¨ != found ex Ý22/11/3396400/0¨ retval 0 flags 5
Ý  210.053271¨ ES cache assertation failed for inode: 384 es_cached ex Ý15/1/334
90/20¨ != found ex Ý15/1/33579/1000¨ retval 1 flags 1
Ý  210.053275¨ mpage_da_submit_io failed block=33490 != b_blocknr=33579
Ý  210.053277¨ ino:384 lbkl:15, b_state=0x1023, b_size=4096
Ý  210.053320¨ Ý cut here ¨
Ý  210.053323¨ kernel BUG at fs/ext4/inode.c:1639!
Ý  210.053402¨ illegal operation: 0001 Ý#1¨ SMP
Ý  210.053405¨ Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast
 ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ipt
able_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defra
g_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tab
les iptable_filter ip_tables sg qeth_l2 vmur xfs libcrc32c dasd_fba_mod dasd_eck
d_mod lcs dasd_mod qeth ctcm qdio ccwgroup fsm dm_mirror dm_region_hash dm_log d
m_mod
Ý  210.053434¨ CPU: 0 Not tainted 3.8.0-rc3+ #16
Ý  210.053436¨ Process fsx (pid: 20565, task: 2c358000, ksp: 2c0
8f480)
Ý  210.053439¨ Krnl PSW : 0704f0018000 003033e8 (mpage_da_submit_io
0x3d4/0x408)
Ý  210.053450¨R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:
3
Krnl GPRS: 0015 0001 0030 031b4508
Ý  210.053455¨003033e4  1000 000
71000
Ý  210.053457¨2c08fa98 03d100a8c6c0 2c08fb68 000
f
Ý  210.053460¨82d2 2204d068 003033e4 000
02c08f970
Ý  210.053473¨ Krnl Code: 003033d8: c02000215447larl%r2,72dc
66
   003033de: c0e50016788f   brasl   %r14,5d24fc
  #003033e4: a7f40001   brc 15,3033e6
  >003033e8: a7f40001   brc 15,3033ea
   003033ec: a7f40001   brc 15,3033ee
   003033f0: 4120f0e8   la  %r2,232(%r15)
   003033f4: a718   lhi %r1,0
   003033f8: 5010f0d8   st  %r1,216(%r15)
Ý  210.053497¨ Call Trace:
Ý  210.053498¨ (Ý<003033e4>¨ mpage_da_submit_io+0x3d0/0x408)
Ý  210.053501¨  Ý<00309a48>¨ mpage_da_map_and_submit+0x150/0x41c
Ý  210.053505¨  Ý<0030a212>¨ write_cache_pages_da+0x4fe/0x530
Ý  210.053509¨  Ý<0030a584>¨ ext4_da_writepages+0x340/0x628
Ý  210.053512¨  Ý<002024d2>¨ __filemap_fdatawrite_range+0x6e/0x7c
Ý  210.053518¨  Ý<002025fc>¨ filemap_write_and_wait_range+0x54/0x8c
Ý  210.053521¨  Ý<002fe0f8>¨ ext4_sync_file+0x7c/0x3d8
Ý  210.053524¨  Ý<0023c932>¨ SyS_msync+0x14e/0x1d8
Ý  210.053528¨  Ý<005de66e>¨ sysc_tracego+0x14/0x1a
Ý  210.053533¨  Ý<03fffd0e1240>¨ 0x3fffd0e1240
Ý  210.053536¨ Last Breaking-Event-Address:
Ý  210

Re: s390x: kernel BUG at fs/ext4/inode.c:1591!

2013-04-01 Thread CAI Qian



- Original Message -
> From: "Dmitry Monakhov" 
> To: "CAI Qian" , "Theodore Ts'o" 
> Cc: "LKML" , "linux-s390" 
> , "Steve Best"
> , linux-e...@vger.kernel.org
> Sent: Monday, April 1, 2013 2:07:35 PM
> Subject: Re: s390x: kernel BUG at fs/ext4/inode.c:1591!
> 
> On Fri, 29 Mar 2013 05:27:02 -0400 (EDT), CAI Qian 
> wrote:
> > 
> I've spent a half of weekend by trying to create s390x guest image,
> without any success. Can you please share it.
Well, do you have something like IBM system-Z?
http://en.wikipedia.org/wiki/IBM_System_z10

Also can be reproduced on system-p,
http://en.wikipedia.org/wiki/IBM_System_p

Never seen it on x86 so far though.
CAI Qian
> > 
> > - Original Message -
> > > From: "Theodore Ts'o" 
> > > To: "CAI Qian" 
> > > Cc: "LKML" , "linux-s390"
> > > , "Steve Best"
> > > , linux-e...@vger.kernel.org
> > > Sent: Thursday, March 28, 2013 8:05:17 PM
> > > Subject: Re: s390x: kernel BUG at fs/ext4/inode.c:1591!
> > > 
> > > On Thu, Mar 28, 2013 at 02:40:33AM -0400, CAI Qian wrote:
> > > > System hung when running xfstests-dev 013 test case on an s390x
> > > > guest. Never saw
> > > > this on 3.9-rc3 before but need to double-check. Any idea?
> > > > 
> > > > Ý 1113.795759¨ Ý cut here ¨
> > > > Ý 1113.795771¨ kernel BUG at fs/ext4/inode.c:1591!
> > > 
> > > thanks for the report.  What kernel version did this come from?  Was
> > > it 3.9-rc4?  (line 1591 for 3.9-rc3 doesn't contain a BUG_ON).
> > Yes, the lastest mainline.
> > > 
> > > If it is indeed 3.9-rc4, it would be helpful, since you can reproduce
> > > the problem, to insert a debugging printk which fires when
> > > bh->b_blocknr != pblock before the BUG_ON, and have it print the
> > > b_blocknr and pblock values.
> > bh->b_blocknr=100346
> > pblock=66797
> > 
> > Bisecting results so far,
> > git bisect start
> > # good: [a937536b868b8369b98967929045f1df54234323] Linux 3.9-rc3
> > git bisect good a937536b868b8369b98967929045f1df54234323
> > # bad: [9064171268d838b8f283fe111ef086b9479d059a] Merge tag 'for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes
> > git bisect bad 9064171268d838b8f283fe111ef086b9479d059a
> > # bad: [38d78e587d4960d0db94add518d27ee74bad2301] mqueue: sys_mq_open: do
> > not call mnt_drop_write() if read-only
> > git bisect bad 38d78e587d4960d0db94add518d27ee74bad2301
> > # good: [e7489622d3603b7d161b484dcd340d9f678b0c7a] Merge tag 'arm64-fixes'
> > of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
> > git bisect good e7489622d3603b7d161b484dcd340d9f678b0c7a
> > # good: [172a271b5e090da7468c66b9ccbcdb3d929eed75] Merge branch 'drm-fixes'
> > of git://people.freedesktop.org/~airlied/linux
> > git bisect good 172a271b5e090da7468c66b9ccbcdb3d929eed75
> > # good: [0a7e453103b9718d357688b83bb968ee108cc874] Merge branch 'next' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
> > git bisect good 0a7e453103b9718d357688b83bb968ee108cc874
> > # bad: [0e401101db49959f5783f6ee9e676124b5a183ac] ext4: fix memory leakage
> > in mext_check_coverage
> > git bisect bad 0e401101db49959f5783f6ee9e676124b5a183ac
> > CAI Qian
> > > 
> > > Thanks,
> > > 
> > >   - Ted
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 160 matches

Mail list logo