Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Russell King - ARM Linux
On Fri, Sep 27, 2013 at 10:04:44AM -0600, Bjorn Helgaas wrote:
> [+cc Thomas, Russell]

Someone is doing something quite bad in the kernel, and as yet I've not
figured out a way to track it down.

The issue is this: someone is kfree'ing a kobject before its release
function has been called, and the memory is being re-used.  The problem
is that when the last reference has been dropped with the debug enabled,
the kobject is linked into the timer lists for the delayed work.  When
the timer lists get run, they're found to be corrupted.

The obvious solution to this is to move the delayed work out of the
kobject into a separately allocated structure.  That would work if
x86 didn't register kobjects very early in boot, before the memory
allocators were up and running.

Frankly, I've no idea how to solve this.  So I regard x86 as just being
difficult and broken.  :)

If anyone has any ideas, then I'm all ears.
http://www.annhuey.com/ed-pix/fa_i-pix/I%27m-All-Ears.jpg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Bjorn Helgaas
[+cc Thomas, Russell]

On Fri, Sep 27, 2013 at 7:13 AM, Zdenek Kabelac  wrote:
> Dne 27.9.2013 13:57, Zdenek Kabelac napsal(a):
>
>> Hi
>>
>>
>> I'm trying to use -rc2 kernel however I'm getting quite often regular
>> kernel
>> panic:
>>
>> Here is a BUG trace from kvm running this kernel:
>> (I'm building kernel with some kernel debug checks)
>> (Kernel is used in 64bit qemu and running 32bit Debian environment)
>> linux-vanilla git: 4b97280675f45c1650ee4e388bd711ecbb18c4b4
>> (on top of that there are few minor unrelated patches)
>>
>>
>> [  235.631952] loop: module loaded
>> [  235.971853] bio: create slab  at 1
>> [  237.355014] bio: create slab  at 2
>> [  237.671371] BUG: unable to handle kernel NULL pointer dereference at
>> 0018
>> [  237.674537] IP: []
>> get_next_timer_interrupt+0x168/0x250
>> [  237.674537] PGD 16939067 PUD 14257067 PMD 0
>> [  237.674537] Oops:  [#1] PREEMPT SMP
>> [  237.674537] Modules linked in: loop dm_thin_pool dm_persistent_data
>
>
>
> Here is the same trace from my native  HW   Lenovo T61:
>
> I'm suspecting new debug option:
>  CONFIG_DEBUG_KOBJECT_RELEASE which I've recently enabled)
>
> I've also noticed there are much older reports for this problem:
> i.e. https://lkml.org/lkml/2013/3/9/3
>
> I can trigger this bug very easily (makes 3.12-rc2 unusable for my desktop)

Yep, I see this crash 100% of the time with v3.12-rc2 and
CONFIG_DEBUG_KOBJECT_RELEASE=y with this qemu invocation and attached
q35-chipset.cfg:

/usr/local/bin/qemu-system-x86_64 -M q35 -readconfig ./q35-chipset.cfg
-enable-kvm -m 512 -drive file=ubuntu.img,if=none,id=mydisk -device
ide-drive,drive=mydisk,bus=ide.0 -nographic -monitor
telnet:localhost:7001,server,nowait,nodelay -kernel
~/linux/arch/x86/boot/bzImage -append "console=ttyS0,115200n8
root=/dev/sda1 ignore_loglevel printk.time=n"


q35-chipset.cfg
Description: Binary data


Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Zdenek Kabelac

Dne 27.9.2013 13:57, Zdenek Kabelac napsal(a):

Hi


I'm trying to use -rc2 kernel however I'm getting quite often regular kernel
panic:

Here is a BUG trace from kvm running this kernel:
(I'm building kernel with some kernel debug checks)
(Kernel is used in 64bit qemu and running 32bit Debian environment)
linux-vanilla git: 4b97280675f45c1650ee4e388bd711ecbb18c4b4
(on top of that there are few minor unrelated patches)


[  235.631952] loop: module loaded
[  235.971853] bio: create slab  at 1
[  237.355014] bio: create slab  at 2
[  237.671371] BUG: unable to handle kernel NULL pointer dereference at
0018
[  237.674537] IP: [] get_next_timer_interrupt+0x168/0x250
[  237.674537] PGD 16939067 PUD 14257067 PMD 0
[  237.674537] Oops:  [#1] PREEMPT SMP
[  237.674537] Modules linked in: loop dm_thin_pool dm_persistent_data



Here is the same trace from my native  HW   Lenovo T61:

I'm suspecting new debug option:
 CONFIG_DEBUG_KOBJECT_RELEASE which I've recently enabled)

I've also noticed there are much older reports for this problem:
i.e. https://lkml.org/lkml/2013/3/9/3

I can trigger this bug very easily (makes 3.12-rc2 unusable for my desktop)


[  120.327263] bio: create slab  at 1
[  120.633731] bio: create slab  at 2
[  120.662856] BUG: unable to handle kernel NULL pointer dereference at 
0018

[  120.666137] IP: [] get_next_timer_interrupt+0x168/0x250
[  120.666137] PGD 0
[  120.666137] Oops:  [#1] PREEMPT SMP
[  120.666137] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio 
dm_bio_prison dm_mod libcrc32c ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp tun bridge stp llc ipv6 ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables bnep btusb bluetooth 
hid_generic usbhid hid snd_hda_codec_analog arc4 iTCO_wdt iTCO_vendor_support 
coretemp iwl3945 kvm_intel iwlegacy kvm mac80211 snd_hda_intel snd_hda_codec 
snd_seq microcode snd_seq_device sdhci_pci r852 cfg80211 sm_common psmouse 
nand sdhci i2c_i801 e1000e nand_ecc snd_pcm nand_ids i2c_core serio_raw r592 
mmc_core mtd lpc_ich memstick mfd_core ptp snd_page_alloc snd_timer 
thinkpad_acpi pps_core wmi nvram snd soundcore evdev binfmt_misc nfsd 
auth_rpcgss oid_registry exportfs nfs_acl lockd loop sunrpc pcmcia sr_mod 
cdrom yenta_socket ehci_pci uhci_hcd ehci_hcd usbcore usb_common video 
backlight autofs4
[  120.666137] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 
3.12.0-rc2-00088-gfcbfc0d #163
[  120.666137] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[  120.666137] task: 81a114c0 ti: 81a0 task.ti: 
81a0
[  120.666137] RIP: 0010:[]  [] 
get_next_timer_interrupt+0x168/0x250

[  120.666137] RSP: 0018:81a01e50  EFLAGS: 00010013
[  120.666137] RAX:  RBX: 2dd6 RCX: 
[  120.666137] RDX:  RSI: 81dfc508 RDI: 002e
[  120.666137] RBP: 81a01e98 R08: 0001 R09: 002e
[  120.666137] R10: 002e R11: 81dfc228 R12: 00013fff2dd5
[  120.666137] R13: 81dfb1c0 R14: 81a01e58 R15: 81a01e70
[  120.666137] FS:  () GS:88013720() 
knlGS:

[  120.666137] CS:  0010 DS:  ES:  CR0: 8005003b
[  120.666137] CR2: 0018 CR3: 0001341c3000 CR4: 07f0
[  120.666137] Stack:
[  120.666137]  81dfc228 81dfc628 81dfca28 
81dfce28
[  120.666137]   001c18108669 2dd6 
88013720d080
[  120.666137]  88013720de40 81a01f00 810bdce5 
001b31c77648

[  120.666137] Call Trace:
[  120.666137]  [] __tick_nohz_idle_enter+0x2e5/0x550
[  120.666137]  [] tick_nohz_idle_enter+0x41/0x70
[  120.666137]  [] cpu_startup_entry+0x3c/0x400
[  120.666137]  [] rest_init+0x132/0x140
[  120.666137]  [] ? rest_init+0x5/0x140
[  120.666137]  [] start_kernel+0x3c2/0x3cf
[  120.666137]  [] ? repair_env_string+0x5c/0x5c
[  120.666137]  [] x86_64_start_reservations+0x2a/0x2c
[  120.666137]  [] x86_64_start_kernel+0xf1/0xf4
[  120.666137] Code: 89 fa 41 83 e2 3f 45 89 d1 66 2e 0f 1f 84 00 00 00 00 00 
49 63 f1 48 c1 e6 04 4c 01 de 48 8b 06 48 39 f0 74 25 66 0f 1f 44 00 00  
40 18 01 75 11 48 8b 48 10 41 b8 01 00 00 00 48 39 d1 48 0f

[  120.666137] RIP  [] get_next_timer_interrupt+0x168/0x250
[  120.666137]  RSP 
[  120.666137] CR2: 0018
[  120.666137] ---[ end trace c4429f55908a7532 ]---
[  120.666137] Kernel panic - not syncing: Attempted to kill the idle task!
[  121.005821] BUG: spinlock lockup suspected on CPU#0, swapper/0/0
[  121.005821]  lock: boot_tvec_bases+0x0/0x2080, .magic: dead4ead, .owner: 
swapper/0/0, .owner_cpu: 0
[  121.005821] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  D W 
3.12.0-rc2-00088-gfcbfc0d #163
[  121.005821] Hardware 

Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Zdenek Kabelac

Hi


I'm trying to use -rc2 kernel however I'm getting quite often regular kernel 
panic:


Here is a BUG trace from kvm running this kernel:
(I'm building kernel with some kernel debug checks)
(Kernel is used in 64bit qemu and running 32bit Debian environment)
linux-vanilla git: 4b97280675f45c1650ee4e388bd711ecbb18c4b4
(on top of that there are few minor unrelated patches)


[  235.631952] loop: module loaded
[  235.971853] bio: create slab  at 1
[  237.355014] bio: create slab  at 2
[  237.671371] BUG: unable to handle kernel NULL pointer dereference at 
0018

[  237.674537] IP: [] get_next_timer_interrupt+0x168/0x250
[  237.674537] PGD 16939067 PUD 14257067 PMD 0
[  237.674537] Oops:  [#1] PREEMPT SMP
[  237.674537] Modules linked in: loop dm_thin_pool dm_persistent_data 
dm_bufio dm_bio_prison libcrc32c nfsv4 nfs nfsd auth_rpcgss oid_registry 
exportfs nfs_acl lockd sunrpc autofs4 fuse dm_crypt dm_mod uhci_hcd ehci_hcd 
virtio_net serio_raw usbcore floppy i2c_piix4 i2c_core usb_common evdev
[  237.674537] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
3.12.0-rc2-00088-gfcbfc0d #163

[  237.674537] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  237.674537] task: 81a114c0 ti: 81a0 task.ti: 
81a0
[  237.674537] RIP: 0010:[]  [] 
get_next_timer_interrupt+0x168/0x250

[  237.674537] RSP: :81a01e50  EFLAGS: 00010013
[  237.674537] RAX:  RBX: b6f5 RCX: 0001
[  237.674537] RDX: 0001 RSI: 81dfc598 RDI: 00b7
[  237.674537] RBP: 81a01e98 R08: 0001 R09: 0037
[  237.674537] R10: 0037 R11: 81dfc228 R12: 00013fffb6f4
[  237.674537] R13: 81dfb1c0 R14: 81a01e58 R15: 81a01e70
[  237.674537] FS:  () GS:88001fc0() 
knlGS:

[  237.674537] CS:  0010 DS:  ES:  CR0: 8005003b
[  237.674537] CR2: 0018 CR3: 1c799000 CR4: 06f0
[  237.674537] Stack:
[  237.674537]  81dfc228 81dfc628 81dfca28 
81dfce28
[  237.674537]   0037564f1895 b6f5 
88001fc0d080
[  237.674537]  88001fc0de40 81a01f00 810bdce5 
002d6bc39000

[  237.674537] Call Trace:
[  237.674537]  [] __tick_nohz_idle_enter+0x2e5/0x550
[  237.674537]  [] tick_nohz_idle_enter+0x41/0x70
[  237.674537]  [] cpu_startup_entry+0x3c/0x400
[  237.674537]  [] rest_init+0x132/0x140
[  237.674537]  [] ? rest_init+0x5/0x140
[  237.674537]  [] start_kernel+0x3c2/0x3cf
[  237.674537]  [] ? repair_env_string+0x5c/0x5c
[  237.674537]  [] x86_64_start_reservations+0x2a/0x2c
[  237.674537]  [] x86_64_start_kernel+0xf1/0xf4
[  237.674537] Code: 89 fa 41 83 e2 3f 45 89 d1 66 2e 0f 1f 84 00 00 00 00 00 
49 63 f1 48 c1 e6 04 4c 01 de 48 8b 06 48 39 f0 74 25 66 0f 1f 44 00 00  
40 18 01 75 11 48 8b 48 10 41 b8 01 00 00 00 48 39 d1 48 0f

[  237.674537] RIP  [] get_next_timer_interrupt+0x168/0x250
[  237.674537]  RSP 
[  237.674537] CR2: 0018
[  237.674537] ---[ end trace 4cd6f72a56546bde ]---




Kernel config:


CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_RCU_FAST_NO_HZ=y
CONFIG_TREE_RCU_TRACE=y
CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_SWAP_ENABLED=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_UID16=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y

Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Zdenek Kabelac

Hi


I'm trying to use -rc2 kernel however I'm getting quite often regular kernel 
panic:


Here is a BUG trace from kvm running this kernel:
(I'm building kernel with some kernel debug checks)
(Kernel is used in 64bit qemu and running 32bit Debian environment)
linux-vanilla git: 4b97280675f45c1650ee4e388bd711ecbb18c4b4
(on top of that there are few minor unrelated patches)


[  235.631952] loop: module loaded
[  235.971853] bio: create slab bio-1 at 1
[  237.355014] bio: create slab bio-2 at 2
[  237.671371] BUG: unable to handle kernel NULL pointer dereference at 
0018

[  237.674537] IP: [8105a008] get_next_timer_interrupt+0x168/0x250
[  237.674537] PGD 16939067 PUD 14257067 PMD 0
[  237.674537] Oops:  [#1] PREEMPT SMP
[  237.674537] Modules linked in: loop dm_thin_pool dm_persistent_data 
dm_bufio dm_bio_prison libcrc32c nfsv4 nfs nfsd auth_rpcgss oid_registry 
exportfs nfs_acl lockd sunrpc autofs4 fuse dm_crypt dm_mod uhci_hcd ehci_hcd 
virtio_net serio_raw usbcore floppy i2c_piix4 i2c_core usb_common evdev
[  237.674537] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
3.12.0-rc2-00088-gfcbfc0d #163

[  237.674537] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  237.674537] task: 81a114c0 ti: 81a0 task.ti: 
81a0
[  237.674537] RIP: 0010:[8105a008]  [8105a008] 
get_next_timer_interrupt+0x168/0x250

[  237.674537] RSP: :81a01e50  EFLAGS: 00010013
[  237.674537] RAX:  RBX: b6f5 RCX: 0001
[  237.674537] RDX: 0001 RSI: 81dfc598 RDI: 00b7
[  237.674537] RBP: 81a01e98 R08: 0001 R09: 0037
[  237.674537] R10: 0037 R11: 81dfc228 R12: 00013fffb6f4
[  237.674537] R13: 81dfb1c0 R14: 81a01e58 R15: 81a01e70
[  237.674537] FS:  () GS:88001fc0() 
knlGS:

[  237.674537] CS:  0010 DS:  ES:  CR0: 8005003b
[  237.674537] CR2: 0018 CR3: 1c799000 CR4: 06f0
[  237.674537] Stack:
[  237.674537]  81dfc228 81dfc628 81dfca28 
81dfce28
[  237.674537]   0037564f1895 b6f5 
88001fc0d080
[  237.674537]  88001fc0de40 81a01f00 810bdce5 
002d6bc39000

[  237.674537] Call Trace:
[  237.674537]  [810bdce5] __tick_nohz_idle_enter+0x2e5/0x550
[  237.674537]  [810bdf91] tick_nohz_idle_enter+0x41/0x70
[  237.674537]  [810ac89c] cpu_startup_entry+0x3c/0x400
[  237.674537]  [8158bce2] rest_init+0x132/0x140
[  237.674537]  [8158bbb5] ? rest_init+0x5/0x140
[  237.674537]  [81cb1e49] start_kernel+0x3c2/0x3cf
[  237.674537]  [81cb188f] ? repair_env_string+0x5c/0x5c
[  237.674537]  [81cb15a3] x86_64_start_reservations+0x2a/0x2c
[  237.674537]  [81cb1696] x86_64_start_kernel+0xf1/0xf4
[  237.674537] Code: 89 fa 41 83 e2 3f 45 89 d1 66 2e 0f 1f 84 00 00 00 00 00 
49 63 f1 48 c1 e6 04 4c 01 de 48 8b 06 48 39 f0 74 25 66 0f 1f 44 00 00 f6 
40 18 01 75 11 48 8b 48 10 41 b8 01 00 00 00 48 39 d1 48 0f

[  237.674537] RIP  [8105a008] get_next_timer_interrupt+0x168/0x250
[  237.674537]  RSP 81a01e50
[  237.674537] CR2: 0018
[  237.674537] ---[ end trace 4cd6f72a56546bde ]---




Kernel config:


CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

CONFIG_TICK_CPU_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_RCU_FAST_NO_HZ=y
CONFIG_TREE_RCU_TRACE=y
CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_SWAP_ENABLED=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_UID16=y
CONFIG_KALLSYMS=y

Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Zdenek Kabelac

Dne 27.9.2013 13:57, Zdenek Kabelac napsal(a):

Hi


I'm trying to use -rc2 kernel however I'm getting quite often regular kernel
panic:

Here is a BUG trace from kvm running this kernel:
(I'm building kernel with some kernel debug checks)
(Kernel is used in 64bit qemu and running 32bit Debian environment)
linux-vanilla git: 4b97280675f45c1650ee4e388bd711ecbb18c4b4
(on top of that there are few minor unrelated patches)


[  235.631952] loop: module loaded
[  235.971853] bio: create slab bio-1 at 1
[  237.355014] bio: create slab bio-2 at 2
[  237.671371] BUG: unable to handle kernel NULL pointer dereference at
0018
[  237.674537] IP: [8105a008] get_next_timer_interrupt+0x168/0x250
[  237.674537] PGD 16939067 PUD 14257067 PMD 0
[  237.674537] Oops:  [#1] PREEMPT SMP
[  237.674537] Modules linked in: loop dm_thin_pool dm_persistent_data



Here is the same trace from my native  HW   Lenovo T61:

I'm suspecting new debug option:
 CONFIG_DEBUG_KOBJECT_RELEASE which I've recently enabled)

I've also noticed there are much older reports for this problem:
i.e. https://lkml.org/lkml/2013/3/9/3

I can trigger this bug very easily (makes 3.12-rc2 unusable for my desktop)


[  120.327263] bio: create slab bio-1 at 1
[  120.633731] bio: create slab bio-2 at 2
[  120.662856] BUG: unable to handle kernel NULL pointer dereference at 
0018

[  120.666137] IP: [8105a008] get_next_timer_interrupt+0x168/0x250
[  120.666137] PGD 0
[  120.666137] Oops:  [#1] PREEMPT SMP
[  120.666137] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio 
dm_bio_prison dm_mod libcrc32c ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp tun bridge stp llc ipv6 ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables bnep btusb bluetooth 
hid_generic usbhid hid snd_hda_codec_analog arc4 iTCO_wdt iTCO_vendor_support 
coretemp iwl3945 kvm_intel iwlegacy kvm mac80211 snd_hda_intel snd_hda_codec 
snd_seq microcode snd_seq_device sdhci_pci r852 cfg80211 sm_common psmouse 
nand sdhci i2c_i801 e1000e nand_ecc snd_pcm nand_ids i2c_core serio_raw r592 
mmc_core mtd lpc_ich memstick mfd_core ptp snd_page_alloc snd_timer 
thinkpad_acpi pps_core wmi nvram snd soundcore evdev binfmt_misc nfsd 
auth_rpcgss oid_registry exportfs nfs_acl lockd loop sunrpc pcmcia sr_mod 
cdrom yenta_socket ehci_pci uhci_hcd ehci_hcd usbcore usb_common video 
backlight autofs4
[  120.666137] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 
3.12.0-rc2-00088-gfcbfc0d #163
[  120.666137] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[  120.666137] task: 81a114c0 ti: 81a0 task.ti: 
81a0
[  120.666137] RIP: 0010:[8105a008]  [8105a008] 
get_next_timer_interrupt+0x168/0x250

[  120.666137] RSP: 0018:81a01e50  EFLAGS: 00010013
[  120.666137] RAX:  RBX: 2dd6 RCX: 
[  120.666137] RDX:  RSI: 81dfc508 RDI: 002e
[  120.666137] RBP: 81a01e98 R08: 0001 R09: 002e
[  120.666137] R10: 002e R11: 81dfc228 R12: 00013fff2dd5
[  120.666137] R13: 81dfb1c0 R14: 81a01e58 R15: 81a01e70
[  120.666137] FS:  () GS:88013720() 
knlGS:

[  120.666137] CS:  0010 DS:  ES:  CR0: 8005003b
[  120.666137] CR2: 0018 CR3: 0001341c3000 CR4: 07f0
[  120.666137] Stack:
[  120.666137]  81dfc228 81dfc628 81dfca28 
81dfce28
[  120.666137]   001c18108669 2dd6 
88013720d080
[  120.666137]  88013720de40 81a01f00 810bdce5 
001b31c77648

[  120.666137] Call Trace:
[  120.666137]  [810bdce5] __tick_nohz_idle_enter+0x2e5/0x550
[  120.666137]  [810bdf91] tick_nohz_idle_enter+0x41/0x70
[  120.666137]  [810ac89c] cpu_startup_entry+0x3c/0x400
[  120.666137]  [8158bce2] rest_init+0x132/0x140
[  120.666137]  [8158bbb5] ? rest_init+0x5/0x140
[  120.666137]  [81cb1e49] start_kernel+0x3c2/0x3cf
[  120.666137]  [81cb188f] ? repair_env_string+0x5c/0x5c
[  120.666137]  [81cb15a3] x86_64_start_reservations+0x2a/0x2c
[  120.666137]  [81cb1696] x86_64_start_kernel+0xf1/0xf4
[  120.666137] Code: 89 fa 41 83 e2 3f 45 89 d1 66 2e 0f 1f 84 00 00 00 00 00 
49 63 f1 48 c1 e6 04 4c 01 de 48 8b 06 48 39 f0 74 25 66 0f 1f 44 00 00 f6 
40 18 01 75 11 48 8b 48 10 41 b8 01 00 00 00 48 39 d1 48 0f

[  120.666137] RIP  [8105a008] get_next_timer_interrupt+0x168/0x250
[  120.666137]  RSP 81a01e50
[  120.666137] CR2: 0018
[  120.666137] ---[ end trace c4429f55908a7532 ]---
[  120.666137] Kernel panic - not syncing: Attempted to kill the idle task!
[  121.005821] BUG: spinlock 

Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Bjorn Helgaas
[+cc Thomas, Russell]

On Fri, Sep 27, 2013 at 7:13 AM, Zdenek Kabelac zkabe...@redhat.com wrote:
 Dne 27.9.2013 13:57, Zdenek Kabelac napsal(a):

 Hi


 I'm trying to use -rc2 kernel however I'm getting quite often regular
 kernel
 panic:

 Here is a BUG trace from kvm running this kernel:
 (I'm building kernel with some kernel debug checks)
 (Kernel is used in 64bit qemu and running 32bit Debian environment)
 linux-vanilla git: 4b97280675f45c1650ee4e388bd711ecbb18c4b4
 (on top of that there are few minor unrelated patches)


 [  235.631952] loop: module loaded
 [  235.971853] bio: create slab bio-1 at 1
 [  237.355014] bio: create slab bio-2 at 2
 [  237.671371] BUG: unable to handle kernel NULL pointer dereference at
 0018
 [  237.674537] IP: [8105a008]
 get_next_timer_interrupt+0x168/0x250
 [  237.674537] PGD 16939067 PUD 14257067 PMD 0
 [  237.674537] Oops:  [#1] PREEMPT SMP
 [  237.674537] Modules linked in: loop dm_thin_pool dm_persistent_data



 Here is the same trace from my native  HW   Lenovo T61:

 I'm suspecting new debug option:
  CONFIG_DEBUG_KOBJECT_RELEASE which I've recently enabled)

 I've also noticed there are much older reports for this problem:
 i.e. https://lkml.org/lkml/2013/3/9/3

 I can trigger this bug very easily (makes 3.12-rc2 unusable for my desktop)

Yep, I see this crash 100% of the time with v3.12-rc2 and
CONFIG_DEBUG_KOBJECT_RELEASE=y with this qemu invocation and attached
q35-chipset.cfg:

/usr/local/bin/qemu-system-x86_64 -M q35 -readconfig ./q35-chipset.cfg
-enable-kvm -m 512 -drive file=ubuntu.img,if=none,id=mydisk -device
ide-drive,drive=mydisk,bus=ide.0 -nographic -monitor
telnet:localhost:7001,server,nowait,nodelay -kernel
~/linux/arch/x86/boot/bzImage -append console=ttyS0,115200n8
root=/dev/sda1 ignore_loglevel printk.time=n


q35-chipset.cfg
Description: Binary data


Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Russell King - ARM Linux
On Fri, Sep 27, 2013 at 10:04:44AM -0600, Bjorn Helgaas wrote:
 [+cc Thomas, Russell]

Someone is doing something quite bad in the kernel, and as yet I've not
figured out a way to track it down.

The issue is this: someone is kfree'ing a kobject before its release
function has been called, and the memory is being re-used.  The problem
is that when the last reference has been dropped with the debug enabled,
the kobject is linked into the timer lists for the delayed work.  When
the timer lists get run, they're found to be corrupted.

The obvious solution to this is to move the delayed work out of the
kobject into a separately allocated structure.  That would work if
x86 didn't register kobjects very early in boot, before the memory
allocators were up and running.

Frankly, I've no idea how to solve this.  So I regard x86 as just being
difficult and broken.  :)

If anyone has any ideas, then I'm all ears.
http://www.annhuey.com/ed-pix/fa_i-pix/I%27m-All-Ears.jpg
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/