Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity
On Sun, Mar 7, 2021 at 11:09 AM Hillf Danton wrote: > > On Sun, 7 Mar 2021 08:46:19 +0100 Dmitry Vyukov wrote: > > On Sun, Mar 7, 2021 at 3:15 AM Hillf Danton wrote: > > > > > > Dmitry can you shed some light on the tricks to config kasan to print > > > Call Trace as the reports with the leading [syzbot] on the subject line > > > do? > > > > +kasan-dev > > > > Hi Hillf, > > > > KASAN prints stack traces always unconditionally. There is nothing you > > need to do at all. > > Got it, thanks. > > > Do you have any reports w/o stack traces? > > No, but I saw different formats in Call Trace prints. > > Below from [1] is the instance without file name and line number printed, > while both info help spot the cause of the reported issue. KASAN always prints stack traces w/o file:line info, like any other kernel bug detection facility. Kernel itself never symbolizes reports. In case of syzkaller, syzkaller will symbolize reports and add file:line info. The main config it requires is CONFIG_DEBUG_INFO. You may see syzkaller kernel configuration guide here: https://github.com/google/syzkaller/blob/master/docs/linux/kernel_configs.md Or fragments that are actually used to generate syzbot configs in this dir (the guide above may be out-of-date): https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/base.yml https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/debug.yml https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/kasan.yml Or a complete syzbot config here: https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-apparmor-kasan.config > > > > I was running syzkaller and I found the following issue : > > Head Commit : b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ( v5.10.4 ) > Git Tree : stable > Console Output : > [ 242.769080] INFO: task repro:2639 blocked for more than 120 seconds. > [ 242.769096] Not tainted 5.10.4 #8 > [ 242.769103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 242.769112] task:repro state:D stack:0 pid: 2639 > ppid: 2638 flags:0x0004 > [ 242.769126] Call Trace: > [ 242.769148] __schedule+0x28d/0x7e0 > [ 242.769162] ? __percpu_counter_sum+0x75/0x90 > [ 242.769175] schedule+0x4f/0xc0 > [ 242.769187] __io_uring_task_cancel+0xad/0xf0 > [ 242.769198] ? wait_woken+0x80/0x80 > [ 242.769210] bprm_execve+0x67/0x8a0 > [ 242.769223] do_execveat_common+0x1d2/0x220 > [ 242.769235] __x64_sys_execveat+0x5d/0x70 > [ 242.769249] do_syscall_64+0x38/0x90 > [ 242.769260] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [1] > https://lore.kernel.org/lkml/CAGyP=7cfm6bje7x2pn9yuptqgt5uqywm4avmoivayqpjg1p...@mail.gmail.com/
Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity
On Sun, Mar 7, 2021 at 3:15 AM Hillf Danton wrote: > > On Fri, 5 Mar 2021 18:01:04 +0800 Ming Lei wrote: > > On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote: > > > I'm thinking of a way to debug this too. The symptom may hint at a > > > use-after-free. Could you enable KASAN in your tests? (On the flip > > > side, I know this might change timings, thereby making the fault > > > disappear). > > > > I have asked our QE to reproduce the issue with debug kernel, which may > > take a > > while. And I can't trigger it in my box. > > > > BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to: > > > > (gdb) l *(__bfq_deactivate_entity+0x5b) > > 0x814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181). > > 1176 * bfq_group_set_parent has already been invoked for the group > > 1177 * represented by entity. Therefore, the field > > 1178 * entity->sched_data has been set, and we can safely use it. > > 1179 */ > > 1180 st = bfq_entity_service_tree(entity); > > 1181 is_in_service = entity == sd->in_service_entity; > > 1182 > > 1183 bfq_calc_finish(entity, entity->service); > > 1184 > > 1185 if (is_in_service) > > > > Seems entity->sched_data points to NULL. > > Hi Ming, > > Thanks for your report. > > Given the invalid pointer cannot explain line 1180, you are reporting > a different issue from what Mike reported, and we can do nothing now > for both without a reproducer. > > Dmitry can you shed some light on the tricks to config kasan to print > Call Trace as the reports with the leading [syzbot] on the subject line do? +kasan-dev Hi Hillf, KASAN prints stack traces always unconditionally. There is nothing you need to do at all. Do you have any reports w/o stack traces? "[syzbot]" is prepend by syzbot code. If you want some prefix, you would need to prepend it manually. > > > Thanks, > > > Paolo > > > > > > > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei > > > > ha scritto: > > > > > > > > Hello Hillf, > > > > > > > > Thanks for the debug patch. > > > > > > > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton wrote: > > > >> > > > >> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > > > >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > > > >>> wrote: > > > > > > Paolo, Jens I am sorry for the noise. > > > But today I hit the kernel panic and git blame said that you have > > > created the file in which happened panic (this I saw from trace) > > > > > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line > > > /lib/debug/lib/modules/`uname -r`/vmlinux > > > __bfq_deactivate_entity+0x15a > > > __bfq_deactivate_entity+0x15a/0x240: > > > bfq_gt at block/bfq-wf2q.c:20 > > > (inlined by) bfq_insert at block/bfq-wf2q.c:381 > > > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > > > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > > > > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > > > > > $ head /sys/block/*/queue/scheduler > > > ==> /sys/block/nvme0n1/queue/scheduler <== > > > [none] mq-deadline kyber bfq > > > > > > ==> /sys/block/sda/queue/scheduler <== > > > mq-deadline kyber [bfq] none > > > > > > ==> /sys/block/zram0/queue/scheduler <== > > > none > > > > > > Trace: > > > general protection fault, probably for non-canonical address > > > 0x46b1b0f0d8856e4a: [#1] SMP NOPTI > > > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW > > > - --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > > > Hardware name: System manufacturer System Product Name/ROG STRIX > > > X570-I GAMING, BIOS 2606 08/13/2020 > > > Workqueue: kblockd blk_mq_run_work_fn > > > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > > > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > > > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> > > > 8b > > > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > > > RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002 > > > RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a > > > RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb > > > RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: > > > R10: 0018 R11: 0018 R12: 8dc904927150 > > > R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88 > > > FS: () GS:8dc90e0c() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 > > > Call Trace: > > > bfq_deactivate_entity+0x4f/0xc0 > > > >>> > > > >>> Hello, > > > >>> > > > >>> The same stack trace was observed in RH internal tes
Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote: > I'm thinking of a way to debug this too. The symptom may hint at a > use-after-free. Could you enable KASAN in your tests? (On the flip > side, I know this might change timings, thereby making the fault > disappear). I have asked our QE to reproduce the issue with debug kernel, which may take a while. And I can't trigger it in my box. BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to: (gdb) l *(__bfq_deactivate_entity+0x5b) 0x814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181). 1176 * bfq_group_set_parent has already been invoked for the group 1177 * represented by entity. Therefore, the field 1178 * entity->sched_data has been set, and we can safely use it. 1179 */ 1180st = bfq_entity_service_tree(entity); 1181is_in_service = entity == sd->in_service_entity; 1182 1183bfq_calc_finish(entity, entity->service); 1184 1185if (is_in_service) Seems entity->sched_data points to NULL. > > Thanks, > Paolo > > > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei ha > > scritto: > > > > Hello Hillf, > > > > Thanks for the debug patch. > > > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton wrote: > >> > >> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > >>> wrote: > > Paolo, Jens I am sorry for the noise. > But today I hit the kernel panic and git blame said that you have > created the file in which happened panic (this I saw from trace) > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line > /lib/debug/lib/modules/`uname -r`/vmlinux > __bfq_deactivate_entity+0x15a > __bfq_deactivate_entity+0x15a/0x240: > bfq_gt at block/bfq-wf2q.c:20 > (inlined by) bfq_insert at block/bfq-wf2q.c:381 > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > $ head /sys/block/*/queue/scheduler > ==> /sys/block/nvme0n1/queue/scheduler <== > [none] mq-deadline kyber bfq > > ==> /sys/block/sda/queue/scheduler <== > mq-deadline kyber [bfq] none > > ==> /sys/block/zram0/queue/scheduler <== > none > > Trace: > general protection fault, probably for non-canonical address > 0x46b1b0f0d8856e4a: [#1] SMP NOPTI > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW > - --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > Hardware name: System manufacturer System Product Name/ROG STRIX > X570-I GAMING, BIOS 2606 08/13/2020 > Workqueue: kblockd blk_mq_run_work_fn > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002 > RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a > RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb > RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: > R10: 0018 R11: 0018 R12: 8dc904927150 > R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88 > FS: () GS:8dc90e0c() > knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 > Call Trace: > bfq_deactivate_entity+0x4f/0xc0 > >>> > >>> Hello, > >>> > >>> The same stack trace was observed in RH internal test too, and kernel > >>> is 5.11.0-0.rc6, > >>> but there isn't reproducer yet. > >>> > >>> > >>> -- > >>> Ming Lei > >> > >> Add some debug info. > >> > >> --- x/block/bfq-wf2q.c > >> +++ y/block/bfq-wf2q.c > >> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq > >> > >>entity->on_st_or_in_serv = false; > >>st->wsum -= entity->weight; > >> - if (bfqq && !is_in_service) > >> + if (bfqq && !is_in_service) { > >> + WARN_ON(entity->tree != NULL); > >>bfq_put_queue(bfqq); > >> + } > >> } > >> > >> /** > >> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct > >> * bfqq gets freed here. > >> */ > >>int ref = in_serv_bfqq->ref; > >> + WARN_ON(in_serv_entity->tree != NULL); > >>bfq_put_queue(in_serv_bfqq); > >>if (ref == 1) > >>return true; > > > > This kernel oops isn't easy to be reproduced, and we have got another crash > > report[1] too, still on __bfq_deactivate_entity(),
Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
I'm thinking of a way to debug this too. The symptom may hint at a use-after-free. Could you enable KASAN in your tests? (On the flip side, I know this might change timings, thereby making the fault disappear). Thanks, Paolo > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei ha > scritto: > > Hello Hillf, > > Thanks for the debug patch. > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton wrote: >> >> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov >>> wrote: Paolo, Jens I am sorry for the noise. But today I hit the kernel panic and git blame said that you have created the file in which happened panic (this I saw from trace) $ /usr/src/kernels/`uname -r`/scripts/faddr2line /lib/debug/lib/modules/`uname -r`/vmlinux __bfq_deactivate_entity+0x15a __bfq_deactivate_entity+0x15a/0x240: bfq_gt at block/bfq-wf2q.c:20 (inlined by) bfq_insert at block/bfq-wf2q.c:381 (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 $ head /sys/block/*/queue/scheduler ==> /sys/block/nvme0n1/queue/scheduler <== [none] mq-deadline kyber bfq ==> /sys/block/sda/queue/scheduler <== mq-deadline kyber [bfq] none ==> /sys/block/zram0/queue/scheduler <== none Trace: general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: [#1] SMP NOPTI CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW - --- 5.9.0-0.rc8.28.fc34.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002 RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: R10: 0018 R11: 0018 R12: 8dc904927150 R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88 FS: () GS:8dc90e0c() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 Call Trace: bfq_deactivate_entity+0x4f/0xc0 >>> >>> Hello, >>> >>> The same stack trace was observed in RH internal test too, and kernel >>> is 5.11.0-0.rc6, >>> but there isn't reproducer yet. >>> >>> >>> -- >>> Ming Lei >> >> Add some debug info. >> >> --- x/block/bfq-wf2q.c >> +++ y/block/bfq-wf2q.c >> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq >> >>entity->on_st_or_in_serv = false; >>st->wsum -= entity->weight; >> - if (bfqq && !is_in_service) >> + if (bfqq && !is_in_service) { >> + WARN_ON(entity->tree != NULL); >>bfq_put_queue(bfqq); >> + } >> } >> >> /** >> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct >> * bfqq gets freed here. >> */ >>int ref = in_serv_bfqq->ref; >> + WARN_ON(in_serv_entity->tree != NULL); >>bfq_put_queue(in_serv_bfqq); >>if (ref == 1) >>return true; > > This kernel oops isn't easy to be reproduced, and we have got another crash > report[1] too, still on __bfq_deactivate_entity(), and not easy to > trigger. Can your > debug patch cover the report[1]? If not, feel free to add more debug messages, > then I will try to reproduce the two. > > [1] another kernel oops log on __bfq_deactivate_entity > > [ 899.790606] systemd-sysv-generator[25205]: SysV service > '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. > Automatically generating a unit file for compatibility. Please update > package to include a native systemd unit file, in order to make it > more safe and robust. > [ 901.937047] BUG: kernel NULL pointer dereference, address: > [ 901.944005] #PF: supervisor read access in kernel mode > [ 901.949143] #PF: error_code(0x) - not-present page > [ 901.954285] PGD 0 P4D 0 > [ 901.956824] Oops: [#1] SMP NOPTI > [ 901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G > IX - --- 5.11.0-1.el9.x86_64 #1 > [ 901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS > 2.5.4 01/13/2020 > [ 901.978480] Workqueue: cgwb_release cgwb_release_workfn > [ 901.983705] RIP: 0010:__bfq_deactivate_e
Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
Hello Hillf, Thanks for the debug patch. On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton wrote: > > On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > > On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > > wrote: > > > > > > Paolo, Jens I am sorry for the noise. > > > But today I hit the kernel panic and git blame said that you have > > > created the file in which happened panic (this I saw from trace) > > > > > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line > > > /lib/debug/lib/modules/`uname -r`/vmlinux > > > __bfq_deactivate_entity+0x15a > > > __bfq_deactivate_entity+0x15a/0x240: > > > bfq_gt at block/bfq-wf2q.c:20 > > > (inlined by) bfq_insert at block/bfq-wf2q.c:381 > > > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > > > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > > > > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > > > > > $ head /sys/block/*/queue/scheduler > > > ==> /sys/block/nvme0n1/queue/scheduler <== > > > [none] mq-deadline kyber bfq > > > > > > ==> /sys/block/sda/queue/scheduler <== > > > mq-deadline kyber [bfq] none > > > > > > ==> /sys/block/zram0/queue/scheduler <== > > > none > > > > > > Trace: > > > general protection fault, probably for non-canonical address > > > 0x46b1b0f0d8856e4a: [#1] SMP NOPTI > > > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW > > > - --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > > > Hardware name: System manufacturer System Product Name/ROG STRIX > > > X570-I GAMING, BIOS 2606 08/13/2020 > > > Workqueue: kblockd blk_mq_run_work_fn > > > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > > > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > > > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > > > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > > > RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002 > > > RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a > > > RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb > > > RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: > > > R10: 0018 R11: 0018 R12: 8dc904927150 > > > R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88 > > > FS: () GS:8dc90e0c() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 > > > Call Trace: > > > bfq_deactivate_entity+0x4f/0xc0 > > > > Hello, > > > > The same stack trace was observed in RH internal test too, and kernel > > is 5.11.0-0.rc6, > > but there isn't reproducer yet. > > > > > > -- > > Ming Lei > > Add some debug info. > > --- x/block/bfq-wf2q.c > +++ y/block/bfq-wf2q.c > @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq > > entity->on_st_or_in_serv = false; > st->wsum -= entity->weight; > - if (bfqq && !is_in_service) > + if (bfqq && !is_in_service) { > + WARN_ON(entity->tree != NULL); > bfq_put_queue(bfqq); > + } > } > > /** > @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct > * bfqq gets freed here. > */ > int ref = in_serv_bfqq->ref; > + WARN_ON(in_serv_entity->tree != NULL); > bfq_put_queue(in_serv_bfqq); > if (ref == 1) > return true; This kernel oops isn't easy to be reproduced, and we have got another crash report[1] too, still on __bfq_deactivate_entity(), and not easy to trigger. Can your debug patch cover the report[1]? If not, feel free to add more debug messages, then I will try to reproduce the two. [1] another kernel oops log on __bfq_deactivate_entity [ 899.790606] systemd-sysv-generator[25205]: SysV service '/etc/rc.d/init.d/anamon' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust. [ 901.937047] BUG: kernel NULL pointer dereference, address: [ 901.944005] #PF: supervisor read access in kernel mode [ 901.949143] #PF: error_code(0x) - not-present page [ 901.954285] PGD 0 P4D 0 [ 901.956824] Oops: [#1] SMP NOPTI [ 901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G IX - --- 5.11.0-1.el9.x86_64 #1 [ 901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS 2.5.4 01/13/2020 [ 901.978480] Workqueue: cgwb_release cgwb_release_workfn [ 901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240 [ 901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f 0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40 48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01 00 00 [ 902.007763] RSP: 0018:b77107f0bd98 EFLAGS: 00010002 [ 902.012986] RAX: 002fffd0 RBX: 9
Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov wrote: > > Paolo, Jens I am sorry for the noise. > But today I hit the kernel panic and git blame said that you have > created the file in which happened panic (this I saw from trace) > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line > /lib/debug/lib/modules/`uname -r`/vmlinux > __bfq_deactivate_entity+0x15a > __bfq_deactivate_entity+0x15a/0x240: > bfq_gt at block/bfq-wf2q.c:20 > (inlined by) bfq_insert at block/bfq-wf2q.c:381 > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 > > $ head /sys/block/*/queue/scheduler > ==> /sys/block/nvme0n1/queue/scheduler <== > [none] mq-deadline kyber bfq > > ==> /sys/block/sda/queue/scheduler <== > mq-deadline kyber [bfq] none > > ==> /sys/block/zram0/queue/scheduler <== > none > > Trace: > general protection fault, probably for non-canonical address > 0x46b1b0f0d8856e4a: [#1] SMP NOPTI > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW > - --- 5.9.0-0.rc8.28.fc34.x86_64 #1 > Hardware name: System manufacturer System Product Name/ROG STRIX > X570-I GAMING, BIOS 2606 08/13/2020 > Workqueue: kblockd blk_mq_run_work_fn > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 > RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002 > RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a > RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb > RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: > R10: 0018 R11: 0018 R12: 8dc904927150 > R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88 > FS: () GS:8dc90e0c() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 > Call Trace: > bfq_deactivate_entity+0x4f/0xc0 Hello, The same stack trace was observed in RH internal test too, and kernel is 5.11.0-0.rc6, but there isn't reproducer yet. -- Ming Lei
Re: [bugreport] [5.10-rc1] Oops: 0000 [#1] SMP NOPTI bug which always starts as page allocation failure
On Tue, Nov 3, 2020 at 4:05 PM Mikhail Gavrilov wrote: > > Hi folks. > I observed hard reproductible the set of bugs. > It always started as > 1) kworker/u64:2: page allocation failure: order:5, > mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), > nodemask=(null),cpuset=/,mems_allowed=0 > Continious as: > 2) WARNING: CPU: 21 PID: 806649 at > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7505 > amdgpu_dm_atomic_commit_tail+0x23bd/0x24e0 [amdgpu] > And ended as: > 3) BUG: unable to handle page fault for address: 00012488 > Which annoing because lead to completely computer hang. Possibly fixed with this patch? https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-5.10&id=0689dcf3e4d6b89cc2087139561dc12b60461dca Alex > > Example of one log: > > [11561.927250] kworker/u64:10: page allocation failure: order:5, > mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), > nodemask=(null),cpuset=/,mems_allowed=0 > [11561.927472] CPU: 18 PID: 39985 Comm: kworker/u64:10 Not tainted > 5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1 > [11561.927475] Hardware name: System manufacturer System Product > Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020 > [11561.927485] Workqueue: events_unbound commit_work [drm_kms_helper] > [11561.927489] Call Trace: > [11561.927496] dump_stack+0x8b/0xb0 > [11561.927501] warn_alloc.cold+0x75/0xd9 > [11561.927507] ? _cond_resched+0x16/0x50 > [11561.927512] ? __alloc_pages_direct_compact+0x159/0x180 > [11561.927518] __alloc_pages_slowpath.constprop.0+0x103f/0x1070 > [11561.927531] __alloc_pages_nodemask+0x37d/0x400 > [11561.927538] kmalloc_order+0x33/0xc0 > [11561.927542] kmalloc_order_trace+0x19/0x110 > [11561.927614] dc_create_state+0x26/0x60 [amdgpu] > [11561.927677] amdgpu_dm_atomic_commit_tail+0x1cee/0x24e0 [amdgpu] > [11561.927686] ? find_busiest_group+0x33/0x350 > [11561.927698] ? __lock_acquire+0x3b0/0x21f0 > [11561.927707] ? lock_acquire+0xc8/0x400 > [11561.927710] ? wait_for_completion_timeout+0x3b/0xf0 > [11561.927715] ? mark_held_locks+0x50/0x80 > [11561.927719] ? lockdep_hardirqs_on_prepare+0xff/0x180 > [11561.927722] ? _raw_spin_unlock_irq+0x24/0x40 > [11561.927726] ? _raw_spin_unlock_irq+0x24/0x40 > [11561.927729] ? wait_for_completion_timeout+0xdb/0xf0 > [11561.927740] commit_tail+0x94/0x130 [drm_kms_helper] > [11561.927745] process_one_work+0x27d/0x5b0 > [11561.927753] worker_thread+0x55/0x3c0 > [11561.927756] ? process_one_work+0x5b0/0x5b0 > [11561.927760] kthread+0x13a/0x150 > [11561.927763] ? __kthread_bind_mask+0x60/0x60 > [11561.927769] ret_from_fork+0x22/0x30 > [11561.927809] Mem-Info: > [11561.927816] active_anon:933848 inactive_anon:4558268 isolated_anon:118 > active_file:154021 inactive_file:80446 isolated_file:0 > unevictable:1586 dirty:32469 writeback:700 > slab_reclaimable:185330 slab_unreclaimable:176202 > mapped:514440 shmem:592199 pagetables:81732 bounce:0 > free:99082 free_pcp:2104 free_cma:0 > [11561.927820] Node 0 active_anon:3735392kB inactive_anon:18233072kB > active_file:616084kB inactive_file:321784kB unevictable:6344kB > isolated(anon):472kB isolated(file):0kB mapped:2057760kB > dirty:129876kB writeback:2800kB shmem:2368796kB shmem_thp: 0kB > shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:8kB > kernel_stack:96608kB all_unreclaimable? no > [11561.927824] Node 0 DMA free:11800kB min:32kB low:44kB high:56kB > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB > present:15992kB managed:15900kB mlocked:0kB pagetables:0kB bounce:0kB > free_pcp:0kB local_pcp:0kB free_cma:0kB > [11561.927829] lowmem_reserve[]: 0 3136 31809 31809 31809 > [11561.927839] Node 0 DMA32 free:142632kB min:26264kB low:29472kB > high:32680kB reserved_highatomic:0KB active_anon:131568kB > inactive_anon:1625184kB active_file:57556kB inactive_file:13532kB > unevictable:0kB writepending:2428kB present:3317760kB > managed:3317572kB mlocked:0kB pagetables:25624kB bounce:0kB > free_pcp:1764kB local_pcp:0kB free_cma:0kB > [11561.927844] lowmem_reserve[]: 0 0 28673 28673 28673 > [11561.927854] Node 0 Normal free:241896kB min:240300kB low:269660kB > high:299020kB reserved_highatomic:2048KB active_anon:3603472kB > inactive_anon:16607812kB active_file:558660kB inactive_file:308056kB > unevictable:6344kB writepending:130596kB present:30133248kB > managed:29370624kB mlocked:6344kB pagetables:301304kB bounce:0kB > free_pcp:6656kB local_pcp:60kB free_cma:0kB > [11561.927859] lowmem_reserve[]: 0 0 0 0 0 > [11561.927871] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB > (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB > (M) = 11800kB > [11561.927900] Node 0 DMA32: 15432*4kB (UME) 4963*8kB (UME) 2169*16kB > (UME) 201*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB > 0*4096kB = 142568kB > [11561.927923] Node 0 Normal: 49027*4kB (UMEH) 5656*8kB (MH) 20*1
[bugreport] [5.10-rc1] Oops: 0000 [#1] SMP NOPTI bug which always starts as page allocation failure
Hi folks. I observed hard reproductible the set of bugs. It always started as 1) kworker/u64:2: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 Continious as: 2) WARNING: CPU: 21 PID: 806649 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7505 amdgpu_dm_atomic_commit_tail+0x23bd/0x24e0 [amdgpu] And ended as: 3) BUG: unable to handle page fault for address: 00012488 Which annoing because lead to completely computer hang. Example of one log: [11561.927250] kworker/u64:10: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 [11561.927472] CPU: 18 PID: 39985 Comm: kworker/u64:10 Not tainted 5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1 [11561.927475] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020 [11561.927485] Workqueue: events_unbound commit_work [drm_kms_helper] [11561.927489] Call Trace: [11561.927496] dump_stack+0x8b/0xb0 [11561.927501] warn_alloc.cold+0x75/0xd9 [11561.927507] ? _cond_resched+0x16/0x50 [11561.927512] ? __alloc_pages_direct_compact+0x159/0x180 [11561.927518] __alloc_pages_slowpath.constprop.0+0x103f/0x1070 [11561.927531] __alloc_pages_nodemask+0x37d/0x400 [11561.927538] kmalloc_order+0x33/0xc0 [11561.927542] kmalloc_order_trace+0x19/0x110 [11561.927614] dc_create_state+0x26/0x60 [amdgpu] [11561.927677] amdgpu_dm_atomic_commit_tail+0x1cee/0x24e0 [amdgpu] [11561.927686] ? find_busiest_group+0x33/0x350 [11561.927698] ? __lock_acquire+0x3b0/0x21f0 [11561.927707] ? lock_acquire+0xc8/0x400 [11561.927710] ? wait_for_completion_timeout+0x3b/0xf0 [11561.927715] ? mark_held_locks+0x50/0x80 [11561.927719] ? lockdep_hardirqs_on_prepare+0xff/0x180 [11561.927722] ? _raw_spin_unlock_irq+0x24/0x40 [11561.927726] ? _raw_spin_unlock_irq+0x24/0x40 [11561.927729] ? wait_for_completion_timeout+0xdb/0xf0 [11561.927740] commit_tail+0x94/0x130 [drm_kms_helper] [11561.927745] process_one_work+0x27d/0x5b0 [11561.927753] worker_thread+0x55/0x3c0 [11561.927756] ? process_one_work+0x5b0/0x5b0 [11561.927760] kthread+0x13a/0x150 [11561.927763] ? __kthread_bind_mask+0x60/0x60 [11561.927769] ret_from_fork+0x22/0x30 [11561.927809] Mem-Info: [11561.927816] active_anon:933848 inactive_anon:4558268 isolated_anon:118 active_file:154021 inactive_file:80446 isolated_file:0 unevictable:1586 dirty:32469 writeback:700 slab_reclaimable:185330 slab_unreclaimable:176202 mapped:514440 shmem:592199 pagetables:81732 bounce:0 free:99082 free_pcp:2104 free_cma:0 [11561.927820] Node 0 active_anon:3735392kB inactive_anon:18233072kB active_file:616084kB inactive_file:321784kB unevictable:6344kB isolated(anon):472kB isolated(file):0kB mapped:2057760kB dirty:129876kB writeback:2800kB shmem:2368796kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:8kB kernel_stack:96608kB all_unreclaimable? no [11561.927824] Node 0 DMA free:11800kB min:32kB low:44kB high:56kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15900kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [11561.927829] lowmem_reserve[]: 0 3136 31809 31809 31809 [11561.927839] Node 0 DMA32 free:142632kB min:26264kB low:29472kB high:32680kB reserved_highatomic:0KB active_anon:131568kB inactive_anon:1625184kB active_file:57556kB inactive_file:13532kB unevictable:0kB writepending:2428kB present:3317760kB managed:3317572kB mlocked:0kB pagetables:25624kB bounce:0kB free_pcp:1764kB local_pcp:0kB free_cma:0kB [11561.927844] lowmem_reserve[]: 0 0 28673 28673 28673 [11561.927854] Node 0 Normal free:241896kB min:240300kB low:269660kB high:299020kB reserved_highatomic:2048KB active_anon:3603472kB inactive_anon:16607812kB active_file:558660kB inactive_file:308056kB unevictable:6344kB writepending:130596kB present:30133248kB managed:29370624kB mlocked:6344kB pagetables:301304kB bounce:0kB free_pcp:6656kB local_pcp:60kB free_cma:0kB [11561.927859] lowmem_reserve[]: 0 0 0 0 0 [11561.927871] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11800kB [11561.927900] Node 0 DMA32: 15432*4kB (UME) 4963*8kB (UME) 2169*16kB (UME) 201*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 142568kB [11561.927923] Node 0 Normal: 49027*4kB (UMEH) 5656*8kB (MH) 20*16kB (H) 10*32kB (H) 2*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 242380kB [11561.927951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [11561.927954] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [11561.927956] 847580 total pagecache pages [11561.927967] 19862 pages in swap cache [11561.927970
[bugreport] [5.10] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww) We 'forgot' to unlock everything else first?
Hi folks. I observed this issue since 5.3 and it still happens with 5.10 git. This warning has reproductivity 100% reliable when I launch "Wolfenstein: Youngblood" version of Mesa doesn't matter. [73690.883948] [ cut here ] [73690.883953] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww) [73690.883963] WARNING: CPU: 30 PID: 194867 at kernel/locking/mutex.c:327 __ww_mutex_lock.constprop.0+0xe96/0xef0 [73690.883966] Modules linked in: tun snd_seq_dummy snd_hrtimer uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek mt76x2u mt76x2_common snd_hda_codec_generic mt76x02_usb ledtrig_audio snd_hda_codec_hdmi mt76_usb mt76x02_lib snd_hda_intel uvcvideo iwlmvm snd_intel_dspcfg mt76 gspca_zc3xx snd_hda_codec gspca_main joydev videobuf2_vmalloc snd_usb_audio btusb edac_mce_amd videobuf2_memops snd_hda_core videobuf2_v4l2 snd_usbmidi_lib kvm_amd btrtl videobuf2_common btbcm snd_hwdep [73690.884036] snd_rawmidi mac80211 btintel snd_seq videodev snd_seq_device eeepc_wmi libarc4 bluetooth kvm xpad ff_memless snd_pcm mc iwlwifi asus_wmi irqbypass sparse_keymap ecdh_generic rapl ecc sp5100_tco video wmi_bmof snd_timer pcspkr snd k10temp i2c_piix4 soundcore cfg80211 rfkill acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ghash_clmulni_intel ccp igb nvme dca i2c_algo_bit nvme_core wmi pinctrl_amd fuse [73690.884094] CPU: 30 PID: 194867 Comm: Youngblood_x64v Not tainted 5.10.0-0.rc0.20201014gitb5fc7a89e58b.42.fc34.x86_64 #1 [73690.884097] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020 [73690.884100] RIP: 0010:__ww_mutex_lock.constprop.0+0xe96/0xef0 [73690.884103] Code: f2 89 5b 9e 48 c7 c7 1d bb 59 9e e8 ef f6 f8 ff 0f 0b e9 2a fc ff ff 48 c7 c6 d4 89 5b 9e 48 c7 c7 1d bb 59 9e e8 d5 f6 f8 ff <0f> 0b e9 e9 fe ff ff 83 3d 44 3d 81 02 00 75 07 48 83 7d 28 00 75 [73690.884106] RSP: 0018:a1c5d079f8f0 EFLAGS: 00010286 [73690.884108] RAX: 0032 RBX: 0001 RCX: 8c650a7db178 [73690.884111] RDX: ffd8 RSI: 0027 RDI: 8c650a7db170 [73690.884112] RBP: a1c5d079fc38 R08: R09: [73690.884114] R10: a1c5d079f720 R11: 8c652e2fffe8 R12: 8c600cd42990 [73690.884116] R13: 8c5f055f R14: 8c600cd42a00 R15: [73690.884119] FS: 060e3640() GS:8c650a60() knlGS:00013ffc [73690.884121] CS: 0010 DS: ES: CR0: 80050033 [73690.884122] CR2: 7fe25431d010 CR3: 00011916e000 CR4: 00350ee0 [73690.884124] Call Trace: [73690.884136] ? ttm_mem_evict_first+0x212/0x4f0 [ttm] [73690.884139] ? __schedule+0x345/0xa80 [73690.884144] ww_mutex_lock_interruptible+0x43/0xb0 [73690.884149] ttm_mem_evict_first+0x212/0x4f0 [ttm] [73690.884157] ttm_bo_mem_space+0x30f/0x340 [ttm] [73690.884164] ttm_bo_validate+0x12b/0x1d0 [ttm] [73690.884169] ? sched_clock+0x5/0x10 [73690.884261] amdgpu_cs_bo_validate+0x8b/0x190 [amdgpu] [73690.884350] amdgpu_cs_list_validate+0x10e/0x150 [amdgpu] [73690.884435] amdgpu_cs_ioctl+0x7f4/0x1ed0 [amdgpu] [73690.884531] ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu] [73690.884550] drm_ioctl_kernel+0x8c/0xe0 [drm] [73690.884563] drm_ioctl+0x20f/0x3a0 [drm] [73690.884623] ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu] [73690.884625] ? sched_clock+0x5/0x10 [73690.884628] ? sched_clock_cpu+0xc/0xb0 [73690.884631] ? lockdep_hardirqs_on_prepare+0xff/0x180 [73690.884632] ? _raw_spin_unlock_irqrestore+0x41/0x50 [73690.884684] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [73690.884688] __x64_sys_ioctl+0x83/0xb0 [73690.884691] do_syscall_64+0x33/0x40 [73690.884693] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [73690.884695] RIP: 0033:0x7fe3209e64cb [73690.884697] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d b9 0c 00 f7 d8 64 89 01 48 [73690.884699] RSP: 002b:060db248 EFLAGS: 0246 ORIG_RAX: 0010 [73690.884701] RAX: ffda RBX: 060db2d0 RCX: 7fe3209e64cb [73690.884702] RDX: 060db2d0 RSI: c0186444 RDI: 00d4 [73690.884703] RBP: c0186444 R08: 7fe1bd653780 R09: 060db290 [73690.884705] R10: R11: 0246 R12: 7fe17d
Re: [bugreport] [5.10] warning at net/netfilter/nf_tables_api.c:622
On Fri, 16 Oct 2020 at 12:11, Mikhail Gavrilov wrote: > > Hi folks, > today I joined to testing Kernel 5.10 and see that every boot happens > this warning: > > [ 22.180180] [ cut here ] > [ 22.180193] WARNING: CPU: 28 PID: 1205 at > net/netfilter/nf_tables_api.c:622 nft_chain_parse_hook+0x224/0x330 > [nf_tables] > [ 22.180194] Modules linked in: nf_tables nfnetlink ip6table_filter > ip6_tables iptable_filter cmac bnep sunrpc vfat fat > snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > snd_hda_codec_hdmi mt76x2u mt76x2_common mt76x02_usb iwlmvm mt76_usb > uvcvideo snd_hda_intel mt76x02_lib gspca_zc3xx snd_intel_dspcfg btusb > gspca_main videobuf2_vmalloc btrtl mt76 edac_mce_amd snd_hda_codec > btbcm videobuf2_memops btintel kvm_amd snd_usb_audio videobuf2_v4l2 > snd_hda_core mac80211 kvm bluetooth snd_usbmidi_lib joydev > videobuf2_common iwlwifi snd_seq xpad snd_hwdep ff_memless videodev > snd_rawmidi snd_seq_device libarc4 eeepc_wmi snd_pcm ecdh_generic > irqbypass asus_wmi mc rapl sparse_keymap ecc snd_timer sp5100_tco > video cfg80211 wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore rfkill > acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp > hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper > crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ccp > ghash_clmulni_intel igb nvme dca nvme_core > [ 22.180273] i2c_algo_bit wmi pinctrl_amd fuse > [ 22.180279] CPU: 28 PID: 1205 Comm: ebtables Not tainted > 5.10.0-0.rc0.20201014gitb5fc7a89e58b.41.fc34.x86_64 #1 > [ 22.180281] Hardware name: System manufacturer System Product > Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020 > [ 22.180289] RIP: 0010:nft_chain_parse_hook+0x224/0x330 [nf_tables] > [ 22.180292] Code: a0 14 00 00 be ff ff ff ff e8 68 82 e1 e4 85 c0 > 0f 85 21 fe ff ff 0f 0b bf 0a 00 00 00 e8 14 60 97 ff 84 c0 0f 84 1f > fe ff ff <0f> 0b e9 18 fe ff ff 48 85 f6 74 61 4c 89 ef e8 78 d0 ff ff > 48 89 > [ 22.180294] RSP: 0018:a9850214f780 EFLAGS: 00010202 > [ 22.180296] RAX: 0001 RBX: a9850214f810 RCX: > > [ 22.180297] RDX: a9850214f810 RSI: RDI: > c0851c20 > [ 22.180299] RBP: 0007 R08: 0001 R09: > a9850214f847 > [ 22.180300] R10: R11: 0007 R12: > a9850214fa88 > [ 22.180301] R13: a6fdfcc0 R14: a9850214fa88 R15: > 993c5c12c800 > [ 22.180304] FS: 7ff92ed99540() GS:993c8a20() > knlGS: > [ 22.180305] CS: 0010 DS: ES: CR0: 80050033 > [ 22.180307] CR2: 7ff92ed1e000 CR3: 0007d3714000 CR4: > 00350ee0 > [ 22.180308] Call Trace: > [ 22.180319] ? __rhashtable_lookup+0x11d/0x210 [nf_tables] > [ 22.180329] nf_tables_addchain.constprop.0+0xab/0x5e0 [nf_tables] > [ 22.180337] ? nft_chain_lookup.part.0+0x12c/0x1e0 [nf_tables] > [ 22.180344] ? get_order+0x20/0x20 [nf_tables] > [ 22.180350] ? nft_chain_hash+0x30/0x30 [nf_tables] > [ 22.180356] ? nft_dump_register+0x40/0x40 [nf_tables] > [ 22.180368] nf_tables_newchain+0x54d/0x730 [nf_tables] > [ 22.180376] nfnetlink_rcv_batch+0x2a4/0x950 [nfnetlink] > [ 22.180385] ? lock_acquire+0x175/0x400 > [ 22.180387] ? lock_release+0x1e7/0x400 > [ 22.180391] ? cred_has_capability.isra.0+0x68/0x100 > [ 22.180395] ? __nla_validate_parse+0x4f/0x8d0 > [ 22.180401] nfnetlink_rcv+0x115/0x130 [nfnetlink] > [ 22.180407] netlink_unicast+0x16d/0x230 > [ 22.180426] netlink_sendmsg+0x23f/0x460 > [ 22.180431] sock_sendmsg+0x5e/0x60 > [ 22.180434] sys_sendmsg+0x231/0x270 > [ 22.180438] ? import_iovec+0x17/0x20 > [ 22.180440] ? sendmsg_copy_msghdr+0x5c/0x80 > [ 22.180444] ___sys_sendmsg+0x75/0xb0 > [ 22.180450] ? cred_has_capability.isra.0+0x68/0x100 > [ 22.180452] ? lock_acquire+0x175/0x400 > [ 22.180454] ? lock_acquire+0x93/0x400 > [ 22.180457] ? lock_release+0x1e7/0x400 > [ 22.180459] ? lock_release+0x1e7/0x400 > [ 22.180462] ? trace_hardirqs_on+0x1b/0xe0 > [ 22.180465] ? sock_setsockopt+0xdf/0x1010 > [ 22.180467] ? __local_bh_enable_ip+0x82/0xd0 > [ 22.180470] ? sock_setsockopt+0xdf/0x1010 > [ 22.180473] __sys_sendmsg+0x49/0x80 > [ 22.180480] do_syscall_64+0x33/0x40 > [ 22.180483] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 22.180486] RIP: 0033:0x7ff92efdb087 > [ 22.180488] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 > 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 > 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 > 24 10 > [ 22.180490] RSP: 002b:7fff54436b38 EFLAGS: 0246 ORIG_RAX: > 002e > [ 22.180492] RAX: ffda RBX: 7fff54436b40 RCX: > 7ff92efdb087 > [ 22.180494] RDX: RSI: 7fff54437be0 RDI: > 0003 > [ 22.180495] RBP: 7fff544381e0 R08: 0004 R09: > 55b281bcf1d0 > [ 22.1804
[bugreport] [5.10] warning at net/netfilter/nf_tables_api.c:622
Hi folks, today I joined to testing Kernel 5.10 and see that every boot happens this warning: [ 22.180180] [ cut here ] [ 22.180193] WARNING: CPU: 28 PID: 1205 at net/netfilter/nf_tables_api.c:622 nft_chain_parse_hook+0x224/0x330 [nf_tables] [ 22.180194] Modules linked in: nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi mt76x2u mt76x2_common mt76x02_usb iwlmvm mt76_usb uvcvideo snd_hda_intel mt76x02_lib gspca_zc3xx snd_intel_dspcfg btusb gspca_main videobuf2_vmalloc btrtl mt76 edac_mce_amd snd_hda_codec btbcm videobuf2_memops btintel kvm_amd snd_usb_audio videobuf2_v4l2 snd_hda_core mac80211 kvm bluetooth snd_usbmidi_lib joydev videobuf2_common iwlwifi snd_seq xpad snd_hwdep ff_memless videodev snd_rawmidi snd_seq_device libarc4 eeepc_wmi snd_pcm ecdh_generic irqbypass asus_wmi mc rapl sparse_keymap ecc snd_timer sp5100_tco video cfg80211 wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore rfkill acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ccp ghash_clmulni_intel igb nvme dca nvme_core [ 22.180273] i2c_algo_bit wmi pinctrl_amd fuse [ 22.180279] CPU: 28 PID: 1205 Comm: ebtables Not tainted 5.10.0-0.rc0.20201014gitb5fc7a89e58b.41.fc34.x86_64 #1 [ 22.180281] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020 [ 22.180289] RIP: 0010:nft_chain_parse_hook+0x224/0x330 [nf_tables] [ 22.180292] Code: a0 14 00 00 be ff ff ff ff e8 68 82 e1 e4 85 c0 0f 85 21 fe ff ff 0f 0b bf 0a 00 00 00 e8 14 60 97 ff 84 c0 0f 84 1f fe ff ff <0f> 0b e9 18 fe ff ff 48 85 f6 74 61 4c 89 ef e8 78 d0 ff ff 48 89 [ 22.180294] RSP: 0018:a9850214f780 EFLAGS: 00010202 [ 22.180296] RAX: 0001 RBX: a9850214f810 RCX: [ 22.180297] RDX: a9850214f810 RSI: RDI: c0851c20 [ 22.180299] RBP: 0007 R08: 0001 R09: a9850214f847 [ 22.180300] R10: R11: 0007 R12: a9850214fa88 [ 22.180301] R13: a6fdfcc0 R14: a9850214fa88 R15: 993c5c12c800 [ 22.180304] FS: 7ff92ed99540() GS:993c8a20() knlGS: [ 22.180305] CS: 0010 DS: ES: CR0: 80050033 [ 22.180307] CR2: 7ff92ed1e000 CR3: 0007d3714000 CR4: 00350ee0 [ 22.180308] Call Trace: [ 22.180319] ? __rhashtable_lookup+0x11d/0x210 [nf_tables] [ 22.180329] nf_tables_addchain.constprop.0+0xab/0x5e0 [nf_tables] [ 22.180337] ? nft_chain_lookup.part.0+0x12c/0x1e0 [nf_tables] [ 22.180344] ? get_order+0x20/0x20 [nf_tables] [ 22.180350] ? nft_chain_hash+0x30/0x30 [nf_tables] [ 22.180356] ? nft_dump_register+0x40/0x40 [nf_tables] [ 22.180368] nf_tables_newchain+0x54d/0x730 [nf_tables] [ 22.180376] nfnetlink_rcv_batch+0x2a4/0x950 [nfnetlink] [ 22.180385] ? lock_acquire+0x175/0x400 [ 22.180387] ? lock_release+0x1e7/0x400 [ 22.180391] ? cred_has_capability.isra.0+0x68/0x100 [ 22.180395] ? __nla_validate_parse+0x4f/0x8d0 [ 22.180401] nfnetlink_rcv+0x115/0x130 [nfnetlink] [ 22.180407] netlink_unicast+0x16d/0x230 [ 22.180426] netlink_sendmsg+0x23f/0x460 [ 22.180431] sock_sendmsg+0x5e/0x60 [ 22.180434] sys_sendmsg+0x231/0x270 [ 22.180438] ? import_iovec+0x17/0x20 [ 22.180440] ? sendmsg_copy_msghdr+0x5c/0x80 [ 22.180444] ___sys_sendmsg+0x75/0xb0 [ 22.180450] ? cred_has_capability.isra.0+0x68/0x100 [ 22.180452] ? lock_acquire+0x175/0x400 [ 22.180454] ? lock_acquire+0x93/0x400 [ 22.180457] ? lock_release+0x1e7/0x400 [ 22.180459] ? lock_release+0x1e7/0x400 [ 22.180462] ? trace_hardirqs_on+0x1b/0xe0 [ 22.180465] ? sock_setsockopt+0xdf/0x1010 [ 22.180467] ? __local_bh_enable_ip+0x82/0xd0 [ 22.180470] ? sock_setsockopt+0xdf/0x1010 [ 22.180473] __sys_sendmsg+0x49/0x80 [ 22.180480] do_syscall_64+0x33/0x40 [ 22.180483] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 22.180486] RIP: 0033:0x7ff92efdb087 [ 22.180488] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 22.180490] RSP: 002b:7fff54436b38 EFLAGS: 0246 ORIG_RAX: 002e [ 22.180492] RAX: ffda RBX: 7fff54436b40 RCX: 7ff92efdb087 [ 22.180494] RDX: RSI: 7fff54437be0 RDI: 0003 [ 22.180495] RBP: 7fff544381e0 R08: 0004 R09: 55b281bcf1d0 [ 22.180496] R10: 7fff54437bcc R11: 0246 R12: 7000 [ 22.180497] R13: 0001 R14: 7fff54436b50 R15: 7fff54438200 [ 22.180503] irq event stamp: 0 [ 22.180505] hardirqs last enabled at (0): [<>] 0x0 [
[bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI
Paolo, Jens I am sorry for the noise. But today I hit the kernel panic and git blame said that you have created the file in which happened panic (this I saw from trace) $ /usr/src/kernels/`uname -r`/scripts/faddr2line /lib/debug/lib/modules/`uname -r`/vmlinux __bfq_deactivate_entity+0x15a __bfq_deactivate_entity+0x15a/0x240: bfq_gt at block/bfq-wf2q.c:20 (inlined by) bfq_insert at block/bfq-wf2q.c:381 (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621 (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203 https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203 $ head /sys/block/*/queue/scheduler ==> /sys/block/nvme0n1/queue/scheduler <== [none] mq-deadline kyber bfq ==> /sys/block/sda/queue/scheduler <== mq-deadline kyber [bfq] none ==> /sys/block/zram0/queue/scheduler <== none Trace: general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: [#1] SMP NOPTI CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW - --- 5.9.0-0.rc8.28.fc34.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:__bfq_deactivate_entity+0x15a/0x240 Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002 RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: R10: 0018 R11: 0018 R12: 8dc904927150 R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88 FS: () GS:8dc90e0c() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 Call Trace: bfq_deactivate_entity+0x4f/0xc0 bfq_del_bfqq_busy+0xbf/0x170 __bfq_bfqq_expire+0x95/0xc0 bfq_bfqq_expire+0x3c5/0x9a0 ? bfq_active_extract+0x8e/0x140 bfq_dispatch_request+0x438/0x1070 __blk_mq_do_dispatch_sched+0x1c7/0x290 ? dequeue_entity+0xa4/0x420 __blk_mq_sched_dispatch_requests+0x129/0x180 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x110 process_one_work+0x1b4/0x370 worker_thread+0x53/0x3e0 ? process_one_work+0x370/0x370 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Modules linked in: tun snd_seq_dummy snd_hrtimer uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat mt76x2u snd_hda_codec_realtek mt76x2_common mt76x02_usb snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi mt76_usb mt76x02_lib edac_mce_amd iwlmvm snd_hda_intel mt76 snd_intel_dspcfg kvm_amd mac80211 gspca_zc3xx snd_usb_audio snd_hda_codec gspca_main uvcvideo btusb snd_usbmidi_lib iwlwifi snd_hda_core videobuf2_vmalloc kvm videobuf2_memops btrtl snd_rawmidi videobuf2_v4l2 snd_hwdep btbcm snd_seq btintel videobuf2_common eeepc_wmi irqbypass snd_seq_device asus_wmi xpad bluetooth joydev sparse_keymap libarc4 rapl cfg80211 ff_memless snd_pcm videodev video pcspkr wmi_bmof sp5100_tco snd_timer mc k10temp i2c_piix4 snd ecdh_generic ecc soundcore rfkill acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm ccp igb ghash_clmulni_intel nvme nvme_core dca i2c_algo_bit wmi pinctrl_amd fuse ---[ end trace 09deb55d1b05f40c ]--- Full system log: https://pastebin.com/6cKHZzAi Full kernel log: https://pastebin.com/316HjHit Unfortunately, I did not know how reproduce this bug. I am not doing anything unusual on the computer when it happened. I could provide any useful info for further investigation. -- Best Regards, Mike Gavrilov.
Re: [Kgdb-bugreport] [PATCH] serial: qcom_geni_serial: Fix recent kdb hang
On Tue, Aug 11, 2020 at 09:21:22AM -0700, Doug Anderson wrote: > Hi, > > On Tue, Aug 11, 2020 at 4:54 AM Akash Asthana wrote: > > > > > > On 8/11/2020 2:56 AM, Doug Anderson wrote: > > > Hi, > > > > > > On Mon, Aug 10, 2020 at 5:32 AM Akash Asthana > > > wrote: > > >> Hi Doug, > > >> > > >> On 8/7/2020 10:49 AM, Douglas Anderson wrote: > > >>> The commit e42d6c3ec0c7 ("serial: qcom_geni_serial: Make kgdb work > > >>> even if UART isn't console") worked pretty well and I've been doing a > > >>> lot of debugging with it. However, recently I typed "dmesg" in kdb > > >>> and then held the space key down to scroll through the pagination. My > > >>> device hung. This was repeatable and I found that it was introduced > > >>> with the aforementioned commit. > > >>> > > >>> It turns out that there are some strange boundary cases in geni where > > >>> in some weird situations it will signal RX_LAST but then will put 0 in > > >>> RX_LAST_BYTE. This means that the entire last FIFO entry is valid. > > >> IMO that means we received a word in RX_FIFO and it is the last word > > >> hence RX_LAST bit is set. > > > What you say would make logical sense, but it's not how I have > > > observed geni to work. See below. > > > > > > > > >> RX_LAST_BYTE is 0 means none of the bytes are valid in the last word. > > > This would imply that qcom_geni_serial_handle_rx() is also broken > > > though, wouldn't it? Specifically imagine that WORD_CNT is 1 and > > > RX_LAST is set and RX_LAST_BYTE_VALID is true. Here's the logic from > > > that function: > > > > > >total_bytes = BYTES_PER_FIFO_WORD * (word_cnt - 1); > > >if (last_word_partial && last_word_byte_cnt) > > > total_bytes += last_word_byte_cnt; > > >else > > > total_bytes += BYTES_PER_FIFO_WORD; > > >port->handle_rx(uport, total_bytes, drop); > > > > > > As you can see that logic will set "total_bytes" to 4 in the case I'm > > > talking about. > > > > Yeah IMO as per theory this should also be corrected but since you have > > already pulled out few experiment to prove garbage data issue(which I > > was suspecting) is not seen. > > > > It's already consistent with existing logic and it behaves well > > practically . So the changes could be merge. Meanwhile I am checking > > with HW team to get clarity. > > > > > > > > > > >> In such scenario we should just read RX_FIFO buffer (to empty it), > > >> discard the word and return NO_POLL_CHAR. Something like below. > > >> > > >> - > > >> > > >> else > > >> private_data->poll_cached_bytes_cnt = 4; > > >> > > >> private_data->poll_cached_bytes = > > >> readl(uport->membase + SE_GENI_RX_FIFOn); > > >> } > > >> > > >> +if (!private_data->poll_cached_bytes_cnt) > > >> + return NO_POLL_CHAR; > > >> private_data->poll_cached_bytes_cnt--; > > >> ret = private_data->poll_cached_bytes & 0xff; > > >> - > > >> > > >> Please let me know whether above code helps. > > > Your code will avoid the hang. Yes. ...but it will drop bytes. I > > > devised a quick-n-dirty test. Here's a test of your code: > > I assumed those as invalid bytes and don't wanted to read them so yeah > > dropping of bytes was expected. > > > > > > https://crrev.com/c/2346886 > > > > > > ...and here's a test of my code: > > > > > > https://crrev.com/c/2346884 > > > > > > I had to keep a buffer around since it's hard to debug the serial > > > driver. In both cases I put "DOUG" into the buffer when I detect this > > > case. If my theory about how geni worked was wrong then we should > > > expect to see some garbage in the buffer right after the DOUG, right? > > > ...but my code gets the alphabet in nice sequence. Your code drops 4 > > > bytes. > > Yeah I was expecting garbage data. > > > > > > > > > NOTE: while poking around with the above two test patches I found it > > > was pretty easy to get geni to drop bytes / hit overflow cases and > > > also to insert bogus 0 bytes in the stream (I believe these are > > > related). I was able to reproduce this: > > > * With ${SUBJECT} patch in place. > > > * With your proposed patch. > > > * With the recent "geni" patches reverted (in other words back to 1 > > > byte per FIFO entry). > > > > > > It's not terribly surprising that we're overflowing since I believe > > > kgdb isn't too keen to read characters at the same time it's writing. > > > That doesn't explain the weird 0-bytes that geni seemed to be > > > inserting, but at least it would explain the overflows. However, even > > > after I fixed this I _still_ was getting problems. Specifically geni > > >
Re: [5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]
On 7/13/20 11:02 AM, Mikhail Gavrilov wrote: > On Mon, 13 Jul 2020 at 12:11, Mikhail Gavrilov > wrote: >> >> On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov >> wrote: >>> >>> Hi folks. >>> While testing 5.8 RCs I founded that kernel log flooded by the message >>> "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree >>> insert+0xaf/0xc0 [fuse]" when I start podman container. >>> In kernel 5.7 not has such a problem. >> >> Maxim, I suppose you leave `WARN_ON(!wpa->ia.ap.num_pages);` for debug >> purpose? >> Now this line is often called when I start the container. >> > > That odd, but I can't send an email to the author of the commit. > mpatlasov wasn't found at virtuozzo.com. Reported problem is not fixed yet in 5.8-rc kernels Please take look at https://lkml.org/lkml/2020/7/13/265 Thank you, Vasily Averin
Re: [5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]
On Mon, 13 Jul 2020 at 12:11, Mikhail Gavrilov wrote: > > On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov > wrote: > > > > Hi folks. > > While testing 5.8 RCs I founded that kernel log flooded by the message > > "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree > > insert+0xaf/0xc0 [fuse]" when I start podman container. > > In kernel 5.7 not has such a problem. > > Maxim, I suppose you leave `WARN_ON(!wpa->ia.ap.num_pages);` for debug > purpose? > Now this line is often called when I start the container. > That odd, but I can't send an email to the author of the commit. mpatlasov wasn't found at virtuozzo.com. -- Best Regards, Mike Gavrilov.
Re: [5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]
On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov wrote: > > Hi folks. > While testing 5.8 RCs I founded that kernel log flooded by the message > "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree > insert+0xaf/0xc0 [fuse]" when I start podman container. > In kernel 5.7 not has such a problem. Maxim, I suppose you leave `WARN_ON(!wpa->ia.ap.num_pages);` for debug purpose? Now this line is often called when I start the container. -- Best Regards, Mike Gavrilov.
[5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]
Hi folks. While testing 5.8 RCs I founded that kernel log flooded by the message "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree insert+0xaf/0xc0 [fuse]" when I start podman container. In kernel 5.7 not has such a problem. [92414.864536] [ cut here ] [92414.864648] WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse] [92414.864652] Modules linked in: snd_seq_dummy snd_hrtimer uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_logitech_hidpp gspca_zc3xx gspca_main videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 videobuf2_common mt76x2u mt76x2_common videodev mt76x02_usb mt76_usb mt76x02_lib xpad mc mt76 hid_logitech_dj ff_memless snd_hda_codec_realtek snd_hda_codec_generic iwlmvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec [92414.864697] mac80211 snd_hda_core edac_mce_amd amd_energy snd_hwdep btusb btrtl btbcm snd_seq kvm_amd libarc4 btintel snd_seq_device bluetooth kvm snd_pcm iwlwifi eeepc_wmi asus_wmi snd_timer ecdh_generic irqbypass ecc snd sparse_keymap rapl cfg80211 video wmi_bmof pcspkr soundcore sp5100_tco k10temp i2c_piix4 rfkill acpi_cpufreq binfmt_misc ip_tables amdgpu iommu_v2 gpu_sched ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm igb ghash_clmulni_intel ccp nvme dca xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi pinctrl_amd fuse [92414.864738] CPU: 28 PID: 211236 Comm: sed Not tainted 5.8.0-0.rc4.20200709git0bddd227f3dc.1.fc33.x86_64 #1 [92414.864742] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 1407 04/02/2020 [92414.864749] RIP: 0010:tree_insert+0xaf/0xc0 [fuse] [92414.864753] Code: 80 c8 00 00 00 49 c7 80 d0 00 00 00 00 00 00 00 49 c7 80 d8 00 00 00 00 00 00 00 48 89 39 e9 78 35 5f d7 0f 0b eb a5 0f 0b c3 <0f> 0b e9 71 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [92414.864757] RSP: 0018:b9b08b66f970 EFLAGS: 00010246 [92414.864761] RAX: 001c RBX: b9b08b66fac8 RCX: 8c6318c6318c6319 [92414.864765] RDX: RSI: RDI: 9beac944fce8 [92414.864768] RBP: eef599772a80 R08: 9bee81360d00 R09: [92414.864772] R10: 9beac944fce8 R11: R12: eef584fe7b80 [92414.864775] R13: 9beac944f800 R14: 9beac944fd98 R15: 9bee81360d00 [92414.864780] FS: 7f98023da840() GS:9bf1bda0() knlGS: [92414.864783] CS: 0010 DS: ES: CR0: 80050033 [92414.864787] CR2: 7ffc5071f080 CR3: 30a0c000 CR4: 003406e0 [92414.864790] Call Trace: [92414.864798] fuse_writepages_fill+0x5cc/0x690 [fuse] [92414.864810] write_cache_pages+0x225/0x560 [92414.864819] ? fuse_writepages+0xe0/0xe0 [fuse] [92414.864828] ? rcu_read_lock_sched_held+0x3f/0x80 [92414.864835] ? trace_kmalloc+0xf2/0x120 [92414.864842] ? __kmalloc+0x136/0x270 [92414.864848] ? fuse_writepages+0x5e/0xe0 [fuse] [92414.864857] fuse_writepages+0x7d/0xe0 [fuse] [92414.864867] do_writepages+0x28/0xb0 [92414.864876] __writeback_single_inode+0x60/0x6b0 [92414.864884] writeback_single_inode+0xa7/0x140 [92414.864890] write_inode_now+0x8b/0xb0 [92414.864904] fuse_do_setattr+0x42f/0x770 [fuse] [92414.864914] ? _raw_spin_unlock+0x1f/0x30 [92414.864921] ? fuse_do_getattr+0x149/0x2c0 [fuse] [92414.864946] fuse_setattr+0x99/0x140 [fuse] [92414.864954] notify_change+0x333/0x4a0 [92414.864964] chown_common+0xec/0x190 [92414.864978] ksys_fchown+0x6c/0xb0 [92414.864985] __x64_sys_fchown+0x16/0x20 [92414.864990] do_syscall_64+0x52/0xb0 [92414.864995] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [92414.865000] RIP: 0033:0x7f9801cc0cd7 [92414.865003] Code: Bad RIP value. [92414.865007] RSP: 002b:7ffc506abb18 EFLAGS: 0206 ORIG_RAX: 005d [92414.865011] RAX: ffda RBX: 7ffc506abba0 RCX: 7f9801cc0cd7 [92414.865014] RDX: RSI: RDI: 0004 [92414.865018] RBP: 0004 R08: 01cb9e70 R09: 7f98023da840 [92414.865021] R10: 7ffc506ab5a0 R11: 0206 R12: 0003 [92414.865025] R13: 7ffc506acde1 R14: R15: [92414.865040] irq event stamp: 7637 [92414.865045] hardirqs last enabled at (7645): [] console_unlock+0x4b7/0x6c0 [92414.865049] hardirqs last disabled at (7652): [] console_unlock+0xad/0x6c0 [92414.865103] softirqs last enab
Re: [Kgdb-bugreport] [PATCH v3] kdb: Remove the misfeature 'KDBFLAGS'
On Thu, May 21, 2020 at 03:21:25PM +0800, Wei Li wrote: > Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined > by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG' > is set, and the user can define an environment variable named 'KDBFLAGS' > too. These are puzzling indeed. > > After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature. > So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we > wrote into. After this modification, we can use `md4c1 kdb_flags` instead, > to observe the state flags. > > Suggested-by: Daniel Thompson > Signed-off-by: Wei Li Applied. Thanks. Daniel. > --- > v2 -> v3: > - Change to replace the internal env 'KDBFLAGS' with 'KDBDEBUG'. > v1 -> v2: > - Fix lack of braces. > > kernel/debug/kdb/kdb_main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c > index 4fc43fb17127..392029287083 100644 > --- a/kernel/debug/kdb/kdb_main.c > +++ b/kernel/debug/kdb/kdb_main.c > @@ -418,8 +418,7 @@ int kdb_set(int argc, const char **argv) > argv[2]); > return 0; > } > - kdb_flags = (kdb_flags & > - ~(KDB_DEBUG_FLAG_MASK << KDB_DEBUG_FLAG_SHIFT)) > + kdb_flags = (kdb_flags & ~KDB_DEBUG(MASK)) > | (debugflags << KDB_DEBUG_FLAG_SHIFT); > > return 0; > @@ -2081,7 +2080,8 @@ static int kdb_env(int argc, const char **argv) > } > > if (KDB_DEBUG(MASK)) > - kdb_printf("KDBFLAGS=0x%x\n", kdb_flags); > + kdb_printf("KDBDEBUG=0x%x\n", > + (kdb_flags & KDB_DEBUG(MASK)) >> KDB_DEBUG_FLAG_SHIFT); > > return 0; > } > -- > 2.17.1 > > > > ___ > Kgdb-bugreport mailing list > kgdb-bugrep...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH v3] kdb: Remove the misfeature 'KDBFLAGS'
On Thu, May 21, 2020 at 03:21:25PM +0800, Wei Li wrote: > Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined > by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG' > is set, and the user can define an environment variable named 'KDBFLAGS' > too. These are puzzling indeed. > > After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature. > So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we > wrote into. After this modification, we can use `md4c1 kdb_flags` instead, > to observe the state flags. > > Suggested-by: Daniel Thompson > Signed-off-by: Wei Li > --- > v2 -> v3: > - Change to replace the internal env 'KDBFLAGS' with 'KDBDEBUG'. > v1 -> v2: > - Fix lack of braces. > > kernel/debug/kdb/kdb_main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c > index 4fc43fb17127..392029287083 100644 > --- a/kernel/debug/kdb/kdb_main.c > +++ b/kernel/debug/kdb/kdb_main.c > @@ -418,8 +418,7 @@ int kdb_set(int argc, const char **argv) > argv[2]); > return 0; > } > - kdb_flags = (kdb_flags & > - ~(KDB_DEBUG_FLAG_MASK << KDB_DEBUG_FLAG_SHIFT)) > + kdb_flags = (kdb_flags & ~KDB_DEBUG(MASK)) > | (debugflags << KDB_DEBUG_FLAG_SHIFT); > > return 0; > @@ -2081,7 +2080,8 @@ static int kdb_env(int argc, const char **argv) > } > > if (KDB_DEBUG(MASK)) > - kdb_printf("KDBFLAGS=0x%x\n", kdb_flags); > + kdb_printf("KDBDEBUG=0x%x\n", > + (kdb_flags & KDB_DEBUG(MASK)) >> KDB_DEBUG_FLAG_SHIFT); For this expression to work correctly, kdb_flags, need to be unsigned (otherwise we get an arithmetic right shift and mis-report when KDBDEBUG == 0xfff). This is just FYI, I think I can fix this up when applying... Daniel.
Re: [Kgdb-bugreport] [PATCH v3 04/11] kgdb: Delay "kgdbwait" to dbg_late_init() by default
On Thu, Apr 30, 2020 at 09:35:30AM -0700, Doug Anderson wrote: > Hi, > > On Thu, Apr 30, 2020 at 8:49 AM Daniel Thompson > wrote: > > > > On Tue, Apr 28, 2020 at 02:13:44PM -0700, Douglas Anderson wrote: > > > Using kgdb requires at least some level of architecture-level > > > initialization. If nothing else, it relies on the architecture to > > > pass breakpoints / crashes onto kgdb. > > > > > > On some architectures this all works super early, specifically it > > > starts working at some point in time before Linux parses > > > early_params's. On other architectures it doesn't. A survey of a few > > > platforms: > > > > > > a) x86: Presumably it all works early since "ekgdboc" is documented to > > >work here. > > > b) arm64: Catching crashes works; with a simple patch breakpoints can > > >also be made to work. > > > c) arm: Nothing in kgdb works until > > >paging_init() -> devicemaps_init() -> early_trap_init() > > > > > > Let's be conservative and, by default, process "kgdbwait" (which tells > > > the kernel to drop into the debugger ASAP at boot) a bit later at > > > dbg_late_init() time. If an architecture has tested it and wants to > > > re-enable super early debugging, they can select the > > > ARCH_HAS_EARLY_DEBUG KConfig option. We'll do this for x86 to start. > > > It should be noted that dbg_late_init() is still called quite early in > > > the system. > > > > > > Note that this patch doesn't affect when kgdb runs its init. If kgdb > > > is set to initialize early it will still initialize when parsing > > > early_param's. This patch _only_ inhibits the initial breakpoint from > > > "kgdbwait". This means: > > > > > > * Without any extra patches arm64 platforms will at least catch > > > crashes after kgdb inits. > > > * arm platforms will catch crashes (and could handle a hardcoded > > > kgdb_breakpoint()) any time after early_trap_init() runs, even > > > before dbg_late_init(). > > > > > > Signed-off-by: Douglas Anderson > > > Cc: Thomas Gleixner > > > Cc: Ingo Molnar > > > Cc: Borislav Petkov > > > Reviewed-by: Greg Kroah-Hartman > > > > It looks like this patch is triggering some warnings from the existing > > defconfigs (both x86 and arm64). It looks like this: > > > > --- > > wychelm$ make defconfig > > GEN Makefile > > *** Default configuration is based on 'x86_64_defconfig' > > > > WARNING: unmet direct dependencies detected for ARCH_HAS_EARLY_DEBUG > > Depends on [n]: KGDB [=n] > > Selected by [y]: > > - X86 [=y] > > > > WARNING: unmet direct dependencies detected for ARCH_HAS_EARLY_DEBUG > > Depends on [n]: KGDB [=n] > > Selected by [y]: > > - X86 [=y] > > Ah, thanks! I hadn't noticed those. I think it'd be easy to just > change the relevant patches to just "select ARCH_HAS_EARLY_DEBUG if > KGDB". If you agree that's a good fix and are willing, I'd be happy > if you just added it to the relevant patches when applying. If not, I > can post a v4. Happy with the approach to fix this. Given the follow on discussion from the end of last week I suspect there probably needs to be a v4 anyway so perhaps the last question is applying a fix up is moot at this point? Daniel.
Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries
On Tue, May 28, 2019 at 10:58:03AM +0500, Mikhail Gavrilov wrote: > On Mon, 27 May 2019 at 21:16, Mikhail Gavrilov > wrote: > > > > I am bisected issue. I hope it help understand what is happened on my > > computer. > > > > Why no one answers? > Even if the problem is known and already fixed, I would be nice to > know that I spent 10 days for searching a problem commit not in vain > and someone reads my messages. Sorry, I didn't see your earlier messages; I'm not sure why. In any case, yes, it's a known issue, and it's fixed in 5.2-rc2. This fix was commit 0a944e8a6c66. - Ted
Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries
On Mon, 27 May 2019 at 21:16, Mikhail Gavrilov wrote: > > I am bisected issue. I hope it help understand what is happened on my > computer. > > $ git bisect log > git bisect start > # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 > git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd > # bad: [7e9890a3500d95c01511a4c45b7e7192dfa47ae2] Merge tag > 'ovl-update-5.2' of > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs > git bisect bad 7e9890a3500d95c01511a4c45b7e7192dfa47ae2 > # good: [80f232121b69cc69a31ccb2b38c1665d770b0710] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect good 80f232121b69cc69a31ccb2b38c1665d770b0710 > # good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag > 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm > git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c > # good: [ea5aee6d97fd2d4499b1eebc233861c1def70f06] Merge tag > 'clk-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux > git bisect good ea5aee6d97fd2d4499b1eebc233861c1def70f06 > # good: [47782361aca21a32ad4198f1b72f1655a7c9f7e5] Merge tag > 'tag-chrome-platform-for-v5.2' of > ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux > git bisect good 47782361aca21a32ad4198f1b72f1655a7c9f7e5 > # bad: [55472bae5331f33582d9f0e8919fed8bebcda0da] Merge tag > 'linux-watchdog-5.2-rc1' of > git://www.linux-watchdog.org/linux-watchdog > git bisect bad 55472bae5331f33582d9f0e8919fed8bebcda0da > # good: [4dbf09fea60d158e60a30c419e0cfa1ea138dd57] Merge tag > 'mtd/for-5.2' of > ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux > git bisect good 4dbf09fea60d158e60a30c419e0cfa1ea138dd57 > # good: [44affc086e6d5ea868c1184cdc5e1159e90ffb71] watchdog: > ts4800_wdt: Convert to use device managed functions and other > improvements > git bisect good 44affc086e6d5ea868c1184cdc5e1159e90ffb71 > # good: [5c09980d9f9de2dc6b255f4f0229aeff0eb2c723] watchdog: > imx_sc_wdt: drop warning after calling watchdog_init_timeout > git bisect good 5c09980d9f9de2dc6b255f4f0229aeff0eb2c723 > # good: [345f16251063bcef5828f17fe90aa7f7a5019aab] watchdog: Improve > Kconfig entry ordering and dependencies > git bisect good 345f16251063bcef5828f17fe90aa7f7a5019aab > # good: [988bec41318f3fa897e2f8af271bd456936d6caf] ubifs: orphan: > Handle xattrs like files > git bisect good 988bec41318f3fa897e2f8af271bd456936d6caf > # good: [a65d10f3ce657aa4542b5de78933053f6d1a9e97] ubifs: Drop > unnecessary setting of zbr->znode > git bisect good a65d10f3ce657aa4542b5de78933053f6d1a9e97 > # good: [a9f0bda567e32a2b44165b067adfc4a4f56d1815] watchdog: Enforce > that at least one pretimeout governor is enabled > git bisect good a9f0bda567e32a2b44165b067adfc4a4f56d1815 > # bad: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge tag > 'upstream-5.2-rc1' of > ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs > git bisect bad d7a02fa0a8f9ec1b81d57628ca9834563208ef33 > # good: [04d37e5a8b1fad2d625727af3d738c6fd9491720] ubi: wl: Fix > uninitialized variable > git bisect good 04d37e5a8b1fad2d625727af3d738c6fd9491720 > # first bad commit: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge > tag 'upstream-5.2-rc1' of > ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs > Why no one answers? Even if the problem is known and already fixed, I would be nice to know that I spent 10 days for searching a problem commit not in vain and someone reads my messages. -- Best Regards, Mike Gavrilov.
Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries
On Sat, 18 May 2019 at 16:07, Mikhail Gavrilov wrote: > > It happens today again. > > [18018.969636] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908: > inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent: > invalid extent entries - magic f30a, entries 8, max 340(340), depth > 0(0) > [18018.970071] jbd2_journal_bmap: journal block not found at offset > 4799 on nvme0n1p2-8 > [18018.970076] Aborting journal on device nvme0n1p2-8. > [18018.970269] EXT4-fs error (device nvme0n1p2): > ext4_journal_check_start:61: Detected aborted journal > [18018.970316] EXT4-fs (nvme0n1p2): Remounting filesystem read-only > I am bisected issue. I hope it help understand what is happened on my computer. $ git bisect log git bisect start # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd # bad: [7e9890a3500d95c01511a4c45b7e7192dfa47ae2] Merge tag 'ovl-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs git bisect bad 7e9890a3500d95c01511a4c45b7e7192dfa47ae2 # good: [80f232121b69cc69a31ccb2b38c1665d770b0710] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect good 80f232121b69cc69a31ccb2b38c1665d770b0710 # good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c # good: [ea5aee6d97fd2d4499b1eebc233861c1def70f06] Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect good ea5aee6d97fd2d4499b1eebc233861c1def70f06 # good: [47782361aca21a32ad4198f1b72f1655a7c9f7e5] Merge tag 'tag-chrome-platform-for-v5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux git bisect good 47782361aca21a32ad4198f1b72f1655a7c9f7e5 # bad: [55472bae5331f33582d9f0e8919fed8bebcda0da] Merge tag 'linux-watchdog-5.2-rc1' of git://www.linux-watchdog.org/linux-watchdog git bisect bad 55472bae5331f33582d9f0e8919fed8bebcda0da # good: [4dbf09fea60d158e60a30c419e0cfa1ea138dd57] Merge tag 'mtd/for-5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux git bisect good 4dbf09fea60d158e60a30c419e0cfa1ea138dd57 # good: [44affc086e6d5ea868c1184cdc5e1159e90ffb71] watchdog: ts4800_wdt: Convert to use device managed functions and other improvements git bisect good 44affc086e6d5ea868c1184cdc5e1159e90ffb71 # good: [5c09980d9f9de2dc6b255f4f0229aeff0eb2c723] watchdog: imx_sc_wdt: drop warning after calling watchdog_init_timeout git bisect good 5c09980d9f9de2dc6b255f4f0229aeff0eb2c723 # good: [345f16251063bcef5828f17fe90aa7f7a5019aab] watchdog: Improve Kconfig entry ordering and dependencies git bisect good 345f16251063bcef5828f17fe90aa7f7a5019aab # good: [988bec41318f3fa897e2f8af271bd456936d6caf] ubifs: orphan: Handle xattrs like files git bisect good 988bec41318f3fa897e2f8af271bd456936d6caf # good: [a65d10f3ce657aa4542b5de78933053f6d1a9e97] ubifs: Drop unnecessary setting of zbr->znode git bisect good a65d10f3ce657aa4542b5de78933053f6d1a9e97 # good: [a9f0bda567e32a2b44165b067adfc4a4f56d1815] watchdog: Enforce that at least one pretimeout governor is enabled git bisect good a9f0bda567e32a2b44165b067adfc4a4f56d1815 # bad: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge tag 'upstream-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs git bisect bad d7a02fa0a8f9ec1b81d57628ca9834563208ef33 # good: [04d37e5a8b1fad2d625727af3d738c6fd9491720] ubi: wl: Fix uninitialized variable git bisect good 04d37e5a8b1fad2d625727af3d738c6fd9491720 # first bad commit: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge tag 'upstream-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs -- Best Regards, Mike Gavrilov.
Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries
Excerpts from Mikhail Gavrilov's message of May 18, 2019 7:07 am: > On Sat, 18 May 2019 at 11:44, Mikhail Gavrilov > wrote: >> [28616.429757] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908: >> inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent: >> invalid extent entries - magic f30a, entries 8, max 340(340), depth >> 0(0) I had a similar problem today: EXT4-fs error (device dm-0): ext4_find_extent:908: inode #8: comm jbd2/dm-0-8: pblk 117997567 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0) I am using dm-crypt on SATA disk.
[bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries
Hi folks. Yesterday I updated kernel to 5.2 (git commit 7e9890a3500d) I always leave computer working at night. Today at morning I am found that computer are hanged. I was connect via ssh and look at kernel log. There I had seen strange records which I never seen before: [28616.429757] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908: inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0) [28616.430602] jbd2_journal_bmap: journal block not found at offset 4383 on nvme0n1p2-8 [28616.430610] Aborting journal on device nvme0n1p2-8. [28616.432474] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:61: Detected aborted journal [28616.432489] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:61: Detected aborted journal [28616.432551] EXT4-fs (nvme0n1p2): Remounting filesystem read-only [28616.432690] EXT4-fs (nvme0n1p2): ext4_writepages: jbd2_start: 9223372036854775791 pages, ino 3285789; err -30 [28616.432692] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:61: Detected aborted journal After reboot computer and running fsck system looks like working. But I am afraid that it could happens again and I may lost all my data. How safe this error and what does it mean? It a bug of kernel 5.2 or not? -- Best Regards, Mike Gavrilov.
Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel
On 12/05/2017 10:42 AM, Randy Dunlap wrote: On 12/05/2017 06:55 AM, Daniel Thompson wrote: On 05/12/17 14:37, Jason Wessel wrote: I have a series of 50+ patches for kgdb/kdb/usb which have never been published. I am not saying that we actually need any of those patches, but it would be nice to let the community decide, and we can see if there is anything worth merging into the next cycle or future work with other maintainers. My kernel.org tree stopped working a long time ago, probably from inactivity. I'll see if that can get restored in the next few days, or I'll use my github tree and send the unpublished work to the mailing list as an RFC. I, for one, would be interested to see these. Me also. I have 3 kdb patches that I just made. If you have some patches please do send them along to the list. I have added Daniel as an additional maintainer for when I am not around. We are open for business again now that my kernel.org tree accepts my tag signing again. It will take some time to go through these unpublished patches to see what is actually relevant, but I'll posting some of them to the mailing reasonably list soon. Cheers, Jason. ps While on the topic of debuggers... I was thinking it might be interesting to have a gdb-serial stub in an FPGA for debugging the kernel not unlike what was done with the firewire debugger that Andi Kleen worked on long ago. I am not exactly sure what kind of run control options exist there but in terms of accessing memory it would certainly be plausible to access it. One option I know that is plausible for run control is a small kernel interrupt handler perhaps for the run control interface based on the fact you can some FPGAs show up like a PCI device. While I haven't been directly working in upstream linux in last year or two, I still do plenty of debugging of full systems with simulators, hardware, and now FPGAs too. :-)
Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel
On 12/05/2017 06:55 AM, Daniel Thompson wrote: > On 05/12/17 14:37, Jason Wessel wrote: >> On 12/05/2017 08:09 AM, Lee Jones wrote: >>> On Tue, 05 Dec 2017, Daniel Thompson wrote: >>> ... with many, many thanks for Jason for all his hard work. Cc: Jason Wessel Signed-off-by: Daniel Thompson --- Notes: Over the years Jason has become increasingly hard to get hold off and I think he must now be regarded as inactive. Patches in kgdb-next (mine as it happens) have been there for over a year without a corresponding pull request and a couple of architecture specific kgdb fixes have ended up missing a release cycle (or two) as the architecture maintainer waits for an Acked-by from Jason. In the past I've had to rely on Andrew M. to land my own changes to kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash"). That I was sharing surrogate acks convinced me we need a change here and I've offered Jason help via private e-mail without reply. So, I really would prefer it it if this patch listed me as a co-maintainer or, failing that, as least had Jason's blessing... but it doesn't. I certainly suggest this patch takes a long time in review, and if it doesn't attract Jason's attention then I can only reiterate what is says in the commit log: Thanks Jason! MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> It looks like Jason has been inactive in all aspects of upstream >>> maintainership and as a contributor for well over a year now. >> >> I have not been working directly on upstream kernel contributions for quite >> some time. It doesn't mean I haven't been involved with kernel development. >> Patches that I have reviewed or suggested to other developers generally >> don't bare my name. I wouldn't mind trying to take a slightly more gradual >> passing of the baton and add Daniel as co-maintainer for a while before I >> retire from kernel work and merge myself away in the coming years. :-) > > Great to hear from you again! I shall consider this patch nacked or the time > being ;-)... and if you are happy with help from me I shall leave it to you > to propose an update to MAINTAINERS. > > >> I have a series of 50+ patches for kgdb/kdb/usb which have never been >> published. I am not saying that we actually need any of those patches, but >> it would be nice to let the community decide, and we can see if there is >> anything worth merging into the next cycle or future work with other >> maintainers. My kernel.org tree stopped working a long time ago, probably >> from inactivity. I'll see if that can get restored in the next few days, or >> I'll use my github tree and send the unpublished work to the mailing list as >> an RFC. > > I, for one, would be interested to see these. Me also. I have 3 kdb patches that I just made. -- ~Randy
Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel
On 05/12/17 14:37, Jason Wessel wrote: On 12/05/2017 08:09 AM, Lee Jones wrote: On Tue, 05 Dec 2017, Daniel Thompson wrote: ... with many, many thanks for Jason for all his hard work. Cc: Jason Wessel Signed-off-by: Daniel Thompson --- Notes: Over the years Jason has become increasingly hard to get hold off and I think he must now be regarded as inactive. Patches in kgdb-next (mine as it happens) have been there for over a year without a corresponding pull request and a couple of architecture specific kgdb fixes have ended up missing a release cycle (or two) as the architecture maintainer waits for an Acked-by from Jason. In the past I've had to rely on Andrew M. to land my own changes to kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash"). That I was sharing surrogate acks convinced me we need a change here and I've offered Jason help via private e-mail without reply. So, I really would prefer it it if this patch listed me as a co-maintainer or, failing that, as least had Jason's blessing... but it doesn't. I certainly suggest this patch takes a long time in review, and if it doesn't attract Jason's attention then I can only reiterate what is says in the commit log: Thanks Jason! MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) It looks like Jason has been inactive in all aspects of upstream maintainership and as a contributor for well over a year now. I have not been working directly on upstream kernel contributions for quite some time. It doesn't mean I haven't been involved with kernel development. Patches that I have reviewed or suggested to other developers generally don't bare my name. I wouldn't mind trying to take a slightly more gradual passing of the baton and add Daniel as co-maintainer for a while before I retire from kernel work and merge myself away in the coming years. :-) Great to hear from you again! I shall consider this patch nacked or the time being ;-)... and if you are happy with help from me I shall leave it to you to propose an update to MAINTAINERS. I have a series of 50+ patches for kgdb/kdb/usb which have never been published. I am not saying that we actually need any of those patches, but it would be nice to let the community decide, and we can see if there is anything worth merging into the next cycle or future work with other maintainers. My kernel.org tree stopped working a long time ago, probably from inactivity. I'll see if that can get restored in the next few days, or I'll use my github tree and send the unpublished work to the mailing list as an RFC. I, for one, would be interested to see these. Daniel.
Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel
On Tue, 05 Dec 2017, Jason Wessel wrote: > On 12/05/2017 08:09 AM, Lee Jones wrote: > > On Tue, 05 Dec 2017, Daniel Thompson wrote: > > > > > ... with many, many thanks for Jason for all his hard work. > > > > > > Cc: Jason Wessel > > > Signed-off-by: Daniel Thompson > > > --- > > > > > > Notes: > > > Over the years Jason has become increasingly hard to get hold off > > > and I think he must now be regarded as inactive. > > > Patches in kgdb-next (mine as it happens) have been there for over a > > > year without a corresponding pull request and a couple of > > > architecture > > > specific kgdb fixes have ended up missing a release cycle (or two) as > > > the architecture maintainer waits for an Acked-by from Jason. > > > In the past I've had to rely on Andrew M. to land my own changes to > > > kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649 > > > ("x86/debug: Handle warnings before the notifier chain, to fix KGDB > > > crash"). That I was sharing surrogate acks convinced me we need a > > > change here and I've offered Jason help via private e-mail without > > > reply. > > > So, I really would prefer it it if this patch listed me as a > > > co-maintainer or, failing that, as least had Jason's blessing... but > > > it doesn't. I certainly suggest this patch takes a long time in > > > review, and if it doesn't attract Jason's attention then I can only > > > reiterate what is says in the commit log: Thanks Jason! > > > > > > MAINTAINERS | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > It looks like Jason has been inactive in all aspects of upstream > > maintainership and as a contributor for well over a year now. > > I have not been working directly on upstream kernel contributions > for quite some time. It doesn't mean I haven't been involved with > kernel development. Patches that I have reviewed or suggested to > other developers generally don't bare my name. I wouldn't mind > trying to take a slightly more gradual passing of the baton and add > Daniel as co-maintainer for a while before I retire from kernel work > and merge myself away in the coming years. :-) > > I have a series of 50+ patches for kgdb/kdb/usb which have never > been published. I am not saying that we actually need any of those > patches, but it would be nice to let the community decide, and we > can see if there is anything worth merging into the next cycle or > future work with other maintainers. My kernel.org tree stopped > working a long time ago, probably from inactivity. I'll see if that > can get restored in the next few days, or I'll use my github tree > and send the unpublished work to the mailing list as an RFC. And > for what it is worth if none of this happens by the end of 4.16, by > all means Daniel has my blessing to be the sole maintainer. Thanks for your reply Jason. Sounds like a perfectly reasonable way forward. -- Lee Jones Linaro STMicroelectronics Landing Team Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel
On 12/05/2017 08:09 AM, Lee Jones wrote: On Tue, 05 Dec 2017, Daniel Thompson wrote: ... with many, many thanks for Jason for all his hard work. Cc: Jason Wessel Signed-off-by: Daniel Thompson --- Notes: Over the years Jason has become increasingly hard to get hold off and I think he must now be regarded as inactive. Patches in kgdb-next (mine as it happens) have been there for over a year without a corresponding pull request and a couple of architecture specific kgdb fixes have ended up missing a release cycle (or two) as the architecture maintainer waits for an Acked-by from Jason. In the past I've had to rely on Andrew M. to land my own changes to kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB crash"). That I was sharing surrogate acks convinced me we need a change here and I've offered Jason help via private e-mail without reply. So, I really would prefer it it if this patch listed me as a co-maintainer or, failing that, as least had Jason's blessing... but it doesn't. I certainly suggest this patch takes a long time in review, and if it doesn't attract Jason's attention then I can only reiterate what is says in the commit log: Thanks Jason! MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) It looks like Jason has been inactive in all aspects of upstream maintainership and as a contributor for well over a year now. I have not been working directly on upstream kernel contributions for quite some time. It doesn't mean I haven't been involved with kernel development. Patches that I have reviewed or suggested to other developers generally don't bare my name. I wouldn't mind trying to take a slightly more gradual passing of the baton and add Daniel as co-maintainer for a while before I retire from kernel work and merge myself away in the coming years. :-) I have a series of 50+ patches for kgdb/kdb/usb which have never been published. I am not saying that we actually need any of those patches, but it would be nice to let the community decide, and we can see if there is anything worth merging into the next cycle or future work with other maintainers. My kernel.org tree stopped working a long time ago, probably from inactivity. I'll see if that can get restored in the next few days, or I'll use my github tree and send the unpublished work to the mailing list as an RFC. And for what it is worth if none of this happens by the end of 4.16, by all means Daniel has my blessing to be the sole maintainer. Many thanks to Daniel for his contributions! Cheers, Jason.
Re: [PATCH] [BUGREPORT] media: v4l: omap_vout: vrfb: initialize DMA flags
Arnd, sorry for the delayed response, I was away w/o internet connection for the past weeks. On 2017-07-10 14:18, Arnd Bergmann wrote: > Passing uninitialized flags into device_prep_interleaved_dma is clearly > a bad idea, and we get a compiler warning for it: > > drivers/media/platform/omap/omap_vout_vrfb.c: In function > 'omap_vout_prepare_vrfb': > drivers/media/platform/omap/omap_vout_vrfb.c:273:5: error: 'flags' may be > used uninitialized in this function [-Werror=maybe-uninitialized] I can not explain why I have missed this. > It seems that the OMAP dmaengine ignores the flags, but we should > pick the right ones anyway. Unfortunately I don't know what they > should be, so I just picked the most common flags. Please set the > right flags here and fold the modified patch. The flags are fine. > > Fixes: 6a1560ecaa8c ("media: v4l: omap_vout: vrfb: Convert to dmaengine") > Signed-off-by: Arnd Bergmann Acked-by: Peter Ujfalusi > --- > drivers/media/platform/omap/omap_vout_vrfb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/media/platform/omap/omap_vout_vrfb.c > b/drivers/media/platform/omap/omap_vout_vrfb.c > index 45a553d4f5b2..fed28b6bbbc0 100644 > --- a/drivers/media/platform/omap/omap_vout_vrfb.c > +++ b/drivers/media/platform/omap/omap_vout_vrfb.c > @@ -233,7 +233,7 @@ int omap_vout_prepare_vrfb(struct omap_vout_device *vout, > struct videobuf_buffer *vb) > { > struct dma_async_tx_descriptor *tx; > - enum dma_ctrl_flags flags; > + enum dma_ctrl_flags flags = DMA_PREP_INTERRUPT | DMA_CTRL_ACK; > struct dma_chan *chan = vout->vrfb_dma_tx.chan; > struct dma_device *dmadev = chan->device; > struct dma_interleaved_template *xt = vout->vrfb_dma_tx.xt; > - Péter
[PATCH] [BUGREPORT] media: v4l: omap_vout: vrfb: initialize DMA flags
Passing uninitialized flags into device_prep_interleaved_dma is clearly a bad idea, and we get a compiler warning for it: drivers/media/platform/omap/omap_vout_vrfb.c: In function 'omap_vout_prepare_vrfb': drivers/media/platform/omap/omap_vout_vrfb.c:273:5: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized] It seems that the OMAP dmaengine ignores the flags, but we should pick the right ones anyway. Unfortunately I don't know what they should be, so I just picked the most common flags. Please set the right flags here and fold the modified patch. Fixes: 6a1560ecaa8c ("media: v4l: omap_vout: vrfb: Convert to dmaengine") Signed-off-by: Arnd Bergmann --- drivers/media/platform/omap/omap_vout_vrfb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/platform/omap/omap_vout_vrfb.c b/drivers/media/platform/omap/omap_vout_vrfb.c index 45a553d4f5b2..fed28b6bbbc0 100644 --- a/drivers/media/platform/omap/omap_vout_vrfb.c +++ b/drivers/media/platform/omap/omap_vout_vrfb.c @@ -233,7 +233,7 @@ int omap_vout_prepare_vrfb(struct omap_vout_device *vout, struct videobuf_buffer *vb) { struct dma_async_tx_descriptor *tx; - enum dma_ctrl_flags flags; + enum dma_ctrl_flags flags = DMA_PREP_INTERRUPT | DMA_CTRL_ACK; struct dma_chan *chan = vout->vrfb_dma_tx.chan; struct dma_device *dmadev = chan->device; struct dma_interleaved_template *xt = vout->vrfb_dma_tx.xt; -- 2.9.0
Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()
On Thu, Oct 27, 2016 at 10:02:10PM -0400, Al Viro wrote: > ... and frankly, backporting 548acf19234d would be my preference. It's a bit > more intrusive than needed (_ASM_EXTABLE_FAULT is used only in > memcpy_mcsafe(), > which is used only by pmem and it's the only reason for passing the trap > number to fixup_exception()), but AFAICS it's fairly safe. Objections? I've grabbed 548acf19234d for 4.1, thanks! -- Thanks, Sasha
Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()
On Fri, Oct 28, 2016 at 08:49:58PM +0100, Al Viro wrote: > On Fri, Oct 28, 2016 at 11:21:24AM -0700, Linus Torvalds wrote: > > > End result: either commit 1c109fabbd51 shouldn't be backported (it's > > really not that important - if people properly check the exception > > error results it shouldn't matter), or you need to also backport > > 548acf19234d as Al suggested. > > > > I'd be inclined to say "don't backport 1c109fabbd51", but it's really > > a judgment call. > > *nod* > > FWIW, that infoleak _does_ allow to leak an uninitialized word into > coredump (in sigreturn the value from uninitialized local variable is > copied into pt_regs of process and when we eventually check that error > has happened and hit the sucker with SIGSEGV, that value gets stored into > the coredump), but in the worst case that's 64 bits leaked from fixed depth > in the kernel stack of attacker's process, with fixed call chain. > > I very much doubt that it's escalatable to anything practically interesting. > If spender et.al. can come up with a usable way to escalate that, I would be > quite surprised (and would love to see the details), but hey, it might be > possible. More likely possibility is that the bug is harmless in practice. Hm, I think I'll backport 548acf19234d to 4.4-stable, as people have shown that leaking anything can be used in odd ways that they shouldn't be, just to be "safe" :) thanks for the heads up. greg k-h
Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()
On Fri, Oct 28, 2016 at 11:21:24AM -0700, Linus Torvalds wrote: > End result: either commit 1c109fabbd51 shouldn't be backported (it's > really not that important - if people properly check the exception > error results it shouldn't matter), or you need to also backport > 548acf19234d as Al suggested. > > I'd be inclined to say "don't backport 1c109fabbd51", but it's really > a judgment call. *nod* FWIW, that infoleak _does_ allow to leak an uninitialized word into coredump (in sigreturn the value from uninitialized local variable is copied into pt_regs of process and when we eventually check that error has happened and hit the sucker with SIGSEGV, that value gets stored into the coredump), but in the worst case that's 64 bits leaked from fixed depth in the kernel stack of attacker's process, with fixed call chain. I very much doubt that it's escalatable to anything practically interesting. If spender et.al. can come up with a usable way to escalate that, I would be quite surprised (and would love to see the details), but hey, it might be possible. More likely possibility is that the bug is harmless in practice.
Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()
On Fri, Oct 28, 2016 at 12:40:33PM -0400, Joe Korty wrote: > Backporting 548acf19234d to 4.1.35 does indeed fix the > issue. However, it is not clear to my _why_ it works, > so it might be better that someone else push the backport > to stable. Because the trick used in fixup_exception() prior to that commit depended upon the handler being very close to faulting instruction. # define _ASM_EXTABLE_EX(from,to) \ .pushsection "__ex_table","a" ; \ .balign 8 ; \ .long (from) - . ; \ .long (to) - . + 0x7ff0 ; \ .popsection puts a recognizable value (handler + offset a bit under 2G) into ->fixup and in fixup_exception() we had if (fixup->fixup - fixup->insn >= 0x7ff0 - 4) { /* Special hack for uaccess_err */ current_thread_info()->uaccess_err = 1; new_ip -= 0x7ff0; } checking that the value in ->fixup is just below 2G from the faulting instruction. So _ASM_EXTABLE_EX relied upon the handler very close to the faulting insn, and worked only because all of its uses had been "set the ->uaccess_err and continue immediately past the faulting insn". When the kludge in fixup_exception() had been eliminated (check what it and _ASM_EXTABLE_EX do these days) this restriction has disappeared, so the mainline commit had no problems. Backport to 4.1 had it run afoul of that restriction, with the results you've observed - this "handler + constant offset" had _not_ been recognized as that magic and had been interpreted as handler being about 2G away from its actual location. That's where the bogus RIP in your oopsen have come from.
Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()
rea On Fri, Oct 28, 2016 at 9:40 AM, Joe Korty wrote: > > Backporting 548acf19234d to 4.1.35 does indeed fix the > issue. However, it is not clear to my _why_ it works, > so it might be better that someone else push the backport > to stable. The problem is that the old _ASM_EXTABLE_EXT hackery ends up being this code in fixup_exception() back in 4.1 (and later). if (fixup->fixup - fixup->insn >= 0x7ff0 - 4) { /* Special hack for uaccess_err */ current_thread_info()->uaccess_err = 1; new_ip -= 0x7ff0; } and it really does depend very intimately on the relationship with the "fixup" address (fixup->fixup) with the instruction that took the fault (fixup->insn). Now, back in the original 4.1 days, that fixup-vs-insn relationship was trivially always the case, since __get_user_asm_ex() always just made the fixup be to fall through to the next instruction. However, when commit 1c109fabbd51 ("fix minor infoleak in get_user_ex()") was backported, now the fixup for __get_user_asm_ex() ends up being in a different section entirely (".section .fixup"), and the close relationship between the faulting instruction and the fixup instruction went away. End result: commit 1c109fabbd51lly effectively and very subtly depends on commit 548acf19234d (introduced in v4.6) that gets rid of the special hack. Adding "stable" to the cc, because this might well affect other stable backports than 4.1. End result: either commit 1c109fabbd51 shouldn't be backported (it's really not that important - if people properly check the exception error results it shouldn't matter), or you need to also backport 548acf19234d as Al suggested. I'd be inclined to say "don't backport 1c109fabbd51", but it's really a judgment call. Linus
[4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()
On Fri, Oct 28, 2016 at 01:03:55AM +0100, Al Viro wrote: > On Thu, Oct 27, 2016 at 03:32:10PM -0400, Joe Korty wrote: [oops in 4.1.35, bisected to 319fe1151940] > > The following test program can be used to trigger the problem: > > > > /* gcc -m32 c.c -o c */ > > #define _GNU_SOURCE > > #include > > #include > > #include > > #include > > #include > > > > #define rt_sigqueueinfo 178 > > > > int main(int argc, char **argv) { > > int stat = syscall(rt_sigqueueinfo, 0, 0, 0, 0, 0, 0); > > printf("syscall(%d): stat: %d, errno: %d\n", > >rt_sigqueueinfo, stat, errno); > > return 0; > > } > > > > This is under 4.1.35 on x86_64. > > AFAICS, it steps on _ASM_EXTABLE_EX being more brittle in 4.1 - it pretty > much has to have the handler on the next insn after the faulting one, or > the resulting extable entry won't be recognized. This > "x86/mm: Expand the exception table logic to allow new handling options" > in mainline is where that requirement has disappeared. I think we > ought to use the plain _ASM_EXTABLE and just call something that would > set current_thread_info()->uaccess_err directly from the fixup code there. > That, or backport the commit switching to less brittle extables. ... and frankly, backporting 548acf19234d would be my preference. It's a bit more intrusive than needed (_ASM_EXTABLE_FAULT is used only in memcpy_mcsafe(), which is used only by pmem and it's the only reason for passing the trap number to fixup_exception()), but AFAICS it's fairly safe. Objections?
Re: BUGreport: fix minor infoleak in get_user_ex()
On Thu, Oct 27, 2016 at 03:32:10PM -0400, Joe Korty wrote: > Hi Al, > I don't know if this is worth fixing or not, but I thought > I would mention it in case it was. > > A git bisect search shows that the commit: > > commit 319fe11519401e8a5db191a0a93aa2c1d7bb59f4 > Author: Al Viro > Date: Thu Sep 15 02:35:29 2016 +0100 > > causes some malformed rt_sigqueueinfo syscalls, executed under > x86_64 kernels running compat mode programs, to oops with > the following message: > > [ 66.054786] BUG: unable to handle kernel paging request at 020573eb > [ 66.061793] IP: [<020573eb>] 0x20573eb > [ 66.066251] PGD 122263067 PUD 120a0c067 PMD 0 > [ 66.070745] Oops: 0010 [#1] PREEMPT SMP > [ 66.074717] Modules linked in: > [ 66.077789] CPU: 7 PID: 5496 Comm: cc Not tainted 4.1.35 #1 > [ 66.083365] Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014 > 10/22/2009 > [ 66.090582] task: 88006b044400 ti: 88006b30 task.ti: > 88006b30 > [ 66.098067] RIP: 0010:[<020573eb>] [<020573eb>] 0x20573eb > [ 66.104961] RSP: 0018:88006b303e98 EFLAGS: 00010246 > [ 66.110269] RAX: 7fffef80 RBX: RCX: > > [ 66.117399] RDX: 88006b304000 RSI: RDI: > 88006b303ea8 > [ 66.124528] RBP: 88006b303e98 R08: R09: > > [ 66.131667] R10: R11: 0246 R12: > > [ 66.138805] R13: 88006b303ea8 R14: R15: > > [ 66.145935] FS: 77fca740() GS:880127d8(0063) > knlGS:f7df06c0 > [ 66.154027] CS: 0010 DS: 002b ES: 002b CR0: 8005003b > [ 66.159777] CR2: 020573eb CR3: 00011f874000 CR4: > 06e0 > [ 66.166906] Stack: > [ 66.168919] 88006b303f48 8107c4b5 > > [ 66.176403] > > [ 66.183872] > > [ 66.191341] Call Trace: > [ 66.193802] [] compat_SyS_rt_sigqueueinfo+0x45/0x70 > [ 66.200340] [] cstar_dispatch+0x7/0x2a > [ 66.205755] Code: Bad RIP value. > [ 66.209103] RIP [<020573eb>] 0x20573eb > [ 66.213648] RSP > [ 66.217134] CR2: 020573eb > [ 66.220505] ---[ end trace 4f88266d7fd7e6d7 ]--- > The following test program can be used to trigger the problem: > > /* gcc -m32 c.c -o c */ > #define _GNU_SOURCE > #include > #include > #include > #include > #include > > #define rt_sigqueueinfo 178 > > int main(int argc, char **argv) { > int stat = syscall(rt_sigqueueinfo, 0, 0, 0, 0, 0, 0); > printf("syscall(%d): stat: %d, errno: %d\n", >rt_sigqueueinfo, stat, errno); > return 0; > } > > This is under 4.1.35 on x86_64. AFAICS, it steps on _ASM_EXTABLE_EX being more brittle in 4.1 - it pretty much has to have the handler on the next insn after the faulting one, or the resulting extable entry won't be recognized. This "x86/mm: Expand the exception table logic to allow new handling options" in mainline is where that requirement has disappeared. I think we ought to use the plain _ASM_EXTABLE and just call something that would set current_thread_info()->uaccess_err directly from the fixup code there. That, or backport the commit switching to less brittle extables.
Re: Official bugreport 4.1 kernel (audio gadget and ChipIdea)
On Tuesday, June 30, 2015 at 04:23:01 AM, Peter Chen wrote: > On Fri, Jun 26, 2015 at 07:15:18PM +0200, Sébastien Pruvost wrote: > > Hello, > > > > I'm sending this mail to report a bug concerning the latest kernel 4.1. > > > > Here is the problem (and the test I've done): > > I have firstly used the 3.10.53 kernel for my two > > sabrelites in > > > > order to use the audio gadget driver with the Dual Role ChipIdea > > Controller (in order to switch roles between my two IMX6 sabreLite). > > After loading g_audio in my two sabreLite and plugging the cable (microA > > – microB), there is an error “ci_hdrc.0 request length too big for > > isochronous snd_uac2.0 1116 Error”. > > And even after running aplay command, I still got this error and there is > > no sound getting out of the jack port. > > I've switched roles between the two boards by following this: https:// > > www.kernel.org/doc/Documentation/usb/chipidea.txt. > > This works fine with the serial driver, I can see a new serial interface > > (host side) and after switching role a new serial interfaces at device > > side. Same thing for ethernet gadget: this works fine too. But not with > > the audio gadget. In fact, there is a new audio interface at host side > > but I can not interact with it (even alsamixer doesn’t see any controls > > on this new sound card). I’ve tested that audio gadget works fine if I > > don’t use ChipIdea HighSpeed Dual Role Controller. > > > > Secondly I have tested this audio gadget with the latest > > Kernel > > > > 4.1 for my two IMX6 sabrelites (imx_v6_v7_defconfig). Now these previous > > errors are gone but there are still no sound getting out of the jack > > port (even if there are a new sound card in host side) > > It is may not a role switch problem, please check if the g_audio can > work well with an ubuntu PC (make sure your codec works well). ci_hdrc.0 request length too big for isochronous Doesn't this just mean it cannot transfer such a long buffer via ISO pipe ? I guess the UAC should send smaller buffers to work with the CI HDRC? Best regards, Marek Vasut -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Official bugreport 4.1 kernel (audio gadget and ChipIdea)
On Fri, Jun 26, 2015 at 07:15:18PM +0200, Sébastien Pruvost wrote: > Hello, > > I'm sending this mail to report a bug concerning the latest kernel 4.1. > > Here is the problem (and the test I've done): > > I have firstly used the 3.10.53 kernel for my two sabrelites > in > order to use the audio gadget driver with the Dual Role ChipIdea Controller > (in > order to switch roles between my two IMX6 sabreLite). > After loading g_audio in my two sabreLite and plugging the cable (microA – > microB), there is an error “ci_hdrc.0 request length too big for isochronous > snd_uac2.0 1116 Error”. > And even after running aplay command, I still got this error and there is no > sound getting out of the jack port. > I've switched roles between the two boards by following this: https:// > www.kernel.org/doc/Documentation/usb/chipidea.txt. > This works fine with the serial driver, I can see a new serial interface (host > side) and after switching role a new serial interfaces at device side. Same > thing for ethernet gadget: this works fine too. But not with the audio gadget. > In fact, there is a new audio interface at host side but I can not interact > with it (even alsamixer doesn’t see any controls on this new sound card). I’ve > tested that audio gadget works fine if I don’t use ChipIdea HighSpeed Dual > Role > Controller. > > > > Secondly I have tested this audio gadget with the latest > Kernel > 4.1 for my two IMX6 sabrelites (imx_v6_v7_defconfig). Now these previous > errors > are gone but there are still no sound getting out of the jack port (even if > there are a new sound card in host side) > It is may not a role switch problem, please check if the g_audio can work well with an ubuntu PC (make sure your codec works well). > > I think this needs a patch to fix that. > Best regards > > Sébastien Pruvost. > -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [RFC v5 - RESEND] debug: prevent entering debug mode on panic/exception.
On 28/01/15 10:39, Kiran Raparthy wrote: > From: Colin Cross > > debug: prevent entering debug mode on panic/exception. > > On non-developer devices, kgdb prevents the device from rebooting > after a panic. > > Incase of panics and exceptions, to allow the device to reboot, prevent > entering > debug mode to avoid getting stuck waiting for the user to interact with > debugger. > > To avoid entering the debugger on panic/exception without any extra > configuration, > panic_timeout is being used which can be set via /proc/sys/kernel/panic at > run time > and CONFIG_PANIC_TIMEOUT sets the default value. > > Setting panic_timeout indicates that the user requested machine to perform > unattended reboot after panic. We dont want to get stuck waiting for the user > input incase of panic. Some kind of changelog between the versions would have been nice. I *think* the difference between v4 and v5 was just the addition paragraph above but I had to put in extra work to check that and I'm still not 100% sure that's the only change. Also you could start billing this as a PATCH rather than an RFC. Daniel. > Cc: Jason Wessel > Cc: Andrew Morton > Cc: kgdb-bugrep...@lists.sourceforge.net > Cc: linux-kernel@vger.kernel.org > Cc: Android Kernel Team > Cc: John Stultz > Cc: Sumit Semwal > Signed-off-by: Colin Cross > [Kiran: Added context to commit message. > panic_timeout is used instead of break_on_panic and > break_on_exception to honor CONFIG_PANIC_TIMEOUT > Modified the commit as per community feedback] > Signed-off-by: Kiran Raparthy > Reviewed-by: Daniel Thompson > --- > kernel/debug/debug_core.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c > index 1adf62b..0012a1f 100644 > --- a/kernel/debug/debug_core.c > +++ b/kernel/debug/debug_core.c > @@ -689,6 +689,14 @@ kgdb_handle_exception(int evector, int signo, int ecode, > struct pt_regs *regs) > > if (arch_kgdb_ops.enable_nmi) > arch_kgdb_ops.enable_nmi(0); > + /* > + * Avoid entering the debugger if we were triggered due to an oops > + * but panic_timeout indicates the system should automatically > + * reboot on panic. We don't want to get stuck waiting for input > + * on such systems, especially if its "just" an oops. > + */ > + if (signo != SIGTRAP && panic_timeout) > + return 1; > > memset(ks, 0, sizeof(struct kgdb_state)); > ks->cpu = raw_smp_processor_id(); > @@ -821,6 +829,15 @@ static int kgdb_panic_event(struct notifier_block *self, > unsigned long val, > void *data) > { > + /* > + * Avoid entering the debugger if we were triggered due to a panic > + * We don't want to get stuck waiting for input from user in such case. > + * panic_timeout indicates the system should automatically > + * reboot on panic. > + */ > + if (panic_timeout) > + return NOTIFY_DONE; > + > if (dbg_kdb_mode) > kdb_printf("PANIC: %s\n", (char *)data); > kgdb_breakpoint(); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [RFC v5 - RESEND] debug: prevent entering debug mode on panic/exception.
On 28 January 2015 at 16:25, Daniel Thompson wrote: > On 28/01/15 10:39, Kiran Raparthy wrote: >> From: Colin Cross >> >> debug: prevent entering debug mode on panic/exception. >> >> On non-developer devices, kgdb prevents the device from rebooting >> after a panic. >> >> Incase of panics and exceptions, to allow the device to reboot, prevent >> entering >> debug mode to avoid getting stuck waiting for the user to interact with >> debugger. >> >> To avoid entering the debugger on panic/exception without any extra >> configuration, >> panic_timeout is being used which can be set via /proc/sys/kernel/panic at >> run time >> and CONFIG_PANIC_TIMEOUT sets the default value. >> >> Setting panic_timeout indicates that the user requested machine to perform >> unattended reboot after panic. We dont want to get stuck waiting for the user >> input incase of panic. > > Some kind of changelog between the versions would have been nice. I > *think* the difference between v4 and v5 was just the addition paragraph > above but I had to put in extra work to check that and I'm still not > 100% sure that's the only change. Since the change was related to only commit message on all the earlier patches,i didn't update the history. Anyways i'll ensure to include the history going forward. > > Also you could start billing this as a PATCH rather than an RFC. Since i didn't get any acknowledgement,i didn't upgrade it to PATCH. Anyways,i'll update and resend the patch,Thanks for the inputs. > > > Daniel. > > >> Cc: Jason Wessel >> Cc: Andrew Morton >> Cc: kgdb-bugrep...@lists.sourceforge.net >> Cc: linux-kernel@vger.kernel.org >> Cc: Android Kernel Team >> Cc: John Stultz >> Cc: Sumit Semwal >> Signed-off-by: Colin Cross >> [Kiran: Added context to commit message. >> panic_timeout is used instead of break_on_panic and >> break_on_exception to honor CONFIG_PANIC_TIMEOUT >> Modified the commit as per community feedback] >> Signed-off-by: Kiran Raparthy >> Reviewed-by: Daniel Thompson >> --- >> kernel/debug/debug_core.c | 17 + >> 1 file changed, 17 insertions(+) >> >> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c >> index 1adf62b..0012a1f 100644 >> --- a/kernel/debug/debug_core.c >> +++ b/kernel/debug/debug_core.c >> @@ -689,6 +689,14 @@ kgdb_handle_exception(int evector, int signo, int >> ecode, struct pt_regs *regs) >> >> if (arch_kgdb_ops.enable_nmi) >> arch_kgdb_ops.enable_nmi(0); >> + /* >> + * Avoid entering the debugger if we were triggered due to an oops >> + * but panic_timeout indicates the system should automatically >> + * reboot on panic. We don't want to get stuck waiting for input >> + * on such systems, especially if its "just" an oops. >> + */ >> + if (signo != SIGTRAP && panic_timeout) >> + return 1; >> >> memset(ks, 0, sizeof(struct kgdb_state)); >> ks->cpu = raw_smp_processor_id(); >> @@ -821,6 +829,15 @@ static int kgdb_panic_event(struct notifier_block *self, >> unsigned long val, >> void *data) >> { >> + /* >> + * Avoid entering the debugger if we were triggered due to a panic >> + * We don't want to get stuck waiting for input from user in such case. >> + * panic_timeout indicates the system should automatically >> + * reboot on panic. >> + */ >> + if (panic_timeout) >> + return NOTIFY_DONE; >> + >> if (dbg_kdb_mode) >> kdb_printf("PANIC: %s\n", (char *)data); >> kgdb_breakpoint(); >> > Regards, Kiran -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
Hi Thomas, Thanks for your reply! >> nf_defrag_ipv4 xt_state nf_conntrack usr_cache(O) acpi_cpufreq mperf >> processor thermal_sys sg hwmon iptable_filter ip_tables x_tables ixgbe(O) >> igb(O) bonding(O) tg(O) netmgmt(O) drvinstall(PO) dal(PO) dca usb_storage(O) >> uhci_hcd ehci_hcd usbcore(O) usb_common sata_mml(O) ahci(O) libata(O) >> satahp(O) drvframe(PO) sd_mod crc_t10dif scsi_mod agetty_interface(PO) ext3 >> jbd mbcache nvram_printk(PO) os_feeddog(PO) os_debug(O) osp_proc(PO) >> os_rnvramdev(PO) vos(O) bsp(PO) os_die_handler(O) os_oom_handler(O) >> os_panic_handler(O) biosnvramdriver(O) kbox(O) >> [2012-03-26 18:55:43][ 929.252460] Pid: 17495, comm: 3th SioT Tainted: P >>O 3.4.24.15-0.11-default #1 > > You have loaded a gazillion of proprietary and out of tree modules and > your kernel is tainted 'P'. > > None of our problems. See: > > http://lwn.net/1999/0211/a/lt-binary.html > > https://lwn.net/Articles/287056/ > > I'm in a good mood today and give you some hints: > > - Ingos patch is correct and always has been for RT. > > - We had not a single bug report against this in almost 10 years. > > - File your bugs to those who abuse our work and violate our license. Actually, we applied the RT stable patch above the Greg 3.4 stable tree. No other changes in schedule filed. I will try to find out what codes cause this issue. Thanks! Yijing. > > Case closed. > > tglx > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
>> Because this patch does not exist in the latest Linus kernel, so I >> have not reported this issue to kernel bugzilla. > > This patch exists in all -RT releases up to 3.12. If there is an issue > with it, it should be solved. > > If the sched bit set is and you can't get lock later then the tasklet > has be to active. Finally, not getting the lock in the tasklet code > itself means it is still occupied by the "add-to-the-list" part which > actually can't happen according to the code. > You said, that you have an eight-way. Is this also NUMA? If so, does > this problem happen if you disable NUMA (i.e. run only one NUMA node > and use only the memory that is directly attached to the node). Hi Sebastian, The target platform is not NUMA, there is only one node. Thanks! Yijing. > > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
On Mon, 3 Mar 2014, Yijing Wang wrote: > Hi list, >I found a tasklet related issue in linux-stable-rt 3.4. > > And after I revert following commit, the test result seems ok(test lasted > 40hours). > > commit 0d9f73fc1e7270a3f8709c59c913408153d9d9f8 This commit id does not exist in the official stable rt tree. > Author: Ingo Molnar > Date: Tue Nov 29 20:18:22 2011 -0500 > > tasklet: Prevent tasklets from going into infinite spin in RT > > I test FC driver IO in this kernel, and after a few hours test, FC IO will > abort, I found a lot of tasklet WARNING Call Trace in kernel message,like: > > [2012-03-26 18:55:43][ 929.252289] [ cut here ] > [2012-03-26 18:55:43][ 929.252312] WARNING: at kernel/softirq.c:773 > __tasklet_action+0x51/0x1a0() There is no warning at line 773 in any official linux-stable-rt 3.4. > [2012-03-26 18:55:43][ 929.252314] Hardware name: Romley > [2012-03-26 18:55:43][ 929.252316] Modules linked in: isd_fid(O) ivs_edft(O) > ivs_emp(O) ivs_xnet(O) isd_rds(O) isd_idm(O) isd_dft(O) isd_base(O) sdm(O) > isd_cmm(O) isd_ibc(O) isd_lib(O) xve_hab(PO) xve_net(PO) > xve_cls_msg_filter(PO) xve_dscp(PO) pagepool(PO) iod(O) cmm(PO) util(PO) > intel_t10(PO) itest_nid(PO) dmi(PO) bsp_adapter(PO) mpa(O) ipmi_si > ipmi_devintf ipmi_msghandler iscsi_sw(PO) iscsi_prot(O) iscsi_seg(PO) > iscsi_comm(PO) iscsi_initiator(PO) 8192cu(O) pciehp(PO) pcieaer(PO) > pciecore(PO) drvinstallthird(PO) quark(O) sal(O) pmsas(O) foe(O) lfcoe(O) > libfc(O) ib_uverbs(O) ibtgt(O) ib_srpt(O) ib_cm(O) ib_sa(O) mlx4_ib(O) > ib_umad(O) ib_mad(O) mlx4_core(O) ib_core(O) drvtom(O) cxgb4(O) drvtoecore(O) > fcdrv(PO) unflowlevel(PO) unfcommon(O) drvmml(PO) scsi_transport_fc scsi_tgt > memtest(PO) drv_iosubsys_ini(O) iocount(O) bsp_mml(PO) agetty_query(PO) > cpufreq_powersave af_packet nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter > ip6_tables xt_limit xt_tcpudp xt_multiport nf_conntr! > ack_ipv4 > nf_defrag_ipv4 xt_state nf_conntrack usr_cache(O) acpi_cpufreq mperf > processor thermal_sys sg hwmon iptable_filter ip_tables x_tables ixgbe(O) > igb(O) bonding(O) tg(O) netmgmt(O) drvinstall(PO) dal(PO) dca usb_storage(O) > uhci_hcd ehci_hcd usbcore(O) usb_common sata_mml(O) ahci(O) libata(O) > satahp(O) drvframe(PO) sd_mod crc_t10dif scsi_mod agetty_interface(PO) ext3 > jbd mbcache nvram_printk(PO) os_feeddog(PO) os_debug(O) osp_proc(PO) > os_rnvramdev(PO) vos(O) bsp(PO) os_die_handler(O) os_oom_handler(O) > os_panic_handler(O) biosnvramdriver(O) kbox(O) > [2012-03-26 18:55:43][ 929.252460] Pid: 17495, comm: 3th SioT Tainted: P > O 3.4.24.15-0.11-default #1 You have loaded a gazillion of proprietary and out of tree modules and your kernel is tainted 'P'. None of our problems. See: http://lwn.net/1999/0211/a/lt-binary.html https://lwn.net/Articles/287056/ I'm in a good mood today and give you some hints: - Ingos patch is correct and always has been for RT. - We had not a single bug report against this in almost 10 years. - File your bugs to those who abuse our work and violate our license. Case closed. tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
On 03/29/2014 07:35 AM, Yijing Wang wrote: > Hi Sebastian, Hi Yijing, >Thanks for your reply and help to look at it, thanks! > > I also check the tasklet state machine changes, and didn't find > clue for this issue. So I Temporarily reverted Ingo's patch, without > this patch, my test is ok. > > Because this patch does not exist in the latest Linus kernel, so I > have not reported this issue to kernel bugzilla. This patch exists in all -RT releases up to 3.12. If there is an issue with it, it should be solved. If the sched bit set is and you can't get lock later then the tasklet has be to active. Finally, not getting the lock in the tasklet code itself means it is still occupied by the "add-to-the-list" part which actually can't happen according to the code. You said, that you have an eight-way. Is this also NUMA? If so, does this problem happen if you disable NUMA (i.e. run only one NUMA node and use only the memory that is directly attached to the node). > Finally, I would like to thank you again. > > Thanks! > Yijing. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
Hi Sebastian, Thanks for your reply and help to look at it, thanks! I also check the tasklet state machine changes, and didn't find clue for this issue. So I Temporarily reverted Ingo's patch, without this patch, my test is ok. commit 0d9f73fc1e7270a3f8709c59c913408153d9d9f8 Author: Ingo Molnar Date: Tue Nov 29 20:18:22 2011 -0500 tasklet: Prevent tasklets from going into infinite spin in RT Because this patch does not exist in the latest Linus kernel, so I have not reported this issue to kernel bugzilla. Finally, I would like to thank you again. Thanks! Yijing. On 2014/3/29 0:37, Sebastian Andrzej Siewior wrote: > * Yijing Wang | 2014-03-03 17:24:39 [+0800]: > >> [2012-03-26 18:55:43][ 929.252312] WARNING: at kernel/softirq.c:773 >> __tasklet_action+0x51/0x1a0() >> [2012-03-27 03:41:06][ 3647.886005] WARNING: at kernel/softirq.c:773 >> __tasklet_action+0x51/0x1a0() >> [2012-03-27 03:42:04][ 3705.434418] WARNING: at kernel/softirq.c:799 >> __tasklet_action+0xae/0x1a0() > >> FC card hardware ---> FC driver interrupt handler >> ->tasklet_schedule(fc driver tasklet) --->tasklet running, call >> function process FC IO data. >>here will disable FC card interrupt >> here will enable FC card interrupt again > > This looks okay. > >> We found the tasklet state is 0x1(mean state is TASKLET_STATE_SCHED),count >> is 0, before we call tasklet_schedule(). >> So the new tasklet can not add to CPU list. >> >> And I also add some dynamic debug in __tasklet_action(); after the issue >> occur, I open the dynamic debug. >> After we force the hardware reset to interrupt OS, we never found the FC >> driver tasklet running in dmesg(I identify the tasklet by its data). >> I guess the FC tasklet is not in CPU global tasklet list. > You guess correct. > >> I hope somebody can help to look at it. If I missing something, let me know. > > The tasklet is always added to the local cpu, never cross. That list is > always accessed with interrupts off. > With TASKLET_STATE_SCHED set, the next step is to add the task let to > the CPU's tasklet list. This isn't done if TASKLET_STATE_RUN is already > set which means __tasklet_action() is already busy serving the tasklet. > In that case it clears TASKLET_STATE_SCHED and invokes the tasklet > again. > After looking at it for a while I must say I have no idea how you > managed to keep TASKLET_STATE_SCHED set. Further, each time > TASKLET_STATE_RUN is cleared it is always with a cmpxchg() down to zero > which means TASKLET_STATE_SCHED is removed earlier. > That said, triggerring the warning at 773 is the first thing that went > wrong. After it has been added to the list, the TASKLET_STATE_RUN is > cleared again. I have no idea how it managed to remain still on except > that __tasklet_common_schedule() is invoked which is protected by the > SCHED bit… > >> Thanks! >> Yijing. > > Sebastian > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
* Yijing Wang | 2014-03-03 17:24:39 [+0800]: >[2012-03-26 18:55:43][ 929.252312] WARNING: at kernel/softirq.c:773 >__tasklet_action+0x51/0x1a0() >[2012-03-27 03:41:06][ 3647.886005] WARNING: at kernel/softirq.c:773 >__tasklet_action+0x51/0x1a0() >[2012-03-27 03:42:04][ 3705.434418] WARNING: at kernel/softirq.c:799 >__tasklet_action+0xae/0x1a0() >FC card hardware ---> FC driver interrupt handler >->tasklet_schedule(fc driver tasklet) --->tasklet running, call >function process FC IO data. >here will disable FC card interrupt > here will enable FC card interrupt again This looks okay. >We found the tasklet state is 0x1(mean state is TASKLET_STATE_SCHED),count is >0, before we call tasklet_schedule(). >So the new tasklet can not add to CPU list. > >And I also add some dynamic debug in __tasklet_action(); after the issue >occur, I open the dynamic debug. >After we force the hardware reset to interrupt OS, we never found the FC >driver tasklet running in dmesg(I identify the tasklet by its data). >I guess the FC tasklet is not in CPU global tasklet list. You guess correct. >I hope somebody can help to look at it. If I missing something, let me know. The tasklet is always added to the local cpu, never cross. That list is always accessed with interrupts off. With TASKLET_STATE_SCHED set, the next step is to add the task let to the CPU's tasklet list. This isn't done if TASKLET_STATE_RUN is already set which means __tasklet_action() is already busy serving the tasklet. In that case it clears TASKLET_STATE_SCHED and invokes the tasklet again. After looking at it for a while I must say I have no idea how you managed to keep TASKLET_STATE_SCHED set. Further, each time TASKLET_STATE_RUN is cleared it is always with a cmpxchg() down to zero which means TASKLET_STATE_SCHED is removed earlier. That said, triggerring the warning at 773 is the first thing that went wrong. After it has been added to the list, the TASKLET_STATE_RUN is cleared again. I have no idea how it managed to remain still on except that __tasklet_common_schedule() is invoked which is protected by the SCHED bit… >Thanks! >Yijing. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt
Hi list, I found a tasklet related issue in linux-stable-rt 3.4. And after I revert following commit, the test result seems ok(test lasted 40hours). commit 0d9f73fc1e7270a3f8709c59c913408153d9d9f8 Author: Ingo Molnar Date: Tue Nov 29 20:18:22 2011 -0500 tasklet: Prevent tasklets from going into infinite spin in RT I test FC driver IO in this kernel, and after a few hours test, FC IO will abort, I found a lot of tasklet WARNING Call Trace in kernel message,like: [2012-03-26 18:55:43][ 929.252289] [ cut here ] [2012-03-26 18:55:43][ 929.252312] WARNING: at kernel/softirq.c:773 __tasklet_action+0x51/0x1a0() [2012-03-26 18:55:43][ 929.252314] Hardware name: Romley [2012-03-26 18:55:43][ 929.252316] Modules linked in: isd_fid(O) ivs_edft(O) ivs_emp(O) ivs_xnet(O) isd_rds(O) isd_idm(O) isd_dft(O) isd_base(O) sdm(O) isd_cmm(O) isd_ibc(O) isd_lib(O) xve_hab(PO) xve_net(PO) xve_cls_msg_filter(PO) xve_dscp(PO) pagepool(PO) iod(O) cmm(PO) util(PO) intel_t10(PO) itest_nid(PO) dmi(PO) bsp_adapter(PO) mpa(O) ipmi_si ipmi_devintf ipmi_msghandler iscsi_sw(PO) iscsi_prot(O) iscsi_seg(PO) iscsi_comm(PO) iscsi_initiator(PO) 8192cu(O) pciehp(PO) pcieaer(PO) pciecore(PO) drvinstallthird(PO) quark(O) sal(O) pmsas(O) foe(O) lfcoe(O) libfc(O) ib_uverbs(O) ibtgt(O) ib_srpt(O) ib_cm(O) ib_sa(O) mlx4_ib(O) ib_umad(O) ib_mad(O) mlx4_core(O) ib_core(O) drvtom(O) cxgb4(O) drvtoecore(O) fcdrv(PO) unflowlevel(PO) unfcommon(O) drvmml(PO) scsi_transport_fc scsi_tgt memtest(PO) drv_iosubsys_ini(O) iocount(O) bsp_mml(PO) agetty_query(PO) cpufreq_powersave af_packet nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_limit xt_tcpudp xt_multiport nf_conntr! ack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack usr_cache(O) acpi_cpufreq mperf processor thermal_sys sg hwmon iptable_filter ip_tables x_tables ixgbe(O) igb(O) bonding(O) tg(O) netmgmt(O) drvinstall(PO) dal(PO) dca usb_storage(O) uhci_hcd ehci_hcd usbcore(O) usb_common sata_mml(O) ahci(O) libata(O) satahp(O) drvframe(PO) sd_mod crc_t10dif scsi_mod agetty_interface(PO) ext3 jbd mbcache nvram_printk(PO) os_feeddog(PO) os_debug(O) osp_proc(PO) os_rnvramdev(PO) vos(O) bsp(PO) os_die_handler(O) os_oom_handler(O) os_panic_handler(O) biosnvramdriver(O) kbox(O) [2012-03-26 18:55:43][ 929.252460] Pid: 17495, comm: 3th SioT Tainted: P O 3.4.24.15-0.11-default #1 [2012-03-26 18:55:43][ 929.252463] Call Trace: [2012-03-26 18:55:43][ 929.252465][] ? __tasklet_action+0x51/0x1a0 [2012-03-26 18:55:43][ 929.252481] [] warn_slowpath_common+0x7a/0xb0 [2012-03-26 18:55:43][ 929.252486] [] warn_slowpath_null+0x15/0x20 [2012-03-26 18:55:43][ 929.252490] [] __tasklet_action+0x51/0x1a0 [2012-03-26 18:55:43][ 929.252494] [] tasklet_action+0x59/0x60 [2012-03-26 18:55:43][ 929.252498] [] handle_pending_softirqs+0xb0/0x170 [2012-03-26 18:55:43][ 929.252502] [] __do_softirq+0x49/0xa0 [2012-03-26 18:55:43][ 929.252513] [] call_softirq+0x1c/0x30 [2012-03-26 18:55:43][ 929.252519] [] do_softirq+0x65/0xa0 [2012-03-26 18:55:43][ 929.252523] [] irq_exit+0xc5/0xe0 [2012-03-26 18:55:43][ 929.252526] [] do_IRQ+0x64/0xe0 [2012-03-26 18:55:43][ 929.252534] [] common_interrupt+0x6a/0x6a [2012-03-26 18:55:43][ 929.252536][] ? _raw_spin_unlock_irqrestore+0x16/0x30 [2012-03-26 18:55:43][ 929.252565] [] SDM_SDGetDisk+0x64/0x100 [sdm] [2012-03-26 18:55:43][ 929.252575] [] SDM_FRAMESdGetDisk+0x17/0x80 [sdm] [2012-03-26 18:55:43][ 929.252585] [] SDM_ERRAddTimer+0x33/0x370 [sdm] [2012-03-26 18:55:43][ 929.252594] [] SDM_FRAMEErrAddTimer+0x17/0x80 [sdm] [2012-03-26 18:55:43][ 929.252604] [] SDM_SIOQueueReqProcess+0x67/0x7d0 [sdm] [2012-03-26 18:55:43][ 929.252612] [] SDM_SIOQueueThread+0x142/0x310 [sdm] [2012-03-26 18:55:43][ 929.252618] [] kernel_thread_helper+0x4/0x10 [2012-03-26 18:55:43][ 929.252627] [] ? SDM_SIOQueueReqProcess+0x7d0/0x7d0 [sdm] [2012-03-26 18:55:43][ 929.252632] [] ? gs_change+0x13/0x13 [2012-03-26 18:55:43][ 929.252635] ---[ end trace a82addcbe6cbf131 ]--- ...[snip]. [2012-03-27 03:41:06][ 3647.885973] [ cut here ] [2012-03-27 03:41:06][ 3647.886005] WARNING: at kernel/softirq.c:773 __tasklet_action+0x51/0x1a0() [2012-03-27 03:41:06][ 3647.886010] Hardware name: Romley [2012-03-27 03:41:06][ 3647.886012] Modules linked in: isd_fid(O) ivs_edft(O) ivs_emp(O) ivs_xnet(O) isd_rds(O) isd_idm(O) isd_dft(O) isd_base(O) sdm(O) isd_cmm(O) isd_ibc(O) isd_lib(O) xve_hab(PO) xve_net(PO) xve_cls_msg_filter(PO) xve_dscp(PO) pagepool(PO) iod(O) cmm(PO) util(PO) intel_t10(PO) itest_nid(PO) dmi(PO) bsp_adapter(PO) mpa(O) ipmi_si ipmi_devintf ipmi_msghandler iscsi_sw(PO) iscsi_prot(O) iscsi_seg(PO) iscsi_comm(PO) iscsi_initiator(PO) 8192cu(O) pciehp(PO) pcieaer(PO) pciecore(PO) drvinstallthird(PO) quark(O) sal(O) pmsas(O) foe(O) lfcoe(O) libfc(O) ib_uverbs(O) ibtgt(O) ib_srpt(O) ib_cm(O) ib_sa(O) mlx4_ib(O) ib_umad(O) ib_mad(O) mlx4_co
Re: [BUGREPORT] Linux USB 3.0
On Tue, Feb 11, 2014 at 7:45 PM, Greg KH wrote: > On Tue, Feb 11, 2014 at 07:29:47PM +0100, Markus Rechberger wrote: >> On Mon, Feb 10, 2014 at 12:15 AM, Robert Hancock >> wrote: >> > On 08/02/14 03:00 AM, Markus Rechberger wrote: >> >> >> >> On Tue, Feb 4, 2014 at 10:31 AM, David Laight >> >> wrote: >> >>> >> >>> From: Markus Rechberger >> >> >> >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: >> >> ERROR Transfer event TRB DMA >> >> ptr >> > >> > >> > These messages might be harmless. The 3.0 kernel contains a fix for >> > Intel Panther Point xHCI hosts that suppresses those messages, commit >> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious >> > successful event." >> > >> > A later commit extends that to all xHCI 1.0 hosts, commit >> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable >> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was >> > queued for 3.11 and marked to be backported into stable kernels as old >> > as 3.0. >> >>> >> >>> >> >>> I see the same error message on the 0.96 ASMedia controller when >> >>> the rx buffers for the ax88179_178a driver cross 64k boundaries. >> >>> >> >>> So this isn't confined to 1.0 controllers. >> >>> >> >> >> >> Sarah, >> >> >> >> since there is no response yet, is there anyone at Intel dedicated at >> >> working on USB 3.0? >> >> We are also getting more and more negative USB 3.0 feedback with Linux >> > >> > >> > Still nobody appears to have provided the requested debugging information >> > that was requested. So there is not much that can be done upstream to debug >> > things based only on vague reports, especially when not using current >> > kernel >> > versions. >> > >> >> Next kernel crash report, this time a Synology NAS System: >> http://support.sundtek.com/index.php/topic,1511.0.html > > That kernel has a closed source kernel module loaded, no community > member can look at it, sorry, please get support from the company that > wrote that module. > I'm going to collect all XHCI issues we get here as a reference, unfortunately we're busy with our own hardware so we don't have the time to dig into USB 3.0 Kernel issues at the moment. All that can be done is collecting the feedback and maybe help to translate between German and English. So if someone wants to volunteer to fix some issues (eg Intel) just drop me a line. As Sarah indicated there are already several issues mentioned within this post. Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
On Tue, Feb 11, 2014 at 07:29:47PM +0100, Markus Rechberger wrote: > On Mon, Feb 10, 2014 at 12:15 AM, Robert Hancock wrote: > > On 08/02/14 03:00 AM, Markus Rechberger wrote: > >> > >> On Tue, Feb 4, 2014 at 10:31 AM, David Laight > >> wrote: > >>> > >>> From: Markus Rechberger > >> > >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: > >> ERROR Transfer event TRB DMA > > ptr > > > > > > These messages might be harmless. The 3.0 kernel contains a fix for > > Intel Panther Point xHCI hosts that suppresses those messages, commit > > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious > > successful event." > > > > A later commit extends that to all xHCI 1.0 hosts, commit > > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable > > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was > > queued for 3.11 and marked to be backported into stable kernels as old > > as 3.0. > >>> > >>> > >>> I see the same error message on the 0.96 ASMedia controller when > >>> the rx buffers for the ax88179_178a driver cross 64k boundaries. > >>> > >>> So this isn't confined to 1.0 controllers. > >>> > >> > >> Sarah, > >> > >> since there is no response yet, is there anyone at Intel dedicated at > >> working on USB 3.0? > >> We are also getting more and more negative USB 3.0 feedback with Linux > > > > > > Still nobody appears to have provided the requested debugging information > > that was requested. So there is not much that can be done upstream to debug > > things based only on vague reports, especially when not using current kernel > > versions. > > > > Next kernel crash report, this time a Synology NAS System: > http://support.sundtek.com/index.php/topic,1511.0.html That kernel has a closed source kernel module loaded, no community member can look at it, sorry, please get support from the company that wrote that module. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
Markus Rechberger writes: > Next kernel crash report, this time a Synology NAS System: > http://support.sundtek.com/index.php/topic,1511.0.html There is no etxhci_hcd driver in the mainline kernel... Feb 11 18:50:41 DiskStation kernel: [103740.405521] Backtrace: Feb 11 18:50:41 DiskStation kernel: [103740.408095] [<7f2d8f2c>] (find_trb_seg+0x0/0x54 [etxhci_hcd]) from [<7f2d9ac0>] (etxhci_find_new_dequeue_state+0x5c/0x200 [etxhci_hcd]) Feb 11 18:50:41 DiskStation kernel: [103740.420389] r4:9675fd44 Feb 11 18:50:41 DiskStation kernel: [103740.423046] [<7f2d9a64>] (etxhci_find_new_dequeue_state+0x0/0x200 [etxhci_hcd]) from [<7f2d4520>] (etxhci_cleanup_stalled_ring+0x50/0x140 [etxhci_hcd]) Feb 11 18:50:41 DiskStation kernel: [103740.436749] [<7f2d44d0>] (etxhci_cleanup_stalled_ring+0x0/0x140 [etxhci_hcd]) from [<7f2d46e0>] (etxhci_endpoint_reset+0xd0/0x100 [etxhci_hcd]) Feb 11 18:50:41 DiskStation kernel: [103740.449738] r7:bc0e9830 r6:965b6360 r5:bc0e9800 r4:be34cc00 Feb 11 18:50:41 DiskStation kernel: [103740.455595] [<7f2d4610>] (etxhci_endpoint_reset+0x0/0x100 [etxhci_hcd]) from [<7f086c00>] (usb_hcd_reset_endpoint+0x2c/0x80 [usbcore]) Feb 11 18:50:41 DiskStation kernel: [103740.467837] [<7f086bd4>] (usb_hcd_reset_endpoint+0x0/0x80 [usbcore]) from [<7f088ff0>] (usb_enable_endpoint+0x70/0x74 [usbcore]) Feb 11 18:50:41 DiskStation kernel: [103740.479558] [<7f088f80>] (usb_enable_endpoint+0x0/0x74 [usbcore]) from [<7f08903c>] (usb_enable_interface+0x48/0x5c [usbcore]) Feb 11 18:50:41 DiskStation kernel: [103740.491066] r8:0001 r7:be34cc00 r6:be8ff368 r5:0001 r4:002c Feb 11 18:50:41 DiskStation kernel: [103740.497750] r3:0001 Feb 11 18:50:41 DiskStation kernel: [103740.500522] [<7f088ff4>] (usb_enable_interface+0x0/0x5c [usbcore]) from [<7f08942c>] (usb_set_interface+0x1c8/0x22c [usbcore]) Feb 11 18:50:41 DiskStation kernel: [103740.512032] r8:be04c860 r7:bdc4d600 r6: r5:be8ff368 r4:be34cc00 Feb 11 18:50:41 DiskStation kernel: [103740.518715] r3:be8ff368 Feb 11 18:50:41 DiskStation kernel: [103740.521495] [<7f089264>] (usb_set_interface+0x0/0x22c [usbcore]) from [<7f0902c0>] (usbdev_ioctl+0xf40/0x1cac [usbcore]) Feb 11 18:50:41 DiskStation kernel: [103740.532512] [<7f08f380>] (usbdev_ioctl+0x0/0x1cac [usbcore]) from [<800db114>] (do_vfs_ioctl+0xa8/0x8bc) Feb 11 18:50:41 DiskStation kernel: [103740.542109] [<800db06c>] (do_vfs_ioctl+0x0/0x8bc) from [<800db968>] (sys_ioctl+0x40/0x64) Feb 11 18:50:41 DiskStation kernel: [103740.550396] r9:9675e000 r8:8000e388 r7:0017 r6:80085504 r5:2f4f54f4 Feb 11 18:50:41 DiskStation kernel: [103740.557079] r4:bc10b540 Feb 11 18:50:41 DiskStation kernel: [103740.559822] [<800db928>] (sys_ioctl+0x0/0x64) from [<8000e1e0>] (ret_fast_syscall+0x0/0x30) Feb 11 18:50:41 DiskStation kernel: [103740.568282] r7:0036 r6: r5: r4:2f4f64d8 Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
On Mon, Feb 10, 2014 at 12:15 AM, Robert Hancock wrote: > On 08/02/14 03:00 AM, Markus Rechberger wrote: >> >> On Tue, Feb 4, 2014 at 10:31 AM, David Laight >> wrote: >>> >>> From: Markus Rechberger >> >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: >> ERROR Transfer event TRB DMA ptr > > > These messages might be harmless. The 3.0 kernel contains a fix for > Intel Panther Point xHCI hosts that suppresses those messages, commit > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious > successful event." > > A later commit extends that to all xHCI 1.0 hosts, commit > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was > queued for 3.11 and marked to be backported into stable kernels as old > as 3.0. >>> >>> >>> I see the same error message on the 0.96 ASMedia controller when >>> the rx buffers for the ax88179_178a driver cross 64k boundaries. >>> >>> So this isn't confined to 1.0 controllers. >>> >> >> Sarah, >> >> since there is no response yet, is there anyone at Intel dedicated at >> working on USB 3.0? >> We are also getting more and more negative USB 3.0 feedback with Linux > > > Still nobody appears to have provided the requested debugging information > that was requested. So there is not much that can be done upstream to debug > things based only on vague reports, especially when not using current kernel > versions. > Next kernel crash report, this time a Synology NAS System: http://support.sundtek.com/index.php/topic,1511.0.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
On 08/02/14 03:00 AM, Markus Rechberger wrote: On Tue, Feb 4, 2014 at 10:31 AM, David Laight wrote: From: Markus Rechberger Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr These messages might be harmless. The 3.0 kernel contains a fix for Intel Panther Point xHCI hosts that suppresses those messages, commit ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious successful event." A later commit extends that to all xHCI 1.0 hosts, commit 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was queued for 3.11 and marked to be backported into stable kernels as old as 3.0. I see the same error message on the 0.96 ASMedia controller when the rx buffers for the ax88179_178a driver cross 64k boundaries. So this isn't confined to 1.0 controllers. Sarah, since there is no response yet, is there anyone at Intel dedicated at working on USB 3.0? We are also getting more and more negative USB 3.0 feedback with Linux Still nobody appears to have provided the requested debugging information that was requested. So there is not much that can be done upstream to debug things based only on vague reports, especially when not using current kernel versions. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
The next one, just today (unfortunately it's in German): http://support.sundtek.com/index.php/topic,1505.msg11020.html#msg11020 This guy is using Ubuntu with Linux 3.13.0-8-generic The system seems to freeze completely after some time. Since the driver is using the usbdevfs interface the problem is in the usbcore. On Sat, Feb 8, 2014 at 10:00 AM, Markus Rechberger wrote: > On Tue, Feb 4, 2014 at 10:31 AM, David Laight wrote: >> From: Markus Rechberger >>> >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: >>> >> ERROR Transfer event TRB DMA >>> ptr >>> > >>> > These messages might be harmless. The 3.0 kernel contains a fix for >>> > Intel Panther Point xHCI hosts that suppresses those messages, commit >>> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious >>> > successful event." >>> > >>> > A later commit extends that to all xHCI 1.0 hosts, commit >>> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable >>> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was >>> > queued for 3.11 and marked to be backported into stable kernels as old >>> > as 3.0. >> >> I see the same error message on the 0.96 ASMedia controller when >> the rx buffers for the ax88179_178a driver cross 64k boundaries. >> >> So this isn't confined to 1.0 controllers. >> > > Sarah, > > since there is no response yet, is there anyone at Intel dedicated at > working on USB 3.0? > We are also getting more and more negative USB 3.0 feedback with Linux > > Best Regards, > Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
On Tue, Feb 4, 2014 at 10:31 AM, David Laight wrote: > From: Markus Rechberger >> >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: >> >> ERROR Transfer event TRB DMA >> ptr >> > >> > These messages might be harmless. The 3.0 kernel contains a fix for >> > Intel Panther Point xHCI hosts that suppresses those messages, commit >> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious >> > successful event." >> > >> > A later commit extends that to all xHCI 1.0 hosts, commit >> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable >> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was >> > queued for 3.11 and marked to be backported into stable kernels as old >> > as 3.0. > > I see the same error message on the 0.96 ASMedia controller when > the rx buffers for the ax88179_178a driver cross 64k boundaries. > > So this isn't confined to 1.0 controllers. > Sarah, since there is no response yet, is there anyone at Intel dedicated at working on USB 3.0? We are also getting more and more negative USB 3.0 feedback with Linux Best Regards, Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [BUGREPORT] Linux USB 3.0
From: Markus Rechberger > >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: ERROR > >> Transfer event TRB DMA > ptr > > > > These messages might be harmless. The 3.0 kernel contains a fix for > > Intel Panther Point xHCI hosts that suppresses those messages, commit > > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious > > successful event." > > > > A later commit extends that to all xHCI 1.0 hosts, commit > > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable > > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was > > queued for 3.11 and marked to be backported into stable kernels as old > > as 3.0. I see the same error message on the 0.96 ASMedia controller when the rx buffers for the ax88179_178a driver cross 64k boundaries. So this isn't confined to 1.0 controllers. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGREPORT] Linux USB 3.0
Hi Sarah, On Mon, Jan 20, 2014 at 8:35 PM, Sarah Sharp wrote: > Hi Markus, > > I'm the xHCI driver maintainer, and it helps to Cc me on USB 3.0 bug > reports. > > On Sat, Dec 28, 2013 at 07:24:20AM +0100, Markus Rechberger wrote: >> just received following log snippset: > > Please state which kernel version you (or your customer) is running. > You've reported issues with several different kernel versions, so which > kernel are you running for this particular snippet? > >> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: ERROR >> Transfer event TRB DMA ptr > > These messages might be harmless. The 3.0 kernel contains a fix for > Intel Panther Point xHCI hosts that suppresses those messages, commit > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious > successful event." > > A later commit extends that to all xHCI 1.0 hosts, commit > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was > queued for 3.11 and marked to be backported into stable kernels as old > as 3.0. > >> the previous bug report of that user: >> https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze > > Hmm, Greg didn't assign that bug to me, so I missed it, sorry. > >> On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger >> wrote: >> > Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately >> > we don't have such a board for testing and customer patience is >> > limited to bisect the kernel. >> > >> > Does anyone have a clue what modification could have killed USB 3.0 >> > support within those releases? >> > It does not seem to be SG support. > > 3.2 was the kernel where the Intel EHCI to xHCI port switchover code > went in. Without that code, all ports will remain under the EHCI host, > and USB 3.0 devices will work at USB 2.0 speeds. I suspect the USB > device triggers an issue with the xHCI driver, and 3.2 only works > because the device is on an EHCI port without the switchover code. > >> > On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger >> > wrote: >> >> I just got another USB 3.0 bugreport, the entire system crashed. That >> >> particular customer already filed a bugreport in November 2013 that >> >> his system is in a bad state when using some USB 2.0 media devices >> >> which even have opensource drivers built into the kernel. >> >> >> >> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12. >> >> The affected board is an Intel DH87RL board. > > Why are they running 3.6.12 in particular? That's not a supported > stable kernel. > our customers are using any kind of linux kernel. The drivers are using USBFS (devio.c) for interfacing with USB. It seems like you are in contact with one customer who is using the DH87RL board. Just today we got another one in our forum using 3.12.9-2-ARCH. Also Synology NAS users seem to be affected by the USB 2.0 through USB 3.0 issue. >> >> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger >> >> wrote: >> >>> A customer using a device with USBDEVFS is reporting following >> >>> backtrace (it seems to be a rather generic issue related to linux usb >> >>> 3.0 in general): >> >>> According to him this problem is reproducible as soon as he starts the >> >>> data transfer, is there anything known about that? >> >>> >> >>> He is using 3.12.0-031200-generic > > So at this point you've reported three separate bugs, all with the same > symptom, but different kernel versions? Are these all from the same bug > reporter, or a different bug reporter? > > You've got me seriously confused right now. Please keep one bug report > to one mail thread, and get the original bug reporter to start that > thread. If this is from one bug reporter, please state the current > kernel they are running, and send dmesg showing the issue with > CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on (you may > also need to turn on CONFIG_DYNAMIC_DEBUG in later kernels). Please > attach the dmesg as a file, since your mail client line-wraps. > >> >>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: >> >>> ERROR Transfer event TRB DMA ptr not part of current TD >> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >> >>> ERROR Transfer event TRB DMA ptr not part of current TD >> >>> D
Re: [BUGREPORT] Linux USB 3.0
Hi Markus, I'm the xHCI driver maintainer, and it helps to Cc me on USB 3.0 bug reports. On Sat, Dec 28, 2013 at 07:24:20AM +0100, Markus Rechberger wrote: > just received following log snippset: Please state which kernel version you (or your customer) is running. You've reported issues with several different kernel versions, so which kernel are you running for this particular snippet? > Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: ERROR > Transfer event TRB DMA ptr These messages might be harmless. The 3.0 kernel contains a fix for Intel Panther Point xHCI hosts that suppresses those messages, commit ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious successful event." A later commit extends that to all xHCI 1.0 hosts, commit 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was queued for 3.11 and marked to be backported into stable kernels as old as 3.0. > the previous bug report of that user: > https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze Hmm, Greg didn't assign that bug to me, so I missed it, sorry. > On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger > wrote: > > Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately > > we don't have such a board for testing and customer patience is > > limited to bisect the kernel. > > > > Does anyone have a clue what modification could have killed USB 3.0 > > support within those releases? > > It does not seem to be SG support. 3.2 was the kernel where the Intel EHCI to xHCI port switchover code went in. Without that code, all ports will remain under the EHCI host, and USB 3.0 devices will work at USB 2.0 speeds. I suspect the USB device triggers an issue with the xHCI driver, and 3.2 only works because the device is on an EHCI port without the switchover code. > > On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger > > wrote: > >> I just got another USB 3.0 bugreport, the entire system crashed. That > >> particular customer already filed a bugreport in November 2013 that > >> his system is in a bad state when using some USB 2.0 media devices > >> which even have opensource drivers built into the kernel. > >> > >> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12. > >> The affected board is an Intel DH87RL board. Why are they running 3.6.12 in particular? That's not a supported stable kernel. > >> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger > >> wrote: > >>> A customer using a device with USBDEVFS is reporting following > >>> backtrace (it seems to be a rather generic issue related to linux usb > >>> 3.0 in general): > >>> According to him this problem is reproducible as soon as he starts the > >>> data transfer, is there anything known about that? > >>> > >>> He is using 3.12.0-031200-generic So at this point you've reported three separate bugs, all with the same symptom, but different kernel versions? Are these all from the same bug reporter, or a different bug reporter? You've got me seriously confused right now. Please keep one bug report to one mail thread, and get the original bug reporter to start that thread. If this is from one bug reporter, please state the current kernel they are running, and send dmesg showing the issue with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on (you may also need to turn on CONFIG_DYNAMIC_DEBUG in later kernels). Please attach the dmesg as a file, since your mail client line-wraps. > >>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: > >>> ERROR Transfer event TRB DMA ptr not part of current TD > >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > >>> ERROR Transfer event TRB DMA ptr not part of current TD > >>> Dec 24 14:30:39 homenas kernel: last message repeated 16 times > >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > >>> WARN Successful completion on short TX > >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > >>> WARN Successful completion on short TX > >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: URB > >>> transfer length is wrong, xHC issue? req. len = 46080, act. len = 1382400 > >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle > >>> kernel NULL pointer dereference at 0004 > >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] > >>
Re: [BUGREPORT] Linux USB 3.0
just received following log snippset: Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.177695] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.217966] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.277473] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.317753] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.377242] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.417514] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.477000] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.517279] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.576761] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.617074] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.676581] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.716852] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.776340] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:50 solist kernel: [ 36.816589] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr Dec 27 23:23:51 solist kernel: [ 36.876117] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr the previous bug report of that user: https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger wrote: > Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately > we don't have such a board for testing and customer patience is > limited to bisect the kernel. > > Does anyone have a clue what modification could have killed USB 3.0 > support within those releases? > It does not seem to be SG support. > > On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger > wrote: >> I just got another USB 3.0 bugreport, the entire system crashed. That >> particular customer already filed a bugreport in November 2013 that >> his system is in a bad state when using some USB 2.0 media devices >> which even have opensource drivers built into the kernel. >> >> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12. >> The affected board is an Intel DH87RL board. >> >> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger >> wrote: >>> A customer using a device with USBDEVFS is reporting following >>> backtrace (it seems to be a rather generic issue related to linux usb >>> 3.0 in general): >>> According to him this problem is reproducible as soon as he starts the >>> data transfer, is there anything known about that? >>> >>> He is using 3.12.0-031200-generic >>> >>> >>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: >>> ERROR Transfer event TRB DMA ptr not part of current TD >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >>> ERROR Transfer event TRB DMA ptr not part of current TD >>> >>> Dec 24 14:30:39 homenas kernel: last message repeated 16 times >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >>> WARN Successful completion on short TX >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >>> WARN Successful completion on short TX >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >>> URB transfer length is wrong, xHC issue? req. len = 46080, act. len = >>> 1382400 >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle >>> kernel NULL pointer dereference at 0004 >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250 >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0 >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops: [#1] SMP >>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in: >>> videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) >>> vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec >>> snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh >>> snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq >>> snd_timer snd_seq_device lpc_ich sn
Re: [BUGREPORT] Linux USB 3.0
Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately we don't have such a board for testing and customer patience is limited to bisect the kernel. Does anyone have a clue what modification could have killed USB 3.0 support within those releases? It does not seem to be SG support. On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger wrote: > I just got another USB 3.0 bugreport, the entire system crashed. That > particular customer already filed a bugreport in November 2013 that > his system is in a bad state when using some USB 2.0 media devices > which even have opensource drivers built into the kernel. > > USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12. > The affected board is an Intel DH87RL board. > > On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger > wrote: >> A customer using a device with USBDEVFS is reporting following >> backtrace (it seems to be a rather generic issue related to linux usb >> 3.0 in general): >> According to him this problem is reproducible as soon as he starts the >> data transfer, is there anything known about that? >> >> He is using 3.12.0-031200-generic >> >> >> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: >> ERROR Transfer event TRB DMA ptr not part of current TD >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >> ERROR Transfer event TRB DMA ptr not part of current TD >> >> Dec 24 14:30:39 homenas kernel: last message repeated 16 times >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >> WARN Successful completion on short TX >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >> WARN Successful completion on short TX >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: >> URB transfer length is wrong, xHC issue? req. len = 46080, act. len = >> 1382400 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle >> kernel NULL pointer dereference at 0004 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops: [#1] SMP >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in: >> videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) >> vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec >> snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh >> snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq >> snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore >> snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf >> hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport >> sunrpc raid10 raid456 async_pq async_xor async_memcpy >> async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor >> libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi >> firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase >> wmi scsi_transport_sas raid_class [last unloaded: vmnet] >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm: >> swapper/0 Tainted: GF O 3.12.0-031200-generic >> #201311031935 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be >> Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30 >> 01/28/2013 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: 81c144a0 >> ti: 81c0 task.ti: 81c0 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[] [] >> finish_td+0x13f/0x250 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP: >> 0018:88102fc03ca8 EFLAGS: 00010046 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: 880f865d2b10 >> RBX: 880f865d2b00 RCX: 0006 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: 880f865d2b10 >> RSI: 0007 RDI: 0046 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: 88102fc03d08 >> R08: 000a R09: >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 06fd >> R11: 06fc R12: 880fd2de >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: 880fd32b1780 >> R14: R15: 880fd5c5f000 >> >> Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS: >> () GS:88102fc0() >>
Re: [BUGREPORT] Linux USB 3.0
I just got another USB 3.0 bugreport, the entire system crashed. That particular customer already filed a bugreport in November 2013 that his system is in a bad state when using some USB 2.0 media devices which even have opensource drivers built into the kernel. USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12. The affected board is an Intel DH87RL board. On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger wrote: > A customer using a device with USBDEVFS is reporting following > backtrace (it seems to be a rather generic issue related to linux usb > 3.0 in general): > According to him this problem is reproducible as soon as he starts the > data transfer, is there anything known about that? > > He is using 3.12.0-031200-generic > > > Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: > ERROR Transfer event TRB DMA ptr not part of current TD > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > ERROR Transfer event TRB DMA ptr not part of current TD > > Dec 24 14:30:39 homenas kernel: last message repeated 16 times > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > WARN Successful completion on short TX > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > WARN Successful completion on short TX > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: > URB transfer length is wrong, xHC issue? req. len = 46080, act. len = > 1382400 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle > kernel NULL pointer dereference at 0004 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops: [#1] SMP > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in: > videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) > vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec > snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh > snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq > snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore > snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf > hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport > sunrpc raid10 raid456 async_pq async_xor async_memcpy > async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor > libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi > firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase > wmi scsi_transport_sas raid_class [last unloaded: vmnet] > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm: > swapper/0 Tainted: GF O 3.12.0-031200-generic > #201311031935 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be > Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30 > 01/28/2013 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: 81c144a0 > ti: 81c0 task.ti: 81c0 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[] [] > finish_td+0x13f/0x250 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP: > 0018:88102fc03ca8 EFLAGS: 00010046 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: 880f865d2b10 > RBX: 880f865d2b00 RCX: 0006 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: 880f865d2b10 > RSI: 0007 RDI: 0046 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: 88102fc03d08 > R08: 000a R09: > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 06fd > R11: 06fc R12: 880fd2de > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: 880fd32b1780 > R14: R15: 880fd5c5f000 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS: > () GS:88102fc0() > knlGS: > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] CS: 0010 DS: ES: > CR0: 80050033 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] CR2: 0004 > CR3: 01c0d000 CR4: 000407f0 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] Stack: > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] 88102fc03ce8 > 880fd0bc8000 88102fc03d00 880fd268d1a0 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] 88102fc03df4 > 00010002 880fd32b1780 880f865d2b00 > > Dec 24 14:30:39 homenas kernel: [ 1469.822450] 880fd268d1a0 > 880fd5c5f000 880fd2de 880fd2c497b0 > > Dec 24 14:30:39 home
[BUGREPORT] Linux USB 3.0
A customer using a device with USBDEVFS is reporting following backtrace (it seems to be a rather generic issue related to linux usb 3.0 in general): According to him this problem is reproducible as soon as he starts the data transfer, is there anything known about that? He is using 3.12.0-031200-generic Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Dec 24 14:30:39 homenas kernel: last message repeated 16 times Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: WARN Successful completion on short TX Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: WARN Successful completion on short TX Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: URB transfer length is wrong, xHC issue? req. len = 46080, act. len = 1382400 Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle kernel NULL pointer dereference at 0004 Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250 Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops: [#1] SMP Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in: videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport sunrpc raid10 raid456 async_pq async_xor async_memcpy async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase wmi scsi_transport_sas raid_class [last unloaded: vmnet] Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF O 3.12.0-031200-generic #201311031935 Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30 01/28/2013 Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: 81c144a0 ti: 81c0 task.ti: 81c0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[] [] finish_td+0x13f/0x250 Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP: 0018:88102fc03ca8 EFLAGS: 00010046 Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: 880f865d2b10 RBX: 880f865d2b00 RCX: 0006 Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: 880f865d2b10 RSI: 0007 RDI: 0046 Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: 88102fc03d08 R08: 000a R09: Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 06fd R11: 06fc R12: 880fd2de Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: 880fd32b1780 R14: R15: 880fd5c5f000 Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS: () GS:88102fc0() knlGS: Dec 24 14:30:39 homenas kernel: [ 1469.822450] CS: 0010 DS: ES: CR0: 80050033 Dec 24 14:30:39 homenas kernel: [ 1469.822450] CR2: 0004 CR3: 01c0d000 CR4: 000407f0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] Stack: Dec 24 14:30:39 homenas kernel: [ 1469.822450] 88102fc03ce8 880fd0bc8000 88102fc03d00 880fd268d1a0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] 88102fc03df4 00010002 880fd32b1780 880f865d2b00 Dec 24 14:30:39 homenas kernel: [ 1469.822450] 880fd268d1a0 880fd5c5f000 880fd2de 880fd2c497b0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] Call Trace: Dec 24 14:30:39 homenas kernel: [ 1469.822450] Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] process_bulk_intr_td+0x116/0x2d0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] handle_tx_event+0x656/0xb50 Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] ? __queue_work+0x3b0/0x3c0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] ? call_timer_fn+0x46/0x160 Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] xhci_handle_event+0x1db/0x2a0 Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] ? run_timer_softirq+0x1b2/0x300 Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] xhci_irq+0x120/0x1f0 Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] xhci_msi_irq+0x11/0x20 Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] handle_irq_event_percpu+0x5d/0x210 Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] handle_irq_event+0x48/0x70 Dec 24 14:30:39 homenas kernel: [ 1470.31207
[perf bugreport] perf doesn't delete /tmp/perf-vdso.so.* file on exit
Perf doesn't properly clean up /tmp/perf-vdso.so-XX on exit. So these files keep accumulating in /tmp every time perf is run. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration
Sergei Shtylyov wrote: >>> You left powerpc-lite.patch broken with this change as it has >>> multiple calls to kgdb8250_add_port()... > >> I see. But I wonder if there ever was a real need for these hooks (in >> 2.4 times?): If I look at bamboo_early_serial_map() e.g., I find it >> calling into early_serial_setup() which fills serial8250_ports[] - and >> that content is now retrieved via serial8250_get_port_def() when we >> parse the runtime or build-time provided parameters (port number & >> baudrate). > >Of course. But now the kgdb8250_add_port() calls need to be removed. For sure. Most arch patches need to go through some refactoring anyway when preparing them for upstream. Cleaning up no longer required hooks should be no problem at this chance. If you want to accelerate this process, please check out Jason's linux-2.6-kgdb.git for 2.6.25 and start rebasing the powerpc patch. He just recently said that support around kgdb for non-x86 would be highly welcome. And if you stumble over ppc-related issues that cannot be solved with latest kgdb design, please let us know. The sooner, the better. Jan signature.asc Description: OpenPGP digital signature
Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration
Hello. Jan Kiszka wrote: Sorry, previous version was missing some __init[data] attributes which were dropped in an intermediate stage. Here comes an updated patch: <---snip---> This major refactoring of the quite complex kgdb8250 configuration does the following: - ensures that static configurations according to SERIAL_PORT_DFNS are always loaded first - tries to pull more accurate configuration via serial8250_get_port_def if simple-config is used - detects empty/invalid simple-configs - enforces KGDB_PORT_NUM <= SERIAL_8250_NR_UARTS at kconfig level - removes kgdb8250_add_port and its hook in serial_core (calling serial8250_get_port_def in demand should provide us the same information) You left powerpc-lite.patch broken with this change as it has multiple calls to kgdb8250_add_port()... I see. But I wonder if there ever was a real need for these hooks (in 2.4 times?): If I look at bamboo_early_serial_map() e.g., I find it calling into early_serial_setup() which fills serial8250_ports[] - and that content is now retrieved via serial8250_get_port_def() when we parse the runtime or build-time provided parameters (port number & baudrate). Of course. But now the kgdb8250_add_port() calls need to be removed. Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH] beautification of debugger_active usage
Jan Kiszka wrote: > Just a beautification of using debugger_active for checking the debugger > state. > > Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> > > --- > arch/x86/kernel/kgdb.c |6 +++--- > include/linux/kgdb.h |7 ++- > kernel/kgdb.c |8 > kernel/sched.c |7 +-- > 4 files changed, 14 insertions(+), 14 deletions(-) > > committed to: http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=kgdb_2.6.25 Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration
Sergei Shtylyov wrote: > Hello. > > Jan Kiszka wrote: > >> Sorry, previous version was missing some __init[data] attributes which >> were dropped in an intermediate stage. Here comes an updated patch: > >> <---snip---> > >> This major refactoring of the quite complex kgdb8250 configuration does >> the following: > >> - ensures that static configurations according to SERIAL_PORT_DFNS are >>always loaded first >> - tries to pull more accurate configuration via serial8250_get_port_def >>if simple-config is used >> - detects empty/invalid simple-configs >> - enforces KGDB_PORT_NUM <= SERIAL_8250_NR_UARTS at kconfig level >> - removes kgdb8250_add_port and its hook in serial_core (calling >>serial8250_get_port_def in demand should provide us the same >>information) > >You left powerpc-lite.patch broken with this change as it has > multiple calls to kgdb8250_add_port()... I see. But I wonder if there ever was a real need for these hooks (in 2.4 times?): If I look at bamboo_early_serial_map() e.g., I find it calling into early_serial_setup() which fills serial8250_ports[] - and that content is now retrieved via serial8250_get_port_def() when we parse the runtime or build-time provided parameters (port number & baudrate). > >> Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> > >> Index: b/drivers/serial/serial_core.c >> === >> --- a/drivers/serial/serial_core.c >> +++ b/drivers/serial/serial_core.c > [...] >> @@ -2370,12 +2369,6 @@ int uart_add_one_port(struct uart_driver >> */ >> port->flags &= ~UPF_DEAD; >> >> -#if defined(CONFIG_KGDB_8250) >> -/* Add any 8250-like ports we find later. */ >> -if (port->type <= PORT_MAX_8250) >> -kgdb8250_add_port(port->line, port); >> -#endif >> - > >I'm afraid this wasn't correct from the very start since this can add > ports with .iotype that 8250_kgdb.c does not support. So, nothing to > regret here... I think a lot of cruft piled up in the kgdb patches over their long life. :) Thanks for your feedback! Jan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration
Hello. Jan Kiszka wrote: Sorry, previous version was missing some __init[data] attributes which were dropped in an intermediate stage. Here comes an updated patch: <---snip---> This major refactoring of the quite complex kgdb8250 configuration does the following: - ensures that static configurations according to SERIAL_PORT_DFNS are always loaded first - tries to pull more accurate configuration via serial8250_get_port_def if simple-config is used - detects empty/invalid simple-configs - enforces KGDB_PORT_NUM <= SERIAL_8250_NR_UARTS at kconfig level - removes kgdb8250_add_port and its hook in serial_core (calling serial8250_get_port_def in demand should provide us the same information) You left powerpc-lite.patch broken with this change as it has multiple calls to kgdb8250_add_port()... Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> Index: b/drivers/serial/serial_core.c === --- a/drivers/serial/serial_core.c +++ b/drivers/serial/serial_core.c [...] @@ -2370,12 +2369,6 @@ int uart_add_one_port(struct uart_driver */ port->flags &= ~UPF_DEAD; -#if defined(CONFIG_KGDB_8250) - /* Add any 8250-like ports we find later. */ - if (port->type <= PORT_MAX_8250) - kgdb8250_add_port(port->line, port); -#endif - I'm afraid this wasn't correct from the very start since this can add ports with .iotype that 8250_kgdb.c does not support. So, nothing to regret here... WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init
Jan Kiszka wrote: > George Anzinger wrote: >> On 01/31/2008 01:36 AM, Jan Kiszka was caught saying: >>> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in >>> parse_early_param as well? It should, because my under standing of >>> trap_init is that it's the functions to arm things like... exception >>> handlers? And that raises the question of the deeper purpose of this >>> check (and the invocation of kgdb_early_init from the argument parsing >>> function). Sigh, KGDB is still a quite improvable piece of code. >> Likely. Once you get it in the main line kernel, one would hope that >> other arch code would be forth coming as many more "eyes" will be in play. > > Meanwhile I realized that there is early_trap_init - for x86-32 only! I > assume now we are only lacking the same for x86-64 to get kgdb running > there already during early_param-parsing. Looks like that was the key. Thanks for pointing me at this, George. Here the updated patch: --snip--- This cleans up the early entry of kgdb. It introduces early_trap_init for x86-64, reloads the idt register also in the 32-bit variant, removes the now unneeded EXCEPTION_STACK_READY construction, and matures the init-state machine of kgdb. Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> --- arch/x86/kernel/setup_64.c |4 +++ arch/x86/kernel/traps_32.c |3 +- arch/x86/kernel/traps_64.c | 10 - include/asm-x86/kgdb.h |3 -- include/linux/kgdb.h |7 +- kernel/kgdb.c | 47 +++-- 6 files changed, 37 insertions(+), 37 deletions(-) Index: b/arch/x86/kernel/traps_32.c === --- a/arch/x86/kernel/traps_32.c +++ b/arch/x86/kernel/traps_32.c @@ -1137,12 +1137,13 @@ asmlinkage void math_emulate(long arg) #endif /* CONFIG_MATH_EMULATION */ -/* Some traps need to be set early. */ +/* Set of traps needed for early debugging. */ void __init early_trap_init(void) { set_intr_gate(1, &debug); set_system_intr_gate(3, &int3); /* int3 can be called from all */ set_intr_gate(14, &page_fault); + load_idt(&idt_descr); } void __init trap_init(void) Index: b/arch/x86/kernel/traps_64.c === --- a/arch/x86/kernel/traps_64.c +++ b/arch/x86/kernel/traps_64.c @@ -1129,6 +1129,15 @@ asmlinkage void math_state_restore(void) } EXPORT_SYMBOL_GPL(math_state_restore); +/* Set of traps needed for early debugging. */ +void __init early_trap_init(void) +{ + set_intr_gate(1, &debug); + set_intr_gate(3, &int3); + set_intr_gate(14, &page_fault); + load_idt((const struct desc_ptr *)&idt_descr); +} + void __init trap_init(void) { set_intr_gate(0,÷_error); @@ -1145,7 +1154,6 @@ void __init trap_init(void) set_intr_gate(11,&segment_not_present); set_intr_gate_ist(12,&stack_segment,STACKFAULT_STACK); set_intr_gate(13,&general_protection); - set_intr_gate(14,&page_fault); set_intr_gate(15,&spurious_interrupt_bug); set_intr_gate(16,&coprocessor_error); set_intr_gate(17,&alignment_check); Index: b/include/linux/kgdb.h === --- a/include/linux/kgdb.h +++ b/include/linux/kgdb.h @@ -43,7 +43,8 @@ extern struct task_struct *kgdb_contthre enum kgdb_initstate { KGDB_UNINITIALIZED = 0, - KGDB_SEMI_INITIALIZED, + KGDB_ARCH_INITIALIZED, + KGDB_DELAYED_CONNECTION, KGDB_FULLY_INITIALIZED }; @@ -290,10 +291,6 @@ int kgdb_nmihook(int cpu, void *regs); extern int debugger_step; extern atomic_tdebugger_active; -#ifndef EXCEPTION_STACK_READY -# define EXCEPTION_STACK_READY() 1 -#endif - #else /* !CONFIG_KGDB */ static const atomic_t debugger_active = ATOMIC_INIT(0); #endif /* !CONFIG_KGDB */ Index: b/kernel/kgdb.c === --- a/kernel/kgdb.c +++ b/kernel/kgdb.c @@ -2104,6 +2104,12 @@ void kgdb_unregister_io_module(struct kg } EXPORT_SYMBOL_GPL(kgdb_unregister_io_module); +static void __init kgdb_initial_breakpoint(void) +{ + printk(KERN_CRIT "kgdb: Waiting for connection from remote gdb...\n"); + breakpoint(); +} + /* * This function can be called very early, either via early_param() or * an explicit breakpoint() early on. @@ -2112,25 +2118,15 @@ static void __init kgdb_early_entry(void { /* Let the architecture do any setup that it needs to. */ kgdb_arch_init(); - - /* -* Don't try and do anything until the architecture is able to -* setup the exception stack. In this case, it is up to the -* architecture to hook in and look at us when they are ready. -*/ - if (!EXCEPTION_STACK_READY()) { - kgdb_state = KGDB_SEMI_INITIALIZED; - /* any ki
Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init
George Anzinger wrote: > On 01/31/2008 01:36 AM, Jan Kiszka was caught saying: >> Jan Kiszka wrote: >>> George Anzinger wrote: On 01/30/2008 04:08 PM, Jan Kiszka was caught saying: > [Here comes a rebased version against latest x86/mm] > > In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up > and connect to the front-end already during early_param evaluation. > This > fails on x86 as the exception stack is not yet initialized, > effectively > delaying kgdbwait until late-init. I wonder how much work it would take to just set up the exception stack and proceed. After all the kgbdwait is there to help debug very early kernel code... >>> >>> In principle a valid question, but I'm not the one to answer it. I >>> would not feel very well if I had to reorder this critical setup code. >>> Look, we would have to move trap_init in start_kernel before >>> parse_early_param, and that would affect _every_ arch... > > I can not speak to other archs, but for x86 I called trap_init from the > code that caught the kgdbwait. At that time (since I retired, I have > not looked at the actual kernel code) it could be called again later by > the kernel code. I.e. I did not try to reorder the kernel bring up > code, but just added an additional call to trap_init and then only in > the case of finding a kgdbwait. > > As such, this would need to be arch specific... > >>> >> >> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in >> parse_early_param as well? It should, because my under standing of >> trap_init is that it's the functions to arm things like... exception >> handlers? And that raises the question of the deeper purpose of this >> check (and the invocation of kgdb_early_init from the argument parsing >> function). Sigh, KGDB is still a quite improvable piece of code. > > Likely. Once you get it in the main line kernel, one would hope that > other arch code would be forth coming as many more "eyes" will be in play. Meanwhile I realized that there is early_trap_init - for x86-32 only! I assume now we are only lacking the same for x86-64 to get kgdb running there already during early_param-parsing. Jan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init
On 01/31/2008 01:36 AM, Jan Kiszka was caught saying: > Jan Kiszka wrote: >> George Anzinger wrote: >>> On 01/30/2008 04:08 PM, Jan Kiszka was caught saying: [Here comes a rebased version against latest x86/mm] In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up and connect to the front-end already during early_param evaluation. This fails on x86 as the exception stack is not yet initialized, effectively delaying kgdbwait until late-init. >>> >>> I wonder how much work it would take to just set up the exception >>> stack and proceed. After all the kgbdwait is there to help debug >>> very early kernel code... >> >> In principle a valid question, but I'm not the one to answer it. I >> would not feel very well if I had to reorder this critical setup code. >> Look, we would have to move trap_init in start_kernel before >> parse_early_param, and that would affect _every_ arch... I can not speak to other archs, but for x86 I called trap_init from the code that caught the kgdbwait. At that time (since I retired, I have not looked at the actual kernel code) it could be called again later by the kernel code. I.e. I did not try to reorder the kernel bring up code, but just added an additional call to trap_init and then only in the case of finding a kgdbwait. As such, this would need to be arch specific... >> > > BTW, do you know if EXCEPTION_STACK_READY fails for other archs in > parse_early_param as well? It should, because my under standing of > trap_init is that it's the functions to arm things like... exception > handlers? And that raises the question of the deeper purpose of this > check (and the invocation of kgdb_early_init from the argument parsing > function). Sigh, KGDB is still a quite improvable piece of code. Likely. Once you get it in the main line kernel, one would hope that other arch code would be forth coming as many more "eyes" will be in play. > > Jan > > PS: Can we move this to some public list? Sure, sorry I picked the wrong reply button, never intended it to be private. > -- George Anzinger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][KGDB] Re: [Kgdb-bugreport] KGDB: 8250_kgdb warnings
Jan Kiszka wrote: > Hi Jason, > > so far I ignored this because it worked, but I know my customer will > complain later anyway: What is the deeper meaning of this warning which > shows up once per registered UART port on my (x86) boxes? > > void kgdb8250_add_port(int i, struct uart_port *serial_req) > { > #ifdef CONFIG_KGDB_SIMPLE_SERIAL > if (should_copy_rs_table) > printk(KERN_ERR "8250_kgdb: warning will over write serial" > " port definitions at kgdb init time\n"); > #endif > ... > > When I look at kgdb8250_add_platform_port, it starts with a call to > kgdb8250_copy_rs_table, and I'm wondering now if that wouldn't be more > appropriate here. > > Jan > This was the result of a race condition between the init code of the platform vs the init code in kgdb. The init code in the arch platform could register serial ports prior to the kgdb module being configured by the kernel while the kernel is processing all the __init functions. It would have been easy to fix this with another call to kgdb8250_copy_rs_table() but you cannot do that because a non-__init function cannot call an __init function. We might as well go ahead and fix the problem by adding in some checks so as not to overwrite the dynamic registrations, because eventually the SERIAL_PORT_DFNS will be gone. Below is the patch with the fix to add some saftey checks as well as to remove the warning. Cut here--- Fix the initialization of the kgdb port structure such that dynamically registered ports will not be later overwritten by the SERIAL_PORT_DFNS table. With this problem fixed, the printk about the overwriting of the kgdb serial definitions at init time can be removed. Also add in additional runtime safety checks to make sure UART_NR was statically allocated by the kernel at compile time to be large enough for all the dynamic registered ports. Signed-off-by: Jason Wessel <[EMAIL PROTECTED]> --- drivers/serial/8250_kgdb.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) --- a/drivers/serial/8250_kgdb.c +++ b/drivers/serial/8250_kgdb.c @@ -53,7 +53,7 @@ static int kgdb8250_buf_out_inx; /* Old-style serial definitions, if existant, and a counter. */ #ifdef CONFIG_KGDB_SIMPLE_SERIAL -static int should_copy_rs_table = 1; +static int __initdata should_copy_rs_table = 1; static struct serial_state old_rs_table[] __initdata = { #ifdef SERIAL_PORT_DFNS SERIAL_PORT_DFNS @@ -260,7 +260,10 @@ static void __init kgdb8250_copy_rs_tabl if (!should_copy_rs_table) return; - for (i = 0; i < ARRAY_SIZE(old_rs_table); i++) { + for (i = 0; i < ARRAY_SIZE(old_rs_table) && i < UART_NR; i++) { + if (kgdb8250_ports[i].iobase || kgdb8250_ports[i].irq || + kgdb8250_ports[i].membase) + continue; kgdb8250_ports[i].iobase = old_rs_table[i].port; kgdb8250_ports[i].irq = irq_canonicalize(old_rs_table[i].irq); kgdb8250_ports[i].uartclk = old_rs_table[i].baud_base * 16; @@ -281,7 +284,7 @@ static void __init kgdb8250_copy_rs_tabl */ static void __init kgdb8250_late_init(void) { - /* Try and copy the old_rs_table. */ + /* Setup the KGDB uart table if not already initialized */ kgdb8250_copy_rs_table(); #if defined(CONFIG_SERIAL_8250) || defined(CONFIG_SERIAL_8250_MODULE) @@ -303,7 +306,7 @@ static void __init kgdb8250_late_init(vo static __init int kgdb_init_io(void) { - /* Give us the basic table of uarts. */ + /* Setup the KGDB uart table if not already initialized */ kgdb8250_copy_rs_table(); /* We're either a module and parse a config string, or we have a @@ -401,11 +404,11 @@ struct kgdb_io kgdb_io_ops = { */ void kgdb8250_add_port(int i, struct uart_port *serial_req) { -#ifdef CONFIG_KGDB_SIMPLE_SERIAL - if (should_copy_rs_table) - printk(KERN_ERR "8250_kgdb: warning will over write serial" - " port definitions at kgdb init time\n"); -#endif + if (i >= UART_NR) { + printk(KERN_ERR "KGDB dynamic uart registration failed" + "NR_UARTS is too small"); + return; + } /* Copy the whole thing over. */ if (current_port != &kgdb8250_ports[i]) @@ -427,6 +430,12 @@ void __init kgdb8250_add_platform_port(i /* Make sure we've got the built-in data before we override. */ kgdb8250_copy_rs_table(); + if (i >= UART_NR) { + printk(KERN_ERR "KGDB dynamic uart registration failed" + "NR_UARTS is too small"); + return; + } + kgdb8250_ports[i].iobase = p->iobase; kgdb8250_ports[i].membase = p->membase; kgdb8250_ports[i].irq = p->irq; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTEC
Re: bugreport kernel panic on early stage, with HIGHMEM4G:
Patch fixes the problem. Here is dmesg (i cut it, probably remaining part of it not required). 0] Movable zone: 0 pages used for memmap [0.00] DMI 2.4 present. [0.00] ACPI: RSDP 000FE020, 0014 (r0 INTEL ) [0.00] ACPI: RSDT CFEFD038, 0050 (r1 INTEL DG965SS 6B7 113) [0.00] ACPI: FACP CFEFC000, 0074 (r1 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: DSDT CFEF7000, 41AA (r1 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: FACS CFE9C000, 0040 [0.00] ACPI: APIC CFEF6000, 0078 (r1 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: WDDT CFEF5000, 0040 (r1 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: MCFG CFEF4000, 003C (r1 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: ASF! CFEF3000, 00A6 (r32 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: HPET CFEF2000, 0038 (r1 INTEL DG965SS 6B7 MSFT 113) [0.00] ACPI: SSDT CFE9A000, 020C (r1 INTEL CpuPm 6B7 MSFT 113) [0.00] ACPI: SSDT CFE99000, 0175 (r1 INTEL Cpu0Ist 6B7 MSFT 113) [0.00] ACPI: SSDT CFE98000, 0175 (r1 INTEL Cpu1Ist 6B7 MSFT 113) [0.00] ACPI: SSDT CFE97000, 0175 (r1 INTEL Cpu2Ist 6B7 MSFT 113) [0.00] ACPI: SSDT CFE96000, 0175 (r1 INTEL Cpu3Ist 6B7 MSFT 113) [0.00] ACPI: PM-Timer IO Port: 0x408 [0.00] ACPI: Local APIC address 0xfee0 [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [0.00] Processor #0 6:15 APIC version 20 [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) [0.00] Processor #1 6:15 APIC version 20 [0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled) [0.00] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled) [0.00] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) [0.00] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) [0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.00] ACPI: IRQ0 used by override. [0.00] ACPI: IRQ2 used by override. [0.00] ACPI: IRQ9 used by override. [0.00] Enabling APIC mode: Flat. Using 1 I/O APICs [0.00] ACPI: HPET id: 0x8086a201 base: 0xfed0 [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Allocating PCI resources starting at d400 (gap: d000:2ff0) [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1040384 [0.00] Kernel command line: root=/dev/sdb2 [0.00] mapped APIC to b000 (fee0) [0.00] mapped IOAPIC to a000 (fec0) [0.00] Enabling fast FPU save and restore... done. [0.00] Enabling unmasked SIMD FPU exception support... done. [0.00] Initializing CPU#0 [0.00] PID hash table entries: 4096 (order: 12, 16384 bytes) [0.00] Detected 2397.647 MHz processor. [ 23.770446] Console: colour VGA+ 80x25 [ 23.770448] console [tty0] enabled [ 23.779138] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) [ 23.779395] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) [ 23.788992] set_highmem_pages_init(bad_ppro:0) [ 23.789056] sizeof(struct page):32 [ 23.789118] sizeof(struct mem_section): 8 [ 23.789180] PFN_SECTION_SHIFT: 14 [ 23.789243] mem_map: [ 23.789304] highstart_pfn:229376 [page: c1700700] [ 23.789367] highend_pfn: 1048576 [page: 0200] [ 23.789431] highend_pfn-1: 1048575 [page: 01e0] [ 23.789494] NR_MEM_SECTIONS: 64 [ 23.789555] pfn_to_section_nr(highstart_pfn):14 [ 23.789619] pfn_to_section_nr(highend_pfn): 64 [ 23.789682] pfn_to_section_nr(highend_pfn-1):63 [ 23.789745] totalhigh_pages: 0 [ 23.789807] totalram_pages:221519 [ 23.924275] WARNING: at arch/x86/mm/init_32.c:353 set_highmem_pages_init() [ 23.924344] Pid: 0, comm: swapper Not tainted 2.6.24-rc8-git1 #1 [ 23.924409] [] show_trace_log_lvl+0x1a/0x2f [ 23.924503] [] show_trace+0x12/0x14 [ 23.924591] [] dump_stack+0x6c/0x72 [ 23.924678] [] mem_init+0x2a7/0x596 [ 23.924768] [] start_kernel+0x271/0x2fb [ 23.924858] [<>] _stext+0x3feff000/0x19 [ 23.924945] === [ 23.925007] bad pfn: 851968 [ 23.925068] totalhigh_pages:622129 [ 23.925130] totalram_pages:221519 [ 23.925194] Memory: 3373812k/4194304k available (1990k kernel code, 30976k reserved, 813k data, 208k init, 2488516k highmem) [ 23.925300] virtual kernel memory layout: [ 23.925301] fixmap : 0xffe15000 - 0xf000 (1960 kB) [ 23.925302] pkmap : 0xff80 - 0xffc0 (4096 kB)
Re: bugreport kernel panic on early stage, with HIGHMEM4G:
On (15/01/08 14:13), Ingo Molnar didst pronounce: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > thanks for the detailed report, i think i know what's going on. Could > > you try the patch below, does it fix your problem? > > find below the fix with a more complete changelog and with no debugging > printouts. > Looks good and the right thing to do. If you check the equivilant code for DISCONTIG, it calls pfn_valid() searching for holes so it was expected. Acked-by: Mel Gorman <[EMAIL PROTECTED]> -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bugreport kernel panic on early stage, with HIGHMEM4G:
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > thanks for the detailed report, i think i know what's going on. Could > you try the patch below, does it fix your problem? find below the fix with a more complete changelog and with no debugging printouts. Ingo --> Subject: x86: fix boot crash on HIGHMEM4G && SPARSEMEM From: Ingo Molnar <[EMAIL PROTECTED]> Denys Fedoryshchenko reported a bootup crash when he upgraded his system from 3GB to 4GB RAM: http://lkml.org/lkml/2008/1/7/9 the bug is due to HIGHMEM4G && SPARSEMEM kernels making pfn_to_page() to return an invalid pointer when the pfn is in a memory hole. The 256 MB PCI aperture at the end of RAM was not mapped by sparsemem, and hence the pfn was not valid. But set_highmem_pages_init() iterated this range without checking the pfn's validity first - crashing the bootup. this bug was probably present in the sparsemem code ever since sparsemem has been introduced in v2.6.13. It was masked due to HIGHMEM64G using larger memory regions in sparsemem_32.h: #ifdef CONFIG_X86_PAE #define SECTION_SIZE_BITS 30 #define MAX_PHYSADDR_BITS 36 #define MAX_PHYSMEM_BITS36 #else #define SECTION_SIZE_BITS 26 #define MAX_PHYSADDR_BITS 32 #define MAX_PHYSMEM_BITS32 #endif which creates 1GB sparsemem regions instead of 64MB sparsemem regions. So in practice we only ever created true sparsemem holes on x86 with HIGHMEM4G - but that was rarely used by distros. ( btw., we could probably save 2MB of mem_map[]s on X86_PAE if we reduced the sparsemem region size to 256 MB. ) Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/mm/init_32.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) Index: linux/arch/x86/mm/init_32.c === --- linux.orig/arch/x86/mm/init_32.c +++ linux/arch/x86/mm/init_32.c @@ -321,8 +321,13 @@ extern void set_highmem_pages_init(int); static void __init set_highmem_pages_init(int bad_ppro) { int pfn; - for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) - add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro); + for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) { + /* +* Holes under sparsemem might not have no mem_map[]: +*/ + if (pfn_valid(pfn)) + add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro); + } totalram_pages += totalhigh_pages; } #endif /* CONFIG_FLATMEM */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bugreport kernel panic on early stage, with HIGHMEM4G:
* Denys Fedoryshchenko <[EMAIL PROTECTED]> wrote: > Hi > > After physical memory upgrade from 3GB to 4GB (also it happens on 5GB) > got kernel panic. > > Because it is happening on early stage and my machine doesn't contain > serial port, i had to take photo. Kernel boots fine with 64GB highmem, > no highmem, or highmem4G with limited memory by mem=3G. All dmesg > attached. Also i attach dmidecode and lspci -vvv output, probably it > will be useful. thanks for the detailed report, i think i know what's going on. Could you try the patch below, does it fix your problem? this seems to be a SPARSEMEM bug which is present in v2.6.23 as well and has probably been present ever since SPARSEMEM was added to 32-bit x86. There's a ~256MB hole in your e820 memory map (the pci aperture), which causes the last 4 sparsemem sections (each covering 64MB of RAM) to be not present - and they are thus missing from the sparsemem mem_map[] too. The highmem init code on the other hand assumes that all pages are in the mem_map[]: static void __init set_highmem_pages_init(int bad_ppro) { int pfn; for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro); the pfn_to_page() is unconditional and dereferences to a NULL-ish pointer which crashes your box. highend_pfn is what got miscalculated by 256 MB, so set_highmem_pages_init() tried to reference a non-existing struct page - but it should still be robust enough against non-existent pages. The patch below fixes this bug. Please also send a dmesg if you manage to boot the box up fine, i've added a few debug printouts to confirm this theory. (i'll figure out whether we need to clip highend_pfn as well - but this patch alone should be good enough to fix the crash on your box.) Ingo -> Subject: x86: fix CONFIG_SPARSEMEM highmem init bug From: Ingo Molnar <[EMAIL PROTECTED]> fix CONFIG_SPARSEMEM highmem init bug. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/mm/init_32.c | 43 --- mm/sparse.c |8 +++- 2 files changed, 47 insertions(+), 4 deletions(-) Index: linux/arch/x86/mm/init_32.c === --- linux.orig/arch/x86/mm/init_32.c +++ linux/arch/x86/mm/init_32.c @@ -321,11 +321,48 @@ extern void set_highmem_pages_init(int); static void __init set_highmem_pages_init(int bad_ppro) { int pfn; - for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) - add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro); + + printk("set_highmem_pages_init(bad_ppro:%d)\n", bad_ppro); + printk("sizeof(struct page):%d\n", sizeof(struct page)); + printk("sizeof(struct mem_section): %d\n", sizeof(struct mem_section)); + printk("PFN_SECTION_SHIFT: %d\n", PFN_SECTION_SHIFT); + + printk("mem_map: %p\n", mem_map); + printk(" highstart_pfn: %9ld [page: %p]\n", + highstart_pfn, pfn_to_page(highstart_pfn)); + printk("highend_pfn: %9ld [page: %p]\n", + highend_pfn, pfn_to_page(highend_pfn)); + printk(" highend_pfn-1: %9ld [page: %p]\n", + highend_pfn-1, pfn_to_page(highend_pfn-1)); + + printk("NR_MEM_SECTIONS: %ld\n", NR_MEM_SECTIONS); + printk("pfn_to_section_nr(highstart_pfn): %9ld\n", + pfn_to_section_nr(highstart_pfn)); + printk("pfn_to_section_nr(highend_pfn): %9ld\n", + pfn_to_section_nr(highend_pfn)); + printk("pfn_to_section_nr(highend_pfn-1): %9ld\n", + pfn_to_section_nr(highend_pfn-1)); + + printk("totalhigh_pages: %9ld\n", totalhigh_pages); + printk(" totalram_pages: %9ld\n", totalram_pages); + + for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) { + if (pfn_valid(pfn)) + add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro); + else { + if (WARN_ON_ONCE(1)) { + printk("bad pfn: %d\n", pfn); + break; + } + } + } + printk("totalhigh_pages: %9ld\n", totalhigh_pages); + printk(" totalram_pages: %9ld\n", totalram_pages); + + totalram_pages += totalhigh_pages; } -#endif /* CONFIG_FLATMEM */ +#endif /* CONFIG_NUMA */ #else #define kmap_init() do { } while (0) Index: linux/mm/sparse.c === --- linux.orig/mm/sparse.c +++ linux/mm/sparse.c @@ -295,9 +295,13 @@ void __init sparse_init(void) struct page *map; unsigned long *usemap; + printk("sparse_init()\n"); for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) { - if (!present_section_nr(pnum)) + printk("section %2ld: ", pnum); + if (!present_section_nr(pnu
bugreport kernel panic on early stage, with HIGHMEM4G:
Hi After physical memory upgrade from 3GB to 4GB (also it happens on 5GB) got kernel panic. Because it is happening on early stage and my machine doesn't contain serial port, i had to take photo. Kernel boots fine with 64GB highmem, no highmem, or highmem4G with limited memory by mem=3G. All dmesg attached. Also i attach dmidecode and lspci -vvv output, probably it will be useful. Photo (2.8MB, sorry, just original size from camera): http://www.nuclearcat.com/files/panic-07012008/img_1232.jpg dmesg without highmem http://www.nuclearcat.com/files/panic-07012008/dmesg-nohighmem.txt with highmem64G http://www.nuclearcat.com/files/panic-07012008/dmesg-highmem64G.txt with highmem4G limited by mem=3G http://www.nuclearcat.com/files/panic-07012008/dmesg-highmem4G-memlim3G.txt Kernel config for this specific boot: http://www.nuclearcat.com/files/panic-07012008/config.txt dmidecode output http://www.nuclearcat.com/files/panic-07012008/dmidecode.txt lspci output http://www.nuclearcat.com/files/panic-07012008/lspci.txt -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH -mm v2] 2.6.23-rc4-mm1: kgdboe link errors
Randy, This patch is fine, and I am committing it to the for_mm kgdb tree. I am also adding the "depends on NET" to the KGDBOE_NOMODULE section, which would otherwise to a select on KGDBOE. We have to cover the case for KGDB as a module and not as a module. Thanks, Jason. Randy Dunlap wrote: On Wed, 12 Sep 2007 13:15:12 -0500 Matt Mackall wrote: NETCONSOLE shouldn't be necessary. Otherwise this looks ok to my kconfig-addled brain. Correct. Patch corrected. Thanks. --- From: Randy Dunlap <[EMAIL PROTECTED]> Fix kgdb build problems: Building modules, stage 2. ERROR: "netpoll_cleanup" [drivers/net/kgdboe.ko] undefined! ERROR: "netpoll_setup" [drivers/net/kgdboe.ko] undefined! ERROR: "netpoll_parse_options" [drivers/net/kgdboe.ko] undefined! ERROR: "netpoll_poll" [drivers/net/kgdboe.ko] undefined! ERROR: "netpoll_send_udp" [drivers/net/kgdboe.ko] undefined! ERROR: "netpoll_set_trap" [drivers/net/kgdboe.ko] undefined! make[1]: *** [__modpost] Error 1 Add 'select' for net-poll related config symbols, but make KGDBOE 'depend on' NET. We don't want to 'select' CONFIG_NET, but if it is already enabled, the 'select's will enable the rest of the needed interfaces. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- lib/Kconfig.kgdb |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- linux-2.6.23-rc4-mm1.orig/lib/Kconfig.kgdb +++ linux-2.6.23-rc4-mm1/lib/Kconfig.kgdb @@ -174,9 +174,10 @@ endchoice config KGDBOE tristate "KGDB: On ethernet" if !KGDBOE_NOMODULE - depends on m && KGDB + depends on m && KGDB && NET select NETPOLL select NETPOLL_TRAP + select NET_POLL_CONTROLLER help Uses the NETPOLL API to communicate with the host GDB via UDP. In order for this to work, the ethernet interface specified must - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Kgdb-bugreport mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
Pete/Piet Delaney wrote: We are getting a problem with VMware where kernel text is the schedler is getting wacked with four null bytes into the code. Thought I'd use the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA patch to make kernel text readonly: https://www.x86-64.org/pipermail/patches/2007-March/003666.html I thought the kernel text was RO and gdb had to disable it to insert a breakpoint. If you are going to make all the kernel text RO, then you are going to have to add some code to the kgdb write memory so as to unprotect a given page or all the breakpoint writes are going to fail. Alternatively you can use HW breakpoints. But, I have no idea if your VM Ware simulated HW emulate HW breakpoint registers or not. Jason. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
Pete/Piet Delaney wrote: Why am I getting this when I do: git clone http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git I have only ever used: git clone git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git Jason. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
On Wed, 29 Aug 2007 18:19:29 -0700 Pete/Piet Delaney wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Pete/Piet Delaney wrote: > > Jason Wessel wrote: > >> Andrew Morton wrote: > >>> On Wed, 22 Aug 2007 17:44:12 -0500 > >>> Jason Wessel <[EMAIL PROTECTED]> wrote: > >>> > >>> > +while (!atomic_read(&debugger_active)); > > >>> eek. We're in the process of hunting down and eliminating exactly this > >>> construct. There have been cases where the compiler cached the > >>> atomic_read() result in a register, turning the above into an infinite > >>> loop. > >>> > >>> Plus we should never add power-burners like that into the kernel > >>> anyway. That loop should have a cpu_relax() in it. Which will also > >>> fix the > >>> compiler problem described above. > >>> > >>> > >> Agreed, and fixed with a cpu_relax. > > > >>> Thirdly, please always add a newline when coding statements like that: > >>> > >>> while (expr()) > >>> ; > >>> > >> The other instances I found of the same problem in the kgdb core are > >> fixed too. > > > >> I merged all the changes into the for_mm branch in the kgdb git tree. > > > > Where is the kgdb git tree? > > Why am I getting this when I do: > > git clone > http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git > > - > > error: Couldn't get > http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git/refs/tags/v2.6.11 > for tags/v2.6.11 > The requested URL returned error: 404 > error: Could not interpret tags/v2.6.11 as something to pull > rm: cannot remove directory > `/nethome/piet/Src/linux/git/jwessel/linux-2.6-kgdb/.git/clone-tmp': > Directory not empty > /nethome/piet/Src/linux/git/jwessel$ > - > See the URLs at the top of http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=summary and try one of those (the git one preferably). > We are getting a problem with VMware where kernel text is the schedler > is getting wacked with four null bytes into the code. Thought I'd use > the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA > patch to make kernel text readonly: > > https://www.x86-64.org/pipermail/patches/2007-March/003666.html > > I thought the kernel text was RO and gdb had to disable it to > insert a breakpoint. > > - -piet > > > > > -piet > > > >> Thanks, > >> Jason. > >> - > >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >> the body of a message to [EMAIL PROTECTED] > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > > > > > - - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFG1hshJICwm/rv3hoRAhTGAJ46pq69zYHqRmT+yTmRx+RVh8aBtgCfdyFM > gl91xCFTy0NJxHalVXpd9Os= > =c8FZ > -END PGP SIGNATURE- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pete/Piet Delaney wrote: > Jason Wessel wrote: >> Andrew Morton wrote: >>> On Wed, 22 Aug 2007 17:44:12 -0500 >>> Jason Wessel <[EMAIL PROTECTED]> wrote: >>> >>> +while (!atomic_read(&debugger_active)); >>> eek. We're in the process of hunting down and eliminating exactly this >>> construct. There have been cases where the compiler cached the >>> atomic_read() result in a register, turning the above into an infinite >>> loop. >>> >>> Plus we should never add power-burners like that into the kernel >>> anyway. That loop should have a cpu_relax() in it. Which will also >>> fix the >>> compiler problem described above. >>> >>> >> Agreed, and fixed with a cpu_relax. > >>> Thirdly, please always add a newline when coding statements like that: >>> >>> while (expr()) >>> ; >>> >> The other instances I found of the same problem in the kgdb core are >> fixed too. > >> I merged all the changes into the for_mm branch in the kgdb git tree. > > Where is the kgdb git tree? Why am I getting this when I do: git clone http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git - error: Couldn't get http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git/refs/tags/v2.6.11 for tags/v2.6.11 The requested URL returned error: 404 error: Could not interpret tags/v2.6.11 as something to pull rm: cannot remove directory `/nethome/piet/Src/linux/git/jwessel/linux-2.6-kgdb/.git/clone-tmp': Directory not empty /nethome/piet/Src/linux/git/jwessel$ - We are getting a problem with VMware where kernel text is the schedler is getting wacked with four null bytes into the code. Thought I'd use the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA patch to make kernel text readonly: https://www.x86-64.org/pipermail/patches/2007-March/003666.html I thought the kernel text was RO and gdb had to disable it to insert a breakpoint. - -piet > > -piet > >> Thanks, >> Jason. >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1hshJICwm/rv3hoRAhTGAJ46pq69zYHqRmT+yTmRx+RVh8aBtgCfdyFM gl91xCFTy0NJxHalVXpd9Os= =c8FZ -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pete/Piet Delaney wrote: > Jason Wessel wrote: >> Andrew Morton wrote: >>> On Wed, 22 Aug 2007 17:44:12 -0500 >>> Jason Wessel <[EMAIL PROTECTED]> wrote: >>> >>> +while (!atomic_read(&debugger_active)); >>> eek. We're in the process of hunting down and eliminating exactly this >>> construct. There have been cases where the compiler cached the >>> atomic_read() result in a register, turning the above into an infinite >>> loop. >>> >>> Plus we should never add power-burners like that into the kernel >>> anyway. That loop should have a cpu_relax() in it. Which will also >>> fix the >>> compiler problem described above. >>> >>> >> Agreed, and fixed with a cpu_relax. > >>> Thirdly, please always add a newline when coding statements like that: >>> >>> while (expr()) >>> ; >>> >> The other instances I found of the same problem in the kgdb core are >> fixed too. > >> I merged all the changes into the for_mm branch in the kgdb git tree. > > Where is the kgdb git tree? Trying: git clone http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git - -piet > > -piet > >> Thanks, >> Jason. >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1gnFJICwm/rv3hoRApOoAJ9BHXLsIuxDiOCaAFRfAZGwrDXATQCeLL3O bxtr3qz0soPRghPmtSZgOqc= =kQd1 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jason Wessel wrote: > Andrew Morton wrote: >> On Wed, 22 Aug 2007 17:44:12 -0500 >> Jason Wessel <[EMAIL PROTECTED]> wrote: >> >> >>> +while (!atomic_read(&debugger_active)); >>> >> >> eek. We're in the process of hunting down and eliminating exactly this >> construct. There have been cases where the compiler cached the >> atomic_read() result in a register, turning the above into an infinite >> loop. >> >> Plus we should never add power-burners like that into the kernel >> anyway. That loop should have a cpu_relax() in it. Which will also >> fix the >> compiler problem described above. >> >> > Agreed, and fixed with a cpu_relax. > >> Thirdly, please always add a newline when coding statements like that: >> >> while (expr()) >> ; >> > > The other instances I found of the same problem in the kgdb core are > fixed too. > > I merged all the changes into the for_mm branch in the kgdb git tree. Where is the kgdb git tree? - -piet > > Thanks, > Jason. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1gS/JICwm/rv3hoRAhfRAJ42F3QlzGwG4aQbs9hHVMI4kJ9SWQCfXrku UGo97ByKsB9yhyIu5c+2Jh0= =welB -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
Andrew Morton wrote: On Wed, 22 Aug 2007 17:44:12 -0500 Jason Wessel <[EMAIL PROTECTED]> wrote: + while (!atomic_read(&debugger_active)); eek. We're in the process of hunting down and eliminating exactly this construct. There have been cases where the compiler cached the atomic_read() result in a register, turning the above into an infinite loop. Plus we should never add power-burners like that into the kernel anyway. That loop should have a cpu_relax() in it. Which will also fix the compiler problem described above. Agreed, and fixed with a cpu_relax. Thirdly, please always add a newline when coding statements like that: while (expr()) ; The other instances I found of the same problem in the kgdb core are fixed too. I merged all the changes into the for_mm branch in the kgdb git tree. Thanks, Jason. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
On Wed, 22 Aug 2007 17:44:12 -0500 Jason Wessel <[EMAIL PROTECTED]> wrote: > Perhaps there is a cleaner way to do the same thing and avoid the > cmpxchg all together. I used the attached patch to eliminate the > cmpxchg operation. > > > Jason. > > > [kgdb_enter_atomic.patch text/plain (2.0KB)] > Signed-off-by: Jason Wessel <[EMAIL PROTECTED]> > > --- > kernel/kgdb.c | 18 -- > 1 file changed, 16 insertions(+), 2 deletions(-) > > --- a/kernel/kgdb.c > +++ b/kernel/kgdb.c > @@ -121,6 +121,7 @@ struct task_struct *kgdb_usethread, *kgd > > int debugger_step; > atomic_t debugger_active; > +static atomic_t kgdb_sync = ATOMIC_INIT(-1); > > /* Our I/O buffers. */ > static char remcom_in_buffer[BUFMAX]; > @@ -638,8 +639,14 @@ static void kgdb_wait(struct pt_regs *re > kgdb_info[processor].task = current; > atomic_set(&procindebug[processor], 1); > > + /* The master processor must be active to enter here, but this is > + * gaurd in case the master processor had not been selected if > + * this was an entry via nmi. > + */ > + while (!atomic_read(&debugger_active)); eek. We're in the process of hunting down and eliminating exactly this construct. There have been cases where the compiler cached the atomic_read() result in a register, turning the above into an infinite loop. Plus we should never add power-burners like that into the kernel anyway. That loop should have a cpu_relax() in it. Which will also fix the compiler problem described above. Thirdly, please always add a newline when coding statements like that: while (expr()) ; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc
Andrew Morton wrote: On Wed, 22 Aug 2007 21:04:28 +0200 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote: Hello, Got that on imac g3. CC kernel/kgdb.o kernel/kgdb.c: In function 'kgdb_handle_exception': kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of '_o_' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of '_n_' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: error: invalid lvalue in unary '&' kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of 'type name' make[1]: *** [kernel/kgdb.o] Blad 1 make: *** [kernel] Blad 2 Against the tip of the kernel + kgdb patches this config builds. I wonder if is the compiler or the macros for atomic_read or cmpxchg have changed for in the -mm tree. Perhaps it is not relevant though if you read on. I'm not surprised. while (cmpxchg(&atomic_read(&debugger_active), 0, (procid + 1)) != 0) { a) cmpxchg isn't available on all architectures It was available for all the archs that the kgdb had been implemented on at the time. b) we can't just go and take the address of atomic_read()'s return value! Perhaps yes, perhaps no I guess it depends on what actually gets generated... In the past the intent of this was to guard for the race to be the master processor and looked like some attempt to do it atomically. This code had been in use for a number of years at this point. c) that's pretty ugly-looking stuff anyway. Perhaps there is a cleaner way to do the same thing and avoid the cmpxchg all together. I used the attached patch to eliminate the cmpxchg operation. Jason. Signed-off-by: Jason Wessel <[EMAIL PROTECTED]> --- kernel/kgdb.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) --- a/kernel/kgdb.c +++ b/kernel/kgdb.c @@ -121,6 +121,7 @@ struct task_struct *kgdb_usethread, *kgd int debugger_step; atomic_t debugger_active; +static atomic_t kgdb_sync = ATOMIC_INIT(-1); /* Our I/O buffers. */ static char remcom_in_buffer[BUFMAX]; @@ -638,8 +639,14 @@ static void kgdb_wait(struct pt_regs *re kgdb_info[processor].task = current; atomic_set(&procindebug[processor], 1); + /* The master processor must be active to enter here, but this is +* gaurd in case the master processor had not been selected if +* this was an entry via nmi. +*/ + while (!atomic_read(&debugger_active)); + /* Wait till master processor goes completely into the debugger. -* FIXME: this looks racy */ +*/ while (!atomic_read(&procindebug[atomic_read(&debugger_active) - 1])) { int i = 10; /* an arbitrary number */ @@ -973,8 +980,13 @@ int kgdb_handle_exception(int ex_vector, /* Hold debugger_active */ procid = raw_smp_processor_id(); - while (cmpxchg(&atomic_read(&debugger_active), 0, (procid + 1)) != 0) { + while (1) { int i = 25; /* an arbitrary number */ + if (atomic_read(&kgdb_sync) < 0 && + atomic_inc_and_test(&kgdb_sync)) { + atomic_set(&debugger_active, procid + 1); + break; + } while (--i) cpu_relax(); @@ -991,6 +1003,7 @@ int kgdb_handle_exception(int ex_vector, if (atomic_read(&cpu_doing_single_step) != -1 && atomic_read(&cpu_doing_single_step) != procid) { atomic_set(&debugger_active, 0); + atomic_set(&kgdb_sync, -1); clocksource_touch_watchdog(); kgdb_softlock_skip[procid] = 1; local_irq_restore(flags); @@ -1557,6 +1570,7 @@ int kgdb_handle_exception(int ex_vector, kgdb_restore: /* Free debugger_active */ atomic_set(&debugger_active, 0); + atomic_set(&kgdb_sync, -1); clocksource_touch_watchdog(); kgdb_softlock_skip[processor] = 1; local_irq_restore(flags);
Bugreport: SATA Problem: port is slow to respond
Hello there! Here my first bugreport on the linux kernel: [1.] One line summary of the problem: SATA Problem: port is slow to respond [2.] Full description of the problem/report: The following messages appear while booting/in dmesg [1]: [...] ata2: port is slow to respond, please be patient (Status 0x80) ata2: port failed to respond (30 secs, Status 0x80) ata2: softreset failed (device not ready) ata2: softreset failed, retrying in 5 secs ata2: port is slow to respond, please be patient (Status 0x80) ata2: port failed to respond (30 secs, Status 0x80) ata2: COMRESET failed (device not ready) ata2: hardreset failed, retrying in 5 secs ata2: port is slow to respond, please be patient (Status 0x80) ata2: port failed to respond (30 secs, Status 0x80) ata2: COMRESET failed (device not ready) ata2: reset failed, giving up scsi2 : ahci [...] This of course means extra long booting time :( [1] http://wieland.homelinux.org/hp_dc5750/dmesg_2.6.19.3.txt A further issue which - imho is coupled w/ this one - is that my cd/dvd+rw (also sata) combo drive simply does not work: firstly there is no indication in dmesg that it was found and secondly it behaves like dead: it does not open if i press the button and it does not blink; it can be opened until these message appear on the screen. I need to unplug and the replug it after shutting down/reboot so that on the next bootup the bios is able to find it (else it would still behave like dead)! The drive itself is ok i guess (a least it works fine on a differen OS). I found a debian bug report where the problem (but unfortunately not a solution that worked for me) is described: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391867 [3.] Keywords (i.e., modules, networking, kernel): SATA, ATA [4.] Kernel version (from /proc/version): Linux version 2.6.19.3 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #2 SMP Thu Feb 15 02:59:14 CET 2007 [8.1.] Software (add the output of the ver_linux script here) http://wieland.homelinux.org/hp_dc5750/ver_linux.txt [8.2.] Processor information (from /proc/cpuinfo): http://wieland.homelinux.org/hp_dc5750/proc_cpuinfo.txt [8.3.] Module information (from /proc/modules): http://wieland.homelinux.org/hp_dc5750/proc_modules.txt [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) http://wieland.homelinux.org/hp_dc5750/proc_iomem.txt http://wieland.homelinux.org/hp_dc5750/proc_ioports.txt [8.5.] PCI information ('lspci -vvv' as root) http://wieland.homelinux.org/hp_dc5750/lspci-vvv.txt [8.6.] SCSI information (from /proc/scsi/scsi) http://wieland.homelinux.org/hp_dc5750/proc_scsi_scsi.txt [X.] Other notes, patches, fixes, workarounds: Also it seems that the System is not capeable of recongnizing the HDD: hdparm -i /dev/sda HDIO_GET_IDENTITY failed: Inappropriate ioctl for device A user on an German mailinglist suggest to uses pata_atiixp instead of atiixp but yet again this does not work for me: http://lists.opensuse.org/archive/opensuse-mobile-de/ 2007-01/msg1.html All the information i collected can be found at: http://wieland.homelinux.org/hp_dc5750/ I've tried several kernel version in the meantime, starting w/ the standard debian testing i386 kernel, then tried different vanilla versions 2.6.18.6, 2.6.19, 2.6.19.3 which are all affected in the same way. However 2.6.20 behaves differently: there is another error message which scrolls over the screen so fast that I'm not able to read it, the only thing i was able to catch that it also concerns (s)ata, looking like 'ata2.0: [...] failed [...]' Please CC I'm *not* subscribed! Greetings, Sigmund - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugreport
* Amelia Nilsson ([EMAIL PROTECTED]) wrote: > I've found a bug in 2.6.11.6. I have a Toshiba laptop and when i did > run 2.6.11.6 my touchpad flipped out, it clicked everywhere when it > wasn't supposed to click. I couldn't even move my mouse without it was > clicking all over. It works fine i 2.6.10 though. Is there any changes > made that can affect this? (I haven't tried 2.6.11.7 yet...) 2.6.11.7 has no significant changes that should effect your touchpad. We'll need much more information to make any headway here (see REPORTING-BUGS). I've got a Toshiba laptop, and have no issues with the touchpad. I assume this is an issue in just in X. Do you see any obvious difference in the Xorg.0.log when starting X on the two different kernels? Any interesting dmesg output on the failing kernel? Does booting with psmouse.proto=exps help (assuming you have CONFIG_MOUSE_PS2=y)? thanks, -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bugreport
Hey, I've found a bug in 2.6.11.6. I have a Toshiba laptop and when i did run 2.6.11.6 my touchpad flipped out, it clicked everywhere when it wasn't supposed to click. I couldn't even move my mouse without it was clicking all over. It works fine i 2.6.10 though. Is there any changes made that can affect this? (I haven't tried 2.6.11.7 yet...) Best regards, Amelia Nilsson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
bugreport : system unacceptably slow.
Hi, Every once in a while, my system gets unbelievable slow. So slow that I almost can't do anything anymore. This happens only once in a few months. I think it has to do with sound, because when I start using sound, it happens. "top" gives me then about 90% idle time, and "top" is using this 10% then. This already happens quite a while. I already had this with a 2.2.x kernel and I just had it with the 2.4.2 kernel. Could you tell me what I can do to give you more information ? Do you think it could be in this module ? I use the es1370 driver as a loadable module that gets loaded when asked for it. I always use standard distribution installation. Now I have RH7.1 With kind regards, Edwin. info : [root@CC90001-A /root]# cat /proc/pci PCI devices found: Bus 0, device 0, function 0: Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 4). Master Capable. Latency=64. Non-prefetchable 32 bit memory at 0xe000 [0xe3ff]. Bus 0, device 1, function 0: PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 4). Master Capable. Latency=64. Min Gnt=8. Bus 0, device 2, function 0: USB Controller: Acer Laboratories Inc. [ALi] M5237 USB (rev 3). IRQ 9. Master Capable. Latency=64. Max Lat=80. Non-prefetchable 32 bit memory at 0xde80 [0xde800fff]. Bus 0, device 3, function 0: Bridge: Acer Laboratories Inc. [ALi] M7101 PMU (rev 0). Bus 0, device 7, function 0: ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev 195). Bus 0, device 9, function 0: Multimedia audio controller: Ensoniq ES1370 [AudioPCI] (rev 1). IRQ 5. Master Capable. Latency=32. Min Gnt=12.Max Lat=128. I/O at 0xd800 [0xd83f]. Bus 0, device 10, function 0: Ethernet controller: Winbond Electronics Corp W89C940 (rev 11). IRQ 7. I/O at 0xd400 [0xd41f]. Bus 0, device 11, function 0: Ethernet controller: Digital Equipment Corporation DECchip 21041 [Tulip Pass 3] (rev 33). IRQ 10. Master Capable. Latency=32. I/O at 0xd000 [0xd07f]. Non-prefetchable 32 bit memory at 0xde00 [0xde7f]. Bus 0, device 12, function 0: SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810 (rev 35). IRQ 11. Master Capable. Latency=64. Min Gnt=8.Max Lat=64. I/O at 0xb800 [0xb8ff]. Non-prefetchable 32 bit memory at 0xdd80 [0xdd8000ff]. Bus 0, device 15, function 0: IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev 193). Master Capable. Latency=32. Min Gnt=2.Max Lat=4. I/O at 0xb400 [0xb40f]. Bus 1, device 0, function 0: VGA compatible controller: nVidia Corporation Riva TnT 128 [NV04] (rev 4). IRQ 11. Master Capable. Latency=64. Min Gnt=5.Max Lat=1. Non-prefetchable 32 bit memory at 0xdf00 [0xdfff]. Prefetchable 32 bit memory at 0xe700 [0xe7ff]. [root@CC90001-A /root]# lspci -vvv 00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04) Subsystem: Acer Laboratories Inc. [ALi] ALI M1541 Aladdin V/V+ AGP System Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- SERR- 00:01.0 PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- SERR- Reset- FastB2B- 00:02.0 USB Controller: Acer Laboratories Inc. [ALi] M5237 USB (rev 03) (prog-if 10 [OHCI]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- [disabled] [size=32K] 00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21041 [Tulip Pass 3] (rev 21) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- [disabled] [size=256K] 00:0c.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810 (rev 23) Subsystem: Symbios Logic Inc. (formerly NCR) 8100S Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- [root@CC90001-A /root]# lsmod Module Size Used by es1370 24896 0 (autoclean) soundcore 4432 4 (autoclean) [es1370] ov511 38768 0 videodev4896 1 [ov511] autofs 11136 1 (autoclean) ne2k-pci4864 1 (autoclea
Re: bugreport: poll() timeout always takes 10ms too long
Tachino Nobuhiro wrote: > > Hello, > > At Fri, 22 Jun 2001 11:52:12 +1000, > [EMAIL PROTECTED] wrote: > > > > [1.] One line summary of the problem: > > > > poll() timeout always takes 10ms too long > > > > [2.] Full description of the problem/report: > > > > Select() timeouts work fine. A timeout between 10n-9 and 10n ms times > > out after 10n ms on average. Poll() timeouts between 10n-9 and 10n ms, > > on the other hand, time out after 10(n+1) ms on average. It's always a > > jiffy too long. This means it's impossible to set a 10ms timeout using > > poll() even though it's possible using select(). The programs and their > > output below [6] demonstrate this. The same behavious occurs with > > linux-2.2 and linux-2.4. > > > I think this is correct behavior. The Single UNIX Specification > describes about the timeout parameter of poll() as follows, > > If none of the defined events have occurred on any selected > file descriptor, poll() waits at least timeout milliseconds > for an event to occur on any of the selected file descriptors. > > On the other hand, select(), > > If the specified condition is false for all of the specified > file descriptors, select() blocks, up to the specified timeout > interval, until the specified condition is true for at least > one of the specified file descriptors. ok, it's a correct behaviour. but having both poll and select timeout at the time specified would also be correct behaviour. better than that, it would be expected behaviour. raf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bugreport: poll() timeout always takes 10ms too long
Hello, At Fri, 22 Jun 2001 11:52:12 +1000, [EMAIL PROTECTED] wrote: > > [1.] One line summary of the problem: > > poll() timeout always takes 10ms too long > > [2.] Full description of the problem/report: > > Select() timeouts work fine. A timeout between 10n-9 and 10n ms times > out after 10n ms on average. Poll() timeouts between 10n-9 and 10n ms, > on the other hand, time out after 10(n+1) ms on average. It's always a > jiffy too long. This means it's impossible to set a 10ms timeout using > poll() even though it's possible using select(). The programs and their > output below [6] demonstrate this. The same behavious occurs with > linux-2.2 and linux-2.4. I think this is correct behavior. The Single UNIX Specification describes about the timeout parameter of poll() as follows, If none of the defined events have occurred on any selected file descriptor, poll() waits at least timeout milliseconds for an event to occur on any of the selected file descriptors. On the other hand, select(), If the specified condition is false for all of the specified file descriptors, select() blocks, up to the specified timeout interval, until the specified condition is true for at least one of the specified file descriptors. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
bugreport: poll() timeout always takes 10ms too long
[1.] One line summary of the problem: poll() timeout always takes 10ms too long [2.] Full description of the problem/report: Select() timeouts work fine. A timeout between 10n-9 and 10n ms times out after 10n ms on average. Poll() timeouts between 10n-9 and 10n ms, on the other hand, time out after 10(n+1) ms on average. It's always a jiffy too long. This means it's impossible to set a 10ms timeout using poll() even though it's possible using select(). The programs and their output below [6] demonstrate this. The same behavious occurs with linux-2.2 and linux-2.4. [3.] Keywords (i.e., modules, networking, kernel): poll, select, timer, timeout [4.] Kernel version (from /proc/version): $ cat /proc/version Linux version 2.4.0 ([EMAIL PROTECTED]) (gcc version 2.95.2 19991024 (release)) #16 Sat Jan 20 07:45:58 EST 2001 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) N/A [6.] A small shell script or example program which triggers the problem (if possible) --- select.c -- #include #include #include #include #include void timeval_diff(struct timeval *start, struct timeval *end, struct timeval *diff) { diff->tv_sec = end->tv_sec - start->tv_sec; if (end->tv_usec < start->tv_usec) diff->tv_usec = 100 + end->tv_usec - start->tv_usec, --diff->tv_sec; else diff->tv_usec = end->tv_usec - start->tv_usec; } double time_select(int msec) { struct timeval timeout[1], start[1], end[1], elapsed[1]; timeout->tv_sec = 0; timeout->tv_usec = msec * 1000; gettimeofday(start, NULL); select(1, NULL, NULL, NULL, timeout); gettimeofday(end, NULL); timeval_diff(start, end, elapsed); return ((double)elapsed->tv_sec * 100.0 + (double)elapsed->tv_usec) / 1000; } void test_select(int msec) { double min = DBL_MAX; double max = DBL_MIN; double sum = 0.0; int i; for (i = 0; i < 1000; ++i) { double elapsed = time_select(msec); if (elapsed < min) min = elapsed; if (elapsed > max) max = elapsed; sum += elapsed; } printf("select(%d ms) min %g ms, max %g ms, avg %g ms\n", msec, min, max, sum / 1000); } int main(int ac, char **av) { int msec = av[1] ? atoi(av[1]) : 1; test_select(msec); return EXIT_SUCCESS; } --- --- poll.c #include #include #include #include #include void timeval_diff(struct timeval *start, struct timeval *end, struct timeval *diff) { diff->tv_sec = end->tv_sec - start->tv_sec; if (end->tv_usec < start->tv_usec) diff->tv_usec = 100 + end->tv_usec - start->tv_usec, --diff->tv_sec; else diff->tv_usec = end->tv_usec - start->tv_usec; } double time_poll(int msec) { struct timeval start[1], end[1], elapsed[1]; gettimeofday(start, NULL); poll(NULL, 0, msec); gettimeofday(end, NULL); timeval_diff(start, end, elapsed); return ((double)elapsed->tv_sec * 100.0 + (double)elapsed->tv_usec) / 1000; } void test_poll(int msec) { double min = DBL_MAX; double max = DBL_MIN; double sum = 0.0; int i; for (i = 0; i < 1000; ++i) { double elapsed = time_poll(msec); if (elapsed < min) min = elapsed; if (elapsed > max) max = elapsed; sum += elapsed; } printf("poll(%d ms) min %g ms, max %g ms, avg %g ms\n", msec, min, max, sum / 1000); } int main(int ac, char **av) { int msec = (av[1]) ? atoi(av[1]) : 1; test_poll(msec); return EXIT_SUCCESS; } --- --- select-output - $ for i in 1 5 9 10 11 15 19 20 21 25 29 30 31 35 39 40 41 45 49 50 51 1000 do ./select $i done select(1 ms) min 5.624 ms, max 10.299 ms, avg 9.99298 ms select(5 ms) min 5.668 ms, max 10.357 ms, avg 9.99301 ms select(9 ms) min 5.635 ms, max 10.034 ms, avg 9.993 ms select(10 ms) min 5.683 ms, max 10.347 ms, avg 9.99306 ms select(11 ms) min 15.663 ms, max 20.627 ms, avg 19.993 ms select(15 ms) min 15.664 ms, max 20.331 ms, avg 19.993 ms select(19 ms) min 15.632 ms, max 20.04 ms, avg 19.993 ms select(20 ms) min 15.652 ms, max 20.029 ms, avg 19.993 ms select(21 ms) min 25.661 ms, max 30.299 ms, avg 29.993 ms select(25 ms) min 25.663 ms, max 30.085 ms, avg 29.993 ms sele
Re: Bugreport: Kernel 2.4.x crash
Hi! > > I have no experience with kernel debugging, but so far, I have found > > no log entry giving me a hint and the screen is blank after the crash > > Could you disable console blanking (setterm -blank 0). > > We really need a hint where it crashed. Over the easter weekend I took some time for testing. One ide channel does not work with dma enabled, which is bootup default. After about 30 seconds, the channel is switched to pio and the machine running again. Funny though: Before, I could not return from console blanking or reach the machine through network. But as for any production system, I rather keep it running than spend downtime seeking the error. Thank you all. Jörn -- Jörn Engel mailto: [EMAIL PROTECTED] http://wohnheim.fh-wedel.de/~joern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugreport: Kernel 2.4.x crash
> 2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within > 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other no problem with ext2 on hpt366 here. > Gnu C 2.95.3 hmm. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugreport: Kernel 2.4.x crash
On Tue, 3 Apr 2001, [iso-8859-1] Jörn Engel wrote: I don't necessarily believe its the hpt366, as you do. See below: (note: I've also had it running on a stock 2.4.2 kernel for a while) > 00:08.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 > (rev 01) > Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- > Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- > SERR- Latency: 248 (2000ns min, 2000ns max), cache line size 08 > Interrupt: pin A routed to IRQ 11 > Region 0: I/O ports at 6100 > Region 1: I/O ports at 6200 > Region 4: I/O ports at 6300 > > 00:08.1 Unknown mass storage controller: Triones Technologies, Inc. HPT366 > (rev 01) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- > Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- > SERR- Latency: 248 (2000ns min, 2000ns max), cache line size 08 > Interrupt: pin A routed to IRQ 11 > Region 0: I/O ports at 6400 > Region 1: I/O ports at 6500 > Region 4: I/O ports at 6600 00:13.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bugreport: Kernel 2.4.x crash
1. Kernel crash w/out error message or logfile entry 2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other exotic hardware. Another possibility might be Reiserfs, which I use for all partitions except /. I have no experience with kernel debugging, but so far, I have found no log entry giving me a hint and the screen is blank after the crash. There might have been some output before, but the machine is in the basement and too important for excessive testing. I have tried 2.4.2 and 2.4.3 once each. 3. ide, hpt366 4. 2.4.2, 2.4.3 5. - 6. - 7. All this information is taken from the running 2.2.18 Kernel. 7.1. sh /usr/src/linux/scripts/ver_linux -- Versions installed: (if some fields are empty or look -- unusual then possibly you have very old versions) Linux belfast 2.2.18 #1 Fri Feb 23 14:47:14 CET 2001 i586 unknown Kernel modules 2.4.2 Gnu C 2.95.3 Gnu Make 3.79.1 Binutils 2.11.90.0.1 Linux C Library2.2.2 Dynamic linker ldd (GNU libc) 2.2.2 Procps 2.0.7 Mount 2.11b Net-tools 2.05 Console-tools 0.2.3 Sh-utils 2.0.11 Modules Loaded sb uart401 sound soundcore 7.2. cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 4 model name : Pentium MMX stepping: 3 cpu MHz : 200.459 fdiv_bug: no hlt_bug : no sep_bug : no f00f_bug: yes coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 mmx bogomips: 399.76 7.3 cat /var/log/ksymoops/20010401164317.modules (2.4.3) sb 2128 0 (unused) sb_lib 33936 0 [sb] uart401 6352 0 [sb_lib] sound 56400 0 [sb_lib uart401] soundcore 3792 5 [sb_lib sound] raid1 12784 0 (unused) raid0 3520 0 (unused) md 41056 0 [raid1 raid0] 7.4. cat /proc/ioports -001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 01f0-01f7 : ide0 0220-022f : soundblaster 02f8-02ff : serial(set) 0330-0333 : MPU-401 UART 03c0-03df : vga+ 03e8-03ef : serial(auto) 03f6-03f6 : ide0 03f8-03ff : serial(set) 6100-6107 : ide2 6202-6202 : ide2 6400-6407 : ide3 6502-6502 : ide3 6700-677f : eth0 f000-f007 : ide0 f008-f00f : ide1 7.5 lspci -vvv 00:00.0 Host bridge: Intel Corporation 430HX - 82439HX TXC [Triton II] (rev 03) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/