Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity

2021-03-07 Thread Dmitry Vyukov
On Sun, Mar 7, 2021 at 11:09 AM Hillf Danton  wrote:
>
> On Sun, 7 Mar 2021 08:46:19 +0100  Dmitry Vyukov wrote:
> > On Sun, Mar 7, 2021 at 3:15 AM Hillf Danton  wrote:
> > >
> > > Dmitry can you shed some light on the tricks to config kasan to print
> > > Call Trace as the reports with the leading [syzbot] on the subject line 
> > > do?
> >
> > +kasan-dev
> >
> > Hi Hillf,
> >
> > KASAN prints stack traces always unconditionally. There is nothing you
> > need to do at all.
>
> Got it, thanks.
>
> > Do you have any reports w/o stack traces?
>
> No, but I saw different formats in Call Trace prints.
>
> Below from [1] is the instance without file name and line number printed,
> while both info help spot the cause of the reported issue.


KASAN always prints stack traces w/o file:line info, like any other
kernel bug detection facility. Kernel itself never symbolizes reports.
In case of syzkaller, syzkaller will symbolize reports and add
file:line info. The main config it requires is CONFIG_DEBUG_INFO.

You may see syzkaller kernel configuration guide here:
https://github.com/google/syzkaller/blob/master/docs/linux/kernel_configs.md

Or fragments that are actually used to generate syzbot configs in this
dir (the guide above may be out-of-date):
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/base.yml
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/debug.yml
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/kasan.yml

Or a complete syzbot config here:
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-apparmor-kasan.config


> >
>
> I was running syzkaller and I found the following issue :
>
> Head Commit : b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ( v5.10.4 )
> Git Tree : stable
> Console Output :
> [  242.769080] INFO: task repro:2639 blocked for more than 120 seconds.
> [  242.769096]   Not tainted 5.10.4 #8
> [  242.769103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  242.769112] task:repro   state:D stack:0 pid: 2639
> ppid:  2638 flags:0x0004
> [  242.769126] Call Trace:
> [  242.769148]  __schedule+0x28d/0x7e0
> [  242.769162]  ? __percpu_counter_sum+0x75/0x90
> [  242.769175]  schedule+0x4f/0xc0
> [  242.769187]  __io_uring_task_cancel+0xad/0xf0
> [  242.769198]  ? wait_woken+0x80/0x80
> [  242.769210]  bprm_execve+0x67/0x8a0
> [  242.769223]  do_execveat_common+0x1d2/0x220
> [  242.769235]  __x64_sys_execveat+0x5d/0x70
> [  242.769249]  do_syscall_64+0x38/0x90
> [  242.769260]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [1] 
> https://lore.kernel.org/lkml/CAGyP=7cfm6bje7x2pn9yuptqgt5uqywm4avmoivayqpjg1p...@mail.gmail.com/


Re: [bugreport 5.9-rc8] general protection fault in __bfq_deactivate_entity

2021-03-06 Thread Dmitry Vyukov
On Sun, Mar 7, 2021 at 3:15 AM Hillf Danton  wrote:
>
> On Fri, 5 Mar 2021 18:01:04 +0800  Ming Lei wrote:
> > On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote:
> > > I'm thinking of a way to debug this too.  The symptom may hint at a
> > > use-after-free.  Could you enable KASAN in your tests?  (On the flip
> > > side, I know this might change timings, thereby making the fault
> > > disappear).
> >
> > I have asked our QE to reproduce the issue with debug kernel, which may 
> > take a
> > while. And I can't trigger it in my box.
> >
> > BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to:
> >
> > (gdb) l *(__bfq_deactivate_entity+0x5b)
> > 0x814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181).
> > 1176   * bfq_group_set_parent has already been invoked for the group
> > 1177   * represented by entity. Therefore, the field
> > 1178   * entity->sched_data has been set, and we can safely use it.
> > 1179   */
> > 1180  st = bfq_entity_service_tree(entity);
> > 1181  is_in_service = entity == sd->in_service_entity;
> > 1182
> > 1183  bfq_calc_finish(entity, entity->service);
> > 1184
> > 1185  if (is_in_service)
> >
> > Seems entity->sched_data points to NULL.
>
> Hi Ming,
>
> Thanks for your report.
>
> Given the invalid pointer cannot explain line 1180, you are reporting
> a different issue from what Mike reported, and we can do nothing now
> for both without a reproducer.
>
> Dmitry can you shed some light on the tricks to config kasan to print
> Call Trace as the reports with the leading [syzbot] on the subject line do?

+kasan-dev

Hi Hillf,

KASAN prints stack traces always unconditionally. There is nothing you
need to do at all. Do you have any reports w/o stack traces?

"[syzbot]" is prepend by syzbot code. If you want some prefix, you
would need to prepend it manually.



> > > Thanks,
> > > Paolo
> > >
> > > > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei  
> > > > ha scritto:
> > > >
> > > > Hello Hillf,
> > > >
> > > > Thanks for the debug patch.
> > > >
> > > > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton  wrote:
> > > >>
> > > >> On Thu, 4 Mar 2021 16:42:30 +0800  Ming Lei wrote:
> > > >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
> > > >>>  wrote:
> > > 
> > >  Paolo, Jens I am sorry for the noise.
> > >  But today I hit the kernel panic and git blame said that you have
> > >  created the file in which happened panic (this I saw from trace)
> > > 
> > >  $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> > >  /lib/debug/lib/modules/`uname -r`/vmlinux
> > >  __bfq_deactivate_entity+0x15a
> > >  __bfq_deactivate_entity+0x15a/0x240:
> > >  bfq_gt at block/bfq-wf2q.c:20
> > >  (inlined by) bfq_insert at block/bfq-wf2q.c:381
> > >  (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
> > >  (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203
> > > 
> > >  https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203
> > > 
> > >  $ head /sys/block/*/queue/scheduler
> > >  ==> /sys/block/nvme0n1/queue/scheduler <==
> > >  [none] mq-deadline kyber bfq
> > > 
> > >  ==> /sys/block/sda/queue/scheduler <==
> > >  mq-deadline kyber [bfq] none
> > > 
> > >  ==> /sys/block/zram0/queue/scheduler <==
> > >  none
> > > 
> > >  Trace:
> > >  general protection fault, probably for non-canonical address
> > >  0x46b1b0f0d8856e4a:  [#1] SMP NOPTI
> > >  CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW
> > >  - ---  5.9.0-0.rc8.28.fc34.x86_64 #1
> > >  Hardware name: System manufacturer System Product Name/ROG STRIX
> > >  X570-I GAMING, BIOS 2606 08/13/2020
> > >  Workqueue: kblockd blk_mq_run_work_fn
> > >  RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
> > >  Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
> > >  74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 
> > >  8b
> > >  48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
> > >  RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002
> > >  RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a
> > >  RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb
> > >  RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: 
> > >  R10: 0018 R11: 0018 R12: 8dc904927150
> > >  R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88
> > >  FS:  () GS:8dc90e0c() 
> > >  knlGS:
> > >  CS:  0010 DS:  ES:  CR0: 80050033
> > >  CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
> > >  Call Trace:
> > >  bfq_deactivate_entity+0x4f/0xc0
> > > >>>
> > > >>> Hello,
> > > >>>
> > > >>> The same stack trace was observed in RH internal tes

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-05 Thread Ming Lei
On Fri, Mar 05, 2021 at 10:32:04AM +0100, Paolo Valente wrote:
> I'm thinking of a way to debug this too.  The symptom may hint at a
> use-after-free.  Could you enable KASAN in your tests?  (On the flip
> side, I know this might change timings, thereby making the fault
> disappear).

I have asked our QE to reproduce the issue with debug kernel, which may take a
while. And I can't trigger it in my box.

BTW, for the 2nd 'kernel NULL pointer dereference', the RIP points to:

(gdb) l *(__bfq_deactivate_entity+0x5b)
0x814c31cb is in __bfq_deactivate_entity (block/bfq-wf2q.c:1181).
1176 * bfq_group_set_parent has already been invoked for the group
1177 * represented by entity. Therefore, the field
1178 * entity->sched_data has been set, and we can safely use it.
1179 */
1180st = bfq_entity_service_tree(entity);
1181is_in_service = entity == sd->in_service_entity;
1182
1183bfq_calc_finish(entity, entity->service);
1184
1185if (is_in_service)

Seems entity->sched_data points to NULL.


> 
> Thanks,
> Paolo
> 
> > Il giorno 5 mar 2021, alle ore 10:27, Ming Lei  ha 
> > scritto:
> > 
> > Hello Hillf,
> > 
> > Thanks for the debug patch.
> > 
> > On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton  wrote:
> >> 
> >> On Thu, 4 Mar 2021 16:42:30 +0800  Ming Lei wrote:
> >>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
> >>>  wrote:
>  
>  Paolo, Jens I am sorry for the noise.
>  But today I hit the kernel panic and git blame said that you have
>  created the file in which happened panic (this I saw from trace)
>  
>  $ /usr/src/kernels/`uname -r`/scripts/faddr2line
>  /lib/debug/lib/modules/`uname -r`/vmlinux
>  __bfq_deactivate_entity+0x15a
>  __bfq_deactivate_entity+0x15a/0x240:
>  bfq_gt at block/bfq-wf2q.c:20
>  (inlined by) bfq_insert at block/bfq-wf2q.c:381
>  (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
>  (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203
>  
>  https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203
>  
>  $ head /sys/block/*/queue/scheduler
>  ==> /sys/block/nvme0n1/queue/scheduler <==
>  [none] mq-deadline kyber bfq
>  
>  ==> /sys/block/sda/queue/scheduler <==
>  mq-deadline kyber [bfq] none
>  
>  ==> /sys/block/zram0/queue/scheduler <==
>  none
>  
>  Trace:
>  general protection fault, probably for non-canonical address
>  0x46b1b0f0d8856e4a:  [#1] SMP NOPTI
>  CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW
>  - ---  5.9.0-0.rc8.28.fc34.x86_64 #1
>  Hardware name: System manufacturer System Product Name/ROG STRIX
>  X570-I GAMING, BIOS 2606 08/13/2020
>  Workqueue: kblockd blk_mq_run_work_fn
>  RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
>  Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
>  74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b
>  48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
>  RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002
>  RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a
>  RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb
>  RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: 
>  R10: 0018 R11: 0018 R12: 8dc904927150
>  R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88
>  FS:  () GS:8dc90e0c() 
>  knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
>  Call Trace:
>  bfq_deactivate_entity+0x4f/0xc0
> >>> 
> >>> Hello,
> >>> 
> >>> The same stack trace was observed in RH internal test too, and kernel
> >>> is 5.11.0-0.rc6,
> >>> but there isn't reproducer yet.
> >>> 
> >>> 
> >>> --
> >>> Ming Lei
> >> 
> >> Add some debug info.
> >> 
> >> --- x/block/bfq-wf2q.c
> >> +++ y/block/bfq-wf2q.c
> >> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq
> >> 
> >>entity->on_st_or_in_serv = false;
> >>st->wsum -= entity->weight;
> >> -   if (bfqq && !is_in_service)
> >> +   if (bfqq && !is_in_service) {
> >> +   WARN_ON(entity->tree != NULL);
> >>bfq_put_queue(bfqq);
> >> +   }
> >> }
> >> 
> >> /**
> >> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct
> >> * bfqq gets freed here.
> >> */
> >>int ref = in_serv_bfqq->ref;
> >> +   WARN_ON(in_serv_entity->tree != NULL);
> >>bfq_put_queue(in_serv_bfqq);
> >>if (ref == 1)
> >>return true;
> > 
> > This kernel oops isn't easy to be reproduced, and  we have got another crash
> > report[1] too, still on __bfq_deactivate_entity(), 

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-05 Thread Paolo Valente
I'm thinking of a way to debug this too.  The symptom may hint at a
use-after-free.  Could you enable KASAN in your tests?  (On the flip
side, I know this might change timings, thereby making the fault
disappear).

Thanks,
Paolo

> Il giorno 5 mar 2021, alle ore 10:27, Ming Lei  ha 
> scritto:
> 
> Hello Hillf,
> 
> Thanks for the debug patch.
> 
> On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton  wrote:
>> 
>> On Thu, 4 Mar 2021 16:42:30 +0800  Ming Lei wrote:
>>> On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
>>>  wrote:
 
 Paolo, Jens I am sorry for the noise.
 But today I hit the kernel panic and git blame said that you have
 created the file in which happened panic (this I saw from trace)
 
 $ /usr/src/kernels/`uname -r`/scripts/faddr2line
 /lib/debug/lib/modules/`uname -r`/vmlinux
 __bfq_deactivate_entity+0x15a
 __bfq_deactivate_entity+0x15a/0x240:
 bfq_gt at block/bfq-wf2q.c:20
 (inlined by) bfq_insert at block/bfq-wf2q.c:381
 (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
 (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203
 
 https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203
 
 $ head /sys/block/*/queue/scheduler
 ==> /sys/block/nvme0n1/queue/scheduler <==
 [none] mq-deadline kyber bfq
 
 ==> /sys/block/sda/queue/scheduler <==
 mq-deadline kyber [bfq] none
 
 ==> /sys/block/zram0/queue/scheduler <==
 none
 
 Trace:
 general protection fault, probably for non-canonical address
 0x46b1b0f0d8856e4a:  [#1] SMP NOPTI
 CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW
 - ---  5.9.0-0.rc8.28.fc34.x86_64 #1
 Hardware name: System manufacturer System Product Name/ROG STRIX
 X570-I GAMING, BIOS 2606 08/13/2020
 Workqueue: kblockd blk_mq_run_work_fn
 RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
 Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b
 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
 RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002
 RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a
 RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb
 RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: 
 R10: 0018 R11: 0018 R12: 8dc904927150
 R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88
 FS:  () GS:8dc90e0c() 
 knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
 Call Trace:
 bfq_deactivate_entity+0x4f/0xc0
>>> 
>>> Hello,
>>> 
>>> The same stack trace was observed in RH internal test too, and kernel
>>> is 5.11.0-0.rc6,
>>> but there isn't reproducer yet.
>>> 
>>> 
>>> --
>>> Ming Lei
>> 
>> Add some debug info.
>> 
>> --- x/block/bfq-wf2q.c
>> +++ y/block/bfq-wf2q.c
>> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq
>> 
>>entity->on_st_or_in_serv = false;
>>st->wsum -= entity->weight;
>> -   if (bfqq && !is_in_service)
>> +   if (bfqq && !is_in_service) {
>> +   WARN_ON(entity->tree != NULL);
>>bfq_put_queue(bfqq);
>> +   }
>> }
>> 
>> /**
>> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct
>> * bfqq gets freed here.
>> */
>>int ref = in_serv_bfqq->ref;
>> +   WARN_ON(in_serv_entity->tree != NULL);
>>bfq_put_queue(in_serv_bfqq);
>>if (ref == 1)
>>return true;
> 
> This kernel oops isn't easy to be reproduced, and  we have got another crash
> report[1] too, still on __bfq_deactivate_entity(), and not easy to
> trigger.  Can your
> debug patch cover the report[1]? If not, feel free to add more debug messages,
> then I will try to reproduce the two.
> 
> [1] another kernel oops log on __bfq_deactivate_entity
> 
> [  899.790606] systemd-sysv-generator[25205]: SysV service
> '/etc/rc.d/init.d/anamon' lacks a native systemd unit file.
> Automatically generating a unit file for compatibility. Please update
> package to include a native systemd unit file, in order to make it
> more safe and robust.
> [  901.937047] BUG: kernel NULL pointer dereference, address: 
> [  901.944005] #PF: supervisor read access in kernel mode
> [  901.949143] #PF: error_code(0x) - not-present page
> [  901.954285] PGD 0 P4D 0
> [  901.956824] Oops:  [#1] SMP NOPTI
> [  901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G
>  IX - ---  5.11.0-1.el9.x86_64 #1
> [  901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS
> 2.5.4 01/13/2020
> [  901.978480] Workqueue: cgwb_release cgwb_release_workfn
> [  901.983705] RIP: 0010:__bfq_deactivate_e

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-05 Thread Ming Lei
Hello Hillf,

Thanks for the debug patch.

On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton  wrote:
>
> On Thu, 4 Mar 2021 16:42:30 +0800  Ming Lei wrote:
> > On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
> >  wrote:
> > >
> > > Paolo, Jens I am sorry for the noise.
> > > But today I hit the kernel panic and git blame said that you have
> > > created the file in which happened panic (this I saw from trace)
> > >
> > > $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> > > /lib/debug/lib/modules/`uname -r`/vmlinux
> > > __bfq_deactivate_entity+0x15a
> > > __bfq_deactivate_entity+0x15a/0x240:
> > > bfq_gt at block/bfq-wf2q.c:20
> > > (inlined by) bfq_insert at block/bfq-wf2q.c:381
> > > (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
> > > (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203
> > >
> > > https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203
> > >
> > > $ head /sys/block/*/queue/scheduler
> > > ==> /sys/block/nvme0n1/queue/scheduler <==
> > > [none] mq-deadline kyber bfq
> > >
> > > ==> /sys/block/sda/queue/scheduler <==
> > > mq-deadline kyber [bfq] none
> > >
> > > ==> /sys/block/zram0/queue/scheduler <==
> > > none
> > >
> > > Trace:
> > > general protection fault, probably for non-canonical address
> > > 0x46b1b0f0d8856e4a:  [#1] SMP NOPTI
> > > CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW
> > > - ---  5.9.0-0.rc8.28.fc34.x86_64 #1
> > > Hardware name: System manufacturer System Product Name/ROG STRIX
> > > X570-I GAMING, BIOS 2606 08/13/2020
> > > Workqueue: kblockd blk_mq_run_work_fn
> > > RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
> > > Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
> > > 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b
> > > 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
> > > RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002
> > > RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a
> > > RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb
> > > RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: 
> > > R10: 0018 R11: 0018 R12: 8dc904927150
> > > R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88
> > > FS:  () GS:8dc90e0c() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
> > > Call Trace:
> > >  bfq_deactivate_entity+0x4f/0xc0
> >
> > Hello,
> >
> > The same stack trace was observed in RH internal test too, and kernel
> > is 5.11.0-0.rc6,
> > but there isn't reproducer yet.
> >
> >
> > --
> > Ming Lei
>
> Add some debug info.
>
> --- x/block/bfq-wf2q.c
> +++ y/block/bfq-wf2q.c
> @@ -647,8 +647,10 @@ static void bfq_forget_entity(struct bfq
>
> entity->on_st_or_in_serv = false;
> st->wsum -= entity->weight;
> -   if (bfqq && !is_in_service)
> +   if (bfqq && !is_in_service) {
> +   WARN_ON(entity->tree != NULL);
> bfq_put_queue(bfqq);
> +   }
>  }
>
>  /**
> @@ -1631,6 +1633,7 @@ bool __bfq_bfqd_reset_in_service(struct
>  * bfqq gets freed here.
>  */
> int ref = in_serv_bfqq->ref;
> +   WARN_ON(in_serv_entity->tree != NULL);
> bfq_put_queue(in_serv_bfqq);
> if (ref == 1)
> return true;

This kernel oops isn't easy to be reproduced, and  we have got another crash
report[1] too, still on __bfq_deactivate_entity(), and not easy to
trigger.  Can your
debug patch cover the report[1]? If not, feel free to add more debug messages,
then I will try to reproduce the two.

[1] another kernel oops log on __bfq_deactivate_entity

[  899.790606] systemd-sysv-generator[25205]: SysV service
'/etc/rc.d/init.d/anamon' lacks a native systemd unit file.
Automatically generating a unit file for compatibility. Please update
package to include a native systemd unit file, in order to make it
more safe and robust.
[  901.937047] BUG: kernel NULL pointer dereference, address: 
[  901.944005] #PF: supervisor read access in kernel mode
[  901.949143] #PF: error_code(0x) - not-present page
[  901.954285] PGD 0 P4D 0
[  901.956824] Oops:  [#1] SMP NOPTI
[  901.960490] CPU: 13 PID: 22966 Comm: kworker/13:0 Tainted: G
  IX - ---  5.11.0-1.el9.x86_64 #1
[  901.970829] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS
2.5.4 01/13/2020
[  901.978480] Workqueue: cgwb_release cgwb_release_workfn
[  901.983705] RIP: 0010:__bfq_deactivate_entity+0x5b/0x240
[  901.989016] Code: b8 30 00 00 00 75 18 48 81 ff 88 00 00 00 74 0f
0f b7 47 8a 83 e8 01 48 8d 04 40 48 c1 e0 04 4c 8b 73 68 48 63 73 40
48 89 df <4d> 8b 3e 4d 8d 64 06 10 e8 48 f0 ff ff 49 39 df 0f 84 87 01
00 00
[  902.007763] RSP: 0018:b77107f0bd98 EFLAGS: 00010002
[  902.012986] RAX: 002fffd0 RBX: 9

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-04 Thread Ming Lei
On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
 wrote:
>
> Paolo, Jens I am sorry for the noise.
> But today I hit the kernel panic and git blame said that you have
> created the file in which happened panic (this I saw from trace)
>
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname -r`/vmlinux
> __bfq_deactivate_entity+0x15a
> __bfq_deactivate_entity+0x15a/0x240:
> bfq_gt at block/bfq-wf2q.c:20
> (inlined by) bfq_insert at block/bfq-wf2q.c:381
> (inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
> (inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203
>
> https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203
>
> $ head /sys/block/*/queue/scheduler
> ==> /sys/block/nvme0n1/queue/scheduler <==
> [none] mq-deadline kyber bfq
>
> ==> /sys/block/sda/queue/scheduler <==
> mq-deadline kyber [bfq] none
>
> ==> /sys/block/zram0/queue/scheduler <==
> none
>
> Trace:
> general protection fault, probably for non-canonical address
> 0x46b1b0f0d8856e4a:  [#1] SMP NOPTI
> CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW
> - ---  5.9.0-0.rc8.28.fc34.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 2606 08/13/2020
> Workqueue: kblockd blk_mq_run_work_fn
> RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
> Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
> 74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b
> 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
> RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002
> RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a
> RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb
> RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: 
> R10: 0018 R11: 0018 R12: 8dc904927150
> R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88
> FS:  () GS:8dc90e0c() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
> Call Trace:
>  bfq_deactivate_entity+0x4f/0xc0

Hello,

The same stack trace was observed in RH internal test too, and kernel
is 5.11.0-0.rc6,
but there isn't reproducer yet.


-- 
Ming Lei


Re: [bugreport] [5.10-rc1] Oops: 0000 [#1] SMP NOPTI bug which always starts as page allocation failure

2020-11-04 Thread Alex Deucher
On Tue, Nov 3, 2020 at 4:05 PM Mikhail Gavrilov
 wrote:
>
> Hi folks.
> I observed hard reproductible the set of bugs.
> It always started as
> 1) kworker/u64:2: page allocation failure: order:5,
> mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
> nodemask=(null),cpuset=/,mems_allowed=0
> Continious as:
> 2) WARNING: CPU: 21 PID: 806649 at
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7505
> amdgpu_dm_atomic_commit_tail+0x23bd/0x24e0 [amdgpu]
> And ended as:
> 3) BUG: unable to handle page fault for address: 00012488
> Which annoing because lead to completely computer hang.

Possibly fixed with this patch?
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-5.10&id=0689dcf3e4d6b89cc2087139561dc12b60461dca

Alex


>
> Example of one log:
>
> [11561.927250] kworker/u64:10: page allocation failure: order:5,
> mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
> nodemask=(null),cpuset=/,mems_allowed=0
> [11561.927472] CPU: 18 PID: 39985 Comm: kworker/u64:10 Not tainted
> 5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1
> [11561.927475] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020
> [11561.927485] Workqueue: events_unbound commit_work [drm_kms_helper]
> [11561.927489] Call Trace:
> [11561.927496]  dump_stack+0x8b/0xb0
> [11561.927501]  warn_alloc.cold+0x75/0xd9
> [11561.927507]  ? _cond_resched+0x16/0x50
> [11561.927512]  ? __alloc_pages_direct_compact+0x159/0x180
> [11561.927518]  __alloc_pages_slowpath.constprop.0+0x103f/0x1070
> [11561.927531]  __alloc_pages_nodemask+0x37d/0x400
> [11561.927538]  kmalloc_order+0x33/0xc0
> [11561.927542]  kmalloc_order_trace+0x19/0x110
> [11561.927614]  dc_create_state+0x26/0x60 [amdgpu]
> [11561.927677]  amdgpu_dm_atomic_commit_tail+0x1cee/0x24e0 [amdgpu]
> [11561.927686]  ? find_busiest_group+0x33/0x350
> [11561.927698]  ? __lock_acquire+0x3b0/0x21f0
> [11561.927707]  ? lock_acquire+0xc8/0x400
> [11561.927710]  ? wait_for_completion_timeout+0x3b/0xf0
> [11561.927715]  ? mark_held_locks+0x50/0x80
> [11561.927719]  ? lockdep_hardirqs_on_prepare+0xff/0x180
> [11561.927722]  ? _raw_spin_unlock_irq+0x24/0x40
> [11561.927726]  ? _raw_spin_unlock_irq+0x24/0x40
> [11561.927729]  ? wait_for_completion_timeout+0xdb/0xf0
> [11561.927740]  commit_tail+0x94/0x130 [drm_kms_helper]
> [11561.927745]  process_one_work+0x27d/0x5b0
> [11561.927753]  worker_thread+0x55/0x3c0
> [11561.927756]  ? process_one_work+0x5b0/0x5b0
> [11561.927760]  kthread+0x13a/0x150
> [11561.927763]  ? __kthread_bind_mask+0x60/0x60
> [11561.927769]  ret_from_fork+0x22/0x30
> [11561.927809] Mem-Info:
> [11561.927816] active_anon:933848 inactive_anon:4558268 isolated_anon:118
> active_file:154021 inactive_file:80446 isolated_file:0
> unevictable:1586 dirty:32469 writeback:700
> slab_reclaimable:185330 slab_unreclaimable:176202
> mapped:514440 shmem:592199 pagetables:81732 bounce:0
> free:99082 free_pcp:2104 free_cma:0
> [11561.927820] Node 0 active_anon:3735392kB inactive_anon:18233072kB
> active_file:616084kB inactive_file:321784kB unevictable:6344kB
> isolated(anon):472kB isolated(file):0kB mapped:2057760kB
> dirty:129876kB writeback:2800kB shmem:2368796kB shmem_thp: 0kB
> shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:8kB
> kernel_stack:96608kB all_unreclaimable? no
> [11561.927824] Node 0 DMA free:11800kB min:32kB low:44kB high:56kB
> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:15992kB managed:15900kB mlocked:0kB pagetables:0kB bounce:0kB
> free_pcp:0kB local_pcp:0kB free_cma:0kB
> [11561.927829] lowmem_reserve[]: 0 3136 31809 31809 31809
> [11561.927839] Node 0 DMA32 free:142632kB min:26264kB low:29472kB
> high:32680kB reserved_highatomic:0KB active_anon:131568kB
> inactive_anon:1625184kB active_file:57556kB inactive_file:13532kB
> unevictable:0kB writepending:2428kB present:3317760kB
> managed:3317572kB mlocked:0kB pagetables:25624kB bounce:0kB
> free_pcp:1764kB local_pcp:0kB free_cma:0kB
> [11561.927844] lowmem_reserve[]: 0 0 28673 28673 28673
> [11561.927854] Node 0 Normal free:241896kB min:240300kB low:269660kB
> high:299020kB reserved_highatomic:2048KB active_anon:3603472kB
> inactive_anon:16607812kB active_file:558660kB inactive_file:308056kB
> unevictable:6344kB writepending:130596kB present:30133248kB
> managed:29370624kB mlocked:6344kB pagetables:301304kB bounce:0kB
> free_pcp:6656kB local_pcp:60kB free_cma:0kB
> [11561.927859] lowmem_reserve[]: 0 0 0 0 0
> [11561.927871] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB
> (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB
> (M) = 11800kB
> [11561.927900] Node 0 DMA32: 15432*4kB (UME) 4963*8kB (UME) 2169*16kB
> (UME) 201*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
> 0*4096kB = 142568kB
> [11561.927923] Node 0 Normal: 49027*4kB (UMEH) 5656*8kB (MH) 20*1

[bugreport] [5.10-rc1] Oops: 0000 [#1] SMP NOPTI bug which always starts as page allocation failure

2020-11-03 Thread Mikhail Gavrilov
Hi folks.
I observed hard reproductible the set of bugs.
It always started as
1) kworker/u64:2: page allocation failure: order:5,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
Continious as:
2) WARNING: CPU: 21 PID: 806649 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7505
amdgpu_dm_atomic_commit_tail+0x23bd/0x24e0 [amdgpu]
And ended as:
3) BUG: unable to handle page fault for address: 00012488
Which annoing because lead to completely computer hang.

Example of one log:

[11561.927250] kworker/u64:10: page allocation failure: order:5,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
[11561.927472] CPU: 18 PID: 39985 Comm: kworker/u64:10 Not tainted
5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1
[11561.927475] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020
[11561.927485] Workqueue: events_unbound commit_work [drm_kms_helper]
[11561.927489] Call Trace:
[11561.927496]  dump_stack+0x8b/0xb0
[11561.927501]  warn_alloc.cold+0x75/0xd9
[11561.927507]  ? _cond_resched+0x16/0x50
[11561.927512]  ? __alloc_pages_direct_compact+0x159/0x180
[11561.927518]  __alloc_pages_slowpath.constprop.0+0x103f/0x1070
[11561.927531]  __alloc_pages_nodemask+0x37d/0x400
[11561.927538]  kmalloc_order+0x33/0xc0
[11561.927542]  kmalloc_order_trace+0x19/0x110
[11561.927614]  dc_create_state+0x26/0x60 [amdgpu]
[11561.927677]  amdgpu_dm_atomic_commit_tail+0x1cee/0x24e0 [amdgpu]
[11561.927686]  ? find_busiest_group+0x33/0x350
[11561.927698]  ? __lock_acquire+0x3b0/0x21f0
[11561.927707]  ? lock_acquire+0xc8/0x400
[11561.927710]  ? wait_for_completion_timeout+0x3b/0xf0
[11561.927715]  ? mark_held_locks+0x50/0x80
[11561.927719]  ? lockdep_hardirqs_on_prepare+0xff/0x180
[11561.927722]  ? _raw_spin_unlock_irq+0x24/0x40
[11561.927726]  ? _raw_spin_unlock_irq+0x24/0x40
[11561.927729]  ? wait_for_completion_timeout+0xdb/0xf0
[11561.927740]  commit_tail+0x94/0x130 [drm_kms_helper]
[11561.927745]  process_one_work+0x27d/0x5b0
[11561.927753]  worker_thread+0x55/0x3c0
[11561.927756]  ? process_one_work+0x5b0/0x5b0
[11561.927760]  kthread+0x13a/0x150
[11561.927763]  ? __kthread_bind_mask+0x60/0x60
[11561.927769]  ret_from_fork+0x22/0x30
[11561.927809] Mem-Info:
[11561.927816] active_anon:933848 inactive_anon:4558268 isolated_anon:118
active_file:154021 inactive_file:80446 isolated_file:0
unevictable:1586 dirty:32469 writeback:700
slab_reclaimable:185330 slab_unreclaimable:176202
mapped:514440 shmem:592199 pagetables:81732 bounce:0
free:99082 free_pcp:2104 free_cma:0
[11561.927820] Node 0 active_anon:3735392kB inactive_anon:18233072kB
active_file:616084kB inactive_file:321784kB unevictable:6344kB
isolated(anon):472kB isolated(file):0kB mapped:2057760kB
dirty:129876kB writeback:2800kB shmem:2368796kB shmem_thp: 0kB
shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:8kB
kernel_stack:96608kB all_unreclaimable? no
[11561.927824] Node 0 DMA free:11800kB min:32kB low:44kB high:56kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:15992kB managed:15900kB mlocked:0kB pagetables:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB
[11561.927829] lowmem_reserve[]: 0 3136 31809 31809 31809
[11561.927839] Node 0 DMA32 free:142632kB min:26264kB low:29472kB
high:32680kB reserved_highatomic:0KB active_anon:131568kB
inactive_anon:1625184kB active_file:57556kB inactive_file:13532kB
unevictable:0kB writepending:2428kB present:3317760kB
managed:3317572kB mlocked:0kB pagetables:25624kB bounce:0kB
free_pcp:1764kB local_pcp:0kB free_cma:0kB
[11561.927844] lowmem_reserve[]: 0 0 28673 28673 28673
[11561.927854] Node 0 Normal free:241896kB min:240300kB low:269660kB
high:299020kB reserved_highatomic:2048KB active_anon:3603472kB
inactive_anon:16607812kB active_file:558660kB inactive_file:308056kB
unevictable:6344kB writepending:130596kB present:30133248kB
managed:29370624kB mlocked:6344kB pagetables:301304kB bounce:0kB
free_pcp:6656kB local_pcp:60kB free_cma:0kB
[11561.927859] lowmem_reserve[]: 0 0 0 0 0
[11561.927871] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB
(U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB
(M) = 11800kB
[11561.927900] Node 0 DMA32: 15432*4kB (UME) 4963*8kB (UME) 2169*16kB
(UME) 201*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 142568kB
[11561.927923] Node 0 Normal: 49027*4kB (UMEH) 5656*8kB (MH) 20*16kB
(H) 10*32kB (H) 2*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 242380kB
[11561.927951] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[11561.927954] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[11561.927956] 847580 total pagecache pages
[11561.927967] 19862 pages in swap cache
[11561.927970

[bugreport] [5.10] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww) We 'forgot' to unlock everything else first?

2020-10-17 Thread Mikhail Gavrilov
Hi folks.
I observed this issue since 5.3 and it still happens with 5.10 git.
This warning has reproductivity 100% reliable when I launch
"Wolfenstein: Youngblood" version of Mesa doesn't matter.

[73690.883948] [ cut here ]
[73690.883953] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww)
[73690.883963] WARNING: CPU: 30 PID: 194867 at
kernel/locking/mutex.c:327 __ww_mutex_lock.constprop.0+0xe96/0xef0
[73690.883966] Modules linked in: tun snd_seq_dummy snd_hrtimer uinput
rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set
nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac
bnep sunrpc vfat fat snd_hda_codec_realtek mt76x2u mt76x2_common
snd_hda_codec_generic mt76x02_usb ledtrig_audio snd_hda_codec_hdmi
mt76_usb mt76x02_lib snd_hda_intel uvcvideo iwlmvm snd_intel_dspcfg
mt76 gspca_zc3xx snd_hda_codec gspca_main joydev videobuf2_vmalloc
snd_usb_audio btusb edac_mce_amd videobuf2_memops snd_hda_core
videobuf2_v4l2 snd_usbmidi_lib kvm_amd btrtl videobuf2_common btbcm
snd_hwdep
[73690.884036]  snd_rawmidi mac80211 btintel snd_seq videodev
snd_seq_device eeepc_wmi libarc4 bluetooth kvm xpad ff_memless snd_pcm
mc iwlwifi asus_wmi irqbypass sparse_keymap ecdh_generic rapl ecc
sp5100_tco video wmi_bmof snd_timer pcspkr snd k10temp i2c_piix4
soundcore cfg80211 rfkill acpi_cpufreq binfmt_misc zram ip_tables
hid_logitech_hidpp hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm
drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel cec drm
ghash_clmulni_intel ccp igb nvme dca i2c_algo_bit nvme_core wmi
pinctrl_amd fuse
[73690.884094] CPU: 30 PID: 194867 Comm: Youngblood_x64v Not tainted
5.10.0-0.rc0.20201014gitb5fc7a89e58b.42.fc34.x86_64 #1
[73690.884097] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020
[73690.884100] RIP: 0010:__ww_mutex_lock.constprop.0+0xe96/0xef0
[73690.884103] Code: f2 89 5b 9e 48 c7 c7 1d bb 59 9e e8 ef f6 f8 ff
0f 0b e9 2a fc ff ff 48 c7 c6 d4 89 5b 9e 48 c7 c7 1d bb 59 9e e8 d5
f6 f8 ff <0f> 0b e9 e9 fe ff ff 83 3d 44 3d 81 02 00 75 07 48 83 7d 28
00 75
[73690.884106] RSP: 0018:a1c5d079f8f0 EFLAGS: 00010286
[73690.884108] RAX: 0032 RBX: 0001 RCX: 8c650a7db178
[73690.884111] RDX: ffd8 RSI: 0027 RDI: 8c650a7db170
[73690.884112] RBP: a1c5d079fc38 R08:  R09: 
[73690.884114] R10: a1c5d079f720 R11: 8c652e2fffe8 R12: 8c600cd42990
[73690.884116] R13: 8c5f055f R14: 8c600cd42a00 R15: 
[73690.884119] FS:  060e3640() GS:8c650a60()
knlGS:00013ffc
[73690.884121] CS:  0010 DS:  ES:  CR0: 80050033
[73690.884122] CR2: 7fe25431d010 CR3: 00011916e000 CR4: 00350ee0
[73690.884124] Call Trace:
[73690.884136]  ? ttm_mem_evict_first+0x212/0x4f0 [ttm]
[73690.884139]  ? __schedule+0x345/0xa80
[73690.884144]  ww_mutex_lock_interruptible+0x43/0xb0
[73690.884149]  ttm_mem_evict_first+0x212/0x4f0 [ttm]
[73690.884157]  ttm_bo_mem_space+0x30f/0x340 [ttm]
[73690.884164]  ttm_bo_validate+0x12b/0x1d0 [ttm]
[73690.884169]  ? sched_clock+0x5/0x10
[73690.884261]  amdgpu_cs_bo_validate+0x8b/0x190 [amdgpu]
[73690.884350]  amdgpu_cs_list_validate+0x10e/0x150 [amdgpu]
[73690.884435]  amdgpu_cs_ioctl+0x7f4/0x1ed0 [amdgpu]
[73690.884531]  ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu]
[73690.884550]  drm_ioctl_kernel+0x8c/0xe0 [drm]
[73690.884563]  drm_ioctl+0x20f/0x3a0 [drm]
[73690.884623]  ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu]
[73690.884625]  ? sched_clock+0x5/0x10
[73690.884628]  ? sched_clock_cpu+0xc/0xb0
[73690.884631]  ? lockdep_hardirqs_on_prepare+0xff/0x180
[73690.884632]  ? _raw_spin_unlock_irqrestore+0x41/0x50
[73690.884684]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[73690.884688]  __x64_sys_ioctl+0x83/0xb0
[73690.884691]  do_syscall_64+0x33/0x40
[73690.884693]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[73690.884695] RIP: 0033:0x7fe3209e64cb
[73690.884697] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c
ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d b9 0c 00 f7 d8 64 89
01 48
[73690.884699] RSP: 002b:060db248 EFLAGS: 0246 ORIG_RAX:
0010
[73690.884701] RAX: ffda RBX: 060db2d0 RCX: 7fe3209e64cb
[73690.884702] RDX: 060db2d0 RSI: c0186444 RDI: 00d4
[73690.884703] RBP: c0186444 R08: 7fe1bd653780 R09: 060db290
[73690.884705] R10:  R11: 0246 R12: 7fe17d

Re: [bugreport] [5.10] warning at net/netfilter/nf_tables_api.c:622

2020-10-16 Thread Mikhail Gavrilov
On Fri, 16 Oct 2020 at 12:11, Mikhail Gavrilov
 wrote:
>
> Hi folks,
> today I joined to testing Kernel 5.10 and see that every boot happens
> this warning:
>
> [   22.180180] [ cut here ]
> [   22.180193] WARNING: CPU: 28 PID: 1205 at
> net/netfilter/nf_tables_api.c:622 nft_chain_parse_hook+0x224/0x330
> [nf_tables]
> [   22.180194] Modules linked in: nf_tables nfnetlink ip6table_filter
> ip6_tables iptable_filter cmac bnep sunrpc vfat fat
> snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
> snd_hda_codec_hdmi mt76x2u mt76x2_common mt76x02_usb iwlmvm mt76_usb
> uvcvideo snd_hda_intel mt76x02_lib gspca_zc3xx snd_intel_dspcfg btusb
> gspca_main videobuf2_vmalloc btrtl mt76 edac_mce_amd snd_hda_codec
> btbcm videobuf2_memops btintel kvm_amd snd_usb_audio videobuf2_v4l2
> snd_hda_core mac80211 kvm bluetooth snd_usbmidi_lib joydev
> videobuf2_common iwlwifi snd_seq xpad snd_hwdep ff_memless videodev
> snd_rawmidi snd_seq_device libarc4 eeepc_wmi snd_pcm ecdh_generic
> irqbypass asus_wmi mc rapl sparse_keymap ecc snd_timer sp5100_tco
> video cfg80211 wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore rfkill
> acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp
> hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper
> crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ccp
> ghash_clmulni_intel igb nvme dca nvme_core
> [   22.180273]  i2c_algo_bit wmi pinctrl_amd fuse
> [   22.180279] CPU: 28 PID: 1205 Comm: ebtables Not tainted
> 5.10.0-0.rc0.20201014gitb5fc7a89e58b.41.fc34.x86_64 #1
> [   22.180281] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020
> [   22.180289] RIP: 0010:nft_chain_parse_hook+0x224/0x330 [nf_tables]
> [   22.180292] Code: a0 14 00 00 be ff ff ff ff e8 68 82 e1 e4 85 c0
> 0f 85 21 fe ff ff 0f 0b bf 0a 00 00 00 e8 14 60 97 ff 84 c0 0f 84 1f
> fe ff ff <0f> 0b e9 18 fe ff ff 48 85 f6 74 61 4c 89 ef e8 78 d0 ff ff
> 48 89
> [   22.180294] RSP: 0018:a9850214f780 EFLAGS: 00010202
> [   22.180296] RAX: 0001 RBX: a9850214f810 RCX: 
> 
> [   22.180297] RDX: a9850214f810 RSI:  RDI: 
> c0851c20
> [   22.180299] RBP: 0007 R08: 0001 R09: 
> a9850214f847
> [   22.180300] R10:  R11: 0007 R12: 
> a9850214fa88
> [   22.180301] R13: a6fdfcc0 R14: a9850214fa88 R15: 
> 993c5c12c800
> [   22.180304] FS:  7ff92ed99540() GS:993c8a20()
> knlGS:
> [   22.180305] CS:  0010 DS:  ES:  CR0: 80050033
> [   22.180307] CR2: 7ff92ed1e000 CR3: 0007d3714000 CR4: 
> 00350ee0
> [   22.180308] Call Trace:
> [   22.180319]  ? __rhashtable_lookup+0x11d/0x210 [nf_tables]
> [   22.180329]  nf_tables_addchain.constprop.0+0xab/0x5e0 [nf_tables]
> [   22.180337]  ? nft_chain_lookup.part.0+0x12c/0x1e0 [nf_tables]
> [   22.180344]  ? get_order+0x20/0x20 [nf_tables]
> [   22.180350]  ? nft_chain_hash+0x30/0x30 [nf_tables]
> [   22.180356]  ? nft_dump_register+0x40/0x40 [nf_tables]
> [   22.180368]  nf_tables_newchain+0x54d/0x730 [nf_tables]
> [   22.180376]  nfnetlink_rcv_batch+0x2a4/0x950 [nfnetlink]
> [   22.180385]  ? lock_acquire+0x175/0x400
> [   22.180387]  ? lock_release+0x1e7/0x400
> [   22.180391]  ? cred_has_capability.isra.0+0x68/0x100
> [   22.180395]  ? __nla_validate_parse+0x4f/0x8d0
> [   22.180401]  nfnetlink_rcv+0x115/0x130 [nfnetlink]
> [   22.180407]  netlink_unicast+0x16d/0x230
> [   22.180426]  netlink_sendmsg+0x23f/0x460
> [   22.180431]  sock_sendmsg+0x5e/0x60
> [   22.180434]  sys_sendmsg+0x231/0x270
> [   22.180438]  ? import_iovec+0x17/0x20
> [   22.180440]  ? sendmsg_copy_msghdr+0x5c/0x80
> [   22.180444]  ___sys_sendmsg+0x75/0xb0
> [   22.180450]  ? cred_has_capability.isra.0+0x68/0x100
> [   22.180452]  ? lock_acquire+0x175/0x400
> [   22.180454]  ? lock_acquire+0x93/0x400
> [   22.180457]  ? lock_release+0x1e7/0x400
> [   22.180459]  ? lock_release+0x1e7/0x400
> [   22.180462]  ? trace_hardirqs_on+0x1b/0xe0
> [   22.180465]  ? sock_setsockopt+0xdf/0x1010
> [   22.180467]  ? __local_bh_enable_ip+0x82/0xd0
> [   22.180470]  ? sock_setsockopt+0xdf/0x1010
> [   22.180473]  __sys_sendmsg+0x49/0x80
> [   22.180480]  do_syscall_64+0x33/0x40
> [   22.180483]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   22.180486] RIP: 0033:0x7ff92efdb087
> [   22.180488] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7
> 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74
> 24 10
> [   22.180490] RSP: 002b:7fff54436b38 EFLAGS: 0246 ORIG_RAX:
> 002e
> [   22.180492] RAX: ffda RBX: 7fff54436b40 RCX: 
> 7ff92efdb087
> [   22.180494] RDX:  RSI: 7fff54437be0 RDI: 
> 0003
> [   22.180495] RBP: 7fff544381e0 R08: 0004 R09: 
> 55b281bcf1d0
> [   22.1804

[bugreport] [5.10] warning at net/netfilter/nf_tables_api.c:622

2020-10-16 Thread Mikhail Gavrilov
Hi folks,
today I joined to testing Kernel 5.10 and see that every boot happens
this warning:

[   22.180180] [ cut here ]
[   22.180193] WARNING: CPU: 28 PID: 1205 at
net/netfilter/nf_tables_api.c:622 nft_chain_parse_hook+0x224/0x330
[nf_tables]
[   22.180194] Modules linked in: nf_tables nfnetlink ip6table_filter
ip6_tables iptable_filter cmac bnep sunrpc vfat fat
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi mt76x2u mt76x2_common mt76x02_usb iwlmvm mt76_usb
uvcvideo snd_hda_intel mt76x02_lib gspca_zc3xx snd_intel_dspcfg btusb
gspca_main videobuf2_vmalloc btrtl mt76 edac_mce_amd snd_hda_codec
btbcm videobuf2_memops btintel kvm_amd snd_usb_audio videobuf2_v4l2
snd_hda_core mac80211 kvm bluetooth snd_usbmidi_lib joydev
videobuf2_common iwlwifi snd_seq xpad snd_hwdep ff_memless videodev
snd_rawmidi snd_seq_device libarc4 eeepc_wmi snd_pcm ecdh_generic
irqbypass asus_wmi mc rapl sparse_keymap ecc snd_timer sp5100_tco
video cfg80211 wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore rfkill
acpi_cpufreq binfmt_misc zram ip_tables hid_logitech_hidpp
hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm drm_kms_helper
crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ccp
ghash_clmulni_intel igb nvme dca nvme_core
[   22.180273]  i2c_algo_bit wmi pinctrl_amd fuse
[   22.180279] CPU: 28 PID: 1205 Comm: ebtables Not tainted
5.10.0-0.rc0.20201014gitb5fc7a89e58b.41.fc34.x86_64 #1
[   22.180281] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 2606 08/13/2020
[   22.180289] RIP: 0010:nft_chain_parse_hook+0x224/0x330 [nf_tables]
[   22.180292] Code: a0 14 00 00 be ff ff ff ff e8 68 82 e1 e4 85 c0
0f 85 21 fe ff ff 0f 0b bf 0a 00 00 00 e8 14 60 97 ff 84 c0 0f 84 1f
fe ff ff <0f> 0b e9 18 fe ff ff 48 85 f6 74 61 4c 89 ef e8 78 d0 ff ff
48 89
[   22.180294] RSP: 0018:a9850214f780 EFLAGS: 00010202
[   22.180296] RAX: 0001 RBX: a9850214f810 RCX: 
[   22.180297] RDX: a9850214f810 RSI:  RDI: c0851c20
[   22.180299] RBP: 0007 R08: 0001 R09: a9850214f847
[   22.180300] R10:  R11: 0007 R12: a9850214fa88
[   22.180301] R13: a6fdfcc0 R14: a9850214fa88 R15: 993c5c12c800
[   22.180304] FS:  7ff92ed99540() GS:993c8a20()
knlGS:
[   22.180305] CS:  0010 DS:  ES:  CR0: 80050033
[   22.180307] CR2: 7ff92ed1e000 CR3: 0007d3714000 CR4: 00350ee0
[   22.180308] Call Trace:
[   22.180319]  ? __rhashtable_lookup+0x11d/0x210 [nf_tables]
[   22.180329]  nf_tables_addchain.constprop.0+0xab/0x5e0 [nf_tables]
[   22.180337]  ? nft_chain_lookup.part.0+0x12c/0x1e0 [nf_tables]
[   22.180344]  ? get_order+0x20/0x20 [nf_tables]
[   22.180350]  ? nft_chain_hash+0x30/0x30 [nf_tables]
[   22.180356]  ? nft_dump_register+0x40/0x40 [nf_tables]
[   22.180368]  nf_tables_newchain+0x54d/0x730 [nf_tables]
[   22.180376]  nfnetlink_rcv_batch+0x2a4/0x950 [nfnetlink]
[   22.180385]  ? lock_acquire+0x175/0x400
[   22.180387]  ? lock_release+0x1e7/0x400
[   22.180391]  ? cred_has_capability.isra.0+0x68/0x100
[   22.180395]  ? __nla_validate_parse+0x4f/0x8d0
[   22.180401]  nfnetlink_rcv+0x115/0x130 [nfnetlink]
[   22.180407]  netlink_unicast+0x16d/0x230
[   22.180426]  netlink_sendmsg+0x23f/0x460
[   22.180431]  sock_sendmsg+0x5e/0x60
[   22.180434]  sys_sendmsg+0x231/0x270
[   22.180438]  ? import_iovec+0x17/0x20
[   22.180440]  ? sendmsg_copy_msghdr+0x5c/0x80
[   22.180444]  ___sys_sendmsg+0x75/0xb0
[   22.180450]  ? cred_has_capability.isra.0+0x68/0x100
[   22.180452]  ? lock_acquire+0x175/0x400
[   22.180454]  ? lock_acquire+0x93/0x400
[   22.180457]  ? lock_release+0x1e7/0x400
[   22.180459]  ? lock_release+0x1e7/0x400
[   22.180462]  ? trace_hardirqs_on+0x1b/0xe0
[   22.180465]  ? sock_setsockopt+0xdf/0x1010
[   22.180467]  ? __local_bh_enable_ip+0x82/0xd0
[   22.180470]  ? sock_setsockopt+0xdf/0x1010
[   22.180473]  __sys_sendmsg+0x49/0x80
[   22.180480]  do_syscall_64+0x33/0x40
[   22.180483]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   22.180486] RIP: 0033:0x7ff92efdb087
[   22.180488] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7
0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74
24 10
[   22.180490] RSP: 002b:7fff54436b38 EFLAGS: 0246 ORIG_RAX:
002e
[   22.180492] RAX: ffda RBX: 7fff54436b40 RCX: 7ff92efdb087
[   22.180494] RDX:  RSI: 7fff54437be0 RDI: 0003
[   22.180495] RBP: 7fff544381e0 R08: 0004 R09: 55b281bcf1d0
[   22.180496] R10: 7fff54437bcc R11: 0246 R12: 7000
[   22.180497] R13: 0001 R14: 7fff54436b50 R15: 7fff54438200
[   22.180503] irq event stamp: 0
[   22.180505] hardirqs last  enabled at (0): [<>] 0x0
[  

[bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2020-10-09 Thread Mikhail Gavrilov
Paolo, Jens I am sorry for the noise.
But today I hit the kernel panic and git blame said that you have
created the file in which happened panic (this I saw from trace)

$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname -r`/vmlinux
__bfq_deactivate_entity+0x15a
__bfq_deactivate_entity+0x15a/0x240:
bfq_gt at block/bfq-wf2q.c:20
(inlined by) bfq_insert at block/bfq-wf2q.c:381
(inlined by) bfq_idle_insert at block/bfq-wf2q.c:621
(inlined by) __bfq_deactivate_entity at block/bfq-wf2q.c:1203

https://github.com/torvalds/linux/blame/master/block/bfq-wf2q.c#L1203

$ head /sys/block/*/queue/scheduler
==> /sys/block/nvme0n1/queue/scheduler <==
[none] mq-deadline kyber bfq

==> /sys/block/sda/queue/scheduler <==
mq-deadline kyber [bfq] none

==> /sys/block/zram0/queue/scheduler <==
none

Trace:
general protection fault, probably for non-canonical address
0x46b1b0f0d8856e4a:  [#1] SMP NOPTI
CPU: 27 PID: 1018 Comm: kworker/27:1H Tainted: GW
- ---  5.9.0-0.rc8.28.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2606 08/13/2020
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:__bfq_deactivate_entity+0x15a/0x240
Code: 48 2b 41 28 48 85 c0 7e 05 49 89 5c 24 18 49 8b 44 24 08 4d 8d
74 24 08 48 85 c0 0f 84 d6 00 00 00 48 8b 7b 28 eb 03 48 89 c8 <48> 8b
48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
RSP: 0018:adf6c0c6fc00 EFLAGS: 00010002
RAX: 46b1b0f0d8856e4a RBX: 8dc2773b5c88 RCX: 46b1b0f0d8856e4a
RDX: 8dc7d02ed0a0 RSI: 8dc7d02ed0a8 RDI: 584e64e96beb
RBP: 8dc2773b5c00 R08: 8dc9054cb938 R09: 
R10: 0018 R11: 0018 R12: 8dc904927150
R13: 0001 R14: 8dc904927158 R15: 8dc2773b5c88
FS:  () GS:8dc90e0c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
Call Trace:
 bfq_deactivate_entity+0x4f/0xc0
 bfq_del_bfqq_busy+0xbf/0x170
 __bfq_bfqq_expire+0x95/0xc0
 bfq_bfqq_expire+0x3c5/0x9a0
 ? bfq_active_extract+0x8e/0x140
 bfq_dispatch_request+0x438/0x1070
 __blk_mq_do_dispatch_sched+0x1c7/0x290
 ? dequeue_entity+0xa4/0x420
 __blk_mq_sched_dispatch_requests+0x129/0x180
 blk_mq_sched_dispatch_requests+0x30/0x60
 __blk_mq_run_hw_queue+0x49/0x110
 process_one_work+0x1b4/0x370
 worker_thread+0x53/0x3e0
 ? process_one_work+0x370/0x370
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
Modules linked in: tun snd_seq_dummy snd_hrtimer uinput rfcomm
xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set
nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac
bnep sunrpc vfat fat mt76x2u snd_hda_codec_realtek mt76x2_common
mt76x02_usb snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
mt76_usb mt76x02_lib edac_mce_amd iwlmvm snd_hda_intel mt76
snd_intel_dspcfg kvm_amd mac80211 gspca_zc3xx snd_usb_audio
snd_hda_codec gspca_main uvcvideo btusb snd_usbmidi_lib iwlwifi
snd_hda_core videobuf2_vmalloc kvm videobuf2_memops btrtl snd_rawmidi
videobuf2_v4l2 snd_hwdep
 btbcm snd_seq btintel videobuf2_common eeepc_wmi irqbypass
snd_seq_device asus_wmi xpad bluetooth joydev sparse_keymap libarc4
rapl cfg80211 ff_memless snd_pcm videodev video pcspkr wmi_bmof
sp5100_tco snd_timer mc k10temp i2c_piix4 snd ecdh_generic ecc
soundcore rfkill acpi_cpufreq binfmt_misc zram ip_tables
hid_logitech_hidpp hid_logitech_dj amdgpu iommu_v2 gpu_sched ttm
drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm ccp
igb ghash_clmulni_intel nvme nvme_core dca i2c_algo_bit wmi
pinctrl_amd fuse
---[ end trace 09deb55d1b05f40c ]---


Full system log: https://pastebin.com/6cKHZzAi
Full kernel log: https://pastebin.com/316HjHit

Unfortunately, I did not know how reproduce this bug. I am not doing
anything unusual on the computer when it happened.
I could provide any useful info for further investigation.


--
Best Regards,
Mike Gavrilov.


Re: [Kgdb-bugreport] [PATCH] serial: qcom_geni_serial: Fix recent kdb hang

2020-08-13 Thread Daniel Thompson
On Tue, Aug 11, 2020 at 09:21:22AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Aug 11, 2020 at 4:54 AM Akash Asthana  wrote:
> >
> >
> > On 8/11/2020 2:56 AM, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Mon, Aug 10, 2020 at 5:32 AM Akash Asthana  
> > > wrote:
> > >> Hi Doug,
> > >>
> > >> On 8/7/2020 10:49 AM, Douglas Anderson wrote:
> > >>> The commit e42d6c3ec0c7 ("serial: qcom_geni_serial: Make kgdb work
> > >>> even if UART isn't console") worked pretty well and I've been doing a
> > >>> lot of debugging with it.  However, recently I typed "dmesg" in kdb
> > >>> and then held the space key down to scroll through the pagination.  My
> > >>> device hung.  This was repeatable and I found that it was introduced
> > >>> with the aforementioned commit.
> > >>>
> > >>> It turns out that there are some strange boundary cases in geni where
> > >>> in some weird situations it will signal RX_LAST but then will put 0 in
> > >>> RX_LAST_BYTE.  This means that the entire last FIFO entry is valid.
> > >> IMO that means we received a word in RX_FIFO and it is the last word
> > >> hence RX_LAST bit is set.
> > > What you say would make logical sense, but it's not how I have
> > > observed geni to work.  See below.
> > >
> > >
> > >> RX_LAST_BYTE is 0 means none of the bytes are valid in the last word.
> > > This would imply that qcom_geni_serial_handle_rx() is also broken
> > > though, wouldn't it?  Specifically imagine that WORD_CNT is 1 and
> > > RX_LAST is set and RX_LAST_BYTE_VALID is true.  Here's the logic from
> > > that function:
> > >
> > >total_bytes = BYTES_PER_FIFO_WORD * (word_cnt - 1);
> > >if (last_word_partial && last_word_byte_cnt)
> > >  total_bytes += last_word_byte_cnt;
> > >else
> > >  total_bytes += BYTES_PER_FIFO_WORD;
> > >port->handle_rx(uport, total_bytes, drop);
> > >
> > > As you can see that logic will set "total_bytes" to 4 in the case I'm
> > > talking about.
> >
> > Yeah IMO as per theory this should also be corrected but since you have
> > already pulled out few experiment to prove garbage data issue(which I
> > was suspecting) is not seen.
> >
> > It's already consistent with existing logic and it behaves well
> > practically . So the changes could be merge. Meanwhile I am checking
> > with HW team to get clarity.
> >
> > >
> > >
> > >> In such scenario we should just read RX_FIFO buffer (to empty it),
> > >> discard the word and return NO_POLL_CHAR. Something like below.
> > >>
> > >> -
> > >>
> > >>   else
> > >>   private_data->poll_cached_bytes_cnt = 4;
> > >>
> > >>   private_data->poll_cached_bytes =
> > >>   readl(uport->membase + SE_GENI_RX_FIFOn);
> > >>   }
> > >>
> > >> +if (!private_data->poll_cached_bytes_cnt)
> > >> +  return NO_POLL_CHAR;
> > >>   private_data->poll_cached_bytes_cnt--;
> > >>   ret = private_data->poll_cached_bytes & 0xff;
> > >> -
> > >>
> > >> Please let me know whether above code helps.
> > > Your code will avoid the hang.  Yes.  ...but it will drop bytes.  I
> > > devised a quick-n-dirty test.  Here's a test of your code:
> > I assumed those as invalid bytes and don't wanted to read them so yeah
> > dropping of bytes was expected.
> > >
> > > https://crrev.com/c/2346886
> > >
> > > ...and here's a test of my code:
> > >
> > > https://crrev.com/c/2346884
> > >
> > > I had to keep a buffer around since it's hard to debug the serial
> > > driver.  In both cases I put "DOUG" into the buffer when I detect this
> > > case.  If my theory about how geni worked was wrong then we should
> > > expect to see some garbage in the buffer right after the DOUG, right?
> > > ...but my code gets the alphabet in nice sequence.  Your code drops 4
> > > bytes.
> > Yeah I was expecting garbage data.
> > >
> > >
> > > NOTE: while poking around with the above two test patches I found it
> > > was pretty easy to get geni to drop bytes / hit overflow cases and
> > > also to insert bogus 0 bytes in the stream (I believe these are
> > > related).  I was able to reproduce this:
> > > * With ${SUBJECT} patch in place.
> > > * With your proposed patch.
> > > * With the recent "geni" patches reverted (in other words back to 1
> > > byte per FIFO entry).
> > >
> > > It's not terribly surprising that we're overflowing since I believe
> > > kgdb isn't too keen to read characters at the same time it's writing.
> > > That doesn't explain the weird 0-bytes that geni seemed to be
> > > inserting, but at least it would explain the overflows.  However, even
> > > after I fixed this I _still_ was getting problems.  Specifically geni
> > > 

Re: [5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]

2020-07-13 Thread Vasily Averin
On 7/13/20 11:02 AM, Mikhail Gavrilov wrote:
> On Mon, 13 Jul 2020 at 12:11, Mikhail Gavrilov
>  wrote:
>>
>> On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov
>>  wrote:
>>>
>>> Hi folks.
>>> While testing 5.8 RCs I founded that kernel log flooded by the message
>>> "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree
>>> insert+0xaf/0xc0 [fuse]" when I start podman container.
>>> In kernel 5.7 not has such a problem.
>>
>> Maxim, I suppose you leave `WARN_ON(!wpa->ia.ap.num_pages);` for debug 
>> purpose?
>> Now this line is often called when I start the container.
>>
> 
> That odd, but I can't send an email to the author of the commit.
> mpatlasov wasn't found at virtuozzo.com.

Reported problem is not fixed yet in 5.8-rc kernels
Please take look at
https://lkml.org/lkml/2020/7/13/265

Thank you,
Vasily Averin


Re: [5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]

2020-07-13 Thread Mikhail Gavrilov
On Mon, 13 Jul 2020 at 12:11, Mikhail Gavrilov
 wrote:
>
> On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov
>  wrote:
> >
> > Hi folks.
> > While testing 5.8 RCs I founded that kernel log flooded by the message
> > "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree
> > insert+0xaf/0xc0 [fuse]" when I start podman container.
> > In kernel 5.7 not has such a problem.
>
> Maxim, I suppose you leave `WARN_ON(!wpa->ia.ap.num_pages);` for debug 
> purpose?
> Now this line is often called when I start the container.
>

That odd, but I can't send an email to the author of the commit.
mpatlasov wasn't found at virtuozzo.com.

--
Best Regards,
Mike Gavrilov.


Re: [5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]

2020-07-13 Thread Mikhail Gavrilov
On Mon, 13 Jul 2020 at 03:28, Mikhail Gavrilov
 wrote:
>
> Hi folks.
> While testing 5.8 RCs I founded that kernel log flooded by the message
> "WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree
> insert+0xaf/0xc0 [fuse]" when I start podman container.
> In kernel 5.7 not has such a problem.

Maxim, I suppose you leave `WARN_ON(!wpa->ia.ap.num_pages);` for debug purpose?
Now this line is often called when I start the container.

--
Best Regards,
Mike Gavrilov.


[5.8RC4][bugreport]WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree_insert+0xaf/0xc0 [fuse]

2020-07-12 Thread Mikhail Gavrilov
Hi folks.
While testing 5.8 RCs I founded that kernel log flooded by the message
"WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684 tree
insert+0xaf/0xc0 [fuse]" when I start podman container.
In kernel 5.7 not has such a problem.

[92414.864536] [ cut here ]
[92414.864648] WARNING: CPU: 28 PID: 211236 at fs/fuse/file.c:1684
tree_insert+0xaf/0xc0 [fuse]
[92414.864652] Modules linked in: snd_seq_dummy snd_hrtimer uinput
rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
nf_conntrack_tftp tun bridge stp llc nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
iptable_filter cmac bnep sunrpc vfat fat snd_usb_audio snd_usbmidi_lib
snd_rawmidi hid_logitech_hidpp gspca_zc3xx gspca_main
videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2
videobuf2_common mt76x2u mt76x2_common videodev mt76x02_usb mt76_usb
mt76x02_lib xpad mc mt76 hid_logitech_dj ff_memless
snd_hda_codec_realtek snd_hda_codec_generic iwlmvm ledtrig_audio
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec
[92414.864697]  mac80211 snd_hda_core edac_mce_amd amd_energy
snd_hwdep btusb btrtl btbcm snd_seq kvm_amd libarc4 btintel
snd_seq_device bluetooth kvm snd_pcm iwlwifi eeepc_wmi asus_wmi
snd_timer ecdh_generic irqbypass ecc snd sparse_keymap rapl cfg80211
video wmi_bmof pcspkr soundcore sp5100_tco k10temp i2c_piix4 rfkill
acpi_cpufreq binfmt_misc ip_tables amdgpu iommu_v2 gpu_sched ttm
drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm igb
ghash_clmulni_intel ccp nvme dca xhci_pci nvme_core xhci_pci_renesas
i2c_algo_bit wmi pinctrl_amd fuse
[92414.864738] CPU: 28 PID: 211236 Comm: sed Not tainted
5.8.0-0.rc4.20200709git0bddd227f3dc.1.fc33.x86_64 #1
[92414.864742] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 1407 04/02/2020
[92414.864749] RIP: 0010:tree_insert+0xaf/0xc0 [fuse]
[92414.864753] Code: 80 c8 00 00 00 49 c7 80 d0 00 00 00 00 00 00 00
49 c7 80 d8 00 00 00 00 00 00 00 48 89 39 e9 78 35 5f d7 0f 0b eb a5
0f 0b c3 <0f> 0b e9 71 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
00 00
[92414.864757] RSP: 0018:b9b08b66f970 EFLAGS: 00010246
[92414.864761] RAX: 001c RBX: b9b08b66fac8 RCX: 8c6318c6318c6319
[92414.864765] RDX:  RSI:  RDI: 9beac944fce8
[92414.864768] RBP: eef599772a80 R08: 9bee81360d00 R09: 
[92414.864772] R10: 9beac944fce8 R11:  R12: eef584fe7b80
[92414.864775] R13: 9beac944f800 R14: 9beac944fd98 R15: 9bee81360d00
[92414.864780] FS:  7f98023da840() GS:9bf1bda0()
knlGS:
[92414.864783] CS:  0010 DS:  ES:  CR0: 80050033
[92414.864787] CR2: 7ffc5071f080 CR3: 30a0c000 CR4: 003406e0
[92414.864790] Call Trace:
[92414.864798]  fuse_writepages_fill+0x5cc/0x690 [fuse]
[92414.864810]  write_cache_pages+0x225/0x560
[92414.864819]  ? fuse_writepages+0xe0/0xe0 [fuse]
[92414.864828]  ? rcu_read_lock_sched_held+0x3f/0x80
[92414.864835]  ? trace_kmalloc+0xf2/0x120
[92414.864842]  ? __kmalloc+0x136/0x270
[92414.864848]  ? fuse_writepages+0x5e/0xe0 [fuse]
[92414.864857]  fuse_writepages+0x7d/0xe0 [fuse]
[92414.864867]  do_writepages+0x28/0xb0
[92414.864876]  __writeback_single_inode+0x60/0x6b0
[92414.864884]  writeback_single_inode+0xa7/0x140
[92414.864890]  write_inode_now+0x8b/0xb0
[92414.864904]  fuse_do_setattr+0x42f/0x770 [fuse]
[92414.864914]  ? _raw_spin_unlock+0x1f/0x30
[92414.864921]  ? fuse_do_getattr+0x149/0x2c0 [fuse]
[92414.864946]  fuse_setattr+0x99/0x140 [fuse]
[92414.864954]  notify_change+0x333/0x4a0
[92414.864964]  chown_common+0xec/0x190
[92414.864978]  ksys_fchown+0x6c/0xb0
[92414.864985]  __x64_sys_fchown+0x16/0x20
[92414.864990]  do_syscall_64+0x52/0xb0
[92414.864995]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[92414.865000] RIP: 0033:0x7f9801cc0cd7
[92414.865003] Code: Bad RIP value.
[92414.865007] RSP: 002b:7ffc506abb18 EFLAGS: 0206 ORIG_RAX:
005d
[92414.865011] RAX: ffda RBX: 7ffc506abba0 RCX: 7f9801cc0cd7
[92414.865014] RDX:  RSI:  RDI: 0004
[92414.865018] RBP: 0004 R08: 01cb9e70 R09: 7f98023da840
[92414.865021] R10: 7ffc506ab5a0 R11: 0206 R12: 0003
[92414.865025] R13: 7ffc506acde1 R14:  R15: 
[92414.865040] irq event stamp: 7637
[92414.865045] hardirqs last  enabled at (7645): []
console_unlock+0x4b7/0x6c0
[92414.865049] hardirqs last disabled at (7652): []
console_unlock+0xad/0x6c0
[92414.865103] softirqs last  enab

Re: [Kgdb-bugreport] [PATCH v3] kdb: Remove the misfeature 'KDBFLAGS'

2020-05-28 Thread Daniel Thompson
On Thu, May 21, 2020 at 03:21:25PM +0800, Wei Li wrote:
> Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined
> by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG'
> is set, and the user can define an environment variable named 'KDBFLAGS'
> too. These are puzzling indeed.
> 
> After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature.
> So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we
> wrote into. After this modification, we can use `md4c1 kdb_flags` instead,
> to observe the state flags.
> 
> Suggested-by: Daniel Thompson 
> Signed-off-by: Wei Li 

Applied. Thanks.


Daniel.

> ---
> v2 -> v3:
>  - Change to replace the internal env 'KDBFLAGS' with 'KDBDEBUG'.
> v1 -> v2:
>  - Fix lack of braces.
> 
>  kernel/debug/kdb/kdb_main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index 4fc43fb17127..392029287083 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -418,8 +418,7 @@ int kdb_set(int argc, const char **argv)
>   argv[2]);
>   return 0;
>   }
> - kdb_flags = (kdb_flags &
> -  ~(KDB_DEBUG_FLAG_MASK << KDB_DEBUG_FLAG_SHIFT))
> + kdb_flags = (kdb_flags & ~KDB_DEBUG(MASK))
>   | (debugflags << KDB_DEBUG_FLAG_SHIFT);
>  
>   return 0;
> @@ -2081,7 +2080,8 @@ static int kdb_env(int argc, const char **argv)
>   }
>  
>   if (KDB_DEBUG(MASK))
> - kdb_printf("KDBFLAGS=0x%x\n", kdb_flags);
> + kdb_printf("KDBDEBUG=0x%x\n",
> + (kdb_flags & KDB_DEBUG(MASK)) >> KDB_DEBUG_FLAG_SHIFT);
>  
>   return 0;
>  }
> -- 
> 2.17.1
> 
> 
> 
> ___
> Kgdb-bugreport mailing list
> kgdb-bugrep...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v3] kdb: Remove the misfeature 'KDBFLAGS'

2020-05-21 Thread Daniel Thompson
On Thu, May 21, 2020 at 03:21:25PM +0800, Wei Li wrote:
> Currently, 'KDBFLAGS' is an internal variable of kdb, it is combined
> by 'KDBDEBUG' and state flags. It will be shown only when 'KDBDEBUG'
> is set, and the user can define an environment variable named 'KDBFLAGS'
> too. These are puzzling indeed.
> 
> After communication with Daniel, it seems that 'KDBFLAGS' is a misfeature.
> So let's replace 'KDBFLAGS' with 'KDBDEBUG' to just show the value we
> wrote into. After this modification, we can use `md4c1 kdb_flags` instead,
> to observe the state flags.
> 
> Suggested-by: Daniel Thompson 
> Signed-off-by: Wei Li 
> ---
> v2 -> v3:
>  - Change to replace the internal env 'KDBFLAGS' with 'KDBDEBUG'.
> v1 -> v2:
>  - Fix lack of braces.
> 
>  kernel/debug/kdb/kdb_main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index 4fc43fb17127..392029287083 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -418,8 +418,7 @@ int kdb_set(int argc, const char **argv)
>   argv[2]);
>   return 0;
>   }
> - kdb_flags = (kdb_flags &
> -  ~(KDB_DEBUG_FLAG_MASK << KDB_DEBUG_FLAG_SHIFT))
> + kdb_flags = (kdb_flags & ~KDB_DEBUG(MASK))
>   | (debugflags << KDB_DEBUG_FLAG_SHIFT);
>  
>   return 0;
> @@ -2081,7 +2080,8 @@ static int kdb_env(int argc, const char **argv)
>   }
>  
>   if (KDB_DEBUG(MASK))
> - kdb_printf("KDBFLAGS=0x%x\n", kdb_flags);
> + kdb_printf("KDBDEBUG=0x%x\n",
> + (kdb_flags & KDB_DEBUG(MASK)) >> KDB_DEBUG_FLAG_SHIFT);

For this expression to work correctly, kdb_flags, need to be unsigned
(otherwise we get an arithmetic right shift and mis-report when
KDBDEBUG == 0xfff).

This is just FYI, I think I can fix this up when applying...


Daniel.


Re: [Kgdb-bugreport] [PATCH v3 04/11] kgdb: Delay "kgdbwait" to dbg_late_init() by default

2020-05-04 Thread Daniel Thompson
On Thu, Apr 30, 2020 at 09:35:30AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Thu, Apr 30, 2020 at 8:49 AM Daniel Thompson
>  wrote:
> >
> > On Tue, Apr 28, 2020 at 02:13:44PM -0700, Douglas Anderson wrote:
> > > Using kgdb requires at least some level of architecture-level
> > > initialization.  If nothing else, it relies on the architecture to
> > > pass breakpoints / crashes onto kgdb.
> > >
> > > On some architectures this all works super early, specifically it
> > > starts working at some point in time before Linux parses
> > > early_params's.  On other architectures it doesn't.  A survey of a few
> > > platforms:
> > >
> > > a) x86: Presumably it all works early since "ekgdboc" is documented to
> > >work here.
> > > b) arm64: Catching crashes works; with a simple patch breakpoints can
> > >also be made to work.
> > > c) arm: Nothing in kgdb works until
> > >paging_init() -> devicemaps_init() -> early_trap_init()
> > >
> > > Let's be conservative and, by default, process "kgdbwait" (which tells
> > > the kernel to drop into the debugger ASAP at boot) a bit later at
> > > dbg_late_init() time.  If an architecture has tested it and wants to
> > > re-enable super early debugging, they can select the
> > > ARCH_HAS_EARLY_DEBUG KConfig option.  We'll do this for x86 to start.
> > > It should be noted that dbg_late_init() is still called quite early in
> > > the system.
> > >
> > > Note that this patch doesn't affect when kgdb runs its init.  If kgdb
> > > is set to initialize early it will still initialize when parsing
> > > early_param's.  This patch _only_ inhibits the initial breakpoint from
> > > "kgdbwait".  This means:
> > >
> > > * Without any extra patches arm64 platforms will at least catch
> > >   crashes after kgdb inits.
> > > * arm platforms will catch crashes (and could handle a hardcoded
> > >   kgdb_breakpoint()) any time after early_trap_init() runs, even
> > >   before dbg_late_init().
> > >
> > > Signed-off-by: Douglas Anderson 
> > > Cc: Thomas Gleixner 
> > > Cc: Ingo Molnar 
> > > Cc: Borislav Petkov 
> > > Reviewed-by: Greg Kroah-Hartman 
> >
> > It looks like this patch is triggering some warnings from the existing
> > defconfigs (both x86 and arm64). It looks like this:
> >
> > ---
> > wychelm$ make defconfig
> >   GEN Makefile
> > *** Default configuration is based on 'x86_64_defconfig'
> >
> > WARNING: unmet direct dependencies detected for ARCH_HAS_EARLY_DEBUG
> >   Depends on [n]: KGDB [=n]
> >   Selected by [y]:
> >   - X86 [=y]
> >
> > WARNING: unmet direct dependencies detected for ARCH_HAS_EARLY_DEBUG
> >   Depends on [n]: KGDB [=n]
> >   Selected by [y]:
> >   - X86 [=y]
> 
> Ah, thanks!  I hadn't noticed those.  I think it'd be easy to just
> change the relevant patches to just "select ARCH_HAS_EARLY_DEBUG if
> KGDB".  If you agree that's a good fix and are willing, I'd be happy
> if you just added it to the relevant patches when applying.  If not, I
> can post a v4.

Happy with the approach to fix this.

Given the follow on discussion from the end of last week I suspect there
probably needs to be a v4 anyway so perhaps the last question is
applying a fix up is moot at this point?


Daniel.


Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries

2019-05-28 Thread Theodore Ts'o
On Tue, May 28, 2019 at 10:58:03AM +0500, Mikhail Gavrilov wrote:
> On Mon, 27 May 2019 at 21:16, Mikhail Gavrilov
>  wrote:
> >
> > I am bisected issue. I hope it help understand what is happened on my 
> > computer.
> >
> 
> Why no one answers?
> Even if the problem is known and already fixed, I would be nice to
> know that I spent 10 days for searching a problem commit not in vain
> and someone reads my messages.

Sorry, I didn't see your earlier messages; I'm not sure why.  In any
case, yes, it's a known issue, and it's fixed in 5.2-rc2.  This fix
was commit 0a944e8a6c66.

- Ted



Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries

2019-05-27 Thread Mikhail Gavrilov
On Mon, 27 May 2019 at 21:16, Mikhail Gavrilov
 wrote:
>
> I am bisected issue. I hope it help understand what is happened on my 
> computer.
>
> $ git bisect log
> git bisect start
> # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
> git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
> # bad: [7e9890a3500d95c01511a4c45b7e7192dfa47ae2] Merge tag
> 'ovl-update-5.2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
> git bisect bad 7e9890a3500d95c01511a4c45b7e7192dfa47ae2
> # good: [80f232121b69cc69a31ccb2b38c1665d770b0710] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect good 80f232121b69cc69a31ccb2b38c1665d770b0710
> # good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag
> 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
> git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
> # good: [ea5aee6d97fd2d4499b1eebc233861c1def70f06] Merge tag
> 'clk-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
> git bisect good ea5aee6d97fd2d4499b1eebc233861c1def70f06
> # good: [47782361aca21a32ad4198f1b72f1655a7c9f7e5] Merge tag
> 'tag-chrome-platform-for-v5.2' of
> ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
> git bisect good 47782361aca21a32ad4198f1b72f1655a7c9f7e5
> # bad: [55472bae5331f33582d9f0e8919fed8bebcda0da] Merge tag
> 'linux-watchdog-5.2-rc1' of
> git://www.linux-watchdog.org/linux-watchdog
> git bisect bad 55472bae5331f33582d9f0e8919fed8bebcda0da
> # good: [4dbf09fea60d158e60a30c419e0cfa1ea138dd57] Merge tag
> 'mtd/for-5.2' of
> ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux
> git bisect good 4dbf09fea60d158e60a30c419e0cfa1ea138dd57
> # good: [44affc086e6d5ea868c1184cdc5e1159e90ffb71] watchdog:
> ts4800_wdt: Convert to use device managed functions and other
> improvements
> git bisect good 44affc086e6d5ea868c1184cdc5e1159e90ffb71
> # good: [5c09980d9f9de2dc6b255f4f0229aeff0eb2c723] watchdog:
> imx_sc_wdt: drop warning after calling watchdog_init_timeout
> git bisect good 5c09980d9f9de2dc6b255f4f0229aeff0eb2c723
> # good: [345f16251063bcef5828f17fe90aa7f7a5019aab] watchdog: Improve
> Kconfig entry ordering and dependencies
> git bisect good 345f16251063bcef5828f17fe90aa7f7a5019aab
> # good: [988bec41318f3fa897e2f8af271bd456936d6caf] ubifs: orphan:
> Handle xattrs like files
> git bisect good 988bec41318f3fa897e2f8af271bd456936d6caf
> # good: [a65d10f3ce657aa4542b5de78933053f6d1a9e97] ubifs: Drop
> unnecessary setting of zbr->znode
> git bisect good a65d10f3ce657aa4542b5de78933053f6d1a9e97
> # good: [a9f0bda567e32a2b44165b067adfc4a4f56d1815] watchdog: Enforce
> that at least one pretimeout governor is enabled
> git bisect good a9f0bda567e32a2b44165b067adfc4a4f56d1815
> # bad: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge tag
> 'upstream-5.2-rc1' of
> ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
> git bisect bad d7a02fa0a8f9ec1b81d57628ca9834563208ef33
> # good: [04d37e5a8b1fad2d625727af3d738c6fd9491720] ubi: wl: Fix
> uninitialized variable
> git bisect good 04d37e5a8b1fad2d625727af3d738c6fd9491720
> # first bad commit: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge
> tag 'upstream-5.2-rc1' of
> ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
>



Why no one answers?
Even if the problem is known and already fixed, I would be nice to
know that I spent 10 days for searching a problem commit not in vain
and someone reads my messages.


--
Best Regards,
Mike Gavrilov.


Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries

2019-05-27 Thread Mikhail Gavrilov
On Sat, 18 May 2019 at 16:07, Mikhail Gavrilov
 wrote:
>
> It happens today again.
>
> [18018.969636] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908:
> inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent:
> invalid extent entries - magic f30a, entries 8, max 340(340), depth
> 0(0)
> [18018.970071] jbd2_journal_bmap: journal block not found at offset
> 4799 on nvme0n1p2-8
> [18018.970076] Aborting journal on device nvme0n1p2-8.
> [18018.970269] EXT4-fs error (device nvme0n1p2):
> ext4_journal_check_start:61: Detected aborted journal
> [18018.970316] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
>

I am bisected issue. I hope it help understand what is happened on my computer.

$ git bisect log
git bisect start
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
# bad: [7e9890a3500d95c01511a4c45b7e7192dfa47ae2] Merge tag
'ovl-update-5.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
git bisect bad 7e9890a3500d95c01511a4c45b7e7192dfa47ae2
# good: [80f232121b69cc69a31ccb2b38c1665d770b0710] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good 80f232121b69cc69a31ccb2b38c1665d770b0710
# good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag
'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# good: [ea5aee6d97fd2d4499b1eebc233861c1def70f06] Merge tag
'clk-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect good ea5aee6d97fd2d4499b1eebc233861c1def70f06
# good: [47782361aca21a32ad4198f1b72f1655a7c9f7e5] Merge tag
'tag-chrome-platform-for-v5.2' of
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
git bisect good 47782361aca21a32ad4198f1b72f1655a7c9f7e5
# bad: [55472bae5331f33582d9f0e8919fed8bebcda0da] Merge tag
'linux-watchdog-5.2-rc1' of
git://www.linux-watchdog.org/linux-watchdog
git bisect bad 55472bae5331f33582d9f0e8919fed8bebcda0da
# good: [4dbf09fea60d158e60a30c419e0cfa1ea138dd57] Merge tag
'mtd/for-5.2' of
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux
git bisect good 4dbf09fea60d158e60a30c419e0cfa1ea138dd57
# good: [44affc086e6d5ea868c1184cdc5e1159e90ffb71] watchdog:
ts4800_wdt: Convert to use device managed functions and other
improvements
git bisect good 44affc086e6d5ea868c1184cdc5e1159e90ffb71
# good: [5c09980d9f9de2dc6b255f4f0229aeff0eb2c723] watchdog:
imx_sc_wdt: drop warning after calling watchdog_init_timeout
git bisect good 5c09980d9f9de2dc6b255f4f0229aeff0eb2c723
# good: [345f16251063bcef5828f17fe90aa7f7a5019aab] watchdog: Improve
Kconfig entry ordering and dependencies
git bisect good 345f16251063bcef5828f17fe90aa7f7a5019aab
# good: [988bec41318f3fa897e2f8af271bd456936d6caf] ubifs: orphan:
Handle xattrs like files
git bisect good 988bec41318f3fa897e2f8af271bd456936d6caf
# good: [a65d10f3ce657aa4542b5de78933053f6d1a9e97] ubifs: Drop
unnecessary setting of zbr->znode
git bisect good a65d10f3ce657aa4542b5de78933053f6d1a9e97
# good: [a9f0bda567e32a2b44165b067adfc4a4f56d1815] watchdog: Enforce
that at least one pretimeout governor is enabled
git bisect good a9f0bda567e32a2b44165b067adfc4a4f56d1815
# bad: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge tag
'upstream-5.2-rc1' of
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
git bisect bad d7a02fa0a8f9ec1b81d57628ca9834563208ef33
# good: [04d37e5a8b1fad2d625727af3d738c6fd9491720] ubi: wl: Fix
uninitialized variable
git bisect good 04d37e5a8b1fad2d625727af3d738c6fd9491720
# first bad commit: [d7a02fa0a8f9ec1b81d57628ca9834563208ef33] Merge
tag 'upstream-5.2-rc1' of
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs


--
Best Regards,
Mike Gavrilov.


Re: [bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries

2019-05-18 Thread Alex Xu (Hello71)
Excerpts from Mikhail Gavrilov's message of May 18, 2019 7:07 am:
> On Sat, 18 May 2019 at 11:44, Mikhail Gavrilov
>  wrote:
>> [28616.429757] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908:
>> inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent:
>> invalid extent entries - magic f30a, entries 8, max 340(340), depth
>> 0(0)


I had a similar problem today:

EXT4-fs error (device dm-0): ext4_find_extent:908: inode #8: comm jbd2/dm-0-8: 
pblk 117997567 bad header/extent: invalid extent entries - magic f30a, entries 
8, max 340(340), depth 0(0)

I am using dm-crypt on SATA disk.


[bugreport] kernel 5.2 pblk bad header/extent: invalid extent entries

2019-05-17 Thread Mikhail Gavrilov
Hi folks.
Yesterday I updated kernel to 5.2 (git commit 7e9890a3500d)
I always leave computer working at night.
Today at morning I am found that computer are hanged.
I was connect via ssh and look at kernel log.
There I had seen strange records which I never seen before:

[28616.429757] EXT4-fs error (device nvme0n1p2): ext4_find_extent:908:
inode #8: comm jbd2/nvme0n1p2-: pblk 23101439 bad header/extent:
invalid extent entries - magic f30a, entries 8, max 340(340), depth
0(0)
[28616.430602] jbd2_journal_bmap: journal block not found at offset
4383 on nvme0n1p2-8
[28616.430610] Aborting journal on device nvme0n1p2-8.
[28616.432474] EXT4-fs error (device nvme0n1p2):
ext4_journal_check_start:61: Detected aborted journal
[28616.432489] EXT4-fs error (device nvme0n1p2):
ext4_journal_check_start:61: Detected aborted journal
[28616.432551] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[28616.432690] EXT4-fs (nvme0n1p2): ext4_writepages: jbd2_start:
9223372036854775791 pages, ino 3285789; err -30
[28616.432692] EXT4-fs error (device nvme0n1p2):
ext4_journal_check_start:61: Detected aborted journal

After reboot computer and running fsck system looks like working.
But I am afraid that it could happens again and I may lost all my data.

How safe this error and what does it mean?
It a bug of kernel 5.2 or not?

--
Best Regards,
Mike Gavrilov.


Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel

2017-12-06 Thread Jason Wessel

On 12/05/2017 10:42 AM, Randy Dunlap wrote:

On 12/05/2017 06:55 AM, Daniel Thompson wrote:

On 05/12/17 14:37, Jason Wessel wrote:

I have a series of 50+ patches for kgdb/kdb/usb which have never been 
published.  I am not saying that we actually need any of those patches, but it 
would be nice to let the community decide, and we can see if there is anything 
worth merging into the next cycle or future work with other maintainers.   My 
kernel.org tree stopped working a long time ago, probably from inactivity.  
I'll see if that can get restored in the next few days, or I'll use my github 
tree and send the unpublished work to the mailing list as an RFC.


I, for one, would be interested to see these.


Me also.  I have 3 kdb patches that I just made.




If you have some patches please do send them along to the list.  I have added 
Daniel as an additional maintainer for when I am not around.

We are open for business again now that my kernel.org tree accepts my tag 
signing again.  It will take some time to go through these unpublished patches 
to see what is actually relevant, but I'll posting some of them to the mailing 
reasonably list soon.

Cheers,
Jason.

ps

While on the topic of debuggers...  I was thinking it might be interesting to 
have a gdb-serial stub in an FPGA for debugging the kernel not unlike what was 
done with the firewire debugger that Andi Kleen worked on long ago.  I am not 
exactly sure what kind of run control options exist there but in terms of 
accessing memory it would certainly be plausible to access it.  One option I 
know that is plausible for run control is a small kernel interrupt handler 
perhaps for the run control interface based on the fact you can some FPGAs show 
up like a PCI device.

While I haven't been directly working in upstream linux in last year or two, I 
still do plenty of debugging of full systems with simulators, hardware, and now 
FPGAs too.  :-)


Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel

2017-12-05 Thread Randy Dunlap
On 12/05/2017 06:55 AM, Daniel Thompson wrote:
> On 05/12/17 14:37, Jason Wessel wrote:
>> On 12/05/2017 08:09 AM, Lee Jones wrote:
>>> On Tue, 05 Dec 2017, Daniel Thompson wrote:
>>>
 ... with many, many thanks for Jason for all his hard work.

 Cc: Jason Wessel 
 Signed-off-by: Daniel Thompson 
 ---

 Notes:
  Over the years Jason has become increasingly hard to get hold off
  and I think he must now be regarded as inactive.
  Patches in kgdb-next (mine as it happens) have been there for over a
  year without a corresponding pull request and a couple of architecture
  specific kgdb fixes have ended up missing a release cycle (or two) as
  the architecture maintainer waits for an Acked-by from Jason.
  In the past I've had to rely on Andrew M. to land my own changes to
  kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649
  ("x86/debug: Handle warnings before the notifier chain, to fix KGDB
  crash"). That I was sharing surrogate acks convinced me we need a
  change here and I've offered Jason help via private e-mail without
  reply.
  So, I really would prefer it it if this patch listed me as a
  co-maintainer or, failing that, as least had Jason's blessing... but
  it doesn't. I certainly suggest this patch takes a long time in
  review, and if it doesn't attract Jason's attention then I can only
  reiterate what is says in the commit log: Thanks Jason!

   MAINTAINERS | 3 +--
   1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> It looks like Jason has been inactive in all aspects of upstream
>>> maintainership and as a contributor for well over a year now.
>>
>> I have not been working directly on upstream kernel contributions for quite 
>> some time.  It doesn't mean I haven't been involved with kernel development. 
>>  Patches that I have reviewed or suggested to other developers generally 
>> don't bare my name.  I wouldn't mind trying to take a slightly more gradual 
>> passing of the baton and add Daniel as co-maintainer for a while before I 
>> retire from kernel work and merge myself away in the coming years. :-)
> 
> Great to hear from you again! I shall consider this patch nacked or the time 
> being ;-)... and if you are happy with help from me I shall leave it to you 
> to propose an update to MAINTAINERS.
> 
> 
>> I have a series of 50+ patches for kgdb/kdb/usb which have never been 
>> published.  I am not saying that we actually need any of those patches, but 
>> it would be nice to let the community decide, and we can see if there is 
>> anything worth merging into the next cycle or future work with other 
>> maintainers.   My kernel.org tree stopped working a long time ago, probably 
>> from inactivity.  I'll see if that can get restored in the next few days, or 
>> I'll use my github tree and send the unpublished work to the mailing list as 
>> an RFC.
> 
> I, for one, would be interested to see these.

Me also.  I have 3 kdb patches that I just made.


-- 
~Randy


Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel

2017-12-05 Thread Daniel Thompson

On 05/12/17 14:37, Jason Wessel wrote:

On 12/05/2017 08:09 AM, Lee Jones wrote:

On Tue, 05 Dec 2017, Daniel Thompson wrote:


... with many, many thanks for Jason for all his hard work.

Cc: Jason Wessel 
Signed-off-by: Daniel Thompson 
---

Notes:
 Over the years Jason has become increasingly hard to get hold off
 and I think he must now be regarded as inactive.
 Patches in kgdb-next (mine as it happens) have been there for 
over a
 year without a corresponding pull request and a couple of 
architecture
 specific kgdb fixes have ended up missing a release cycle (or 
two) as

 the architecture maintainer waits for an Acked-by from Jason.
 In the past I've had to rely on Andrew M. to land my own changes to
 kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649
 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB
 crash"). That I was sharing surrogate acks convinced me we need a
 change here and I've offered Jason help via private e-mail without
 reply.
 So, I really would prefer it it if this patch listed me as a
 co-maintainer or, failing that, as least had Jason's blessing... 
but

 it doesn't. I certainly suggest this patch takes a long time in
 review, and if it doesn't attract Jason's attention then I can only
 reiterate what is says in the commit log: Thanks Jason!

  MAINTAINERS | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)


It looks like Jason has been inactive in all aspects of upstream
maintainership and as a contributor for well over a year now.


I have not been working directly on upstream kernel contributions for 
quite some time.  It doesn't mean I haven't been involved with kernel 
development.  Patches that I have reviewed or suggested to other 
developers generally don't bare my name.  I wouldn't mind trying to take 
a slightly more gradual passing of the baton and add Daniel as 
co-maintainer for a while before I retire from kernel work and merge 
myself away in the coming years. :-)


Great to hear from you again! I shall consider this patch nacked or the 
time being ;-)... and if you are happy with help from me I shall leave 
it to you to propose an update to MAINTAINERS.



I have a series of 50+ patches for kgdb/kdb/usb which have never been 
published.  I am not saying that we actually need any of those patches, 
but it would be nice to let the community decide, and we can see if 
there is anything worth merging into the next cycle or future work with 
other maintainers.   My kernel.org tree stopped working a long time ago, 
probably from inactivity.  I'll see if that can get restored in the next 
few days, or I'll use my github tree and send the unpublished work to 
the mailing list as an RFC.


I, for one, would be interested to see these.


Daniel.


Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel

2017-12-05 Thread Lee Jones
On Tue, 05 Dec 2017, Jason Wessel wrote:

> On 12/05/2017 08:09 AM, Lee Jones wrote:
> > On Tue, 05 Dec 2017, Daniel Thompson wrote:
> > 
> > > ... with many, many thanks for Jason for all his hard work.
> > > 
> > > Cc: Jason Wessel 
> > > Signed-off-by: Daniel Thompson 
> > > ---
> > > 
> > > Notes:
> > >  Over the years Jason has become increasingly hard to get hold off
> > >  and I think he must now be regarded as inactive.
> > >  Patches in kgdb-next (mine as it happens) have been there for over a
> > >  year without a corresponding pull request and a couple of 
> > > architecture
> > >  specific kgdb fixes have ended up missing a release cycle (or two) as
> > >  the architecture maintainer waits for an Acked-by from Jason.
> > >  In the past I've had to rely on Andrew M. to land my own changes to
> > >  kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649
> > >  ("x86/debug: Handle warnings before the notifier chain, to fix KGDB
> > >  crash"). That I was sharing surrogate acks convinced me we need a
> > >  change here and I've offered Jason help via private e-mail without
> > >  reply.
> > >  So, I really would prefer it it if this patch listed me as a
> > >  co-maintainer or, failing that, as least had Jason's blessing... but
> > >  it doesn't. I certainly suggest this patch takes a long time in
> > >  review, and if it doesn't attract Jason's attention then I can only
> > >  reiterate what is says in the commit log: Thanks Jason!
> > > 
> > >   MAINTAINERS | 3 +--
> > >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > It looks like Jason has been inactive in all aspects of upstream
> > maintainership and as a contributor for well over a year now.
> 
> I have not been working directly on upstream kernel contributions
> for quite some time.  It doesn't mean I haven't been involved with
> kernel development.  Patches that I have reviewed or suggested to
> other developers generally don't bare my name.  I wouldn't mind
> trying to take a slightly more gradual passing of the baton and add
> Daniel as co-maintainer for a while before I retire from kernel work
> and merge myself away in the coming years. :-) 
> 
> I have a series of 50+ patches for kgdb/kdb/usb which have never
> been published.  I am not saying that we actually need any of those
> patches, but it would be nice to let the community decide, and we
> can see if there is anything worth merging into the next cycle or
> future work with other maintainers.   My kernel.org tree stopped
> working a long time ago, probably from inactivity.  I'll see if that
> can get restored in the next few days, or I'll use my github tree
> and send the unpublished work to the mailing list as an RFC.  And
> for what it is worth if none of this happens by the end of 4.16, by
> all means Daniel has my blessing to be the sole maintainer.

Thanks for your reply Jason.

Sounds like a perfectly reasonable way forward.

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog


Re: [Kgdb-bugreport] [PATCH] MAINTAINERS: kgdb: Replace Jason with Daniel

2017-12-05 Thread Jason Wessel

On 12/05/2017 08:09 AM, Lee Jones wrote:

On Tue, 05 Dec 2017, Daniel Thompson wrote:


... with many, many thanks for Jason for all his hard work.

Cc: Jason Wessel 
Signed-off-by: Daniel Thompson 
---

Notes:
 Over the years Jason has become increasingly hard to get hold off
 and I think he must now be regarded as inactive.
 
 Patches in kgdb-next (mine as it happens) have been there for over a

 year without a corresponding pull request and a couple of architecture
 specific kgdb fixes have ended up missing a release cycle (or two) as
 the architecture maintainer waits for an Acked-by from Jason.
 
 In the past I've had to rely on Andrew M. to land my own changes to

 kgdb and in the v4.14 cycle you'll find my Acked-by on b8347c219649
 ("x86/debug: Handle warnings before the notifier chain, to fix KGDB
 crash"). That I was sharing surrogate acks convinced me we need a
 change here and I've offered Jason help via private e-mail without
 reply.
 
 So, I really would prefer it it if this patch listed me as a

 co-maintainer or, failing that, as least had Jason's blessing... but
 it doesn't. I certainly suggest this patch takes a long time in
 review, and if it doesn't attract Jason's attention then I can only
 reiterate what is says in the commit log: Thanks Jason!

  MAINTAINERS | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)


It looks like Jason has been inactive in all aspects of upstream
maintainership and as a contributor for well over a year now.


I have not been working directly on upstream kernel contributions for quite 
some time.  It doesn't mean I haven't been involved with kernel development.  
Patches that I have reviewed or suggested to other developers generally don't 
bare my name.  I wouldn't mind trying to take a slightly more gradual passing 
of the baton and add Daniel as co-maintainer for a while before I retire from 
kernel work and merge myself away in the coming years. :-)

I have a series of 50+ patches for kgdb/kdb/usb which have never been 
published.  I am not saying that we actually need any of those patches, but it 
would be nice to let the community decide, and we can see if there is anything 
worth merging into the next cycle or future work with other maintainers.   My 
kernel.org tree stopped working a long time ago, probably from inactivity.  
I'll see if that can get restored in the next few days, or I'll use my github 
tree and send the unpublished work to the mailing list as an RFC.  And for what 
it is worth if none of this happens by the end of 4.16, by all means Daniel has 
my blessing to be the sole maintainer.

Many thanks to Daniel for his contributions!

Cheers,
Jason.


Re: [PATCH] [BUGREPORT] media: v4l: omap_vout: vrfb: initialize DMA flags

2017-07-17 Thread Peter Ujfalusi
Arnd,

sorry for the delayed response, I was away w/o internet connection for
the past weeks.

On 2017-07-10 14:18, Arnd Bergmann wrote:
> Passing uninitialized flags into device_prep_interleaved_dma is clearly
> a bad idea, and we get a compiler warning for it:
> 
> drivers/media/platform/omap/omap_vout_vrfb.c: In function 
> 'omap_vout_prepare_vrfb':
> drivers/media/platform/omap/omap_vout_vrfb.c:273:5: error: 'flags' may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]

I can not explain why I have missed this.

> It seems that the OMAP dmaengine ignores the flags, but we should
> pick the right ones anyway. Unfortunately I don't know what they
> should be, so I just picked the most common flags. Please set the
> right flags here and fold the modified patch.

The flags are fine.

> 
> Fixes: 6a1560ecaa8c ("media: v4l: omap_vout: vrfb: Convert to dmaengine")
> Signed-off-by: Arnd Bergmann 

Acked-by: Peter Ujfalusi 

> ---
>  drivers/media/platform/omap/omap_vout_vrfb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/omap/omap_vout_vrfb.c 
> b/drivers/media/platform/omap/omap_vout_vrfb.c
> index 45a553d4f5b2..fed28b6bbbc0 100644
> --- a/drivers/media/platform/omap/omap_vout_vrfb.c
> +++ b/drivers/media/platform/omap/omap_vout_vrfb.c
> @@ -233,7 +233,7 @@ int omap_vout_prepare_vrfb(struct omap_vout_device *vout,
>  struct videobuf_buffer *vb)
>  {
>   struct dma_async_tx_descriptor *tx;
> - enum dma_ctrl_flags flags;
> + enum dma_ctrl_flags flags = DMA_PREP_INTERRUPT | DMA_CTRL_ACK;
>   struct dma_chan *chan = vout->vrfb_dma_tx.chan;
>   struct dma_device *dmadev = chan->device;
>   struct dma_interleaved_template *xt = vout->vrfb_dma_tx.xt;
> 

- Péter


[PATCH] [BUGREPORT] media: v4l: omap_vout: vrfb: initialize DMA flags

2017-07-10 Thread Arnd Bergmann
Passing uninitialized flags into device_prep_interleaved_dma is clearly
a bad idea, and we get a compiler warning for it:

drivers/media/platform/omap/omap_vout_vrfb.c: In function 
'omap_vout_prepare_vrfb':
drivers/media/platform/omap/omap_vout_vrfb.c:273:5: error: 'flags' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]

It seems that the OMAP dmaengine ignores the flags, but we should
pick the right ones anyway. Unfortunately I don't know what they
should be, so I just picked the most common flags. Please set the
right flags here and fold the modified patch.

Fixes: 6a1560ecaa8c ("media: v4l: omap_vout: vrfb: Convert to dmaengine")
Signed-off-by: Arnd Bergmann 
---
 drivers/media/platform/omap/omap_vout_vrfb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/omap/omap_vout_vrfb.c 
b/drivers/media/platform/omap/omap_vout_vrfb.c
index 45a553d4f5b2..fed28b6bbbc0 100644
--- a/drivers/media/platform/omap/omap_vout_vrfb.c
+++ b/drivers/media/platform/omap/omap_vout_vrfb.c
@@ -233,7 +233,7 @@ int omap_vout_prepare_vrfb(struct omap_vout_device *vout,
   struct videobuf_buffer *vb)
 {
struct dma_async_tx_descriptor *tx;
-   enum dma_ctrl_flags flags;
+   enum dma_ctrl_flags flags = DMA_PREP_INTERRUPT | DMA_CTRL_ACK;
struct dma_chan *chan = vout->vrfb_dma_tx.chan;
struct dma_device *dmadev = chan->device;
struct dma_interleaved_template *xt = vout->vrfb_dma_tx.xt;
-- 
2.9.0



Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-30 Thread Levin, Alexander
On Thu, Oct 27, 2016 at 10:02:10PM -0400, Al Viro wrote:
> ... and frankly, backporting 548acf19234d would be my preference.  It's a bit
> more intrusive than needed (_ASM_EXTABLE_FAULT is used only in 
> memcpy_mcsafe(),
> which is used only by pmem and it's the only reason for passing the trap
> number to fixup_exception()), but AFAICS it's fairly safe.  Objections?

I've grabbed 548acf19234d for 4.1, thanks!

-- 

Thanks,
Sasha

Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-28 Thread Greg KH
On Fri, Oct 28, 2016 at 08:49:58PM +0100, Al Viro wrote:
> On Fri, Oct 28, 2016 at 11:21:24AM -0700, Linus Torvalds wrote:
> 
> > End result: either commit 1c109fabbd51 shouldn't be backported (it's
> > really not that important - if people properly check the exception
> > error results it shouldn't matter), or you need to also backport
> > 548acf19234d as Al suggested.
> > 
> > I'd be inclined to say "don't backport 1c109fabbd51", but it's really
> > a judgment call.
> 
> *nod*
> 
> FWIW, that infoleak _does_ allow to leak an uninitialized word into
> coredump (in sigreturn the value from uninitialized local variable is
> copied into pt_regs of process and when we eventually check that error
> has happened and hit the sucker with SIGSEGV, that value gets stored into
> the coredump), but in the worst case that's 64 bits leaked from fixed depth
> in the kernel stack of attacker's process, with fixed call chain.
> 
> I very much doubt that it's escalatable to anything practically interesting.
> If spender et.al. can come up with a usable way to escalate that, I would be
> quite surprised (and would love to see the details), but hey, it might be
> possible.  More likely possibility is that the bug is harmless in practice.

Hm, I think I'll backport 548acf19234d to 4.4-stable, as people have
shown that leaking anything can be used in odd ways that they shouldn't
be, just to be "safe" :)

thanks for the heads up.

greg k-h


Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-28 Thread Al Viro
On Fri, Oct 28, 2016 at 11:21:24AM -0700, Linus Torvalds wrote:

> End result: either commit 1c109fabbd51 shouldn't be backported (it's
> really not that important - if people properly check the exception
> error results it shouldn't matter), or you need to also backport
> 548acf19234d as Al suggested.
> 
> I'd be inclined to say "don't backport 1c109fabbd51", but it's really
> a judgment call.

*nod*

FWIW, that infoleak _does_ allow to leak an uninitialized word into
coredump (in sigreturn the value from uninitialized local variable is
copied into pt_regs of process and when we eventually check that error
has happened and hit the sucker with SIGSEGV, that value gets stored into
the coredump), but in the worst case that's 64 bits leaked from fixed depth
in the kernel stack of attacker's process, with fixed call chain.

I very much doubt that it's escalatable to anything practically interesting.
If spender et.al. can come up with a usable way to escalate that, I would be
quite surprised (and would love to see the details), but hey, it might be
possible.  More likely possibility is that the bug is harmless in practice.


Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-28 Thread Al Viro
On Fri, Oct 28, 2016 at 12:40:33PM -0400, Joe Korty wrote:

> Backporting 548acf19234d to 4.1.35 does indeed fix the
> issue.  However, it is not clear to my _why_ it works,
> so it might be better that someone else push the backport
> to stable.

Because the trick used in fixup_exception() prior to that commit depended
upon the handler being very close to faulting instruction.
# define _ASM_EXTABLE_EX(from,to)   \
.pushsection "__ex_table","a" ; \
.balign 8 ; \
.long (from) - . ;  \
.long (to) - . + 0x7ff0 ;   \
.popsection
puts a recognizable value (handler + offset a bit under 2G) into
->fixup and in fixup_exception() we had
if (fixup->fixup - fixup->insn >= 0x7ff0 - 4) {
/* Special hack for uaccess_err */
current_thread_info()->uaccess_err = 1;
new_ip -= 0x7ff0;
}
checking that the value in ->fixup is just below 2G from the faulting
instruction.  So _ASM_EXTABLE_EX relied upon the handler very close to
the faulting insn, and worked only because all of its uses had
been "set the ->uaccess_err and continue immediately past the faulting
insn".  When the kludge in fixup_exception() had been eliminated
(check what it and _ASM_EXTABLE_EX do these days) this restriction has
disappeared, so the mainline commit had no problems.  Backport to 4.1
had it run afoul of that restriction, with the results you've observed -
this "handler + constant offset" had _not_ been recognized as that magic and
had been interpreted as handler being about 2G away from its actual location.
That's where the bogus RIP in your oopsen have come from.


Re: [4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-28 Thread Linus Torvalds
 rea

On Fri, Oct 28, 2016 at 9:40 AM, Joe Korty  wrote:
>
> Backporting 548acf19234d to 4.1.35 does indeed fix the
> issue.  However, it is not clear to my _why_ it works,
> so it might be better that someone else push the backport
> to stable.

The problem is that the old _ASM_EXTABLE_EXT hackery ends up being
this code in fixup_exception() back in 4.1 (and later).

if (fixup->fixup - fixup->insn >= 0x7ff0 - 4) {
/* Special hack for uaccess_err */
current_thread_info()->uaccess_err = 1;
new_ip -= 0x7ff0;
}

and it really does depend very intimately on the relationship with the
"fixup" address (fixup->fixup) with the instruction that took the
fault (fixup->insn).

Now, back in the original 4.1 days, that fixup-vs-insn relationship
was trivially always the case, since __get_user_asm_ex() always just
made the fixup be to fall through to the next instruction.

However, when commit 1c109fabbd51 ("fix minor infoleak in
get_user_ex()") was backported, now the fixup for __get_user_asm_ex()
ends up being in a different section entirely (".section .fixup"), and
the close relationship between the faulting instruction and the fixup
instruction went away.

End result: commit 1c109fabbd51lly effectively and very subtly depends
on commit 548acf19234d (introduced in v4.6) that gets rid of the
special hack.

Adding "stable" to the cc, because this might well affect other stable
backports than 4.1.

End result: either commit 1c109fabbd51 shouldn't be backported (it's
really not that important - if people properly check the exception
error results it shouldn't matter), or you need to also backport
548acf19234d as Al suggested.

I'd be inclined to say "don't backport 1c109fabbd51", but it's really
a judgment call.

   Linus


[4.1 backport trouble] Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-27 Thread Al Viro
On Fri, Oct 28, 2016 at 01:03:55AM +0100, Al Viro wrote:

> On Thu, Oct 27, 2016 at 03:32:10PM -0400, Joe Korty wrote:
[oops in 4.1.35, bisected to 319fe1151940]
> > The following test program can be used to trigger the problem:
> > 
> > /* gcc -m32 c.c -o c */
> > #define _GNU_SOURCE
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > #define rt_sigqueueinfo 178
> > 
> > int main(int argc, char **argv) {
> >  int stat = syscall(rt_sigqueueinfo, 0, 0, 0, 0, 0, 0);
> >  printf("syscall(%d): stat: %d, errno: %d\n",
> >rt_sigqueueinfo, stat, errno);
> >  return 0;
> > }
> > 
> > This is under 4.1.35 on x86_64.
>
> AFAICS, it steps on _ASM_EXTABLE_EX being more brittle in 4.1 - it pretty
> much has to have the handler on the next insn after the faulting one, or
> the resulting extable entry won't be recognized.  This
> "x86/mm: Expand the exception table logic to allow new handling options"
> in mainline is where that requirement has disappeared.  I think we
> ought to use the plain _ASM_EXTABLE and just call something that would
> set current_thread_info()->uaccess_err directly from the fixup code there.
> That, or backport the commit switching to less brittle extables.

... and frankly, backporting 548acf19234d would be my preference.  It's a bit
more intrusive than needed (_ASM_EXTABLE_FAULT is used only in memcpy_mcsafe(),
which is used only by pmem and it's the only reason for passing the trap
number to fixup_exception()), but AFAICS it's fairly safe.  Objections?


Re: BUGreport: fix minor infoleak in get_user_ex()

2016-10-27 Thread Al Viro
On Thu, Oct 27, 2016 at 03:32:10PM -0400, Joe Korty wrote:
> Hi Al,
> I don't know if this is worth fixing or not, but I thought
> I would mention it in case it was.
> 
> A git bisect search shows that the commit:
> 
>   commit 319fe11519401e8a5db191a0a93aa2c1d7bb59f4
>   Author: Al Viro 
>   Date:   Thu Sep 15 02:35:29 2016 +0100
> 
> causes some malformed rt_sigqueueinfo syscalls, executed under
> x86_64 kernels running compat mode programs, to oops with
> the following message:
> 
> [   66.054786] BUG: unable to handle kernel paging request at 020573eb
> [   66.061793] IP: [<020573eb>] 0x20573eb
> [   66.066251] PGD 122263067 PUD 120a0c067 PMD 0 
> [   66.070745] Oops: 0010 [#1] PREEMPT SMP 
> [   66.074717] Modules linked in:
> [   66.077789] CPU: 7 PID: 5496 Comm: cc Not tainted 4.1.35 #1
> [   66.083365] Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014  
> 10/22/2009
> [   66.090582] task: 88006b044400 ti: 88006b30 task.ti: 
> 88006b30
> [   66.098067] RIP: 0010:[<020573eb>]  [<020573eb>] 0x20573eb
> [   66.104961] RSP: 0018:88006b303e98  EFLAGS: 00010246
> [   66.110269] RAX: 7fffef80 RBX:  RCX: 
> 
> [   66.117399] RDX: 88006b304000 RSI:  RDI: 
> 88006b303ea8
> [   66.124528] RBP: 88006b303e98 R08:  R09: 
> 
> [   66.131667] R10:  R11: 0246 R12: 
> 
> [   66.138805] R13: 88006b303ea8 R14:  R15: 
> 
> [   66.145935] FS:  77fca740() GS:880127d8(0063) 
> knlGS:f7df06c0
> [   66.154027] CS:  0010 DS: 002b ES: 002b CR0: 8005003b
> [   66.159777] CR2: 020573eb CR3: 00011f874000 CR4: 
> 06e0
> [   66.166906] Stack:
> [   66.168919]  88006b303f48 8107c4b5  
> 
> [   66.176403]     
> 
> [   66.183872]     
> 
> [   66.191341] Call Trace:
> [   66.193802]  [] compat_SyS_rt_sigqueueinfo+0x45/0x70
> [   66.200340]  [] cstar_dispatch+0x7/0x2a
> [   66.205755] Code:  Bad RIP value.
> [   66.209103] RIP  [<020573eb>] 0x20573eb
> [   66.213648]  RSP 
> [   66.217134] CR2: 020573eb
> [   66.220505] ---[ end trace 4f88266d7fd7e6d7 ]---
 
> The following test program can be used to trigger the problem:
> 
> /* gcc -m32 c.c -o c */
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #define rt_sigqueueinfo 178
> 
> int main(int argc, char **argv) {
>  int stat = syscall(rt_sigqueueinfo, 0, 0, 0, 0, 0, 0);
>  printf("syscall(%d): stat: %d, errno: %d\n",
>rt_sigqueueinfo, stat, errno);
>  return 0;
> }
> 
> This is under 4.1.35 on x86_64.

AFAICS, it steps on _ASM_EXTABLE_EX being more brittle in 4.1 - it pretty
much has to have the handler on the next insn after the faulting one, or
the resulting extable entry won't be recognized.  This
"x86/mm: Expand the exception table logic to allow new handling options"
in mainline is where that requirement has disappeared.  I think we
ought to use the plain _ASM_EXTABLE and just call something that would
set current_thread_info()->uaccess_err directly from the fixup code there.
That, or backport the commit switching to less brittle extables.


Re: Official bugreport 4.1 kernel (audio gadget and ChipIdea)

2015-06-30 Thread Marek Vasut
On Tuesday, June 30, 2015 at 04:23:01 AM, Peter Chen wrote:
> On Fri, Jun 26, 2015 at 07:15:18PM +0200, Sébastien Pruvost wrote:
> > Hello,
> > 
> > I'm sending this mail to report a bug concerning the latest kernel 4.1.
> > 
> > Here is the problem (and the test I've done):
> > I have firstly used the 3.10.53 kernel for my two
> > sabrelites in
> > 
> > order to use the audio gadget driver with the Dual Role ChipIdea
> > Controller (in order to switch roles between my two IMX6 sabreLite).
> > After loading g_audio in my two sabreLite and plugging the cable (microA
> > – microB), there is an error “ci_hdrc.0 request length too big for
> > isochronous snd_uac2.0 1116 Error”.
> > And even after running aplay command, I still got this error and there is
> > no sound getting out of the jack port.
> > I've switched roles between the two boards by following this: https://
> > www.kernel.org/doc/Documentation/usb/chipidea.txt.
> > This works fine with the serial driver, I can see a new serial interface
> > (host side) and after switching role a new serial interfaces at device
> > side. Same thing for ethernet gadget: this works fine too. But not with
> > the audio gadget. In fact, there is a new audio interface at host side
> > but I can not interact with it (even alsamixer doesn’t see any controls
> > on this new sound card). I’ve tested that audio gadget works fine if I
> > don’t use ChipIdea HighSpeed Dual Role Controller.
> > 
> > Secondly I have tested this audio gadget with the latest
> > Kernel
> > 
> > 4.1 for my two IMX6 sabrelites (imx_v6_v7_defconfig). Now these previous
> > errors are gone but there are still no sound getting out of the jack
> > port (even if there are a new sound card in host side)
> 
> It is may not a role switch problem, please check if the g_audio can
> work well with an ubuntu PC (make sure your codec works well).

ci_hdrc.0 request length too big for isochronous

Doesn't this just mean it cannot transfer such a long buffer via ISO pipe ?
I guess the UAC should send smaller buffers to work with the CI HDRC?

Best regards,
Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Official bugreport 4.1 kernel (audio gadget and ChipIdea)

2015-06-29 Thread Peter Chen
On Fri, Jun 26, 2015 at 07:15:18PM +0200, Sébastien Pruvost wrote:
> Hello,
> 
> I'm sending this mail to report a bug concerning the latest kernel 4.1.
> 
> Here is the problem (and the test I've done):
> 
> I have firstly used the 3.10.53 kernel for my two sabrelites 
> in
> order to use the audio gadget driver with the Dual Role ChipIdea Controller 
> (in
> order to switch roles between my two IMX6 sabreLite).
> After loading g_audio in my two sabreLite and plugging the cable (microA –
> microB), there is an error “ci_hdrc.0 request length too big for isochronous
> snd_uac2.0 1116 Error”.
> And even after running aplay command, I still got this error and there is no
> sound getting out of the jack port.
> I've switched roles between the two boards by following this: https://
> www.kernel.org/doc/Documentation/usb/chipidea.txt.
> This works fine with the serial driver, I can see a new serial interface (host
> side) and after switching role a new serial interfaces at device side. Same
> thing for ethernet gadget: this works fine too. But not with the audio gadget.
> In fact, there is a new audio interface at host side but I can not interact
> with it (even alsamixer doesn’t see any controls on this new sound card). I’ve
> tested that audio gadget works fine if I don’t use ChipIdea HighSpeed Dual 
> Role
> Controller.
> 
>  
> 
> Secondly I have tested this audio gadget with the latest 
> Kernel
> 4.1 for my two IMX6 sabrelites (imx_v6_v7_defconfig). Now these previous 
> errors
> are gone but there are still no sound getting out of the jack port (even if
> there are a new sound card in host side)
> 

It is may not a role switch problem, please check if the g_audio can
work well with an ubuntu PC (make sure your codec works well).

> 
> I think this needs a patch to fix that.
> Best regards
> 
> Sébastien Pruvost.
> 

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [RFC v5 - RESEND] debug: prevent entering debug mode on panic/exception.

2015-01-28 Thread Daniel Thompson
On 28/01/15 10:39, Kiran Raparthy wrote:
> From: Colin Cross 
> 
> debug: prevent entering debug mode on panic/exception.
> 
> On non-developer devices, kgdb prevents the device from rebooting
> after a panic.
> 
> Incase of panics and exceptions, to allow the device to reboot, prevent 
> entering
> debug mode to avoid getting stuck waiting for the user to interact with 
> debugger.
> 
> To avoid entering the debugger on panic/exception without any extra 
> configuration,
> panic_timeout is being used which can be set via /proc/sys/kernel/panic at 
> run time
> and CONFIG_PANIC_TIMEOUT sets the default value.
> 
> Setting panic_timeout indicates that the user requested machine to perform
> unattended reboot after panic. We dont want to get stuck waiting for the user
> input incase of panic.

Some kind of changelog between the versions would have been nice. I
*think* the difference between v4 and v5 was just the addition paragraph
above but I had to put in extra work to check that and I'm still not
100% sure that's the only change.

Also you could start billing this as a PATCH rather than an RFC.


Daniel.


> Cc: Jason Wessel 
> Cc: Andrew Morton 
> Cc: kgdb-bugrep...@lists.sourceforge.net
> Cc: linux-kernel@vger.kernel.org
> Cc: Android Kernel Team 
> Cc: John Stultz 
> Cc: Sumit Semwal 
> Signed-off-by: Colin Cross 
> [Kiran: Added context to commit message.
> panic_timeout is used instead of break_on_panic and
> break_on_exception to honor CONFIG_PANIC_TIMEOUT
> Modified the commit as per community feedback]
> Signed-off-by: Kiran Raparthy 
> Reviewed-by: Daniel Thompson 
> ---
>  kernel/debug/debug_core.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 1adf62b..0012a1f 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -689,6 +689,14 @@ kgdb_handle_exception(int evector, int signo, int ecode, 
> struct pt_regs *regs)
>  
>   if (arch_kgdb_ops.enable_nmi)
>   arch_kgdb_ops.enable_nmi(0);
> + /*
> +  * Avoid entering the debugger if we were triggered due to an oops
> +  * but panic_timeout indicates the system should automatically
> +  * reboot on panic. We don't want to get stuck waiting for input
> +  * on such systems, especially if its "just" an oops.
> +  */
> + if (signo != SIGTRAP && panic_timeout)
> + return 1;
>  
>   memset(ks, 0, sizeof(struct kgdb_state));
>   ks->cpu = raw_smp_processor_id();
> @@ -821,6 +829,15 @@ static int kgdb_panic_event(struct notifier_block *self,
>   unsigned long val,
>   void *data)
>  {
> + /*
> +  * Avoid entering the debugger if we were triggered due to a panic
> +  * We don't want to get stuck waiting for input from user in such case.
> +  * panic_timeout indicates the system should automatically
> +  * reboot on panic.
> +  */
> + if (panic_timeout)
> + return NOTIFY_DONE;
> +
>   if (dbg_kdb_mode)
>   kdb_printf("PANIC: %s\n", (char *)data);
>   kgdb_breakpoint();
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [RFC v5 - RESEND] debug: prevent entering debug mode on panic/exception.

2015-01-28 Thread Kiran Raparthy
On 28 January 2015 at 16:25, Daniel Thompson  wrote:
> On 28/01/15 10:39, Kiran Raparthy wrote:
>> From: Colin Cross 
>>
>> debug: prevent entering debug mode on panic/exception.
>>
>> On non-developer devices, kgdb prevents the device from rebooting
>> after a panic.
>>
>> Incase of panics and exceptions, to allow the device to reboot, prevent 
>> entering
>> debug mode to avoid getting stuck waiting for the user to interact with 
>> debugger.
>>
>> To avoid entering the debugger on panic/exception without any extra 
>> configuration,
>> panic_timeout is being used which can be set via /proc/sys/kernel/panic at 
>> run time
>> and CONFIG_PANIC_TIMEOUT sets the default value.
>>
>> Setting panic_timeout indicates that the user requested machine to perform
>> unattended reboot after panic. We dont want to get stuck waiting for the user
>> input incase of panic.
>
> Some kind of changelog between the versions would have been nice. I
> *think* the difference between v4 and v5 was just the addition paragraph
> above but I had to put in extra work to check that and I'm still not
> 100% sure that's the only change.
Since the change was related to only commit message on all the earlier
patches,i didn't update the history.
Anyways i'll ensure to include the history going forward.
>
> Also you could start billing this as a PATCH rather than an RFC.
Since i didn't get any acknowledgement,i didn't upgrade it to PATCH.
Anyways,i'll update and resend the patch,Thanks for the inputs.

>
>
> Daniel.
>
>
>> Cc: Jason Wessel 
>> Cc: Andrew Morton 
>> Cc: kgdb-bugrep...@lists.sourceforge.net
>> Cc: linux-kernel@vger.kernel.org
>> Cc: Android Kernel Team 
>> Cc: John Stultz 
>> Cc: Sumit Semwal 
>> Signed-off-by: Colin Cross 
>> [Kiran: Added context to commit message.
>> panic_timeout is used instead of break_on_panic and
>> break_on_exception to honor CONFIG_PANIC_TIMEOUT
>> Modified the commit as per community feedback]
>> Signed-off-by: Kiran Raparthy 
>> Reviewed-by: Daniel Thompson 
>> ---
>>  kernel/debug/debug_core.c | 17 +
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
>> index 1adf62b..0012a1f 100644
>> --- a/kernel/debug/debug_core.c
>> +++ b/kernel/debug/debug_core.c
>> @@ -689,6 +689,14 @@ kgdb_handle_exception(int evector, int signo, int 
>> ecode, struct pt_regs *regs)
>>
>>   if (arch_kgdb_ops.enable_nmi)
>>   arch_kgdb_ops.enable_nmi(0);
>> + /*
>> +  * Avoid entering the debugger if we were triggered due to an oops
>> +  * but panic_timeout indicates the system should automatically
>> +  * reboot on panic. We don't want to get stuck waiting for input
>> +  * on such systems, especially if its "just" an oops.
>> +  */
>> + if (signo != SIGTRAP && panic_timeout)
>> + return 1;
>>
>>   memset(ks, 0, sizeof(struct kgdb_state));
>>   ks->cpu = raw_smp_processor_id();
>> @@ -821,6 +829,15 @@ static int kgdb_panic_event(struct notifier_block *self,
>>   unsigned long val,
>>   void *data)
>>  {
>> + /*
>> +  * Avoid entering the debugger if we were triggered due to a panic
>> +  * We don't want to get stuck waiting for input from user in such case.
>> +  * panic_timeout indicates the system should automatically
>> +  * reboot on panic.
>> +  */
>> + if (panic_timeout)
>> + return NOTIFY_DONE;
>> +
>>   if (dbg_kdb_mode)
>>   kdb_printf("PANIC: %s\n", (char *)data);
>>   kgdb_breakpoint();
>>
>

Regards,
Kiran
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-31 Thread Yijing Wang
Hi Thomas,
   Thanks for your reply!


>>  nf_defrag_ipv4 xt_state nf_conntrack usr_cache(O) acpi_cpufreq mperf 
>> processor thermal_sys sg hwmon iptable_filter ip_tables x_tables ixgbe(O) 
>> igb(O) bonding(O) tg(O) netmgmt(O) drvinstall(PO) dal(PO) dca usb_storage(O) 
>> uhci_hcd ehci_hcd usbcore(O) usb_common sata_mml(O) ahci(O) libata(O) 
>> satahp(O) drvframe(PO) sd_mod crc_t10dif scsi_mod agetty_interface(PO) ext3 
>> jbd mbcache nvram_printk(PO) os_feeddog(PO) os_debug(O) osp_proc(PO) 
>> os_rnvramdev(PO) vos(O) bsp(PO) os_die_handler(O) os_oom_handler(O) 
>> os_panic_handler(O) biosnvramdriver(O) kbox(O)
>> [2012-03-26 18:55:43][  929.252460] Pid: 17495, comm: 3th SioT Tainted: P
>>O 3.4.24.15-0.11-default #1
> 
> You have loaded a gazillion of proprietary and out of tree modules and
> your kernel is tainted 'P'.
> 
> None of our problems. See:
> 
>  http://lwn.net/1999/0211/a/lt-binary.html
> 
>  https://lwn.net/Articles/287056/
> 
> I'm in a good mood today and give you some hints:
> 
> - Ingos patch is correct and always has been for RT.
> 
> - We had not a single bug report against this in almost 10 years.
> 
> - File your bugs to those who abuse our work and violate our license.

Actually, we applied the RT stable patch above the Greg 3.4 stable tree.
No other changes in schedule filed.

I will try to find out what codes cause this issue.

Thanks!
Yijing.

> 
> Case closed.
> 
>  tglx
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-31 Thread Yijing Wang
>> Because this patch does not exist in the latest Linus kernel, so I
>> have not reported this issue to kernel bugzilla.
> 
> This patch exists in all -RT releases up to 3.12. If there is an issue
> with it, it should be solved.
> 
> If the sched bit set is and you can't get lock later then the tasklet
> has be to active. Finally, not getting the lock in the tasklet code
> itself means it is still occupied by the "add-to-the-list" part which
> actually can't happen according to the code.
> You said, that you have an eight-way. Is this also NUMA? If so, does
> this problem happen if you disable NUMA (i.e. run only one NUMA node
> and use only the memory that is directly attached to the node).

Hi Sebastian,
   The target platform is not NUMA, there is only one node.

Thanks!
Yijing.

> 
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-31 Thread Thomas Gleixner
On Mon, 3 Mar 2014, Yijing Wang wrote:

> Hi list,
>I found a tasklet related issue in linux-stable-rt 3.4.
> 
> And after I revert following commit, the test result seems ok(test lasted 
> 40hours).
> 
> commit 0d9f73fc1e7270a3f8709c59c913408153d9d9f8

This commit id does not exist in the official stable rt tree.

> Author: Ingo Molnar 
> Date:   Tue Nov 29 20:18:22 2011 -0500
> 
> tasklet: Prevent tasklets from going into infinite spin in RT
 
> 
> I test FC driver IO in this kernel, and after a few hours test, FC IO will 
> abort, I found a lot of tasklet WARNING Call Trace in kernel message,like:
> 
> [2012-03-26 18:55:43][  929.252289] [ cut here ]
> [2012-03-26 18:55:43][  929.252312] WARNING: at kernel/softirq.c:773 
> __tasklet_action+0x51/0x1a0()

There is no warning at line 773 in any official linux-stable-rt 3.4.

> [2012-03-26 18:55:43][  929.252314] Hardware name: Romley
> [2012-03-26 18:55:43][  929.252316] Modules linked in: isd_fid(O) ivs_edft(O) 
> ivs_emp(O) ivs_xnet(O) isd_rds(O) isd_idm(O) isd_dft(O) isd_base(O) sdm(O) 
> isd_cmm(O) isd_ibc(O) isd_lib(O) xve_hab(PO) xve_net(PO) 
> xve_cls_msg_filter(PO) xve_dscp(PO) pagepool(PO) iod(O) cmm(PO) util(PO) 
> intel_t10(PO) itest_nid(PO) dmi(PO) bsp_adapter(PO) mpa(O) ipmi_si 
> ipmi_devintf ipmi_msghandler iscsi_sw(PO) iscsi_prot(O) iscsi_seg(PO) 
> iscsi_comm(PO) iscsi_initiator(PO) 8192cu(O) pciehp(PO) pcieaer(PO) 
> pciecore(PO) drvinstallthird(PO) quark(O) sal(O) pmsas(O) foe(O) lfcoe(O) 
> libfc(O) ib_uverbs(O) ibtgt(O) ib_srpt(O) ib_cm(O) ib_sa(O) mlx4_ib(O) 
> ib_umad(O) ib_mad(O) mlx4_core(O) ib_core(O) drvtom(O) cxgb4(O) drvtoecore(O) 
> fcdrv(PO) unflowlevel(PO) unfcommon(O) drvmml(PO) scsi_transport_fc scsi_tgt 
> memtest(PO) drv_iosubsys_ini(O) iocount(O) bsp_mml(PO) agetty_query(PO) 
> cpufreq_powersave af_packet nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
> ip6_tables xt_limit xt_tcpudp xt_multiport nf_conntr!
> ack_ipv4
>  nf_defrag_ipv4 xt_state nf_conntrack usr_cache(O) acpi_cpufreq mperf 
> processor thermal_sys sg hwmon iptable_filter ip_tables x_tables ixgbe(O) 
> igb(O) bonding(O) tg(O) netmgmt(O) drvinstall(PO) dal(PO) dca usb_storage(O) 
> uhci_hcd ehci_hcd usbcore(O) usb_common sata_mml(O) ahci(O) libata(O) 
> satahp(O) drvframe(PO) sd_mod crc_t10dif scsi_mod agetty_interface(PO) ext3 
> jbd mbcache nvram_printk(PO) os_feeddog(PO) os_debug(O) osp_proc(PO) 
> os_rnvramdev(PO) vos(O) bsp(PO) os_die_handler(O) os_oom_handler(O) 
> os_panic_handler(O) biosnvramdriver(O) kbox(O)
> [2012-03-26 18:55:43][  929.252460] Pid: 17495, comm: 3th SioT Tainted: P 
>   O 3.4.24.15-0.11-default #1

You have loaded a gazillion of proprietary and out of tree modules and
your kernel is tainted 'P'.

None of our problems. See:

 http://lwn.net/1999/0211/a/lt-binary.html

 https://lwn.net/Articles/287056/

I'm in a good mood today and give you some hints:

- Ingos patch is correct and always has been for RT.

- We had not a single bug report against this in almost 10 years.

- File your bugs to those who abuse our work and violate our license.

Case closed.

 tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-31 Thread Sebastian Andrzej Siewior
On 03/29/2014 07:35 AM, Yijing Wang wrote:
> Hi Sebastian,

Hi Yijing,

>Thanks for your reply and help to look at it, thanks!
> 
> I also check the tasklet state machine changes, and didn't find
> clue for this issue. So I Temporarily reverted Ingo's patch, without
> this patch, my test is ok.
> 
> Because this patch does not exist in the latest Linus kernel, so I
> have not reported this issue to kernel bugzilla.

This patch exists in all -RT releases up to 3.12. If there is an issue
with it, it should be solved.

If the sched bit set is and you can't get lock later then the tasklet
has be to active. Finally, not getting the lock in the tasklet code
itself means it is still occupied by the "add-to-the-list" part which
actually can't happen according to the code.
You said, that you have an eight-way. Is this also NUMA? If so, does
this problem happen if you disable NUMA (i.e. run only one NUMA node
and use only the memory that is directly attached to the node).

> Finally, I would like to thank you again.
> 
> Thanks!
> Yijing.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-28 Thread Yijing Wang
Hi Sebastian,
   Thanks for your reply and help to look at it, thanks!

I also check the tasklet state machine changes, and didn't find
clue for this issue. So I Temporarily reverted Ingo's patch, without
this patch, my test is ok.

commit 0d9f73fc1e7270a3f8709c59c913408153d9d9f8
Author: Ingo Molnar 
Date:   Tue Nov 29 20:18:22 2011 -0500

tasklet: Prevent tasklets from going into infinite spin in RT

Because this patch does not exist in the latest Linus kernel, so I
have not reported this issue to kernel bugzilla.

Finally, I would like to thank you again.

Thanks!
Yijing.


On 2014/3/29 0:37, Sebastian Andrzej Siewior wrote:
> * Yijing Wang | 2014-03-03 17:24:39 [+0800]:
> 
>> [2012-03-26 18:55:43][  929.252312] WARNING: at kernel/softirq.c:773 
>> __tasklet_action+0x51/0x1a0()
>> [2012-03-27 03:41:06][ 3647.886005] WARNING: at kernel/softirq.c:773 
>> __tasklet_action+0x51/0x1a0()
>> [2012-03-27 03:42:04][ 3705.434418] WARNING: at kernel/softirq.c:799 
>> __tasklet_action+0xae/0x1a0()
> 
>> FC card hardware  ---> FC driver interrupt handler  
>> ->tasklet_schedule(fc driver tasklet) --->tasklet running, call 
>> function process FC IO data.
>>here will disable FC card interrupt   
>>   here will enable FC card interrupt again
> 
> This looks okay.
> 
>> We found the tasklet state is 0x1(mean state is TASKLET_STATE_SCHED),count 
>> is 0, before we call tasklet_schedule().
>> So the new tasklet can not add to CPU list.
>>
>> And I also add some dynamic debug in __tasklet_action(); after the issue 
>> occur, I open the dynamic debug.
>> After we force the hardware reset to interrupt OS, we never found the FC 
>> driver tasklet running in dmesg(I identify the tasklet by its data).
>> I guess the FC tasklet is not in CPU global tasklet list.
> You guess correct.
> 
>> I hope somebody can help to look at it. If I missing something, let me know.
> 
> The tasklet is always added to the local cpu, never cross. That list is
> always accessed with interrupts off.
> With TASKLET_STATE_SCHED set, the next step is to add the task let to
> the CPU's tasklet list. This isn't done if TASKLET_STATE_RUN is already
> set which means __tasklet_action() is already busy serving the tasklet.
> In that case it clears TASKLET_STATE_SCHED and invokes the tasklet
> again.
> After looking at it for a while I must say I have no idea how you
> managed to keep TASKLET_STATE_SCHED set. Further, each time
> TASKLET_STATE_RUN is cleared it is always with a cmpxchg() down to zero
> which means TASKLET_STATE_SCHED is removed earlier.
> That said, triggerring the warning at 773 is the first thing that went
> wrong. After it has been added to the list, the TASKLET_STATE_RUN is
> cleared again. I have no idea how it managed to remain still on except
> that __tasklet_common_schedule() is invoked which is protected by the
> SCHED bit…
> 
>> Thanks!
>> Yijing.
> 
> Sebastian
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-28 Thread Sebastian Andrzej Siewior
* Yijing Wang | 2014-03-03 17:24:39 [+0800]:

>[2012-03-26 18:55:43][  929.252312] WARNING: at kernel/softirq.c:773 
>__tasklet_action+0x51/0x1a0()
>[2012-03-27 03:41:06][ 3647.886005] WARNING: at kernel/softirq.c:773 
>__tasklet_action+0x51/0x1a0()
>[2012-03-27 03:42:04][ 3705.434418] WARNING: at kernel/softirq.c:799 
>__tasklet_action+0xae/0x1a0()

>FC card hardware  ---> FC driver interrupt handler  
>->tasklet_schedule(fc driver tasklet) --->tasklet running, call 
>function process FC IO data.
>here will disable FC card interrupt
>  here will enable FC card interrupt again

This looks okay.

>We found the tasklet state is 0x1(mean state is TASKLET_STATE_SCHED),count is 
>0, before we call tasklet_schedule().
>So the new tasklet can not add to CPU list.
>
>And I also add some dynamic debug in __tasklet_action(); after the issue 
>occur, I open the dynamic debug.
>After we force the hardware reset to interrupt OS, we never found the FC 
>driver tasklet running in dmesg(I identify the tasklet by its data).
>I guess the FC tasklet is not in CPU global tasklet list.
You guess correct.

>I hope somebody can help to look at it. If I missing something, let me know.

The tasklet is always added to the local cpu, never cross. That list is
always accessed with interrupts off.
With TASKLET_STATE_SCHED set, the next step is to add the task let to
the CPU's tasklet list. This isn't done if TASKLET_STATE_RUN is already
set which means __tasklet_action() is already busy serving the tasklet.
In that case it clears TASKLET_STATE_SCHED and invokes the tasklet
again.
After looking at it for a while I must say I have no idea how you
managed to keep TASKLET_STATE_SCHED set. Further, each time
TASKLET_STATE_RUN is cleared it is always with a cmpxchg() down to zero
which means TASKLET_STATE_SCHED is removed earlier.
That said, triggerring the warning at 773 is the first thing that went
wrong. After it has been added to the list, the TASKLET_STATE_RUN is
cleared again. I have no idea how it managed to remain still on except
that __tasklet_common_schedule() is invoked which is protected by the
SCHED bit…

>Thanks!
>Yijing.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUGREPORT] Tasklet scheduled issue in Linux 3.4.x-rt

2014-03-03 Thread Yijing Wang
Hi list,
   I found a tasklet related issue in linux-stable-rt 3.4.

And after I revert following commit, the test result seems ok(test lasted 
40hours).

commit 0d9f73fc1e7270a3f8709c59c913408153d9d9f8
Author: Ingo Molnar 
Date:   Tue Nov 29 20:18:22 2011 -0500

tasklet: Prevent tasklets from going into infinite spin in RT


I test FC driver IO in this kernel, and after a few hours test, FC IO will 
abort, I found a lot of tasklet WARNING Call Trace in kernel message,like:

[2012-03-26 18:55:43][  929.252289] [ cut here ]
[2012-03-26 18:55:43][  929.252312] WARNING: at kernel/softirq.c:773 
__tasklet_action+0x51/0x1a0()
[2012-03-26 18:55:43][  929.252314] Hardware name: Romley
[2012-03-26 18:55:43][  929.252316] Modules linked in: isd_fid(O) ivs_edft(O) 
ivs_emp(O) ivs_xnet(O) isd_rds(O) isd_idm(O) isd_dft(O) isd_base(O) sdm(O) 
isd_cmm(O) isd_ibc(O) isd_lib(O) xve_hab(PO) xve_net(PO) xve_cls_msg_filter(PO) 
xve_dscp(PO) pagepool(PO) iod(O) cmm(PO) util(PO) intel_t10(PO) itest_nid(PO) 
dmi(PO) bsp_adapter(PO) mpa(O) ipmi_si ipmi_devintf ipmi_msghandler 
iscsi_sw(PO) iscsi_prot(O) iscsi_seg(PO) iscsi_comm(PO) iscsi_initiator(PO) 
8192cu(O) pciehp(PO) pcieaer(PO) pciecore(PO) drvinstallthird(PO) quark(O) 
sal(O) pmsas(O) foe(O) lfcoe(O) libfc(O) ib_uverbs(O) ibtgt(O) ib_srpt(O) 
ib_cm(O) ib_sa(O) mlx4_ib(O) ib_umad(O) ib_mad(O) mlx4_core(O) ib_core(O) 
drvtom(O) cxgb4(O) drvtoecore(O) fcdrv(PO) unflowlevel(PO) unfcommon(O) 
drvmml(PO) scsi_transport_fc scsi_tgt memtest(PO) drv_iosubsys_ini(O) 
iocount(O) bsp_mml(PO) agetty_query(PO) cpufreq_powersave af_packet 
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_limit xt_tcpudp 
xt_multiport nf_conntr!
ack_ipv4
 nf_defrag_ipv4 xt_state nf_conntrack usr_cache(O) acpi_cpufreq mperf processor 
thermal_sys sg hwmon iptable_filter ip_tables x_tables ixgbe(O) igb(O) 
bonding(O) tg(O) netmgmt(O) drvinstall(PO) dal(PO) dca usb_storage(O) uhci_hcd 
ehci_hcd usbcore(O) usb_common sata_mml(O) ahci(O) libata(O) satahp(O) 
drvframe(PO) sd_mod crc_t10dif scsi_mod agetty_interface(PO) ext3 jbd mbcache 
nvram_printk(PO) os_feeddog(PO) os_debug(O) osp_proc(PO) os_rnvramdev(PO) 
vos(O) bsp(PO) os_die_handler(O) os_oom_handler(O) os_panic_handler(O) 
biosnvramdriver(O) kbox(O)
[2012-03-26 18:55:43][  929.252460] Pid: 17495, comm: 3th SioT Tainted: P   
O 3.4.24.15-0.11-default #1
[2012-03-26 18:55:43][  929.252463] Call Trace:
[2012-03-26 18:55:43][  929.252465][] ? 
__tasklet_action+0x51/0x1a0
[2012-03-26 18:55:43][  929.252481]  [] 
warn_slowpath_common+0x7a/0xb0
[2012-03-26 18:55:43][  929.252486]  [] 
warn_slowpath_null+0x15/0x20
[2012-03-26 18:55:43][  929.252490]  [] 
__tasklet_action+0x51/0x1a0
[2012-03-26 18:55:43][  929.252494]  [] 
tasklet_action+0x59/0x60
[2012-03-26 18:55:43][  929.252498]  [] 
handle_pending_softirqs+0xb0/0x170
[2012-03-26 18:55:43][  929.252502]  [] __do_softirq+0x49/0xa0
[2012-03-26 18:55:43][  929.252513]  [] call_softirq+0x1c/0x30
[2012-03-26 18:55:43][  929.252519]  [] do_softirq+0x65/0xa0
[2012-03-26 18:55:43][  929.252523]  [] irq_exit+0xc5/0xe0
[2012-03-26 18:55:43][  929.252526]  [] do_IRQ+0x64/0xe0
[2012-03-26 18:55:43][  929.252534]  [] 
common_interrupt+0x6a/0x6a
[2012-03-26 18:55:43][  929.252536][] ? 
_raw_spin_unlock_irqrestore+0x16/0x30
[2012-03-26 18:55:43][  929.252565]  [] 
SDM_SDGetDisk+0x64/0x100 [sdm]
[2012-03-26 18:55:43][  929.252575]  [] 
SDM_FRAMESdGetDisk+0x17/0x80 [sdm]
[2012-03-26 18:55:43][  929.252585]  [] 
SDM_ERRAddTimer+0x33/0x370 [sdm]
[2012-03-26 18:55:43][  929.252594]  [] 
SDM_FRAMEErrAddTimer+0x17/0x80 [sdm]
[2012-03-26 18:55:43][  929.252604]  [] 
SDM_SIOQueueReqProcess+0x67/0x7d0 [sdm]
[2012-03-26 18:55:43][  929.252612]  [] 
SDM_SIOQueueThread+0x142/0x310 [sdm]
[2012-03-26 18:55:43][  929.252618]  [] 
kernel_thread_helper+0x4/0x10
[2012-03-26 18:55:43][  929.252627]  [] ? 
SDM_SIOQueueReqProcess+0x7d0/0x7d0 [sdm]
[2012-03-26 18:55:43][  929.252632]  [] ? gs_change+0x13/0x13
[2012-03-26 18:55:43][  929.252635] ---[ end trace a82addcbe6cbf131 ]---
...[snip].
[2012-03-27 03:41:06][ 3647.885973] [ cut here ]
[2012-03-27 03:41:06][ 3647.886005] WARNING: at kernel/softirq.c:773 
__tasklet_action+0x51/0x1a0()
[2012-03-27 03:41:06][ 3647.886010] Hardware name: Romley
[2012-03-27 03:41:06][ 3647.886012] Modules linked in: isd_fid(O) ivs_edft(O) 
ivs_emp(O) ivs_xnet(O) isd_rds(O) isd_idm(O) isd_dft(O) isd_base(O) sdm(O) 
isd_cmm(O) isd_ibc(O) isd_lib(O) xve_hab(PO) xve_net(PO) xve_cls_msg_filter(PO) 
xve_dscp(PO) pagepool(PO) iod(O) cmm(PO) util(PO) intel_t10(PO) itest_nid(PO) 
dmi(PO) bsp_adapter(PO) mpa(O) ipmi_si ipmi_devintf ipmi_msghandler 
iscsi_sw(PO) iscsi_prot(O) iscsi_seg(PO) iscsi_comm(PO) iscsi_initiator(PO) 
8192cu(O) pciehp(PO) pcieaer(PO) pciecore(PO) drvinstallthird(PO) quark(O) 
sal(O) pmsas(O) foe(O) lfcoe(O) libfc(O) ib_uverbs(O) ibtgt(O) ib_srpt(O) 
ib_cm(O) ib_sa(O) mlx4_ib(O) ib_umad(O) ib_mad(O) mlx4_co

Re: [BUGREPORT] Linux USB 3.0

2014-02-11 Thread Markus Rechberger
On Tue, Feb 11, 2014 at 7:45 PM, Greg KH  wrote:
> On Tue, Feb 11, 2014 at 07:29:47PM +0100, Markus Rechberger wrote:
>> On Mon, Feb 10, 2014 at 12:15 AM, Robert Hancock  
>> wrote:
>> > On 08/02/14 03:00 AM, Markus Rechberger wrote:
>> >>
>> >> On Tue, Feb 4, 2014 at 10:31 AM, David Laight 
>> >> wrote:
>> >>>
>> >>> From: Markus Rechberger
>> >>
>> >> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0:
>> >> ERROR Transfer event TRB DMA
>> 
>>  ptr
>> >
>> >
>> > These messages might be harmless.  The 3.0 kernel contains a fix for
>> > Intel Panther Point xHCI hosts that suppresses those messages, commit
>> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
>> > successful event."
>> >
>> > A later commit extends that to all xHCI 1.0 hosts, commit
>> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
>> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
>> > queued for 3.11 and marked to be backported into stable kernels as old
>> > as 3.0.
>> >>>
>> >>>
>> >>> I see the same error message on the 0.96 ASMedia controller when
>> >>> the rx buffers for the ax88179_178a driver cross 64k boundaries.
>> >>>
>> >>> So this isn't confined to 1.0 controllers.
>> >>>
>> >>
>> >> Sarah,
>> >>
>> >> since there is no response yet, is there anyone at Intel dedicated at
>> >> working on USB 3.0?
>> >> We are also getting more and more negative USB 3.0 feedback with Linux
>> >
>> >
>> > Still nobody appears to have provided the requested debugging information
>> > that was requested. So there is not much that can be done upstream to debug
>> > things based only on vague reports, especially when not using current 
>> > kernel
>> > versions.
>> >
>>
>> Next kernel crash report, this time a Synology NAS System:
>> http://support.sundtek.com/index.php/topic,1511.0.html
>
> That kernel has a closed source kernel module loaded, no community
> member can look at it, sorry, please get support from the company that
> wrote that module.
>

I'm going to collect all XHCI issues we get here as a reference,
unfortunately we're busy with our own hardware so we don't have the
time to dig into USB 3.0 Kernel issues at the moment. All that can be
done is collecting the feedback and maybe help to translate between
German and English. So if someone wants to volunteer to fix some
issues (eg Intel) just drop me a line. As Sarah indicated there are
already several issues mentioned within this post.

Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-11 Thread Greg KH
On Tue, Feb 11, 2014 at 07:29:47PM +0100, Markus Rechberger wrote:
> On Mon, Feb 10, 2014 at 12:15 AM, Robert Hancock  wrote:
> > On 08/02/14 03:00 AM, Markus Rechberger wrote:
> >>
> >> On Tue, Feb 4, 2014 at 10:31 AM, David Laight 
> >> wrote:
> >>>
> >>> From: Markus Rechberger
> >>
> >> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0:
> >> ERROR Transfer event TRB DMA
> 
>  ptr
> >
> >
> > These messages might be harmless.  The 3.0 kernel contains a fix for
> > Intel Panther Point xHCI hosts that suppresses those messages, commit
> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
> > successful event."
> >
> > A later commit extends that to all xHCI 1.0 hosts, commit
> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
> > queued for 3.11 and marked to be backported into stable kernels as old
> > as 3.0.
> >>>
> >>>
> >>> I see the same error message on the 0.96 ASMedia controller when
> >>> the rx buffers for the ax88179_178a driver cross 64k boundaries.
> >>>
> >>> So this isn't confined to 1.0 controllers.
> >>>
> >>
> >> Sarah,
> >>
> >> since there is no response yet, is there anyone at Intel dedicated at
> >> working on USB 3.0?
> >> We are also getting more and more negative USB 3.0 feedback with Linux
> >
> >
> > Still nobody appears to have provided the requested debugging information
> > that was requested. So there is not much that can be done upstream to debug
> > things based only on vague reports, especially when not using current kernel
> > versions.
> >
> 
> Next kernel crash report, this time a Synology NAS System:
> http://support.sundtek.com/index.php/topic,1511.0.html

That kernel has a closed source kernel module loaded, no community
member can look at it, sorry, please get support from the company that
wrote that module.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-11 Thread Bjørn Mork
Markus Rechberger  writes:

> Next kernel crash report, this time a Synology NAS System:
> http://support.sundtek.com/index.php/topic,1511.0.html

There is no etxhci_hcd driver in the mainline kernel...


Feb 11 18:50:41 DiskStation kernel: [103740.405521] Backtrace:
Feb 11 18:50:41 DiskStation kernel: [103740.408095] [<7f2d8f2c>] 
(find_trb_seg+0x0/0x54 [etxhci_hcd]) from [<7f2d9ac0>] 
(etxhci_find_new_dequeue_state+0x5c/0x200 [etxhci_hcd])
Feb 11 18:50:41 DiskStation kernel: [103740.420389]  r4:9675fd44
Feb 11 18:50:41 DiskStation kernel: [103740.423046] [<7f2d9a64>] 
(etxhci_find_new_dequeue_state+0x0/0x200 [etxhci_hcd]) from [<7f2d4520>] 
(etxhci_cleanup_stalled_ring+0x50/0x140 [etxhci_hcd])
Feb 11 18:50:41 DiskStation kernel: [103740.436749] [<7f2d44d0>] 
(etxhci_cleanup_stalled_ring+0x0/0x140 [etxhci_hcd]) from [<7f2d46e0>] 
(etxhci_endpoint_reset+0xd0/0x100 [etxhci_hcd])
Feb 11 18:50:41 DiskStation kernel: [103740.449738]  r7:bc0e9830 r6:965b6360 
r5:bc0e9800 r4:be34cc00
Feb 11 18:50:41 DiskStation kernel: [103740.455595] [<7f2d4610>] 
(etxhci_endpoint_reset+0x0/0x100 [etxhci_hcd]) from [<7f086c00>] 
(usb_hcd_reset_endpoint+0x2c/0x80 [usbcore])
Feb 11 18:50:41 DiskStation kernel: [103740.467837] [<7f086bd4>] 
(usb_hcd_reset_endpoint+0x0/0x80 [usbcore]) from [<7f088ff0>] 
(usb_enable_endpoint+0x70/0x74 [usbcore])
Feb 11 18:50:41 DiskStation kernel: [103740.479558] [<7f088f80>] 
(usb_enable_endpoint+0x0/0x74 [usbcore]) from [<7f08903c>] 
(usb_enable_interface+0x48/0x5c [usbcore])
Feb 11 18:50:41 DiskStation kernel: [103740.491066]  r8:0001 r7:be34cc00 
r6:be8ff368 r5:0001 r4:002c
Feb 11 18:50:41 DiskStation kernel: [103740.497750] r3:0001
Feb 11 18:50:41 DiskStation kernel: [103740.500522] [<7f088ff4>] 
(usb_enable_interface+0x0/0x5c [usbcore]) from [<7f08942c>] 
(usb_set_interface+0x1c8/0x22c [usbcore])
Feb 11 18:50:41 DiskStation kernel: [103740.512032]  r8:be04c860 r7:bdc4d600 
r6: r5:be8ff368 r4:be34cc00
Feb 11 18:50:41 DiskStation kernel: [103740.518715] r3:be8ff368
Feb 11 18:50:41 DiskStation kernel: [103740.521495] [<7f089264>] 
(usb_set_interface+0x0/0x22c [usbcore]) from [<7f0902c0>] 
(usbdev_ioctl+0xf40/0x1cac [usbcore])
Feb 11 18:50:41 DiskStation kernel: [103740.532512] [<7f08f380>] 
(usbdev_ioctl+0x0/0x1cac [usbcore]) from [<800db114>] (do_vfs_ioctl+0xa8/0x8bc)
Feb 11 18:50:41 DiskStation kernel: [103740.542109] [<800db06c>] 
(do_vfs_ioctl+0x0/0x8bc) from [<800db968>] (sys_ioctl+0x40/0x64)
Feb 11 18:50:41 DiskStation kernel: [103740.550396]  r9:9675e000 r8:8000e388 
r7:0017 r6:80085504 r5:2f4f54f4
Feb 11 18:50:41 DiskStation kernel: [103740.557079] r4:bc10b540
Feb 11 18:50:41 DiskStation kernel: [103740.559822] [<800db928>] 
(sys_ioctl+0x0/0x64) from [<8000e1e0>] (ret_fast_syscall+0x0/0x30)
Feb 11 18:50:41 DiskStation kernel: [103740.568282]  r7:0036 r6: 
r5: r4:2f4f64d8


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-11 Thread Markus Rechberger
On Mon, Feb 10, 2014 at 12:15 AM, Robert Hancock  wrote:
> On 08/02/14 03:00 AM, Markus Rechberger wrote:
>>
>> On Tue, Feb 4, 2014 at 10:31 AM, David Laight 
>> wrote:
>>>
>>> From: Markus Rechberger
>>
>> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0:
>> ERROR Transfer event TRB DMA

 ptr
>
>
> These messages might be harmless.  The 3.0 kernel contains a fix for
> Intel Panther Point xHCI hosts that suppresses those messages, commit
> ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
> successful event."
>
> A later commit extends that to all xHCI 1.0 hosts, commit
> 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
> XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
> queued for 3.11 and marked to be backported into stable kernels as old
> as 3.0.
>>>
>>>
>>> I see the same error message on the 0.96 ASMedia controller when
>>> the rx buffers for the ax88179_178a driver cross 64k boundaries.
>>>
>>> So this isn't confined to 1.0 controllers.
>>>
>>
>> Sarah,
>>
>> since there is no response yet, is there anyone at Intel dedicated at
>> working on USB 3.0?
>> We are also getting more and more negative USB 3.0 feedback with Linux
>
>
> Still nobody appears to have provided the requested debugging information
> that was requested. So there is not much that can be done upstream to debug
> things based only on vague reports, especially when not using current kernel
> versions.
>

Next kernel crash report, this time a Synology NAS System:
http://support.sundtek.com/index.php/topic,1511.0.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-09 Thread Robert Hancock

On 08/02/14 03:00 AM, Markus Rechberger wrote:

On Tue, Feb 4, 2014 at 10:31 AM, David Laight  wrote:

From: Markus Rechberger

Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0: ERROR 
Transfer event TRB DMA

ptr


These messages might be harmless.  The 3.0 kernel contains a fix for
Intel Panther Point xHCI hosts that suppresses those messages, commit
ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
successful event."

A later commit extends that to all xHCI 1.0 hosts, commit
07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
queued for 3.11 and marked to be backported into stable kernels as old
as 3.0.


I see the same error message on the 0.96 ASMedia controller when
the rx buffers for the ax88179_178a driver cross 64k boundaries.

So this isn't confined to 1.0 controllers.



Sarah,

since there is no response yet, is there anyone at Intel dedicated at
working on USB 3.0?
We are also getting more and more negative USB 3.0 feedback with Linux


Still nobody appears to have provided the requested debugging 
information that was requested. So there is not much that can be done 
upstream to debug things based only on vague reports, especially when 
not using current kernel versions.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-08 Thread Markus Rechberger
The next one, just today (unfortunately it's in German):
http://support.sundtek.com/index.php/topic,1505.msg11020.html#msg11020

This guy is using Ubuntu with Linux 3.13.0-8-generic
The system seems to freeze completely after some time.
Since the driver is using the usbdevfs interface the problem is in the usbcore.

On Sat, Feb 8, 2014 at 10:00 AM, Markus Rechberger
 wrote:
> On Tue, Feb 4, 2014 at 10:31 AM, David Laight  wrote:
>> From: Markus Rechberger
>>> >> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0: 
>>> >> ERROR Transfer event TRB DMA
>>> ptr
>>> >
>>> > These messages might be harmless.  The 3.0 kernel contains a fix for
>>> > Intel Panther Point xHCI hosts that suppresses those messages, commit
>>> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
>>> > successful event."
>>> >
>>> > A later commit extends that to all xHCI 1.0 hosts, commit
>>> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
>>> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
>>> > queued for 3.11 and marked to be backported into stable kernels as old
>>> > as 3.0.
>>
>> I see the same error message on the 0.96 ASMedia controller when
>> the rx buffers for the ax88179_178a driver cross 64k boundaries.
>>
>> So this isn't confined to 1.0 controllers.
>>
>
> Sarah,
>
> since there is no response yet, is there anyone at Intel dedicated at
> working on USB 3.0?
> We are also getting more and more negative USB 3.0 feedback with Linux
>
> Best Regards,
> Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-08 Thread Markus Rechberger
On Tue, Feb 4, 2014 at 10:31 AM, David Laight  wrote:
> From: Markus Rechberger
>> >> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0: 
>> >> ERROR Transfer event TRB DMA
>> ptr
>> >
>> > These messages might be harmless.  The 3.0 kernel contains a fix for
>> > Intel Panther Point xHCI hosts that suppresses those messages, commit
>> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
>> > successful event."
>> >
>> > A later commit extends that to all xHCI 1.0 hosts, commit
>> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
>> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
>> > queued for 3.11 and marked to be backported into stable kernels as old
>> > as 3.0.
>
> I see the same error message on the 0.96 ASMedia controller when
> the rx buffers for the ax88179_178a driver cross 64k boundaries.
>
> So this isn't confined to 1.0 controllers.
>

Sarah,

since there is no response yet, is there anyone at Intel dedicated at
working on USB 3.0?
We are also getting more and more negative USB 3.0 feedback with Linux

Best Regards,
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [BUGREPORT] Linux USB 3.0

2014-02-04 Thread David Laight
From: Markus Rechberger
> >> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0: ERROR 
> >> Transfer event TRB DMA
> ptr
> >
> > These messages might be harmless.  The 3.0 kernel contains a fix for
> > Intel Panther Point xHCI hosts that suppresses those messages, commit
> > ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
> > successful event."
> >
> > A later commit extends that to all xHCI 1.0 hosts, commit
> > 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
> > XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
> > queued for 3.11 and marked to be backported into stable kernels as old
> > as 3.0.

I see the same error message on the 0.96 ASMedia controller when
the rx buffers for the ax88179_178a driver cross 64k boundaries.

So this isn't confined to 1.0 controllers.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGREPORT] Linux USB 3.0

2014-02-03 Thread Markus Rechberger
Hi Sarah,

On Mon, Jan 20, 2014 at 8:35 PM, Sarah Sharp
 wrote:
> Hi Markus,
>
> I'm the xHCI driver maintainer, and it helps to Cc me on USB 3.0 bug
> reports.
>
> On Sat, Dec 28, 2013 at 07:24:20AM +0100, Markus Rechberger wrote:
>> just received following log snippset:
>
> Please state which kernel version you (or your customer) is running.
> You've reported issues with several different kernel versions, so which
> kernel are you running for this particular snippet?
>
>> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0: ERROR 
>> Transfer event TRB DMA ptr
>
> These messages might be harmless.  The 3.0 kernel contains a fix for
> Intel Panther Point xHCI hosts that suppresses those messages, commit
> ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
> successful event."
>
> A later commit extends that to all xHCI 1.0 hosts, commit
> 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
> XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
> queued for 3.11 and marked to be backported into stable kernels as old
> as 3.0.
>
>> the previous bug report of that user:
>> https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze
>
> Hmm, Greg didn't assign that bug to me, so I missed it, sorry.
>
>> On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger  
>> wrote:
>> > Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately
>> > we don't have such a board for testing and customer patience is
>> > limited to bisect the kernel.
>> >
>> > Does anyone have a clue what modification could have killed USB 3.0
>> > support within those releases?
>> > It does not seem to be SG support.
>
> 3.2 was the kernel where the Intel EHCI to xHCI port switchover code
> went in.  Without that code, all ports will remain under the EHCI host,
> and USB 3.0 devices will work at USB 2.0 speeds.  I suspect the USB
> device triggers an issue with the xHCI driver, and 3.2 only works
> because the device is on an EHCI port without the switchover code.
>
>> > On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger  
>> > wrote:
>> >> I just got another USB 3.0 bugreport, the entire system crashed. That
>> >> particular customer already filed a bugreport in November 2013 that
>> >> his system is in a bad state when using some USB 2.0 media devices
>> >> which even have opensource drivers built into the kernel.
>> >>
>> >> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12.
>> >> The affected board is an Intel DH87RL board.
>
> Why are they running 3.6.12 in particular?  That's not a supported
> stable kernel.
>

our customers are using any kind of linux kernel. The drivers are
using USBFS (devio.c) for interfacing with USB.
It seems like you are in contact with one customer who is using the
DH87RL board.
Just today we got another one in our forum using 3.12.9-2-ARCH.
Also Synology NAS users seem to be affected by the USB 2.0 through USB
3.0 issue.


>> >> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger
>> >>  wrote:
>> >>> A customer using a device with USBDEVFS is reporting following
>> >>> backtrace (it seems to be a rather generic issue related to linux usb
>> >>> 3.0 in general):
>> >>> According to him this problem is reproducible as soon as he starts the
>> >>> data transfer, is there anything known about that?
>> >>>
>> >>> He is using 3.12.0-031200-generic
>
> So at this point you've reported three separate bugs, all with the same
> symptom, but different kernel versions?  Are these all from the same bug
> reporter, or a different bug reporter?
>
> You've got me seriously confused right now.  Please keep one bug report
> to one mail thread, and get the original bug reporter to start that
> thread.  If this is from one bug reporter, please state the current
> kernel they are running, and send dmesg showing the issue with
> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on (you may
> also need to turn on CONFIG_DYNAMIC_DEBUG in later kernels).  Please
> attach the dmesg as a file, since your mail client line-wraps.
>
>> >>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: 
>> >>> ERROR Transfer event TRB DMA ptr not part of current TD
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: 
>> >>> ERROR Transfer event TRB DMA ptr not part of current TD
>> >>> D

Re: [BUGREPORT] Linux USB 3.0

2014-01-20 Thread Sarah Sharp
Hi Markus,

I'm the xHCI driver maintainer, and it helps to Cc me on USB 3.0 bug
reports.

On Sat, Dec 28, 2013 at 07:24:20AM +0100, Markus Rechberger wrote:
> just received following log snippset:

Please state which kernel version you (or your customer) is running.
You've reported issues with several different kernel versions, so which
kernel are you running for this particular snippet?

> Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0: ERROR 
> Transfer event TRB DMA ptr

These messages might be harmless.  The 3.0 kernel contains a fix for
Intel Panther Point xHCI hosts that suppresses those messages, commit
ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
successful event."

A later commit extends that to all xHCI 1.0 hosts, commit
07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0"  That was
queued for 3.11 and marked to be backported into stable kernels as old
as 3.0.

> the previous bug report of that user:
> https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze

Hmm, Greg didn't assign that bug to me, so I missed it, sorry.

> On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger  
> wrote:
> > Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately
> > we don't have such a board for testing and customer patience is
> > limited to bisect the kernel.
> >
> > Does anyone have a clue what modification could have killed USB 3.0
> > support within those releases?
> > It does not seem to be SG support.

3.2 was the kernel where the Intel EHCI to xHCI port switchover code
went in.  Without that code, all ports will remain under the EHCI host,
and USB 3.0 devices will work at USB 2.0 speeds.  I suspect the USB
device triggers an issue with the xHCI driver, and 3.2 only works
because the device is on an EHCI port without the switchover code.

> > On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger  
> > wrote:
> >> I just got another USB 3.0 bugreport, the entire system crashed. That
> >> particular customer already filed a bugreport in November 2013 that
> >> his system is in a bad state when using some USB 2.0 media devices
> >> which even have opensource drivers built into the kernel.
> >>
> >> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12.
> >> The affected board is an Intel DH87RL board.

Why are they running 3.6.12 in particular?  That's not a supported
stable kernel.

> >> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger
> >>  wrote:
> >>> A customer using a device with USBDEVFS is reporting following
> >>> backtrace (it seems to be a rather generic issue related to linux usb
> >>> 3.0 in general):
> >>> According to him this problem is reproducible as soon as he starts the
> >>> data transfer, is there anything known about that?
> >>>
> >>> He is using 3.12.0-031200-generic

So at this point you've reported three separate bugs, all with the same
symptom, but different kernel versions?  Are these all from the same bug
reporter, or a different bug reporter?

You've got me seriously confused right now.  Please keep one bug report
to one mail thread, and get the original bug reporter to start that
thread.  If this is from one bug reporter, please state the current
kernel they are running, and send dmesg showing the issue with
CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on (you may
also need to turn on CONFIG_DYNAMIC_DEBUG in later kernels).  Please
attach the dmesg as a file, since your mail client line-wraps.

> >>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0: 
> >>> ERROR Transfer event TRB DMA ptr not part of current TD
> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: 
> >>> ERROR Transfer event TRB DMA ptr not part of current TD
> >>> Dec 24 14:30:39 homenas kernel: last message repeated 16 times
> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: 
> >>> WARN Successful completion on short TX
> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: 
> >>> WARN Successful completion on short TX
> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0: URB 
> >>> transfer length is wrong, xHC issue? req. len = 46080, act. len = 1382400
> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle 
> >>> kernel NULL pointer dereference at 0004
> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] 
> >>

Re: [BUGREPORT] Linux USB 3.0

2013-12-27 Thread Markus Rechberger
just received following log snippset:

Dec 27 23:23:50 solist kernel: [   36.118245] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.177695] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.217966] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.277473] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.317753] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.377242] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.417514] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.477000] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.517279] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.576761] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.617074] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.676581] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.716852] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.776340] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:50 solist kernel: [   36.816589] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr
Dec 27 23:23:51 solist kernel: [   36.876117] xhci_hcd :00:14.0:
ERROR Transfer event TRB DMA ptr


the previous bug report of that user:
https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze

On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger
 wrote:
> Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately
> we don't have such a board for testing and customer patience is
> limited to bisect the kernel.
>
> Does anyone have a clue what modification could have killed USB 3.0
> support within those releases?
> It does not seem to be SG support.
>
> On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger
>  wrote:
>> I just got another USB 3.0 bugreport, the entire system crashed. That
>> particular customer already filed a bugreport in November 2013 that
>> his system is in a bad state when using some USB 2.0 media devices
>> which even have opensource drivers built into the kernel.
>>
>> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12.
>> The affected board is an Intel DH87RL board.
>>
>> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger
>>  wrote:
>>> A customer using a device with USBDEVFS is reporting following
>>> backtrace (it seems to be a rather generic issue related to linux usb
>>> 3.0 in general):
>>> According to him this problem is reproducible as soon as he starts the
>>> data transfer, is there anything known about that?
>>>
>>> He is using 3.12.0-031200-generic
>>>
>>>
>>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0:
>>> ERROR Transfer event TRB DMA ptr not part of current TD
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>>> ERROR Transfer event TRB DMA ptr not part of current TD
>>>
>>> Dec 24 14:30:39 homenas kernel: last message repeated 16 times
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>>> WARN Successful completion on short TX
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>>> WARN Successful completion on short TX
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>>> URB transfer length is wrong, xHC issue? req. len = 46080, act. len =
>>> 1382400
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle
>>> kernel NULL pointer dereference at 0004
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops:  [#1] SMP
>>>
>>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in:
>>> videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF)
>>> vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec
>>> snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh
>>> snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq
>>> snd_timer snd_seq_device lpc_ich sn

Re: [BUGREPORT] Linux USB 3.0

2013-12-27 Thread Markus Rechberger
Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately
we don't have such a board for testing and customer patience is
limited to bisect the kernel.

Does anyone have a clue what modification could have killed USB 3.0
support within those releases?
It does not seem to be SG support.

On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger
 wrote:
> I just got another USB 3.0 bugreport, the entire system crashed. That
> particular customer already filed a bugreport in November 2013 that
> his system is in a bad state when using some USB 2.0 media devices
> which even have opensource drivers built into the kernel.
>
> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12.
> The affected board is an Intel DH87RL board.
>
> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger
>  wrote:
>> A customer using a device with USBDEVFS is reporting following
>> backtrace (it seems to be a rather generic issue related to linux usb
>> 3.0 in general):
>> According to him this problem is reproducible as soon as he starts the
>> data transfer, is there anything known about that?
>>
>> He is using 3.12.0-031200-generic
>>
>>
>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0:
>> ERROR Transfer event TRB DMA ptr not part of current TD
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>> ERROR Transfer event TRB DMA ptr not part of current TD
>>
>> Dec 24 14:30:39 homenas kernel: last message repeated 16 times
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>> WARN Successful completion on short TX
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>> WARN Successful completion on short TX
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
>> URB transfer length is wrong, xHC issue? req. len = 46080, act. len =
>> 1382400
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle
>> kernel NULL pointer dereference at 0004
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops:  [#1] SMP
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in:
>> videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF)
>> vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec
>> snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh
>> snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq
>> snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore
>> snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf
>> hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport
>> sunrpc raid10 raid456 async_pq async_xor async_memcpy
>> async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor
>> libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi
>> firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase
>> wmi scsi_transport_sas raid_class [last unloaded: vmnet]
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm:
>> swapper/0 Tainted: GF O 3.12.0-031200-generic
>> #201311031935
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be
>> Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30
>> 01/28/2013
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: 81c144a0
>> ti: 81c0 task.ti: 81c0
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[]  []
>> finish_td+0x13f/0x250
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP:
>> 0018:88102fc03ca8  EFLAGS: 00010046
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: 880f865d2b10
>> RBX: 880f865d2b00 RCX: 0006
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: 880f865d2b10
>> RSI: 0007 RDI: 0046
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: 88102fc03d08
>> R08: 000a R09: 
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 06fd
>> R11: 06fc R12: 880fd2de
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: 880fd32b1780
>> R14:  R15: 880fd5c5f000
>>
>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS:
>> () GS:88102fc0()
>>

Re: [BUGREPORT] Linux USB 3.0

2013-12-27 Thread Markus Rechberger
I just got another USB 3.0 bugreport, the entire system crashed. That
particular customer already filed a bugreport in November 2013 that
his system is in a bad state when using some USB 2.0 media devices
which even have opensource drivers built into the kernel.

USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12.
The affected board is an Intel DH87RL board.

On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger
 wrote:
> A customer using a device with USBDEVFS is reporting following
> backtrace (it seems to be a rather generic issue related to linux usb
> 3.0 in general):
> According to him this problem is reproducible as soon as he starts the
> data transfer, is there anything known about that?
>
> He is using 3.12.0-031200-generic
>
>
> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0:
> ERROR Transfer event TRB DMA ptr not part of current TD
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
> ERROR Transfer event TRB DMA ptr not part of current TD
>
> Dec 24 14:30:39 homenas kernel: last message repeated 16 times
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
> WARN Successful completion on short TX
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
> WARN Successful completion on short TX
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
> URB transfer length is wrong, xHC issue? req. len = 46080, act. len =
> 1382400
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle
> kernel NULL pointer dereference at 0004
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops:  [#1] SMP
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in:
> videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF)
> vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec
> snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh
> snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq
> snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore
> snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf
> hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport
> sunrpc raid10 raid456 async_pq async_xor async_memcpy
> async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor
> libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi
> firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase
> wmi scsi_transport_sas raid_class [last unloaded: vmnet]
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm:
> swapper/0 Tainted: GF O 3.12.0-031200-generic
> #201311031935
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be
> Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30
> 01/28/2013
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: 81c144a0
> ti: 81c0 task.ti: 81c0
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[]  []
> finish_td+0x13f/0x250
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP:
> 0018:88102fc03ca8  EFLAGS: 00010046
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: 880f865d2b10
> RBX: 880f865d2b00 RCX: 0006
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: 880f865d2b10
> RSI: 0007 RDI: 0046
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: 88102fc03d08
> R08: 000a R09: 
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 06fd
> R11: 06fc R12: 880fd2de
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: 880fd32b1780
> R14:  R15: 880fd5c5f000
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS:
> () GS:88102fc0()
> knlGS:
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CS:  0010 DS:  ES:
>  CR0: 80050033
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CR2: 0004
> CR3: 01c0d000 CR4: 000407f0
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Stack:
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450]  88102fc03ce8
> 880fd0bc8000 88102fc03d00 880fd268d1a0
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450]  88102fc03df4
> 00010002 880fd32b1780 880f865d2b00
>
> Dec 24 14:30:39 homenas kernel: [ 1469.822450]  880fd268d1a0
> 880fd5c5f000 880fd2de 880fd2c497b0
>
> Dec 24 14:30:39 home

[BUGREPORT] Linux USB 3.0

2013-12-24 Thread Markus Rechberger
A customer using a device with USBDEVFS is reporting following
backtrace (it seems to be a rather generic issue related to linux usb
3.0 in general):
According to him this problem is reproducible as soon as he starts the
data transfer, is there anything known about that?

He is using 3.12.0-031200-generic


Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd :0f:00.0:
ERROR Transfer event TRB DMA ptr not part of current TD

Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
ERROR Transfer event TRB DMA ptr not part of current TD

Dec 24 14:30:39 homenas kernel: last message repeated 16 times

Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
WARN Successful completion on short TX

Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
WARN Successful completion on short TX

Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd :0f:00.0:
URB transfer length is wrong, xHC issue? req. len = 46080, act. len =
1382400

Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle
kernel NULL pointer dereference at 0004

Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250

Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0

Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops:  [#1] SMP

Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in:
videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF)
vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh
snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq
snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore
snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf
hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport
sunrpc raid10 raid456 async_pq async_xor async_memcpy
async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor
libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi
firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase
wmi scsi_transport_sas raid_class [last unloaded: vmnet]

Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm:
swapper/0 Tainted: GF O 3.12.0-031200-generic
#201311031935

Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be
Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30
01/28/2013

Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: 81c144a0
ti: 81c0 task.ti: 81c0

Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[]  []
finish_td+0x13f/0x250

Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP:
0018:88102fc03ca8  EFLAGS: 00010046

Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: 880f865d2b10
RBX: 880f865d2b00 RCX: 0006

Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: 880f865d2b10
RSI: 0007 RDI: 0046

Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: 88102fc03d08
R08: 000a R09: 

Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 06fd
R11: 06fc R12: 880fd2de

Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: 880fd32b1780
R14:  R15: 880fd5c5f000

Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS:
() GS:88102fc0()
knlGS:

Dec 24 14:30:39 homenas kernel: [ 1469.822450] CS:  0010 DS:  ES:
 CR0: 80050033

Dec 24 14:30:39 homenas kernel: [ 1469.822450] CR2: 0004
CR3: 01c0d000 CR4: 000407f0

Dec 24 14:30:39 homenas kernel: [ 1469.822450] Stack:

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  88102fc03ce8
880fd0bc8000 88102fc03d00 880fd268d1a0

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  88102fc03df4
00010002 880fd32b1780 880f865d2b00

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  880fd268d1a0
880fd5c5f000 880fd2de 880fd2c497b0

Dec 24 14:30:39 homenas kernel: [ 1469.822450] Call Trace:

Dec 24 14:30:39 homenas kernel: [ 1469.822450]

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  []
process_bulk_intr_td+0x116/0x2d0

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  [] handle_tx_event+0x656/0xb50

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  [] ? __queue_work+0x3b0/0x3c0

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  [] ? call_timer_fn+0x46/0x160

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  [] xhci_handle_event+0x1db/0x2a0

Dec 24 14:30:39 homenas kernel: [ 1469.822450]  [] ?
run_timer_softirq+0x1b2/0x300

Dec 24 14:30:39 homenas kernel: [ 1470.312076]  [] xhci_irq+0x120/0x1f0

Dec 24 14:30:39 homenas kernel: [ 1470.312076]  [] xhci_msi_irq+0x11/0x20

Dec 24 14:30:39 homenas kernel: [ 1470.312076]  []
handle_irq_event_percpu+0x5d/0x210

Dec 24 14:30:39 homenas kernel: [ 1470.312076]  [] handle_irq_event+0x48/0x70

Dec 24 14:30:39 homenas kernel: [ 1470.31207

[perf bugreport] perf doesn't delete /tmp/perf-vdso.so.* file on exit

2013-02-21 Thread Markus Trippelsdorf
Perf doesn't properly clean up /tmp/perf-vdso.so-XX on exit. So
these files keep accumulating in /tmp every time perf is run.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration

2008-02-06 Thread Jan Kiszka
Sergei Shtylyov wrote:
>>>   You left powerpc-lite.patch broken with this change as it has
>>> multiple calls to kgdb8250_add_port()...
> 
>> I see. But I wonder if there ever was a real need for these hooks (in
>> 2.4 times?): If I look at bamboo_early_serial_map() e.g., I find it
>> calling into early_serial_setup() which fills serial8250_ports[] - and
>> that content is now retrieved via serial8250_get_port_def() when we
>> parse the runtime or build-time provided parameters (port number &
>> baudrate).
> 
>Of course. But now the kgdb8250_add_port() calls need to be removed.

For sure. Most arch patches need to go through some refactoring anyway
when preparing them for upstream. Cleaning up no longer required hooks
should be no problem at this chance.

If you want to accelerate this process, please check out Jason's
linux-2.6-kgdb.git for 2.6.25 and start rebasing the powerpc patch. He
just recently said that support around kgdb for non-x86 would be highly
welcome. And if you stumble over ppc-related issues that cannot be
solved with latest kgdb design, please let us know. The sooner, the better.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration

2008-02-06 Thread Sergei Shtylyov

Hello.

Jan Kiszka wrote:


Sorry, previous version was missing some __init[data] attributes which
were dropped in an intermediate stage. Here comes an updated patch:



<---snip--->



This major refactoring of the quite complex kgdb8250 configuration does
the following:



- ensures that static configurations according to SERIAL_PORT_DFNS are
  always loaded first
- tries to pull more accurate configuration via serial8250_get_port_def
  if simple-config is used
- detects empty/invalid simple-configs
- enforces KGDB_PORT_NUM <= SERIAL_8250_NR_UARTS at kconfig level
- removes kgdb8250_add_port and its hook in serial_core (calling
  serial8250_get_port_def in demand should provide us the same
  information)



  You left powerpc-lite.patch broken with this change as it has
multiple calls to kgdb8250_add_port()...



I see. But I wonder if there ever was a real need for these hooks (in
2.4 times?): If I look at bamboo_early_serial_map() e.g., I find it
calling into early_serial_setup() which fills serial8250_ports[] - and
that content is now retrieved via serial8250_get_port_def() when we
parse the runtime or build-time provided parameters (port number &
baudrate).


   Of course. But now the kgdb8250_add_port() calls need to be removed.


Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>


WBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH] beautification of debugger_active usage

2008-02-06 Thread Jason Wessel
Jan Kiszka wrote:
> Just a beautification of using debugger_active for checking the debugger
> state.
>
> Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
>
> ---
>  arch/x86/kernel/kgdb.c |6 +++---
>  include/linux/kgdb.h   |7 ++-
>  kernel/kgdb.c  |8 
>  kernel/sched.c |7 +--
>  4 files changed, 14 insertions(+), 14 deletions(-)
>
>   
committed to:
http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=kgdb_2.6.25

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration

2008-02-01 Thread Jan Kiszka
Sergei Shtylyov wrote:
> Hello.
> 
> Jan Kiszka wrote:
> 
>> Sorry, previous version was missing some __init[data] attributes which
>> were dropped in an intermediate stage. Here comes an updated patch:
> 
>> <---snip--->
> 
>> This major refactoring of the quite complex kgdb8250 configuration does
>> the following:
> 
>>  - ensures that static configurations according to SERIAL_PORT_DFNS are
>>always loaded first
>>  - tries to pull more accurate configuration via serial8250_get_port_def
>>if simple-config is used
>>  - detects empty/invalid simple-configs
>>  - enforces KGDB_PORT_NUM <= SERIAL_8250_NR_UARTS at kconfig level
>>  - removes kgdb8250_add_port and its hook in serial_core (calling
>>serial8250_get_port_def in demand should provide us the same
>>information)
> 
>You left powerpc-lite.patch broken with this change as it has
> multiple calls to kgdb8250_add_port()...

I see. But I wonder if there ever was a real need for these hooks (in
2.4 times?): If I look at bamboo_early_serial_map() e.g., I find it
calling into early_serial_setup() which fills serial8250_ports[] - and
that content is now retrieved via serial8250_get_port_def() when we
parse the runtime or build-time provided parameters (port number &
baudrate).

> 
>> Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
> 
>> Index: b/drivers/serial/serial_core.c
>> ===
>> --- a/drivers/serial/serial_core.c
>> +++ b/drivers/serial/serial_core.c
> [...]
>> @@ -2370,12 +2369,6 @@ int uart_add_one_port(struct uart_driver
>>   */
>>  port->flags &= ~UPF_DEAD;
>>  
>> -#if defined(CONFIG_KGDB_8250)
>> -/* Add any 8250-like ports we find later. */
>> -if (port->type <= PORT_MAX_8250)
>> -kgdb8250_add_port(port->line, port);
>> -#endif
>> -
> 
>I'm afraid this wasn't correct from the very start since this can add
> ports with .iotype that 8250_kgdb.c does not support. So, nothing to
> regret here...

I think a lot of cruft piled up in the kgdb patches over their long life. :)

Thanks for your feedback!

Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH 4/5] KGDB-8250: refactor configuration

2008-02-01 Thread Sergei Shtylyov

Hello.

Jan Kiszka wrote:


Sorry, previous version was missing some __init[data] attributes which
were dropped in an intermediate stage. Here comes an updated patch:



<---snip--->



This major refactoring of the quite complex kgdb8250 configuration does
the following:



 - ensures that static configurations according to SERIAL_PORT_DFNS are
   always loaded first
 - tries to pull more accurate configuration via serial8250_get_port_def
   if simple-config is used
 - detects empty/invalid simple-configs
 - enforces KGDB_PORT_NUM <= SERIAL_8250_NR_UARTS at kconfig level
 - removes kgdb8250_add_port and its hook in serial_core (calling
   serial8250_get_port_def in demand should provide us the same
   information)


   You left powerpc-lite.patch broken with this change as it has multiple 
calls to kgdb8250_add_port()...



Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>



Index: b/drivers/serial/serial_core.c
===
--- a/drivers/serial/serial_core.c
+++ b/drivers/serial/serial_core.c

[...]

@@ -2370,12 +2369,6 @@ int uart_add_one_port(struct uart_driver
 */
port->flags &= ~UPF_DEAD;
 
-#if defined(CONFIG_KGDB_8250)

-   /* Add any 8250-like ports we find later. */
-   if (port->type <= PORT_MAX_8250)
-   kgdb8250_add_port(port->line, port);
-#endif
-


   I'm afraid this wasn't correct from the very start since this can add 
ports with .iotype that 8250_kgdb.c does not support. So, nothing to regret 
here...


WBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init

2008-01-31 Thread Jan Kiszka
Jan Kiszka wrote:
> George Anzinger wrote:
>> On 01/31/2008 01:36 AM,  Jan Kiszka was caught saying:
>>> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in
>>> parse_early_param as well? It should, because my under standing of
>>> trap_init is that it's the functions to arm things like... exception
>>> handlers? And that raises the question of the deeper purpose of this
>>> check (and the invocation of kgdb_early_init from the argument parsing
>>> function). Sigh, KGDB is still a quite improvable piece of code.
>> Likely.  Once you get it in the main line kernel, one would hope that
>> other arch code would be forth coming as many more "eyes" will be in play.
> 
> Meanwhile I realized that there is early_trap_init - for x86-32 only! I
> assume now we are only lacking the same for x86-64 to get kgdb running
> there already during early_param-parsing.

Looks like that was the key. Thanks for pointing me at this, George.
Here the updated patch:

--snip---

This cleans up the early entry of kgdb. It introduces early_trap_init
for x86-64, reloads the idt register also in the 32-bit variant, removes
the now unneeded EXCEPTION_STACK_READY construction, and matures the
init-state machine of kgdb.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>

---
 arch/x86/kernel/setup_64.c |4 +++
 arch/x86/kernel/traps_32.c |3 +-
 arch/x86/kernel/traps_64.c |   10 -
 include/asm-x86/kgdb.h |3 --
 include/linux/kgdb.h   |7 +-
 kernel/kgdb.c  |   47 +++--
 6 files changed, 37 insertions(+), 37 deletions(-)

Index: b/arch/x86/kernel/traps_32.c
===
--- a/arch/x86/kernel/traps_32.c
+++ b/arch/x86/kernel/traps_32.c
@@ -1137,12 +1137,13 @@ asmlinkage void math_emulate(long arg)
 
 #endif /* CONFIG_MATH_EMULATION */
 
-/* Some traps need to be set early. */
+/* Set of traps needed for early debugging. */
 void __init early_trap_init(void)
 {
set_intr_gate(1, &debug);
set_system_intr_gate(3, &int3); /* int3 can be called from all */
set_intr_gate(14, &page_fault);
+   load_idt(&idt_descr);
 }
 
 void __init trap_init(void)
Index: b/arch/x86/kernel/traps_64.c
===
--- a/arch/x86/kernel/traps_64.c
+++ b/arch/x86/kernel/traps_64.c
@@ -1129,6 +1129,15 @@ asmlinkage void math_state_restore(void)
 }
 EXPORT_SYMBOL_GPL(math_state_restore);
 
+/* Set of traps needed for early debugging. */
+void __init early_trap_init(void)
+{
+   set_intr_gate(1, &debug);
+   set_intr_gate(3, &int3);
+   set_intr_gate(14, &page_fault);
+   load_idt((const struct desc_ptr *)&idt_descr);
+}
+
 void __init trap_init(void)
 {
set_intr_gate(0,÷_error);
@@ -1145,7 +1154,6 @@ void __init trap_init(void)
set_intr_gate(11,&segment_not_present);
set_intr_gate_ist(12,&stack_segment,STACKFAULT_STACK);
set_intr_gate(13,&general_protection);
-   set_intr_gate(14,&page_fault);
set_intr_gate(15,&spurious_interrupt_bug);
set_intr_gate(16,&coprocessor_error);
set_intr_gate(17,&alignment_check);
Index: b/include/linux/kgdb.h
===
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -43,7 +43,8 @@ extern struct task_struct *kgdb_contthre
 
 enum kgdb_initstate {
KGDB_UNINITIALIZED = 0,
-   KGDB_SEMI_INITIALIZED,
+   KGDB_ARCH_INITIALIZED,
+   KGDB_DELAYED_CONNECTION,
KGDB_FULLY_INITIALIZED
 };
 
@@ -290,10 +291,6 @@ int kgdb_nmihook(int cpu, void *regs);
 extern int debugger_step;
 extern atomic_tdebugger_active;
 
-#ifndef EXCEPTION_STACK_READY
-# define EXCEPTION_STACK_READY()   1
-#endif
-
 #else /* !CONFIG_KGDB */
 static const atomic_t  debugger_active = ATOMIC_INIT(0);
 #endif /* !CONFIG_KGDB */
Index: b/kernel/kgdb.c
===
--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -2104,6 +2104,12 @@ void kgdb_unregister_io_module(struct kg
 }
 EXPORT_SYMBOL_GPL(kgdb_unregister_io_module);
 
+static void __init kgdb_initial_breakpoint(void)
+{
+   printk(KERN_CRIT "kgdb: Waiting for connection from remote gdb...\n");
+   breakpoint();
+}
+
 /*
  * This function can be called very early, either via early_param() or
  * an explicit breakpoint() early on.
@@ -2112,25 +2118,15 @@ static void __init kgdb_early_entry(void
 {
/* Let the architecture do any setup that it needs to. */
kgdb_arch_init();
-
-   /*
-* Don't try and do anything until the architecture is able to
-* setup the exception stack.  In this case, it is up to the
-* architecture to hook in and look at us when they are ready.
-*/
-   if (!EXCEPTION_STACK_READY()) {
-   kgdb_state = KGDB_SEMI_INITIALIZED;
-   /* any ki

Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init

2008-01-31 Thread Jan Kiszka
George Anzinger wrote:
> On 01/31/2008 01:36 AM,  Jan Kiszka was caught saying:
>> Jan Kiszka wrote:
>>> George Anzinger wrote:
 On 01/30/2008 04:08 PM,  Jan Kiszka was caught saying:
> [Here comes a rebased version against latest x86/mm]
>
> In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up
> and connect to the front-end already during early_param evaluation.
> This
> fails on x86 as the exception stack is not yet initialized,
> effectively
> delaying kgdbwait until late-init.

 I wonder how much work it would take to just set up the exception
 stack and proceed.  After all the kgbdwait is there to help debug
 very early kernel code...
>>>
>>> In principle a valid question, but I'm not the one to answer it. I
>>> would not feel very well if I had to reorder this critical setup code.
>>> Look, we would have to move trap_init in start_kernel before
>>> parse_early_param, and that would affect _every_ arch...
> 
> I can not speak to other archs, but for x86 I called trap_init from the
> code that caught the kgdbwait.  At that time (since I retired, I have
> not looked at the actual kernel code) it could be called again later by
> the kernel code.  I.e. I did not try to reorder the kernel bring up
> code, but just added an additional call to trap_init and then only in
> the case of finding a kgdbwait.
> 
> As such, this would need to be arch specific...
> 
>>>
>>
>> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in
>> parse_early_param as well? It should, because my under standing of
>> trap_init is that it's the functions to arm things like... exception
>> handlers? And that raises the question of the deeper purpose of this
>> check (and the invocation of kgdb_early_init from the argument parsing
>> function). Sigh, KGDB is still a quite improvable piece of code.
> 
> Likely.  Once you get it in the main line kernel, one would hope that
> other arch code would be forth coming as many more "eyes" will be in play.

Meanwhile I realized that there is early_trap_init - for x86-32 only! I
assume now we are only lacking the same for x86-64 to get kgdb running
there already during early_param-parsing.

Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init

2008-01-31 Thread George Anzinger

On 01/31/2008 01:36 AM,  Jan Kiszka was caught saying:
> Jan Kiszka wrote:
>> George Anzinger wrote:
>>> On 01/30/2008 04:08 PM,  Jan Kiszka was caught saying:
 [Here comes a rebased version against latest x86/mm]

 In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up
 and connect to the front-end already during early_param evaluation.
 This
 fails on x86 as the exception stack is not yet initialized, 
effectively

 delaying kgdbwait until late-init.
>>>
>>> I wonder how much work it would take to just set up the exception
>>> stack and proceed.  After all the kgbdwait is there to help debug
>>> very early kernel code...
>>
>> In principle a valid question, but I'm not the one to answer it. I
>> would not feel very well if I had to reorder this critical setup code.
>> Look, we would have to move trap_init in start_kernel before
>> parse_early_param, and that would affect _every_ arch...

I can not speak to other archs, but for x86 I called trap_init from the 
code that caught the kgdbwait.  At that time (since I retired, I have 
not looked at the actual kernel code) it could be called again later by 
the kernel code.  I.e. I did not try to reorder the kernel bring up 
code, but just added an additional call to trap_init and then only in 
the case of finding a kgdbwait.


As such, this would need to be arch specific...

>>
>
> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in
> parse_early_param as well? It should, because my under standing of
> trap_init is that it's the functions to arm things like... exception
> handlers? And that raises the question of the deeper purpose of this
> check (and the invocation of kgdb_early_init from the argument parsing
> function). Sigh, KGDB is still a quite improvable piece of code.

Likely.  Once you get it in the main line kernel, one would hope that 
other arch code would be forth coming as many more "eyes" will be in play.

>
> Jan
>
> PS: Can we move this to some public list?

Sure, sorry I picked the wrong reply button, never intended it to be 
private.

>

--
George Anzinger   [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][KGDB] Re: [Kgdb-bugreport] KGDB: 8250_kgdb warnings

2008-01-29 Thread Jason Wessel
Jan Kiszka wrote:
> Hi Jason,
> 
> so far I ignored this because it worked, but I know my customer will
> complain later anyway: What is the deeper meaning of this warning which
> shows up once per registered UART port on my (x86) boxes?
> 
> void kgdb8250_add_port(int i, struct uart_port *serial_req)
> {
> #ifdef CONFIG_KGDB_SIMPLE_SERIAL
>   if (should_copy_rs_table)
>   printk(KERN_ERR "8250_kgdb: warning will over write serial"
>  " port definitions at kgdb init time\n");
> #endif
> ...
> 
> When I look at kgdb8250_add_platform_port, it starts with a call to
> kgdb8250_copy_rs_table, and I'm wondering now if that wouldn't be more
> appropriate here.
> 
> Jan
> 


This was the result of a race condition between the init code of the
platform vs the init code in kgdb.  The init code in the arch platform
could register serial ports prior to the kgdb module being configured
by the kernel while the kernel is processing all the __init functions.
It would have been easy to fix this with another call to
kgdb8250_copy_rs_table() but you cannot do that because a non-__init
function cannot call an __init function.


We might as well go ahead and fix the problem by adding in some checks
so as not to overwrite the dynamic registrations, because eventually
the SERIAL_PORT_DFNS will be gone.  Below is the patch with the fix to
add some saftey checks as well as to remove the warning.

Cut here---


Fix the initialization of the kgdb port structure such that
dynamically registered ports will not be later overwritten by the
SERIAL_PORT_DFNS table.  With this problem fixed, the printk about the
overwriting of the kgdb serial definitions at init time can be
removed.

Also add in additional runtime safety checks to make sure UART_NR was
statically allocated by the kernel at compile time to be large enough
for all the dynamic registered ports.

Signed-off-by: Jason Wessel <[EMAIL PROTECTED]>

---
 drivers/serial/8250_kgdb.c |   27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

--- a/drivers/serial/8250_kgdb.c
+++ b/drivers/serial/8250_kgdb.c
@@ -53,7 +53,7 @@ static int kgdb8250_buf_out_inx;
 
 /* Old-style serial definitions, if existant, and a counter. */
 #ifdef CONFIG_KGDB_SIMPLE_SERIAL
-static int should_copy_rs_table = 1;
+static int __initdata should_copy_rs_table = 1;
 static struct serial_state old_rs_table[] __initdata = {
 #ifdef SERIAL_PORT_DFNS
SERIAL_PORT_DFNS
@@ -260,7 +260,10 @@ static void __init kgdb8250_copy_rs_tabl
if (!should_copy_rs_table)
return;
 
-   for (i = 0; i < ARRAY_SIZE(old_rs_table); i++) {
+   for (i = 0; i < ARRAY_SIZE(old_rs_table) && i < UART_NR; i++) {
+   if (kgdb8250_ports[i].iobase || kgdb8250_ports[i].irq ||
+   kgdb8250_ports[i].membase)
+   continue;
kgdb8250_ports[i].iobase = old_rs_table[i].port;
kgdb8250_ports[i].irq = irq_canonicalize(old_rs_table[i].irq);
kgdb8250_ports[i].uartclk = old_rs_table[i].baud_base * 16;
@@ -281,7 +284,7 @@ static void __init kgdb8250_copy_rs_tabl
  */
 static void __init kgdb8250_late_init(void)
 {
-   /* Try and copy the old_rs_table. */
+   /* Setup the KGDB uart table if not already initialized */
kgdb8250_copy_rs_table();
 
 #if defined(CONFIG_SERIAL_8250) || defined(CONFIG_SERIAL_8250_MODULE)
@@ -303,7 +306,7 @@ static void __init kgdb8250_late_init(vo
 
 static __init int kgdb_init_io(void)
 {
-   /* Give us the basic table of uarts. */
+   /* Setup the KGDB uart table if not already initialized */
kgdb8250_copy_rs_table();
 
/* We're either a module and parse a config string, or we have a
@@ -401,11 +404,11 @@ struct kgdb_io kgdb_io_ops = {
  */
 void kgdb8250_add_port(int i, struct uart_port *serial_req)
 {
-#ifdef CONFIG_KGDB_SIMPLE_SERIAL
-   if (should_copy_rs_table)
-   printk(KERN_ERR "8250_kgdb: warning will over write serial"
-  " port definitions at kgdb init time\n");
-#endif
+   if (i >= UART_NR) {
+   printk(KERN_ERR "KGDB dynamic uart registration failed"
+  "NR_UARTS is too small");
+   return;
+   }
 
/* Copy the whole thing over. */
if (current_port != &kgdb8250_ports[i])
@@ -427,6 +430,12 @@ void __init kgdb8250_add_platform_port(i
/* Make sure we've got the built-in data before we override. */
kgdb8250_copy_rs_table();
 
+   if (i >= UART_NR) {
+   printk(KERN_ERR "KGDB dynamic uart registration failed"
+  "NR_UARTS is too small");
+   return;
+   }
+
kgdb8250_ports[i].iobase = p->iobase;
kgdb8250_ports[i].membase = p->membase;
kgdb8250_ports[i].irq = p->irq;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTEC

Re: bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-17 Thread Denys Fedoryshchenko
Patch fixes the problem.
Here is dmesg (i cut it, probably remaining part of it not required).
0]   Movable zone: 0 pages used for memmap
[0.00] DMI 2.4 present.
[0.00] ACPI: RSDP 000FE020, 0014 (r0 INTEL )
[0.00] ACPI: RSDT CFEFD038, 0050 (r1 INTEL  DG965SS   6B7   
113)
[0.00] ACPI: FACP CFEFC000, 0074 (r1 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: DSDT CFEF7000, 41AA (r1 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: FACS CFE9C000, 0040
[0.00] ACPI: APIC CFEF6000, 0078 (r1 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: WDDT CFEF5000, 0040 (r1 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: MCFG CFEF4000, 003C (r1 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: ASF! CFEF3000, 00A6 (r32 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: HPET CFEF2000, 0038 (r1 INTEL  DG965SS   6B7 MSFT  
113)
[0.00] ACPI: SSDT CFE9A000, 020C (r1 INTEL CpuPm  6B7 MSFT  
113)
[0.00] ACPI: SSDT CFE99000, 0175 (r1 INTEL   Cpu0Ist  6B7 MSFT  
113)
[0.00] ACPI: SSDT CFE98000, 0175 (r1 INTEL   Cpu1Ist  6B7 MSFT  
113)
[0.00] ACPI: SSDT CFE97000, 0175 (r1 INTEL   Cpu2Ist  6B7 MSFT  
113)
[0.00] ACPI: SSDT CFE96000, 0175 (r1 INTEL   Cpu3Ist  6B7 MSFT  
113)
[0.00] ACPI: PM-Timer IO Port: 0x408
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[0.00] Processor #0 6:15 APIC version 20
[0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[0.00] Processor #1 6:15 APIC version 20
[0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
[0.00] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
[0.00] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
[0.00] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
[0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.00] ACPI: IRQ0 used by override.
[0.00] ACPI: IRQ2 used by override.
[0.00] ACPI: IRQ9 used by override.
[0.00] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[0.00] ACPI: HPET id: 0x8086a201 base: 0xfed0
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] Allocating PCI resources starting at d400 (gap: 
d000:2ff0)
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 1040384
[0.00] Kernel command line: root=/dev/sdb2
[0.00] mapped APIC to b000 (fee0)
[0.00] mapped IOAPIC to a000 (fec0)
[0.00] Enabling fast FPU save and restore... done.
[0.00] Enabling unmasked SIMD FPU exception support... done.
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 16384 bytes)
[0.00] Detected 2397.647 MHz processor.
[   23.770446] Console: colour VGA+ 80x25
[   23.770448] console [tty0] enabled
[   23.779138] Dentry cache hash table entries: 131072 (order: 7, 524288 
bytes)
[   23.779395] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[   23.788992] set_highmem_pages_init(bad_ppro:0)
[   23.789056] sizeof(struct page):32
[   23.789118] sizeof(struct mem_section): 8
[   23.789180] PFN_SECTION_SHIFT:  14
[   23.789243] mem_map: 
[   23.789304]   highstart_pfn:229376 [page: c1700700]
[   23.789367] highend_pfn:   1048576 [page: 0200]
[   23.789431]   highend_pfn-1:   1048575 [page: 01e0]
[   23.789494] NR_MEM_SECTIONS: 64
[   23.789555] pfn_to_section_nr(highstart_pfn):14
[   23.789619] pfn_to_section_nr(highend_pfn):  64
[   23.789682] pfn_to_section_nr(highend_pfn-1):63
[   23.789745] totalhigh_pages: 0
[   23.789807]  totalram_pages:221519
[   23.924275] WARNING: at arch/x86/mm/init_32.c:353 set_highmem_pages_init()
[   23.924344] Pid: 0, comm: swapper Not tainted 2.6.24-rc8-git1 #1
[   23.924409]  [] show_trace_log_lvl+0x1a/0x2f
[   23.924503]  [] show_trace+0x12/0x14
[   23.924591]  [] dump_stack+0x6c/0x72
[   23.924678]  [] mem_init+0x2a7/0x596
[   23.924768]  [] start_kernel+0x271/0x2fb
[   23.924858]  [<>] _stext+0x3feff000/0x19
[   23.924945]  ===
[   23.925007] bad pfn: 851968
[   23.925068] totalhigh_pages:622129
[   23.925130]  totalram_pages:221519
[   23.925194] Memory: 3373812k/4194304k available (1990k kernel code, 30976k 
reserved, 813k data, 208k init, 2488516k highmem)
[   23.925300] virtual kernel memory layout:
[   23.925301] fixmap  : 0xffe15000 - 0xf000   (1960 kB)
[   23.925302] pkmap   : 0xff80 - 0xffc0   (4096 kB)

Re: bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-17 Thread Mel Gorman
On (15/01/08 14:13), Ingo Molnar didst pronounce:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > thanks for the detailed report, i think i know what's going on. Could 
> > you try the patch below, does it fix your problem?
> 
> find below the fix with a more complete changelog and with no debugging 
> printouts.
> 

Looks good and the right thing to do. If you check the equivilant code for
DISCONTIG, it calls pfn_valid() searching for holes so it was expected.

Acked-by: Mel Gorman <[EMAIL PROTECTED]>

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-15 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> thanks for the detailed report, i think i know what's going on. Could 
> you try the patch below, does it fix your problem?

find below the fix with a more complete changelog and with no debugging 
printouts.

Ingo

-->
Subject: x86: fix boot crash on HIGHMEM4G && SPARSEMEM
From: Ingo Molnar <[EMAIL PROTECTED]>

Denys Fedoryshchenko reported a bootup crash when he upgraded
his system from 3GB to 4GB RAM:

   http://lkml.org/lkml/2008/1/7/9

the bug is due to HIGHMEM4G && SPARSEMEM kernels making pfn_to_page() to 
return an invalid pointer when the pfn is in a memory hole. The 256 MB 
PCI aperture at the end of RAM was not mapped by sparsemem, and hence 
the pfn was not valid. But set_highmem_pages_init() iterated this range 
without checking the pfn's validity first - crashing the bootup.

this bug was probably present in the sparsemem code ever since sparsemem 
has been introduced in v2.6.13. It was masked due to HIGHMEM64G using 
larger memory regions in sparsemem_32.h:

 #ifdef CONFIG_X86_PAE
 #define SECTION_SIZE_BITS   30
 #define MAX_PHYSADDR_BITS   36
 #define MAX_PHYSMEM_BITS36
 #else
 #define SECTION_SIZE_BITS   26
 #define MAX_PHYSADDR_BITS   32
 #define MAX_PHYSMEM_BITS32
 #endif

which creates 1GB sparsemem regions instead of 64MB sparsemem regions. 
So in practice we only ever created true sparsemem holes on x86 with 
HIGHMEM4G - but that was rarely used by distros.

( btw., we could probably save 2MB of mem_map[]s on X86_PAE if we reduced
  the sparsemem region size to 256 MB. )

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/mm/init_32.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux/arch/x86/mm/init_32.c
===
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -321,8 +321,13 @@ extern void set_highmem_pages_init(int);
 static void __init set_highmem_pages_init(int bad_ppro)
 {
int pfn;
-   for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
-   add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
+   for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) {
+   /*
+* Holes under sparsemem might not have no mem_map[]:
+*/
+   if (pfn_valid(pfn))
+   add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
+   }
totalram_pages += totalhigh_pages;
 }
 #endif /* CONFIG_FLATMEM */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-15 Thread Ingo Molnar

* Denys Fedoryshchenko <[EMAIL PROTECTED]> wrote:

> Hi
> 
> After physical memory upgrade from 3GB to 4GB (also it happens on 5GB) 
> got kernel panic.
> 
> Because it is happening on early stage and my machine doesn't contain 
> serial port, i had to take photo. Kernel boots fine with 64GB highmem, 
> no highmem, or highmem4G with limited memory by mem=3G. All dmesg 
> attached. Also i attach dmidecode and lspci -vvv output, probably it 
> will be useful.

thanks for the detailed report, i think i know what's going on. Could 
you try the patch below, does it fix your problem?

this seems to be a SPARSEMEM bug which is present in v2.6.23 as well and 
has probably been present ever since SPARSEMEM was added to 32-bit x86.

There's a ~256MB hole in your e820 memory map (the pci aperture), which 
causes the last 4 sparsemem sections (each covering 64MB of RAM) to be 
not present - and they are thus missing from the sparsemem mem_map[] 
too. The highmem init code on the other hand assumes that all pages are 
in the mem_map[]:

 static void __init set_highmem_pages_init(int bad_ppro)
 {
int pfn;
for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);

the pfn_to_page() is unconditional and dereferences to a NULL-ish 
pointer which crashes your box. highend_pfn is what got miscalculated by 
256 MB, so set_highmem_pages_init() tried to reference a non-existing 
struct page - but it should still be robust enough against non-existent 
pages.

The patch below fixes this bug. Please also send a dmesg if you manage 
to boot the box up fine, i've added a few debug printouts to confirm 
this theory. (i'll figure out whether we need to clip highend_pfn as 
well - but this patch alone should be good enough to fix the crash on 
your box.)

Ingo

->
Subject: x86: fix CONFIG_SPARSEMEM highmem init bug
From: Ingo Molnar <[EMAIL PROTECTED]>

fix CONFIG_SPARSEMEM highmem init bug.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/mm/init_32.c |   43 ---
 mm/sparse.c   |8 +++-
 2 files changed, 47 insertions(+), 4 deletions(-)

Index: linux/arch/x86/mm/init_32.c
===
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -321,11 +321,48 @@ extern void set_highmem_pages_init(int);
 static void __init set_highmem_pages_init(int bad_ppro)
 {
int pfn;
-   for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
-   add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
+
+   printk("set_highmem_pages_init(bad_ppro:%d)\n", bad_ppro);
+   printk("sizeof(struct page):%d\n", sizeof(struct page));
+   printk("sizeof(struct mem_section): %d\n", sizeof(struct mem_section));
+   printk("PFN_SECTION_SHIFT:  %d\n", PFN_SECTION_SHIFT);
+
+   printk("mem_map: %p\n", mem_map);
+   printk("  highstart_pfn: %9ld [page: %p]\n",
+   highstart_pfn, pfn_to_page(highstart_pfn));
+   printk("highend_pfn: %9ld [page: %p]\n",
+   highend_pfn, pfn_to_page(highend_pfn));
+   printk("  highend_pfn-1: %9ld [page: %p]\n",
+   highend_pfn-1, pfn_to_page(highend_pfn-1));
+
+   printk("NR_MEM_SECTIONS: %ld\n", NR_MEM_SECTIONS);
+   printk("pfn_to_section_nr(highstart_pfn): %9ld\n",
+   pfn_to_section_nr(highstart_pfn));
+   printk("pfn_to_section_nr(highend_pfn):   %9ld\n",
+   pfn_to_section_nr(highend_pfn));
+   printk("pfn_to_section_nr(highend_pfn-1): %9ld\n",
+   pfn_to_section_nr(highend_pfn-1));
+
+   printk("totalhigh_pages: %9ld\n", totalhigh_pages);
+   printk(" totalram_pages: %9ld\n", totalram_pages);
+
+   for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) {
+   if (pfn_valid(pfn))
+   add_one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
+   else {
+   if (WARN_ON_ONCE(1)) {
+   printk("bad pfn: %d\n", pfn);
+   break;
+   }
+   }
+   }
+   printk("totalhigh_pages: %9ld\n", totalhigh_pages);
+   printk(" totalram_pages: %9ld\n", totalram_pages);
+
+
totalram_pages += totalhigh_pages;
 }
-#endif /* CONFIG_FLATMEM */
+#endif /* CONFIG_NUMA */
 
 #else
 #define kmap_init() do { } while (0)
Index: linux/mm/sparse.c
===
--- linux.orig/mm/sparse.c
+++ linux/mm/sparse.c
@@ -295,9 +295,13 @@ void __init sparse_init(void)
struct page *map;
unsigned long *usemap;
 
+   printk("sparse_init()\n");
for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
-   if (!present_section_nr(pnum))
+   printk("section %2ld: ", pnum);
+   if (!present_section_nr(pnu

bugreport kernel panic on early stage, with HIGHMEM4G:

2008-01-06 Thread Denys Fedoryshchenko
Hi

After physical memory upgrade from 3GB to 4GB (also it happens on 5GB) got 
kernel panic.

Because it is happening on early stage and my machine doesn't contain serial 
port, i had to take photo.
Kernel boots fine with 64GB highmem, no highmem, or highmem4G with limited 
memory by mem=3G. All dmesg attached.
Also i attach dmidecode and lspci -vvv output, probably it will be useful.


Photo (2.8MB, sorry, just original size from camera):
http://www.nuclearcat.com/files/panic-07012008/img_1232.jpg

dmesg without highmem
http://www.nuclearcat.com/files/panic-07012008/dmesg-nohighmem.txt

with highmem64G
http://www.nuclearcat.com/files/panic-07012008/dmesg-highmem64G.txt

with highmem4G limited by mem=3G
http://www.nuclearcat.com/files/panic-07012008/dmesg-highmem4G-memlim3G.txt
Kernel config for this specific boot:
http://www.nuclearcat.com/files/panic-07012008/config.txt

dmidecode output
http://www.nuclearcat.com/files/panic-07012008/dmidecode.txt

lspci output
http://www.nuclearcat.com/files/panic-07012008/lspci.txt

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH -mm v2] 2.6.23-rc4-mm1: kgdboe link errors

2007-09-12 Thread Jason Wessel

Randy,

This patch is fine, and I am committing it to the for_mm kgdb tree.

I am also adding the "depends on NET" to the KGDBOE_NOMODULE section, 
which would otherwise to a select on KGDBOE. We have to cover the case 
for KGDB as a module and not as a module.


Thanks,
Jason.

Randy Dunlap wrote:

On Wed, 12 Sep 2007 13:15:12 -0500 Matt Mackall wrote:

  

NETCONSOLE shouldn't be necessary. Otherwise this looks ok to my
kconfig-addled brain.



Correct.  Patch corrected.  Thanks.

---
From: Randy Dunlap <[EMAIL PROTECTED]>

Fix kgdb build problems:
  Building modules, stage 2.
ERROR: "netpoll_cleanup" [drivers/net/kgdboe.ko] undefined!
ERROR: "netpoll_setup" [drivers/net/kgdboe.ko] undefined!
ERROR: "netpoll_parse_options" [drivers/net/kgdboe.ko] undefined!
ERROR: "netpoll_poll" [drivers/net/kgdboe.ko] undefined!
ERROR: "netpoll_send_udp" [drivers/net/kgdboe.ko] undefined!
ERROR: "netpoll_set_trap" [drivers/net/kgdboe.ko] undefined!
make[1]: *** [__modpost] Error 1


Add 'select' for net-poll related config symbols, but
make KGDBOE 'depend on' NET.  We don't want to 'select' CONFIG_NET,
but if it is already enabled, the 'select's will enable the rest
of the needed interfaces.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 lib/Kconfig.kgdb |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.23-rc4-mm1.orig/lib/Kconfig.kgdb
+++ linux-2.6.23-rc4-mm1/lib/Kconfig.kgdb
@@ -174,9 +174,10 @@ endchoice
 
 config KGDBOE

tristate "KGDB: On ethernet" if !KGDBOE_NOMODULE
-   depends on m && KGDB
+   depends on m && KGDB && NET
select NETPOLL
select NETPOLL_TRAP
+   select NET_POLL_CONTROLLER
help
  Uses the NETPOLL API to communicate with the host GDB via UDP.
  In order for this to work, the ethernet interface specified must

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Kgdb-bugreport mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-29 Thread Jason Wessel

Pete/Piet Delaney wrote:

We are getting a problem with VMware where kernel text is the schedler
is getting wacked with four null bytes into the code. Thought I'd use
the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA
patch to make kernel text readonly:

 https://www.x86-64.org/pipermail/patches/2007-March/003666.html

I thought the kernel text was RO and gdb had to disable it to
insert a breakpoint.

  


If you are going to make all the kernel text RO, then you are going to 
have to add some code to the kgdb write memory so as to unprotect a 
given page or all the breakpoint writes are going to fail.  
Alternatively you can use HW breakpoints.  But, I have no idea if your 
VM Ware simulated HW emulate HW breakpoint registers or not.


Jason.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-29 Thread Jason Wessel

Pete/Piet Delaney wrote:

Why am I getting this when I do:

git clone
http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git
  


I have only ever used:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git



Jason.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-29 Thread Randy Dunlap
On Wed, 29 Aug 2007 18:19:29 -0700 Pete/Piet Delaney wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Pete/Piet Delaney wrote:
> > Jason Wessel wrote:
> >> Andrew Morton wrote:
> >>> On Wed, 22 Aug 2007 17:44:12 -0500
> >>> Jason Wessel <[EMAIL PROTECTED]> wrote:
> >>>
> >>>  
>  +while (!atomic_read(&debugger_active));
>  
> >>> eek.  We're in the process of hunting down and eliminating exactly this
> >>> construct.  There have been cases where the compiler cached the
> >>> atomic_read() result in a register, turning the above into an infinite
> >>> loop.
> >>>
> >>> Plus we should never add power-burners like that into the kernel
> >>> anyway. That loop should have a cpu_relax() in it.  Which will also
> >>> fix the
> >>> compiler problem described above.
> >>>
> >>>   
> >> Agreed, and fixed with a cpu_relax.
> > 
> >>> Thirdly, please always add a newline when coding statements like that:
> >>>
> >>> while (expr())
> >>> ;
> >>>   
> >> The other instances I found of the same problem in the kgdb core are
> >> fixed too.
> > 
> >> I merged all the changes into the for_mm branch in the kgdb git tree.
> > 
> > Where is the kgdb git tree?
> 
> Why am I getting this when I do:
> 
> git clone
> http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git
> 
> -
> 
> error: Couldn't get
> http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git/refs/tags/v2.6.11
> for tags/v2.6.11
> The requested URL returned error: 404
> error: Could not interpret tags/v2.6.11 as something to pull
> rm: cannot remove directory
> `/nethome/piet/Src/linux/git/jwessel/linux-2.6-kgdb/.git/clone-tmp':
> Directory not empty
> /nethome/piet/Src/linux/git/jwessel$
> -
> 

See the URLs at the top of
http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=summary
and try one of those (the git one preferably).


> We are getting a problem with VMware where kernel text is the schedler
> is getting wacked with four null bytes into the code. Thought I'd use
> the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA
> patch to make kernel text readonly:
> 
>  https://www.x86-64.org/pipermail/patches/2007-March/003666.html
> 
> I thought the kernel text was RO and gdb had to disable it to
> insert a breakpoint.
> 
> - -piet
> 
> > 
> > -piet
> > 
> >> Thanks,
> >> Jason.
> >> -
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [EMAIL PROTECTED]
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > 
> - -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFG1hshJICwm/rv3hoRAhTGAJ46pq69zYHqRmT+yTmRx+RVh8aBtgCfdyFM
> gl91xCFTy0NJxHalVXpd9Os=
> =c8FZ
> -END PGP SIGNATURE-
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-29 Thread Pete/Piet Delaney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pete/Piet Delaney wrote:
> Jason Wessel wrote:
>> Andrew Morton wrote:
>>> On Wed, 22 Aug 2007 17:44:12 -0500
>>> Jason Wessel <[EMAIL PROTECTED]> wrote:
>>>
>>>  
 +while (!atomic_read(&debugger_active));
 
>>> eek.  We're in the process of hunting down and eliminating exactly this
>>> construct.  There have been cases where the compiler cached the
>>> atomic_read() result in a register, turning the above into an infinite
>>> loop.
>>>
>>> Plus we should never add power-burners like that into the kernel
>>> anyway. That loop should have a cpu_relax() in it.  Which will also
>>> fix the
>>> compiler problem described above.
>>>
>>>   
>> Agreed, and fixed with a cpu_relax.
> 
>>> Thirdly, please always add a newline when coding statements like that:
>>>
>>> while (expr())
>>> ;
>>>   
>> The other instances I found of the same problem in the kgdb core are
>> fixed too.
> 
>> I merged all the changes into the for_mm branch in the kgdb git tree.
> 
> Where is the kgdb git tree?

Why am I getting this when I do:

git clone
http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git

-

error: Couldn't get
http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git/refs/tags/v2.6.11
for tags/v2.6.11
The requested URL returned error: 404
error: Could not interpret tags/v2.6.11 as something to pull
rm: cannot remove directory
`/nethome/piet/Src/linux/git/jwessel/linux-2.6-kgdb/.git/clone-tmp':
Directory not empty
/nethome/piet/Src/linux/git/jwessel$
-


We are getting a problem with VMware where kernel text is the schedler
is getting wacked with four null bytes into the code. Thought I'd use
the current linux-2.6-kgdb.git tree and possible the CONFIG_DEBUG_RODATA
patch to make kernel text readonly:

 https://www.x86-64.org/pipermail/patches/2007-March/003666.html

I thought the kernel text was RO and gdb had to disable it to
insert a breakpoint.

- -piet

> 
> -piet
> 
>> Thanks,
>> Jason.
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
- -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1hshJICwm/rv3hoRAhTGAJ46pq69zYHqRmT+yTmRx+RVh8aBtgCfdyFM
gl91xCFTy0NJxHalVXpd9Os=
=c8FZ
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-29 Thread Pete/Piet Delaney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pete/Piet Delaney wrote:
> Jason Wessel wrote:
>> Andrew Morton wrote:
>>> On Wed, 22 Aug 2007 17:44:12 -0500
>>> Jason Wessel <[EMAIL PROTECTED]> wrote:
>>>
>>>  
 +while (!atomic_read(&debugger_active));
 
>>> eek.  We're in the process of hunting down and eliminating exactly this
>>> construct.  There have been cases where the compiler cached the
>>> atomic_read() result in a register, turning the above into an infinite
>>> loop.
>>>
>>> Plus we should never add power-burners like that into the kernel
>>> anyway. That loop should have a cpu_relax() in it.  Which will also
>>> fix the
>>> compiler problem described above.
>>>
>>>   
>> Agreed, and fixed with a cpu_relax.
> 
>>> Thirdly, please always add a newline when coding statements like that:
>>>
>>> while (expr())
>>> ;
>>>   
>> The other instances I found of the same problem in the kgdb core are
>> fixed too.
> 
>> I merged all the changes into the for_mm branch in the kgdb git tree.
> 
> Where is the kgdb git tree?

Trying:

git clone
http://master.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb.git

- -piet

> 
> -piet
> 
>> Thanks,
>> Jason.
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
- -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1gnFJICwm/rv3hoRApOoAJ9BHXLsIuxDiOCaAFRfAZGwrDXATQCeLL3O
bxtr3qz0soPRghPmtSZgOqc=
=kQd1
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-29 Thread Pete/Piet Delaney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jason Wessel wrote:
> Andrew Morton wrote:
>> On Wed, 22 Aug 2007 17:44:12 -0500
>> Jason Wessel <[EMAIL PROTECTED]> wrote:
>>
>>  
>>> +while (!atomic_read(&debugger_active));
>>> 
>>
>> eek.  We're in the process of hunting down and eliminating exactly this
>> construct.  There have been cases where the compiler cached the
>> atomic_read() result in a register, turning the above into an infinite
>> loop.
>>
>> Plus we should never add power-burners like that into the kernel
>> anyway. That loop should have a cpu_relax() in it.  Which will also
>> fix the
>> compiler problem described above.
>>
>>   
> Agreed, and fixed with a cpu_relax.
> 
>> Thirdly, please always add a newline when coding statements like that:
>>
>> while (expr())
>> ;
>>   
> 
> The other instances I found of the same problem in the kgdb core are
> fixed too.
> 
> I merged all the changes into the for_mm branch in the kgdb git tree.

Where is the kgdb git tree?

- -piet

> 
> Thanks,
> Jason.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1gS/JICwm/rv3hoRAhfRAJ42F3QlzGwG4aQbs9hHVMI4kJ9SWQCfXrku
UGo97ByKsB9yhyIu5c+2Jh0=
=welB
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-22 Thread Jason Wessel

Andrew Morton wrote:

On Wed, 22 Aug 2007 17:44:12 -0500
Jason Wessel <[EMAIL PROTECTED]> wrote:

  

+   while (!atomic_read(&debugger_active));



eek.  We're in the process of hunting down and eliminating exactly this
construct.  There have been cases where the compiler cached the
atomic_read() result in a register, turning the above into an infinite
loop.

Plus we should never add power-burners like that into the kernel anyway. 
That loop should have a cpu_relax() in it.  Which will also fix the

compiler problem described above.

  

Agreed, and fixed with a cpu_relax.


Thirdly, please always add a newline when coding statements like that:

while (expr())
;
  


The other instances I found of the same problem in the kgdb core are 
fixed too.


I merged all the changes into the for_mm branch in the kgdb git tree.

Thanks,
Jason.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-22 Thread Andrew Morton
On Wed, 22 Aug 2007 17:44:12 -0500
Jason Wessel <[EMAIL PROTECTED]> wrote:

> Perhaps there is a cleaner way to do the same thing and avoid the 
> cmpxchg all together.  I used the attached patch to eliminate the 
> cmpxchg operation.
> 
> 
> Jason.
> 
> 
> [kgdb_enter_atomic.patch  text/plain (2.0KB)]
> Signed-off-by: Jason Wessel <[EMAIL PROTECTED]>
> 
> ---
>  kernel/kgdb.c |   18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> --- a/kernel/kgdb.c
> +++ b/kernel/kgdb.c
> @@ -121,6 +121,7 @@ struct task_struct *kgdb_usethread, *kgd
>  
>  int debugger_step;
>  atomic_t debugger_active;
> +static atomic_t kgdb_sync = ATOMIC_INIT(-1);
>  
>  /* Our I/O buffers. */
>  static char remcom_in_buffer[BUFMAX];
> @@ -638,8 +639,14 @@ static void kgdb_wait(struct pt_regs *re
>   kgdb_info[processor].task = current;
>   atomic_set(&procindebug[processor], 1);
>  
> + /* The master processor must be active to enter here, but this is
> +  * gaurd in case the master processor had not been selected if
> +  * this was an entry via nmi.
> +  */
> + while (!atomic_read(&debugger_active));

eek.  We're in the process of hunting down and eliminating exactly this
construct.  There have been cases where the compiler cached the
atomic_read() result in a register, turning the above into an infinite
loop.

Plus we should never add power-burners like that into the kernel anyway. 
That loop should have a cpu_relax() in it.  Which will also fix the
compiler problem described above.

Thirdly, please always add a newline when coding statements like that:

while (expr())
;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] 2.6.23-rc3-mm1: kgdb build failure on powerpc

2007-08-22 Thread Jason Wessel

Andrew Morton wrote:

On Wed, 22 Aug 2007 21:04:28 +0200
Mariusz Kozlowski <[EMAIL PROTECTED]> wrote:

  

Hello,

Got that on imac g3.

  CC  kernel/kgdb.o
kernel/kgdb.c: In function 'kgdb_handle_exception':
kernel/kgdb.c:940: error: invalid lvalue in unary '&'
kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of '_o_'
kernel/kgdb.c:940: error: invalid lvalue in unary '&'
kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of '_n_'
kernel/kgdb.c:940: error: invalid lvalue in unary '&'
kernel/kgdb.c:940: error: invalid lvalue in unary '&'
kernel/kgdb.c:940: error: invalid lvalue in unary '&'
kernel/kgdb.c:940: warning: type defaults to 'int' in declaration of 'type name'
make[1]: *** [kernel/kgdb.o] Blad 1
make: *** [kernel] Blad 2




  
Against the tip of the kernel + kgdb patches this config builds.  I 
wonder if is the compiler or the macros for atomic_read or cmpxchg have 
changed for in the -mm tree.  Perhaps it is not relevant though if you 
read on.

I'm not surprised.

while (cmpxchg(&atomic_read(&debugger_active), 0, (procid + 1)) != 0) {

a) cmpxchg isn't available on all architectures

  
It was available for all the archs that the kgdb had been implemented on 
at the time.

b) we can't just go and take the address of atomic_read()'s return value!

  
Perhaps yes, perhaps no I guess it depends on what actually gets 
generated...  In the past the intent of this was to guard for the race 
to be the master processor and looked like some attempt to do it 
atomically.  This code had been in use for a number of years at this point.

c) that's pretty ugly-looking stuff anyway.

  


Perhaps there is a cleaner way to do the same thing and avoid the 
cmpxchg all together.  I used the attached patch to eliminate the 
cmpxchg operation.



Jason.
Signed-off-by: Jason Wessel <[EMAIL PROTECTED]>

---
 kernel/kgdb.c |   18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

--- a/kernel/kgdb.c
+++ b/kernel/kgdb.c
@@ -121,6 +121,7 @@ struct task_struct *kgdb_usethread, *kgd
 
 int debugger_step;
 atomic_t debugger_active;
+static atomic_t kgdb_sync = ATOMIC_INIT(-1);
 
 /* Our I/O buffers. */
 static char remcom_in_buffer[BUFMAX];
@@ -638,8 +639,14 @@ static void kgdb_wait(struct pt_regs *re
kgdb_info[processor].task = current;
atomic_set(&procindebug[processor], 1);
 
+   /* The master processor must be active to enter here, but this is
+* gaurd in case the master processor had not been selected if
+* this was an entry via nmi.
+*/
+   while (!atomic_read(&debugger_active));
+
/* Wait till master processor goes completely into the debugger.
-* FIXME: this looks racy */
+*/
while (!atomic_read(&procindebug[atomic_read(&debugger_active) - 1])) {
int i = 10; /* an arbitrary number */
 
@@ -973,8 +980,13 @@ int kgdb_handle_exception(int ex_vector,
/* Hold debugger_active */
procid = raw_smp_processor_id();
 
-   while (cmpxchg(&atomic_read(&debugger_active), 0, (procid + 1)) != 0) {
+   while (1) {
int i = 25; /* an arbitrary number */
+   if (atomic_read(&kgdb_sync) < 0 &&
+   atomic_inc_and_test(&kgdb_sync)) {
+   atomic_set(&debugger_active, procid + 1);
+   break;
+   }
 
while (--i)
cpu_relax();
@@ -991,6 +1003,7 @@ int kgdb_handle_exception(int ex_vector,
if (atomic_read(&cpu_doing_single_step) != -1 &&
atomic_read(&cpu_doing_single_step) != procid) {
atomic_set(&debugger_active, 0);
+   atomic_set(&kgdb_sync, -1);
clocksource_touch_watchdog();
kgdb_softlock_skip[procid] = 1;
local_irq_restore(flags);
@@ -1557,6 +1570,7 @@ int kgdb_handle_exception(int ex_vector,
  kgdb_restore:
/* Free debugger_active */
atomic_set(&debugger_active, 0);
+   atomic_set(&kgdb_sync, -1);
clocksource_touch_watchdog();
kgdb_softlock_skip[processor] = 1;
local_irq_restore(flags);


Bugreport: SATA Problem: port is slow to respond

2007-02-19 Thread Sigmund Scheinbar
Hello there!

Here my first bugreport on the linux kernel:

[1.] One line summary of the problem:

SATA Problem: port is slow to respond

[2.] Full description of the problem/report:

The following messages appear while booting/in dmesg [1]:

[...]
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: port failed to respond (30 secs, Status 0x80)
ata2: softreset failed (device not ready)
ata2: softreset failed, retrying in 5 secs
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: port failed to respond (30 secs, Status 0x80)
ata2: COMRESET failed (device not ready)
ata2: hardreset failed, retrying in 5 secs
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: port failed to respond (30 secs, Status 0x80)
ata2: COMRESET failed (device not ready)
ata2: reset failed, giving up
scsi2 : ahci
[...]

This of course means extra long booting time :(
[1] http://wieland.homelinux.org/hp_dc5750/dmesg_2.6.19.3.txt

A further issue which -  imho is coupled w/ this one - is that my
cd/dvd+rw (also sata) combo drive simply does not work:
firstly there is no indication in dmesg that it was found and
secondly it behaves like dead: it does not open if i press the
button and it does not blink; it can be opened until
these message appear on the screen. I need to unplug and the replug
it after shutting down/reboot so that on the next bootup the bios
is able to find it (else it would still behave like dead)!
The drive itself is ok i guess (a least it works fine on a differen OS).

I found a debian bug report where the problem (but unfortunately
not a solution that worked for me) is described:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391867

[3.] Keywords (i.e., modules, networking, kernel):

SATA, ATA

[4.] Kernel version (from /proc/version):

Linux version 2.6.19.3 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115
(prerelease) (Debian 4.1.1-21)) #2 SMP Thu Feb 15 02:59:14 CET 2007

[8.1.] Software (add the output of the ver_linux script here)

http://wieland.homelinux.org/hp_dc5750/ver_linux.txt

[8.2.] Processor information (from /proc/cpuinfo):

http://wieland.homelinux.org/hp_dc5750/proc_cpuinfo.txt

[8.3.] Module information (from /proc/modules):

http://wieland.homelinux.org/hp_dc5750/proc_modules.txt

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

http://wieland.homelinux.org/hp_dc5750/proc_iomem.txt
http://wieland.homelinux.org/hp_dc5750/proc_ioports.txt

[8.5.] PCI information ('lspci -vvv' as root)

http://wieland.homelinux.org/hp_dc5750/lspci-vvv.txt

[8.6.] SCSI information (from /proc/scsi/scsi)

http://wieland.homelinux.org/hp_dc5750/proc_scsi_scsi.txt

[X.] Other notes, patches, fixes, workarounds:

Also it seems that the System is not capeable of recongnizing
the HDD:
hdparm -i /dev/sda
HDIO_GET_IDENTITY failed: Inappropriate ioctl for device

A user on an German mailinglist suggest to uses pata_atiixp
instead of atiixp but yet again this does not work for me:
http://lists.opensuse.org/archive/opensuse-mobile-de/
2007-01/msg1.html

All the information i collected can be found at:
http://wieland.homelinux.org/hp_dc5750/

I've tried several kernel version in the meantime, starting w/
the standard debian testing i386 kernel, then tried different
vanilla versions 2.6.18.6, 2.6.19, 2.6.19.3 which are all
affected in the same way. However 2.6.20 behaves
differently: there is another error message which scrolls over
the screen so fast that I'm not able to read it, the only
thing i was able to catch that it also concerns (s)ata, looking like
'ata2.0: [...] failed [...]'

Please CC I'm *not* subscribed!

Greetings,
Sigmund

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugreport

2005-04-12 Thread Chris Wright
* Amelia Nilsson ([EMAIL PROTECTED]) wrote:
> I've found a bug in 2.6.11.6. I have a Toshiba laptop and when i did
> run 2.6.11.6 my touchpad flipped out, it clicked everywhere when it
> wasn't supposed to click. I couldn't even move my mouse without it was
> clicking all over. It works fine i 2.6.10 though. Is there any changes
> made that can affect this? (I haven't tried 2.6.11.7 yet...)

2.6.11.7 has no significant changes that should effect your touchpad.
We'll need much more information to make any headway here (see
REPORTING-BUGS).  I've got a Toshiba laptop, and have no issues with the
touchpad.  I assume this is an issue in just in X.  Do you see any obvious
difference in the Xorg.0.log when starting X on the two different kernels?
Any interesting dmesg output on the failing kernel?  Does booting with
psmouse.proto=exps help (assuming you have CONFIG_MOUSE_PS2=y)?

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bugreport

2005-04-12 Thread Amelia Nilsson
Hey,
I've found a bug in 2.6.11.6. I have a Toshiba laptop and when i did run 
2.6.11.6 my touchpad flipped out, it clicked everywhere when it wasn't supposed 
to click. I couldn't even move my mouse without it was clicking all over. It 
works fine i 2.6.10 though. Is there any changes made that can affect this? (I 
haven't tried 2.6.11.7 yet...)

Best regards,
Amelia Nilsson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


bugreport : system unacceptably slow.

2001-07-07 Thread JES

Hi,
Every once in a while, my system gets unbelievable slow. So slow that I 
almost can't do anything anymore. This happens only once in a few months.
I think it has to do with sound, because when I start using sound, it happens.
"top" gives me then about 90% idle time, and "top" is using this 10% then. 
This already happens quite a while. I already had this with a 2.2.x kernel 
and I just had it with the 2.4.2 kernel. Could you tell me what I can do to 
give you more information ? Do you think it could be in this module ?

I use the es1370 driver as a loadable module that gets loaded when asked for 
it.
I always use standard distribution installation. Now I have RH7.1


With kind regards,
Edwin.

info :

[root@CC90001-A /root]# cat /proc/pci
PCI devices found:
  Bus  0, device   0, function  0:
Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 4).
  Master Capable.  Latency=64.
  Non-prefetchable 32 bit memory at 0xe000 [0xe3ff].
  Bus  0, device   1, function  0:
PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 4).
  Master Capable.  Latency=64.  Min Gnt=8.
  Bus  0, device   2, function  0:
USB Controller: Acer Laboratories Inc. [ALi] M5237 USB (rev 3).
  IRQ 9.
  Master Capable.  Latency=64.  Max Lat=80.
  Non-prefetchable 32 bit memory at 0xde80 [0xde800fff].
  Bus  0, device   3, function  0:
Bridge: Acer Laboratories Inc. [ALi] M7101 PMU (rev 0).
  Bus  0, device   7, function  0:
ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin 
IV] (rev 195).
  Bus  0, device   9, function  0:
Multimedia audio controller: Ensoniq ES1370 [AudioPCI] (rev 1).
  IRQ 5.
  Master Capable.  Latency=32.  Min Gnt=12.Max Lat=128.
  I/O at 0xd800 [0xd83f].
  Bus  0, device  10, function  0:
Ethernet controller: Winbond Electronics Corp W89C940 (rev 11).
  IRQ 7.
  I/O at 0xd400 [0xd41f].
  Bus  0, device  11, function  0:
Ethernet controller: Digital Equipment Corporation DECchip 21041 [Tulip 
Pass 3] (rev 33).
  IRQ 10.
  Master Capable.  Latency=32.
  I/O at 0xd000 [0xd07f].
  Non-prefetchable 32 bit memory at 0xde00 [0xde7f].
  Bus  0, device  12, function  0:
SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810 (rev 
35).
  IRQ 11.
  Master Capable.  Latency=64.  Min Gnt=8.Max Lat=64.
  I/O at 0xb800 [0xb8ff].
  Non-prefetchable 32 bit memory at 0xdd80 [0xdd8000ff].
  Bus  0, device  15, function  0:
IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev 193).
  Master Capable.  Latency=32.  Min Gnt=2.Max Lat=4.
  I/O at 0xb400 [0xb40f].
  Bus  1, device   0, function  0:
VGA compatible controller: nVidia Corporation Riva TnT 128 [NV04] (rev 4).
  IRQ 11.
  Master Capable.  Latency=64.  Min Gnt=5.Max Lat=1.
  Non-prefetchable 32 bit memory at 0xdf00 [0xdfff].
  Prefetchable 32 bit memory at 0xe700 [0xe7ff].

[root@CC90001-A /root]# lspci -vvv
00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04)
Subsystem: Acer Laboratories Inc. [ALi] ALI M1541 Aladdin V/V+ AGP 
System Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- 
SERR- 

00:01.0 PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 04) (prog-if 00 
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- 
SERR- Reset- FastB2B-

00:02.0 USB Controller: Acer Laboratories Inc. [ALi] M5237 USB (rev 03) 
(prog-if 10 [OHCI])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR-  [disabled] [size=32K]

00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21041 
[Tulip Pass 3] (rev 21)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR-  [disabled] [size=256K]
 
00:0c.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810 
(rev 23)
Subsystem: Symbios Logic Inc. (formerly NCR) 8100S
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- 


[root@CC90001-A /root]# lsmod
Module  Size  Used by
es1370 24896   0  (autoclean)
soundcore   4432   4  (autoclean) [es1370]
ov511  38768   0
videodev4896   1  [ov511]
autofs 11136   1  (autoclean)
ne2k-pci4864   1  (autoclea

Re: bugreport: poll() timeout always takes 10ms too long

2001-06-22 Thread raf

Tachino Nobuhiro wrote:

> 
>  Hello,
> 
> At Fri, 22 Jun 2001 11:52:12 +1000,
> [EMAIL PROTECTED] wrote:
> > 
> > [1.] One line summary of the problem:
> > 
> > poll() timeout always takes 10ms too long
> > 
> > [2.] Full description of the problem/report:
> > 
> > Select() timeouts work fine. A timeout between 10n-9 and 10n ms times
> > out after 10n ms on average. Poll() timeouts between 10n-9 and 10n ms,
> > on the other hand, time out after 10(n+1) ms on average. It's always a
> > jiffy too long. This means it's impossible to set a 10ms timeout using
> > poll() even though it's possible using select(). The programs and their
> > output below [6] demonstrate this. The same behavious occurs with
> > linux-2.2 and linux-2.4.
> 
> 
>   I think this is correct behavior. The Single UNIX Specification
> describes about the timeout parameter of poll() as follows,
> 
>   If none of the defined events have occurred on any selected
>   file descriptor, poll() waits at least timeout milliseconds
>   for an event to occur on any of the selected file descriptors.
> 
>   On the other hand, select(),
> 
>   If the specified condition is false for all of the specified
>   file descriptors, select() blocks, up to the specified timeout
>   interval, until the specified condition is true for at least
>   one of the specified file descriptors.

ok, it's a correct behaviour.
but having both poll and select
timeout at the time specified
would also be correct behaviour.
better than that, it would be
expected behaviour.

raf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: bugreport: poll() timeout always takes 10ms too long

2001-06-21 Thread Tachino Nobuhiro


 Hello,

At Fri, 22 Jun 2001 11:52:12 +1000,
[EMAIL PROTECTED] wrote:
> 
> [1.] One line summary of the problem:
> 
> poll() timeout always takes 10ms too long
> 
> [2.] Full description of the problem/report:
> 
> Select() timeouts work fine. A timeout between 10n-9 and 10n ms times
> out after 10n ms on average. Poll() timeouts between 10n-9 and 10n ms,
> on the other hand, time out after 10(n+1) ms on average. It's always a
> jiffy too long. This means it's impossible to set a 10ms timeout using
> poll() even though it's possible using select(). The programs and their
> output below [6] demonstrate this. The same behavious occurs with
> linux-2.2 and linux-2.4.


  I think this is correct behavior. The Single UNIX Specification
describes about the timeout parameter of poll() as follows,

If none of the defined events have occurred on any selected
file descriptor, poll() waits at least timeout milliseconds
for an event to occur on any of the selected file descriptors.

  On the other hand, select(),

If the specified condition is false for all of the specified
file descriptors, select() blocks, up to the specified timeout
interval, until the specified condition is true for at least
one of the specified file descriptors.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



bugreport: poll() timeout always takes 10ms too long

2001-06-21 Thread raf

[1.] One line summary of the problem:

poll() timeout always takes 10ms too long

[2.] Full description of the problem/report:

Select() timeouts work fine. A timeout between 10n-9 and 10n ms times
out after 10n ms on average. Poll() timeouts between 10n-9 and 10n ms,
on the other hand, time out after 10(n+1) ms on average. It's always a
jiffy too long. This means it's impossible to set a 10ms timeout using
poll() even though it's possible using select(). The programs and their
output below [6] demonstrate this. The same behavious occurs with
linux-2.2 and linux-2.4.

[3.] Keywords (i.e., modules, networking, kernel):

poll, select, timer, timeout

[4.] Kernel version (from /proc/version):

$ cat /proc/version
Linux version 2.4.0 ([EMAIL PROTECTED]) (gcc version 2.95.2 19991024 (release)) #16 
Sat Jan 20 07:45:58 EST 2001

[5.] Output of Oops.. message (if applicable) with symbolic information 
 resolved (see Documentation/oops-tracing.txt)

N/A

[6.] A small shell script or example program which triggers the
 problem (if possible)

--- select.c --
#include 
#include 
#include 
#include 
#include 

void timeval_diff(struct timeval *start, struct timeval *end, struct timeval *diff)
{
diff->tv_sec = end->tv_sec - start->tv_sec;

if (end->tv_usec < start->tv_usec)
diff->tv_usec = 100 + end->tv_usec - start->tv_usec, 
--diff->tv_sec;
else
diff->tv_usec = end->tv_usec - start->tv_usec;
}

double time_select(int msec)
{
struct timeval timeout[1], start[1], end[1], elapsed[1];

timeout->tv_sec = 0;
timeout->tv_usec = msec * 1000;
gettimeofday(start, NULL);
select(1, NULL, NULL, NULL, timeout);
gettimeofday(end, NULL);
timeval_diff(start, end, elapsed);
return ((double)elapsed->tv_sec * 100.0 + (double)elapsed->tv_usec) / 1000;
}

void test_select(int msec)
{
double min = DBL_MAX;
double max = DBL_MIN;
double sum = 0.0;
int i;

for (i = 0; i < 1000; ++i)
{
double elapsed = time_select(msec);

if (elapsed < min)
min = elapsed;
if (elapsed > max)
max = elapsed;
sum += elapsed;
}

printf("select(%d ms) min %g ms, max %g ms, avg %g ms\n", msec, min, max, sum 
/ 1000);
}

int main(int ac, char **av)
{
int msec = av[1] ? atoi(av[1]) : 1;
test_select(msec);
return EXIT_SUCCESS;
}
---

--- poll.c 
#include 
#include 
#include 
#include 
#include 

void timeval_diff(struct timeval *start, struct timeval *end, struct timeval *diff)
{
diff->tv_sec = end->tv_sec - start->tv_sec;

if (end->tv_usec < start->tv_usec)
diff->tv_usec = 100 + end->tv_usec - start->tv_usec, 
--diff->tv_sec;
else
diff->tv_usec = end->tv_usec - start->tv_usec;
}

double time_poll(int msec)
{
struct timeval start[1], end[1], elapsed[1];

gettimeofday(start, NULL);
poll(NULL, 0, msec);
gettimeofday(end, NULL);
timeval_diff(start, end, elapsed);
return ((double)elapsed->tv_sec * 100.0 + (double)elapsed->tv_usec) / 1000;
}

void test_poll(int msec)
{
double min = DBL_MAX;
double max = DBL_MIN;
double sum = 0.0;
int i;

for (i = 0; i < 1000; ++i)
{
double elapsed = time_poll(msec);

if (elapsed < min)
min = elapsed;
if (elapsed > max)
max = elapsed;
sum += elapsed;
}

printf("poll(%d ms) min %g ms, max %g ms, avg %g ms\n", msec, min, max, sum / 
1000);
}

int main(int ac, char **av)
{
int msec = (av[1]) ? atoi(av[1]) : 1;
test_poll(msec);
return EXIT_SUCCESS;
}
---

--- select-output -
$ for i in 1 5 9 10 11 15 19 20 21 25 29 30 31 35 39 40 41 45 49 50 51 1000 
do
./select $i
done
select(1 ms) min 5.624 ms, max 10.299 ms, avg 9.99298 ms
select(5 ms) min 5.668 ms, max 10.357 ms, avg 9.99301 ms
select(9 ms) min 5.635 ms, max 10.034 ms, avg 9.993 ms
select(10 ms) min 5.683 ms, max 10.347 ms, avg 9.99306 ms
select(11 ms) min 15.663 ms, max 20.627 ms, avg 19.993 ms
select(15 ms) min 15.664 ms, max 20.331 ms, avg 19.993 ms
select(19 ms) min 15.632 ms, max 20.04 ms, avg 19.993 ms
select(20 ms) min 15.652 ms, max 20.029 ms, avg 19.993 ms
select(21 ms) min 25.661 ms, max 30.299 ms, avg 29.993 ms
select(25 ms) min 25.663 ms, max 30.085 ms, avg 29.993 ms
sele

Re: Bugreport: Kernel 2.4.x crash

2001-04-18 Thread Jörn Engel

Hi!

> > I have no experience with kernel debugging, but so far, I have found
> > no log entry giving me a hint and the screen is blank after the crash
> 
> Could you disable console blanking (setterm -blank 0).
> 
> We really need a hint where it crashed.

Over the easter weekend I took some time for testing. One ide channel does
not work with dma enabled, which is bootup default. After about 30 seconds,
the channel is switched to pio and the machine running again.

Funny though: Before, I could not return from console blanking or reach the
machine through network. But as for any production system, I rather keep it
running than spend downtime seeking the error.

Thank you all.

Jörn

-- 
Jörn Engel
mailto: [EMAIL PROTECTED]
http://wohnheim.fh-wedel.de/~joern
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bugreport: Kernel 2.4.x crash

2001-04-03 Thread Mark Hahn

> 2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within
> 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other

no problem with ext2 on hpt366 here.

> Gnu C  2.95.3

hmm.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bugreport: Kernel 2.4.x crash

2001-04-03 Thread John Jasen

On Tue, 3 Apr 2001, [iso-8859-1] Jörn Engel wrote:

I don't necessarily believe its the hpt366, as you do. See below: (note:
I've also had it running on a stock 2.4.2 kernel for a while)

> 00:08.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366
> (rev 01)
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> SERR-  Latency: 248 (2000ns min, 2000ns max), cache line size 08
> Interrupt: pin A routed to IRQ 11
> Region 0: I/O ports at 6100
> Region 1: I/O ports at 6200
> Region 4: I/O ports at 6300
>
> 00:08.1 Unknown mass storage controller: Triones Technologies, Inc. HPT366
> (rev 01)
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> SERR-  Latency: 248 (2000ns min, 2000ns max), cache line size 08
> Interrupt: pin A routed to IRQ 11
> Region 0: I/O ports at 6400
> Region 1: I/O ports at 6500
> Region 4: I/O ports at 6600

00:13.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366
(rev
01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
SERR- TAbort-
SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Bugreport: Kernel 2.4.x crash

2001-04-03 Thread Jörn Engel

1. Kernel crash w/out error message or logfile entry

2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within
5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other
exotic hardware. Another possibility might be Reiserfs, which I use for all
partitions except /.
I have no experience with kernel debugging, but so far, I have found no log
entry giving me a hint and the screen is blank after the crash. There might
have been some output before, but the machine is in the basement and too
important for excessive testing.
I have tried 2.4.2 and 2.4.3 once each.

3. ide, hpt366

4. 2.4.2, 2.4.3

5. -

6. -

7. All this information is taken from the running 2.2.18 Kernel.

7.1. sh /usr/src/linux/scripts/ver_linux
-- Versions installed: (if some fields are empty or look
-- unusual then possibly you have very old versions)
Linux belfast 2.2.18 #1 Fri Feb 23 14:47:14 CET 2001 i586 unknown
Kernel modules 2.4.2
Gnu C  2.95.3
Gnu Make   3.79.1
Binutils   2.11.90.0.1
Linux C Library2.2.2
Dynamic linker ldd (GNU libc) 2.2.2
Procps 2.0.7
Mount  2.11b
Net-tools  2.05
Console-tools  0.2.3
Sh-utils   2.0.11
Modules Loaded sb uart401 sound soundcore

7.2. cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 5
model   : 4
model name  : Pentium MMX
stepping: 3
cpu MHz : 200.459
fdiv_bug: no
hlt_bug : no
sep_bug : no
f00f_bug: yes
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr mce cx8 mmx
bogomips: 399.76

7.3 cat /var/log/ksymoops/20010401164317.modules (2.4.3)
sb  2128   0 (unused)
sb_lib 33936   0 [sb]
uart401 6352   0 [sb_lib]
sound  56400   0 [sb_lib uart401]
soundcore   3792   5 [sb_lib sound]
raid1  12784   0 (unused)
raid0   3520   0 (unused)
md 41056   0 [raid1 raid0]

7.4. cat /proc/ioports
-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
0220-022f : soundblaster
02f8-02ff : serial(set)
0330-0333 : MPU-401 UART
03c0-03df : vga+
03e8-03ef : serial(auto)
03f6-03f6 : ide0
03f8-03ff : serial(set)
6100-6107 : ide2
6202-6202 : ide2
6400-6407 : ide3
6502-6502 : ide3
6700-677f : eth0
f000-f007 : ide0
f008-f00f : ide1

7.5 lspci -vvv
00:00.0 Host bridge: Intel Corporation 430HX - 82439HX TXC [Triton II] (rev
03)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   >