Re: INFO: task hung in filemap_fault

2018-01-15 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> On Mon, Jan 15, 2018 at 3:25 PM, Tetsuo Handa
>  wrote:
> > Dmitry Vyukov wrote:
> >> >> I am not completely following. You previously mentioned raw.log, which
> >> >> is a collection of multiple programs, but now you seem to be talking
> >> >> about a single reproducer. When syzbot manages to reproduce the bug
> >> >> only with syzkaller program but not with a corresponding C program, it
> >> >> provides only syzkaller program. It such case it can sense to convert.
> >> >> But the case you pointed to actually contains C program. So there is
> >> >> no need to do the conversion at all... What am I missing?
> >> >>
> >> >
> >> > raw.log is not readable for me.
> >> > I want to see C program even if syzbot did not manage to reproduce the 
> >> > bug.
> >> > If C program is available, everyone can try reproducing the bug with 
> >> > manually
> >> > trimmed C program.
> >>
> >> If it did not manage to reproduce the bug, there is no C program.
> >> There is no program at all.
> >>
> >
> > What!? Then, what does raw.log contain? I want to read raw.log as C program.
> 
> 
> raw.log is not a _program_, it's hundreds of separate programs that
> were executed before the crash.

I want to know the hundreds of separate programs as C programs.
Even if there are hundreds, more recently ran programs tend to be
the culprit.

> It's also very compressed
> representation as compared to equivalent C programs. For example for
> this program:
> 
> mmap(&(0x7f00/0xfff000)=nil, 0xfff000, 0x3, 0x32,
> 0x, 0x0)
> r0 = socket$nl_generic(0x10, 0x3, 0x10)
> sendmsg$nl_generic(r0,
> &(0x7fb3e000-0x38)={&(0x7fd4a000-0xc)={0x10, 0x0, 0x0, 0x0},
> 0xc, &(0x7f007000)={&(0x7ff7c000-0x15c)={0x24, 0x1c, 0x109,
> 0x, 0x, {0x4, 0x0, 0x0},
> [@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
> 0x0, 0x0, 0x0}, 0x0)
> 
> you can get up to this amount of C code:
> https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt
> 
> that is, 700 lines of C source for 3 line program. So instead of a 1MB
> file that will be 100MB, and then it probably should be a gzip archive
> with hundreds of separate C files. There are people on this list
> complaining even about 200K of attachments. I don't see that this will
> be better and well accepted.
> 

I don't think attaching to mails is acceptable. Thus, I suggest e.g. CGI
service so that only those who want to try can obtain the C programs.


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 3:25 PM, Tetsuo Handa
 wrote:
> Dmitry Vyukov wrote:
>> >> I am not completely following. You previously mentioned raw.log, which
>> >> is a collection of multiple programs, but now you seem to be talking
>> >> about a single reproducer. When syzbot manages to reproduce the bug
>> >> only with syzkaller program but not with a corresponding C program, it
>> >> provides only syzkaller program. It such case it can sense to convert.
>> >> But the case you pointed to actually contains C program. So there is
>> >> no need to do the conversion at all... What am I missing?
>> >>
>> >
>> > raw.log is not readable for me.
>> > I want to see C program even if syzbot did not manage to reproduce the bug.
>> > If C program is available, everyone can try reproducing the bug with 
>> > manually
>> > trimmed C program.
>>
>> If it did not manage to reproduce the bug, there is no C program.
>> There is no program at all.
>>
>
> What!? Then, what does raw.log contain? I want to read raw.log as C program.


raw.log is not a _program_, it's hundreds of separate programs that
were executed before the crash. It's also very compressed
representation as compared to equivalent C programs. For example for
this program:

mmap(&(0x7f00/0xfff000)=nil, 0xfff000, 0x3, 0x32,
0x, 0x0)
r0 = socket$nl_generic(0x10, 0x3, 0x10)
sendmsg$nl_generic(r0,
&(0x7fb3e000-0x38)={&(0x7fd4a000-0xc)={0x10, 0x0, 0x0, 0x0},
0xc, &(0x7f007000)={&(0x7ff7c000-0x15c)={0x24, 0x1c, 0x109,
0x, 0x, {0x4, 0x0, 0x0},
[@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
0x0, 0x0, 0x0}, 0x0)

you can get up to this amount of C code:
https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt

that is, 700 lines of C source for 3 line program. So instead of a 1MB
file that will be 100MB, and then it probably should be a gzip archive
with hundreds of separate C files. There are people on this list
complaining even about 200K of attachments. I don't see that this will
be better and well accepted.


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> >> I am not completely following. You previously mentioned raw.log, which
> >> is a collection of multiple programs, but now you seem to be talking
> >> about a single reproducer. When syzbot manages to reproduce the bug
> >> only with syzkaller program but not with a corresponding C program, it
> >> provides only syzkaller program. It such case it can sense to convert.
> >> But the case you pointed to actually contains C program. So there is
> >> no need to do the conversion at all... What am I missing?
> >>
> >
> > raw.log is not readable for me.
> > I want to see C program even if syzbot did not manage to reproduce the bug.
> > If C program is available, everyone can try reproducing the bug with 
> > manually
> > trimmed C program.
> 
> If it did not manage to reproduce the bug, there is no C program.
> There is no program at all.
> 

What!? Then, what does raw.log contain? I want to read raw.log as C program.


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 2:56 PM, Tetsuo Handa
 wrote:
> Dmitry Vyukov wrote:
>> > No problem. In the "tty: User triggerable soft lockup." case, I manually
>> > trimmed the reproducer at https://marc.info/?l=linux-mm&m=151368630414963 .
>> > That is,
>> >
>> >  (1) Can the problem be reproduced even if setup_tun(0, true); is 
>> > commented out?
>> >
>> >  (2) Can the problem be reproduced even if NONFAILING(A = B); is replaced 
>> > with
>> >  plain A = B; assignment?
>> >
>> >  (3) Can the problem be reproduced even if install_segv_handler(); is 
>> > commented
>> >  out?
>> >
>> >  (4) Can the problem be reproduced even if some syscalls (e.g. 
>> > __NR_memfd_create,
>> >  __NR_getsockopt, __NR_perf_event_open) are replaced with no-op?
>> >
>> > and so on. Then, I finally reached a reproducer which I sent, and the bug 
>> > was fixed.
>> >
>> > What is important is that everyone can try simplifying the reproducer 
>> > written
>> > in plain C in order to narrow down the culprit. Providing a (e.g.) CGI 
>> > service
>> > which generates plain C reproducer like gistfile1.txt will be helpful to 
>> > me.
>>
>> I am not completely following. You previously mentioned raw.log, which
>> is a collection of multiple programs, but now you seem to be talking
>> about a single reproducer. When syzbot manages to reproduce the bug
>> only with syzkaller program but not with a corresponding C program, it
>> provides only syzkaller program. It such case it can sense to convert.
>> But the case you pointed to actually contains C program. So there is
>> no need to do the conversion at all... What am I missing?
>>
>
> raw.log is not readable for me.
> I want to see C program even if syzbot did not manage to reproduce the bug.
> If C program is available, everyone can try reproducing the bug with manually
> trimmed C program.

If it did not manage to reproduce the bug, there is no C program.
There is no program at all.


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> > No problem. In the "tty: User triggerable soft lockup." case, I manually
> > trimmed the reproducer at https://marc.info/?l=linux-mm&m=151368630414963 .
> > That is,
> >
> >  (1) Can the problem be reproduced even if setup_tun(0, true); is commented 
> > out?
> >
> >  (2) Can the problem be reproduced even if NONFAILING(A = B); is replaced 
> > with
> >  plain A = B; assignment?
> >
> >  (3) Can the problem be reproduced even if install_segv_handler(); is 
> > commented
> >  out?
> >
> >  (4) Can the problem be reproduced even if some syscalls (e.g. 
> > __NR_memfd_create,
> >  __NR_getsockopt, __NR_perf_event_open) are replaced with no-op?
> >
> > and so on. Then, I finally reached a reproducer which I sent, and the bug 
> > was fixed.
> >
> > What is important is that everyone can try simplifying the reproducer 
> > written
> > in plain C in order to narrow down the culprit. Providing a (e.g.) CGI 
> > service
> > which generates plain C reproducer like gistfile1.txt will be helpful to me.
> 
> I am not completely following. You previously mentioned raw.log, which
> is a collection of multiple programs, but now you seem to be talking
> about a single reproducer. When syzbot manages to reproduce the bug
> only with syzkaller program but not with a corresponding C program, it
> provides only syzkaller program. It such case it can sense to convert.
> But the case you pointed to actually contains C program. So there is
> no need to do the conversion at all... What am I missing?
> 

raw.log is not readable for me.
I want to see C program even if syzbot did not manage to reproduce the bug.
If C program is available, everyone can try reproducing the bug with manually
trimmed C program.


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 11:44 AM, Tetsuo Handa
 wrote:
>> >>We can't simply drop info from there. Do you propose to add
>> >> another attached file that contains the same info in a different
>> >> format? What is the exact format you are proposing?
>> >
>> > Plain C program which can be compiled without installing additional
>> > program/library packages (except those needed for building kernels).
>> >
>> >> Is it just
>> >> dropping the syscall name part after $ sign? Note that it's still not
>> >> C, more complex syscall generally look as follows:
>> >>
>> >> perf_event_open(&(0x7fb5a000)={0x42, 0x78, 0x1e2, 0x0,
>> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x, 0x0,
>> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> >> @perf_bp={&(0x7f00)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> >> 0x0, 0x0}, 0x0, 0x0, 0x, 0x0)
>> >> recvmmsg(0x, &(0x7f003000)=[{{0x0, 0x0,
>> >> &(0x7f002000)=[{&(0x7f00a000)=""/193, 0xc1},
>> >> {&(0x7f007000-0x3d)=""/61, 0x3d}], 0x2,
>> >> &(0x7f005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
>> >> &(0x7f003000-0x10)={0x77359400, 0x0})
>> >> bpf$PROG_LOAD(0x5, &(0x7f00)={0x1, 0x5,
>> >> &(0x7f002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> >> 0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
>> >> 0x0}}, &(0x7f004000-0xa)='syzkaller\x00', 0x3, 0xc3,
>> >> &(0x7f386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)
>> >>
>> >> Note: you can convert any syzkaller program to equivalent C code using
>> >> syz-prog2c utility that comes with syzkaller.
>> >
>> > I won't install go language into my environment for analyzing/reproducing 
>> > your
>> > reports. If syz-prog2c is provided as a CGI service (e.g. receive URL 
>> > containing
>> > raw.log and print the converted C program), I might try it.
>>
>>
>> raw.log is not a _program_, it's hundreds of separate programs that
>> were executed before the crash. It's also very compressed
>> representation as compared to equivalent C programs. For example for
>> this program:
>>
>> mmap(&(0x7f00/0xfff000)=nil, 0xfff000, 0x3, 0x32,
>> 0x, 0x0)
>> r0 = socket$nl_generic(0x10, 0x3, 0x10)
>> sendmsg$nl_generic(r0,
>> &(0x7fb3e000-0x38)={&(0x7fd4a000-0xc)={0x10, 0x0, 0x0, 0x0},
>> 0xc, &(0x7f007000)={&(0x7ff7c000-0x15c)={0x24, 0x1c, 0x109,
>> 0x, 0x, {0x4, 0x0, 0x0},
>> [@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
>> 0x0, 0x0, 0x0}, 0x0)
>>
>> you can get up to this amount of C code:
>> https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt
>>
>> that is, 700 lines of C source for 3 line program. So instead of a 1MB
>> file that will be 100MB, and then it probably should be a gzip archive
>> with hundreds of separate C files. There are people on this list
>> complaining even about 200K of attachments. I don't see that this will
>> be better and well accepted.
>>
>
> No problem. In the "tty: User triggerable soft lockup." case, I manually
> trimmed the reproducer at https://marc.info/?l=linux-mm&m=151368630414963 .
> That is,
>
>  (1) Can the problem be reproduced even if setup_tun(0, true); is commented 
> out?
>
>  (2) Can the problem be reproduced even if NONFAILING(A = B); is replaced with
>  plain A = B; assignment?
>
>  (3) Can the problem be reproduced even if install_segv_handler(); is 
> commented
>  out?
>
>  (4) Can the problem be reproduced even if some syscalls (e.g. 
> __NR_memfd_create,
>  __NR_getsockopt, __NR_perf_event_open) are replaced with no-op?
>
> and so on. Then, I finally reached a reproducer which I sent, and the bug was 
> fixed.
>
> What is important is that everyone can try simplifying the reproducer written
> in plain C in order to narrow down the culprit. Providing a (e.g.) CGI service
> which generates plain C reproducer like gistfile1.txt will be helpful to me.

I am not completely following. You previously mentioned raw.log, which
is a collection of multiple programs, but now you seem to be talking
about a single reproducer. When syzbot manages to reproduce the bug
only with syzkaller program but not with a corresponding C program, it
provides only syzkaller program. It such case it can sense to convert.
But the case you pointed to actually contains C program. So there is
no need to do the conversion at all... What am I missing?


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 15, 2018 at 11:44 AM, Tetsuo Handa
 wrote:
> Dmitry Vyukov wrote:
>> On Mon, Jan 8, 2018 at 11:48 AM, Tetsuo Handa
>>  wrote:
>> > Dmitry Vyukov wrote:
>> >> >> Hi Tetsuo,
>> >> >>
>> >> >> syzbot always re-runs the same workload on a new machine. If it
>> >> >> manages to reproduce the problem, it provides a reproducer. In this
>> >> >> case it didn't.
>> >> >
>> >> > Even if it did not manage to reproduce the problem, showing raw.log in
>> >> > C format is helpful for me. For example,
>> >> >
>> >> >   ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
>> >> >
>> >> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
>> >> > If the message were
>> >> >
>> >> >   ioctl(r3, 0x4c00, r1)
>> >> >
>> >> > more people will be able to read what the program tried to do.
>> >> > There are many operations done on loop devices, but are too hard
>> >> > for me to pick up only loop related actions.
>> >>
>> >>
>> >> Hi Tetsuo,
>> >>
>> >> The main purpose of this format is different, this is a complete
>> >> representation of programs that allows replaying them using syzkaller
>> >> tools.
>> >
>> > What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
>> > 0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?
>>
>>
>> In short, it specifies exact discrimination of the syscall which
>> affects parsing of the rest of the arguments. For some syscalls
>> (ioctl/setsockopt/sendmsg) kernel has hundreds of different
>> discriminations with radically different arguments.
>> Now if you are asking why the discrimination is LOOP_CHANGE_FD, but
>> the actual command is LOOP_SET_FD, that's because this is a fuzzer,
>> it's sole purpose is to mess things in unexpected ways.
>
> ??? I can't catch what you want to say.
>
> I understand that a fuzzer intentionally tests various cases.
> My question is simple. Why don't you use actual command name like
> ioctl$LOOP_SET_FD(r3, 0x4c00, r1) ?
> Writing like ioctl$LOOP_CHANGE_FD is confusing. I consider it as a bug.

To answer this I need to dive a bit into syzkaller internals.
There are 2 main operations with programs: generation and mutations.
During these operations consts specified in syscall discriminations
always match (we don't get LOOP_CHANGE_FD which is actually
LOOP_SET_FD). However, there are some additional operations, in
particular argument changes based on intercepted comparison arguments
in kernel code. Say, we execute LOOP_CHANGE_FD and collect all
comparison arguments and then alter syscall arguments that match one
argument to another argument. That's when we can get discrimination X
which is actually discrimination Y. There is bunch of open questions
associated with this process. First of which is -- should we ever
mutate arguments that are specified as const in a discrimination.
Mutating consts is useful, because that's what allows fuzzer to get to
all of these undescribed ioctls/setsockopts/netlink messages/network
packet protocols, etc. But then, should we also change the
discrimination? What to do if 2 disciminations have completely
different arguments, which are not possible to translate? What if we
don't have a discrimination for the new value at all? What if we have
a partial const match for 2 discriminations, but no exact match for
any discrimination. Also note that it's usually more complex than this
ioctl change because consts can be buried deep into sendmsg's data.
Even in this case, the actual ioctl depends on the file type, but we
can as well have an unknown fd type, so even when we do
ioctl$LOOP_SET_FD(r3, 0x4c00, r1), that's not necessasry LOOP_SET_FD.


Re: INFO: task hung in filemap_fault

2018-01-15 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> On Mon, Jan 8, 2018 at 11:48 AM, Tetsuo Handa
>  wrote:
> > Dmitry Vyukov wrote:
> >> >> Hi Tetsuo,
> >> >>
> >> >> syzbot always re-runs the same workload on a new machine. If it
> >> >> manages to reproduce the problem, it provides a reproducer. In this
> >> >> case it didn't.
> >> >
> >> > Even if it did not manage to reproduce the problem, showing raw.log in
> >> > C format is helpful for me. For example,
> >> >
> >> >   ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
> >> >
> >> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
> >> > If the message were
> >> >
> >> >   ioctl(r3, 0x4c00, r1)
> >> >
> >> > more people will be able to read what the program tried to do.
> >> > There are many operations done on loop devices, but are too hard
> >> > for me to pick up only loop related actions.
> >>
> >>
> >> Hi Tetsuo,
> >>
> >> The main purpose of this format is different, this is a complete
> >> representation of programs that allows replaying them using syzkaller
> >> tools.
> >
> > What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
> > 0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?
> 
> 
> In short, it specifies exact discrimination of the syscall which
> affects parsing of the rest of the arguments. For some syscalls
> (ioctl/setsockopt/sendmsg) kernel has hundreds of different
> discriminations with radically different arguments.
> Now if you are asking why the discrimination is LOOP_CHANGE_FD, but
> the actual command is LOOP_SET_FD, that's because this is a fuzzer,
> it's sole purpose is to mess things in unexpected ways.

??? I can't catch what you want to say.

I understand that a fuzzer intentionally tests various cases.
My question is simple. Why don't you use actual command name like
ioctl$LOOP_SET_FD(r3, 0x4c00, r1) ?
Writing like ioctl$LOOP_CHANGE_FD is confusing. I consider it as a bug.

> 
> 
> >>We can't simply drop info from there. Do you propose to add
> >> another attached file that contains the same info in a different
> >> format? What is the exact format you are proposing?
> >
> > Plain C program which can be compiled without installing additional
> > program/library packages (except those needed for building kernels).
> >
> >> Is it just
> >> dropping the syscall name part after $ sign? Note that it's still not
> >> C, more complex syscall generally look as follows:
> >>
> >> perf_event_open(&(0x7fb5a000)={0x42, 0x78, 0x1e2, 0x0,
> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x, 0x0,
> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >> @perf_bp={&(0x7f00)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >> 0x0, 0x0}, 0x0, 0x0, 0x, 0x0)
> >> recvmmsg(0x, &(0x7f003000)=[{{0x0, 0x0,
> >> &(0x7f002000)=[{&(0x7f00a000)=""/193, 0xc1},
> >> {&(0x7f007000-0x3d)=""/61, 0x3d}], 0x2,
> >> &(0x7f005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
> >> &(0x7f003000-0x10)={0x77359400, 0x0})
> >> bpf$PROG_LOAD(0x5, &(0x7f00)={0x1, 0x5,
> >> &(0x7f002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >> 0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
> >> 0x0}}, &(0x7f004000-0xa)='syzkaller\x00', 0x3, 0xc3,
> >> &(0x7f386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)
> >>
> >> Note: you can convert any syzkaller program to equivalent C code using
> >> syz-prog2c utility that comes with syzkaller.
> >
> > I won't install go language into my environment for analyzing/reproducing 
> > your
> > reports. If syz-prog2c is provided as a CGI service (e.g. receive URL 
> > containing
> > raw.log and print the converted C program), I might try it.
> 
> 
> raw.log is not a _program_, it's hundreds of separate programs that
> were executed before the crash. It's also very compressed
> representation as compared to equivalent C programs. For example for
> this program:
> 
> mmap(&(0x7f00/0xfff000)=nil, 0xfff000, 0x3, 0x32,
> 0x, 0x0)
> r0 = socket$nl_generic(0x10, 0x3, 0x10)
> sendmsg$nl_generic(r0,
> &(0x7fb3e000-0x38)={&(0x7fd4a000-0xc)={0x10, 0x0, 0x0, 0x0},
> 0xc, &(0x7f007000)={&(0x7ff7c000-0x15c)={0x24, 0x1c, 0x109,
> 0x, 0x, {0x4, 0x0, 0x0},
> [@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
> 0x0, 0x0, 0x0}, 0x0)
> 
> you can get up to this amount of C code:
> https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt
> 
> that is, 700 lines of C source for 3 line program. So instead of a 1MB
> file that will be 100MB, and then it probably should be a gzip archive
> with hundreds of separate C files. There are people on this list
> complaining even about 200K of attachment

Re: INFO: task hung in filemap_fault

2018-01-15 Thread Dmitry Vyukov
On Mon, Jan 8, 2018 at 11:48 AM, Tetsuo Handa
 wrote:
> Dmitry Vyukov wrote:
>> >> Hi Tetsuo,
>> >>
>> >> syzbot always re-runs the same workload on a new machine. If it
>> >> manages to reproduce the problem, it provides a reproducer. In this
>> >> case it didn't.
>> >
>> > Even if it did not manage to reproduce the problem, showing raw.log in
>> > C format is helpful for me. For example,
>> >
>> >   ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
>> >
>> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
>> > If the message were
>> >
>> >   ioctl(r3, 0x4c00, r1)
>> >
>> > more people will be able to read what the program tried to do.
>> > There are many operations done on loop devices, but are too hard
>> > for me to pick up only loop related actions.
>>
>>
>> Hi Tetsuo,
>>
>> The main purpose of this format is different, this is a complete
>> representation of programs that allows replaying them using syzkaller
>> tools.
>
> What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
> 0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?


In short, it specifies exact discrimination of the syscall which
affects parsing of the rest of the arguments. For some syscalls
(ioctl/setsockopt/sendmsg) kernel has hundreds of different
discriminations with radically different arguments.
Now if you are asking why the discrimination is LOOP_CHANGE_FD, but
the actual command is LOOP_SET_FD, that's because this is a fuzzer,
it's sole purpose is to mess things in unexpected ways.


>>We can't simply drop info from there. Do you propose to add
>> another attached file that contains the same info in a different
>> format? What is the exact format you are proposing?
>
> Plain C program which can be compiled without installing additional
> program/library packages (except those needed for building kernels).
>
>> Is it just
>> dropping the syscall name part after $ sign? Note that it's still not
>> C, more complex syscall generally look as follows:
>>
>> perf_event_open(&(0x7fb5a000)={0x42, 0x78, 0x1e2, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> @perf_bp={&(0x7f00)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0}, 0x0, 0x0, 0x, 0x0)
>> recvmmsg(0x, &(0x7f003000)=[{{0x0, 0x0,
>> &(0x7f002000)=[{&(0x7f00a000)=""/193, 0xc1},
>> {&(0x7f007000-0x3d)=""/61, 0x3d}], 0x2,
>> &(0x7f005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
>> &(0x7f003000-0x10)={0x77359400, 0x0})
>> bpf$PROG_LOAD(0x5, &(0x7f00)={0x1, 0x5,
>> &(0x7f002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
>> 0x0}}, &(0x7f004000-0xa)='syzkaller\x00', 0x3, 0xc3,
>> &(0x7f386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)
>>
>> Note: you can convert any syzkaller program to equivalent C code using
>> syz-prog2c utility that comes with syzkaller.
>
> I won't install go language into my environment for analyzing/reproducing your
> reports. If syz-prog2c is provided as a CGI service (e.g. receive URL 
> containing
> raw.log and print the converted C program), I might try it.


raw.log is not a _program_, it's hundreds of separate programs that
were executed before the crash. It's also very compressed
representation as compared to equivalent C programs. For example for
this program:

mmap(&(0x7f00/0xfff000)=nil, 0xfff000, 0x3, 0x32,
0x, 0x0)
r0 = socket$nl_generic(0x10, 0x3, 0x10)
sendmsg$nl_generic(r0,
&(0x7fb3e000-0x38)={&(0x7fd4a000-0xc)={0x10, 0x0, 0x0, 0x0},
0xc, &(0x7f007000)={&(0x7ff7c000-0x15c)={0x24, 0x1c, 0x109,
0x, 0x, {0x4, 0x0, 0x0},
[@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
0x0, 0x0, 0x0}, 0x0)

you can get up to this amount of C code:
https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt

that is, 700 lines of C source for 3 line program. So instead of a 1MB
file that will be 100MB, and then it probably should be a gzip archive
with hundreds of separate C files. There are people on this list
complaining even about 200K of attachments. I don't see that this will
be better and well accepted.


Re: INFO: task hung in filemap_fault

2018-01-08 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> >> Hi Tetsuo,
> >>
> >> syzbot always re-runs the same workload on a new machine. If it
> >> manages to reproduce the problem, it provides a reproducer. In this
> >> case it didn't.
> >
> > Even if it did not manage to reproduce the problem, showing raw.log in
> > C format is helpful for me. For example,
> >
> >   ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
> >
> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
> > If the message were
> >
> >   ioctl(r3, 0x4c00, r1)
> >
> > more people will be able to read what the program tried to do.
> > There are many operations done on loop devices, but are too hard
> > for me to pick up only loop related actions.
> 
> 
> Hi Tetsuo,
> 
> The main purpose of this format is different, this is a complete
> representation of programs that allows replaying them using syzkaller
> tools.

What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?

>We can't simply drop info from there. Do you propose to add
> another attached file that contains the same info in a different
> format? What is the exact format you are proposing?

Plain C program which can be compiled without installing additional
program/library packages (except those needed for building kernels).

> Is it just
> dropping the syscall name part after $ sign? Note that it's still not
> C, more complex syscall generally look as follows:
> 
> perf_event_open(&(0x7fb5a000)={0x42, 0x78, 0x1e2, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> @perf_bp={&(0x7f00)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0}, 0x0, 0x0, 0x, 0x0)
> recvmmsg(0x, &(0x7f003000)=[{{0x0, 0x0,
> &(0x7f002000)=[{&(0x7f00a000)=""/193, 0xc1},
> {&(0x7f007000-0x3d)=""/61, 0x3d}], 0x2,
> &(0x7f005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
> &(0x7f003000-0x10)={0x77359400, 0x0})
> bpf$PROG_LOAD(0x5, &(0x7f00)={0x1, 0x5,
> &(0x7f002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
> 0x0}}, &(0x7f004000-0xa)='syzkaller\x00', 0x3, 0xc3,
> &(0x7f386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)
> 
> Note: you can convert any syzkaller program to equivalent C code using
> syz-prog2c utility that comes with syzkaller.

I won't install go language into my environment for analyzing/reproducing your
reports. If syz-prog2c is provided as a CGI service (e.g. receive URL containing
raw.log and print the converted C program), I might try it.


Re: INFO: task hung in filemap_fault

2018-01-08 Thread Dmitry Vyukov
On Mon, Jan 1, 2018 at 4:27 PM, Tetsuo Handa
 wrote:
> I suggest syzbot to try linux.git before reporting bugs in linux-next.git.
> You know there are many duplicates caused by an invalid free in pcrypt.
> Soft lockups in ioctl(LOOP_SET_FD) are
>
> /* Avoid recursion */
> f = file;
> while (is_loop_device(f)) {
> struct loop_device *l;
>
> if (f->f_mapping->host->i_bdev == bdev)
> goto out_putf;
>
> l = f->f_mapping->host->i_bdev->bd_disk->private_data;
> if (l->lo_state == Lo_unbound) {
> error = -EINVAL;
> goto out_putf;
> }
> f = l->lo_backing_file;
> }
>
> loop which means that something (maybe memory corruption) is forming circular
> chain, and there seems to be some encryption related parameters/values in
> raw.log file. It is nice to retest a kernel without encryption related things
> and/or a kernel without known encryption related bugs.


Hi Tetsuo,

Let's forget about the single crypto bug. We can't build the system
that handles hundreds of bugs around that single bug which is fixed at
this point. What is the general improvement you are proposing?

Note that some bugs are only in linux.git, some are only in
linux-next.git, some are only in net, kvm, etc, or maybe in some
combination of these. And we generally don't know where a bug is
present and where it is not. We can try to do some assumption _if_ the
bug has a reproducer, but even then most kernel bugs are due to races
and can't be reproduced with 100% probability, or it can't be just
that the same bug can be reproduced on a different tree but requires a
slightly different reproducer. So any such assumptions won't be 100%
reliable, and any flaw in information syzbot provides usually provokes
lots of very negative reaction from kernel developers.


Re: INFO: task hung in filemap_fault

2018-01-08 Thread Dmitry Vyukov
On Mon, Jan 1, 2018 at 4:27 PM, Tetsuo Handa
 wrote:
> Dmitry Vyukov wrote:
>> On Mon, Dec 18, 2017 at 3:52 PM, Tetsuo Handa
>>  wrote:
>> > On 2017/12/18 17:43, syzbot wrote:
>> >> Hello,
>> >>
>> >> syzkaller hit the following crash on 
>> >> 6084b576dca2e898f5c101baef151f7bfdbb606d
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> >> compiler: gcc (GCC) 7.1.1 20170620
>> >> .config is attached
>> >> Raw console output is attached.
>> >>
>> >> Unfortunately, I don't have any reproducer for this bug yet.
>> >>
>> >
>> > This log has a lot of mmap() but also has Android's binder messages.
>> >
>> > r9 = syz_open_dev$binder(&(0x7f00)='/dev/binder#\x00', 0x0, 0x800)
>> >
>> > [   49.200735] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 
>> > 2 ret -22
>> > [   49.221514] binder: 9749:9755 Acquire 1 refcount change on invalid ref 
>> > 4 ret -22
>> > [   49.233325] binder: 9749:9755 Acquire 1 refcount change on invalid ref 
>> > 0 ret -22
>> > [   49.241979] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
>> > failed -1
>> > [   49.256949] binder: 9749:9755 unknown command 0
>> > [   49.262470] binder: 9749:9755 ioctl c0306201 2fd0 returned -22
>> > [   49.293365] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 
>> > 2 ret -22
>> > [   49.301297] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
>> > failed -1
>> > [   49.314146] binder: 9749:9755 Acquire 1 refcount change on invalid ref 
>> > 4 ret -22
>> > [   49.322732] binder: 9749:9755 Acquire 1 refcount change on invalid ref 
>> > 0 ret -22
>> > [   49.332063] binder: 9749:9755 Release 1 refcount change on invalid ref 
>> > 1 ret -22
>> > [   49.340796] binder: 9749:9755 Acquire 1 refcount change on invalid ref 
>> > 2 ret -22
>> > [   49.349457] binder: 9749:9755 BC_DEAD_BINDER_DONE 0001 not 
>> > found
>> > [   49.349462] binder: 9749:9755 BC_DEAD_BINDER_DONE  not 
>> > found
>> >
>> > [  246.752088] INFO: task syz-executor7:10280 blocked for more than 120 
>> > seconds.
>> >
>> > Anything that hung in 126.75 >= uptime > 6.75 can be reported at uptime = 
>> > 246.75, can't it?
>> > khungtaskd warning with 120 seconds check interval can be delayed for up 
>> > to 240 seconds.
>> >
>> > Is it possible to reproduce this problem by running the same program?
>>
>>
>> Hi Tetsuo,
>>
>> syzbot always re-runs the same workload on a new machine. If it
>> manages to reproduce the problem, it provides a reproducer. In this
>> case it didn't.
>
> Even if it did not manage to reproduce the problem, showing raw.log in
> C format is helpful for me. For example,
>
>   ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
>
> is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
> If the message were
>
>   ioctl(r3, 0x4c00, r1)
>
> more people will be able to read what the program tried to do.
> There are many operations done on loop devices, but are too hard
> for me to pick up only loop related actions.


Hi Tetsuo,

The main purpose of this format is different, this is a complete
representation of programs that allows replaying them using syzkaller
tools. We can't simply drop info from there. Do you propose to add
another attached file that contains the same info in a different
format? What is the exact format you are proposing? Is it just
dropping the syscall name part after $ sign? Note that it's still not
C, more complex syscall generally look as follows:

perf_event_open(&(0x7fb5a000)={0x42, 0x78, 0x1e2, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
@perf_bp={&(0x7f00)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0}, 0x0, 0x0, 0x, 0x0)
recvmmsg(0x, &(0x7f003000)=[{{0x0, 0x0,
&(0x7f002000)=[{&(0x7f00a000)=""/193, 0xc1},
{&(0x7f007000-0x3d)=""/61, 0x3d}], 0x2,
&(0x7f005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
&(0x7f003000-0x10)={0x77359400, 0x0})
bpf$PROG_LOAD(0x5, &(0x7f00)={0x1, 0x5,
&(0x7f002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
0x0}}, &(0x7f004000-0xa)='syzkaller\x00', 0x3, 0xc3,
&(0x7f386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)

Note: you can convert any syzkaller program to equivalent C code using
syz-prog2c utility that comes with syzkaller.


Re: INFO: task hung in filemap_fault

2018-01-01 Thread Tetsuo Handa
Dmitry Vyukov wrote:
> On Mon, Dec 18, 2017 at 3:52 PM, Tetsuo Handa
>  wrote:
> > On 2017/12/18 17:43, syzbot wrote:
> >> Hello,
> >>
> >> syzkaller hit the following crash on 
> >> 6084b576dca2e898f5c101baef151f7bfdbb606d
> >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> >> compiler: gcc (GCC) 7.1.1 20170620
> >> .config is attached
> >> Raw console output is attached.
> >>
> >> Unfortunately, I don't have any reproducer for this bug yet.
> >>
> >
> > This log has a lot of mmap() but also has Android's binder messages.
> >
> > r9 = syz_open_dev$binder(&(0x7f00)='/dev/binder#\x00', 0x0, 0x800)
> >
> > [   49.200735] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 
> > ret -22
> > [   49.221514] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 
> > ret -22
> > [   49.233325] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 
> > ret -22
> > [   49.241979] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
> > failed -1
> > [   49.256949] binder: 9749:9755 unknown command 0
> > [   49.262470] binder: 9749:9755 ioctl c0306201 2fd0 returned -22
> > [   49.293365] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 
> > ret -22
> > [   49.301297] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
> > failed -1
> > [   49.314146] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 
> > ret -22
> > [   49.322732] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 
> > ret -22
> > [   49.332063] binder: 9749:9755 Release 1 refcount change on invalid ref 1 
> > ret -22
> > [   49.340796] binder: 9749:9755 Acquire 1 refcount change on invalid ref 2 
> > ret -22
> > [   49.349457] binder: 9749:9755 BC_DEAD_BINDER_DONE 0001 not 
> > found
> > [   49.349462] binder: 9749:9755 BC_DEAD_BINDER_DONE  not 
> > found
> >
> > [  246.752088] INFO: task syz-executor7:10280 blocked for more than 120 
> > seconds.
> >
> > Anything that hung in 126.75 >= uptime > 6.75 can be reported at uptime = 
> > 246.75, can't it?
> > khungtaskd warning with 120 seconds check interval can be delayed for up to 
> > 240 seconds.
> >
> > Is it possible to reproduce this problem by running the same program?
> 
> 
> Hi Tetsuo,
> 
> syzbot always re-runs the same workload on a new machine. If it
> manages to reproduce the problem, it provides a reproducer. In this
> case it didn't.

Even if it did not manage to reproduce the problem, showing raw.log in
C format is helpful for me. For example,

  ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)

is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
If the message were

  ioctl(r3, 0x4c00, r1)

more people will be able to read what the program tried to do.
There are many operations done on loop devices, but are too hard
for me to pick up only loop related actions.

> 
> The program that triggered this is this one (number 7 matches task
> syz-executor7):
> 
> 2017/12/18 06:16:18 executing program 7:
> 
> It has only 2 mmaps. The first one is pretty standard, but the second
> one mmaps loop device:
> 
> r7 = syz_open_dev$loop(&(0x7fe58000-0xb)='/dev/loop#\x00', 0x0, 0x4102)
> mmap(&(0x7fe5b000/0x1000)=nil, 0x1000, 0x3, 0x2011, r7, 0x0)
> 
> We have a bunch of hangs around /dev/loop:
> 
> https://groups.google.com/forum/#!msg/syzkaller-bugs/qzz2v1M93O4/DjHEEvq5AQAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/jy-bXYbRh7c/a1dQYyD9CgAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/vjGYuMMspAU/K3oOF_eHCgAJ
> https://groups.google.com/forum/#!msg/syzkaller-bugs/BwpEc6q6gFY/5kHDMGElAgAJ
> 
> Probably related to these ones.
> +loop maintainers.
> 

I suggest syzbot to try linux.git before reporting bugs in linux-next.git.
You know there are many duplicates caused by an invalid free in pcrypt.
Soft lockups in ioctl(LOOP_SET_FD) are

/* Avoid recursion */
f = file;
while (is_loop_device(f)) {
struct loop_device *l;

if (f->f_mapping->host->i_bdev == bdev)
goto out_putf;

l = f->f_mapping->host->i_bdev->bd_disk->private_data;
if (l->lo_state == Lo_unbound) {
error = -EINVAL;
goto out_putf;
}
f = l->lo_backing_file;
}

loop which means that something (maybe memory corruption) is forming circular
chain, and there seems to be some encryption related parameters/values in
raw.log file. It is nice to retest a kernel without encryption related things
and/or a kernel without known encryption related bugs.


Re: INFO: task hung in filemap_fault

2017-12-28 Thread Dmitry Vyukov
On Mon, Dec 18, 2017 at 3:52 PM, Tetsuo Handa
 wrote:
> On 2017/12/18 17:43, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on 6084b576dca2e898f5c101baef151f7bfdbb606d
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> Unfortunately, I don't have any reproducer for this bug yet.
>>
>
> This log has a lot of mmap() but also has Android's binder messages.
>
> r9 = syz_open_dev$binder(&(0x7f00)='/dev/binder#\x00', 0x0, 0x800)
>
> [   49.200735] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 
> ret -22
> [   49.221514] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 
> ret -22
> [   49.233325] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 
> ret -22
> [   49.241979] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
> failed -1
> [   49.256949] binder: 9749:9755 unknown command 0
> [   49.262470] binder: 9749:9755 ioctl c0306201 2fd0 returned -22
> [   49.293365] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 
> ret -22
> [   49.301297] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
> failed -1
> [   49.314146] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 
> ret -22
> [   49.322732] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 
> ret -22
> [   49.332063] binder: 9749:9755 Release 1 refcount change on invalid ref 1 
> ret -22
> [   49.340796] binder: 9749:9755 Acquire 1 refcount change on invalid ref 2 
> ret -22
> [   49.349457] binder: 9749:9755 BC_DEAD_BINDER_DONE 0001 not 
> found
> [   49.349462] binder: 9749:9755 BC_DEAD_BINDER_DONE  not 
> found
>
> [  246.752088] INFO: task syz-executor7:10280 blocked for more than 120 
> seconds.
>
> Anything that hung after uptime > 46.75 can be reported at uptime = 246.75, 
> can't it?
>
> Is it possible to reproduce this problem by running the same program?


Hi Tetsuo,

syzbot always re-runs the same workload on a new machine. If it
manages to reproduce the problem, it provides a reproducer. In this
case it didn't.

The program that triggered this is this one (number 7 matches task
syz-executor7):

2017/12/18 06:16:18 executing program 7:

It has only 2 mmaps. The first one is pretty standard, but the second
one mmaps loop device:

r7 = syz_open_dev$loop(&(0x7fe58000-0xb)='/dev/loop#\x00', 0x0, 0x4102)
mmap(&(0x7fe5b000/0x1000)=nil, 0x1000, 0x3, 0x2011, r7, 0x0)

We have a bunch of hangs around /dev/loop:

https://groups.google.com/forum/#!msg/syzkaller-bugs/qzz2v1M93O4/DjHEEvq5AQAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/jy-bXYbRh7c/a1dQYyD9CgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/vjGYuMMspAU/K3oOF_eHCgAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/BwpEc6q6gFY/5kHDMGElAgAJ

Probably related to these ones.
+loop maintainers.



>> INFO: task syz-executor7:10280 blocked for more than 120 seconds.
>>   Not tainted 4.15.0-rc3-next-20171214+ #67
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> syz-executor7   D0 10280   3310 0x0004
>> Call Trace:
>>  context_switch kernel/sched/core.c:2800 [inline]
>>  __schedule+0x30b/0xaf0 kernel/sched/core.c:3376
>>  schedule+0x2e/0x90 kernel/sched/core.c:3435
>>  io_schedule+0x11/0x40 kernel/sched/core.c:5043
>>  wait_on_page_bit_common mm/filemap.c:1099 [inline]
>>  wait_on_page_bit mm/filemap.c:1132 [inline]
>>  wait_on_page_locked include/linux/pagemap.h:530 [inline]
>>  __lock_page_or_retry+0x391/0x3e0 mm/filemap.c:1310
>>  lock_page_or_retry include/linux/pagemap.h:510 [inline]
>>  filemap_fault+0x61c/0xa70 mm/filemap.c:2532
>>  __do_fault+0x23/0xa4 mm/memory.c:3206
>>  do_read_fault mm/memory.c:3616 [inline]
>>  do_fault mm/memory.c:3716 [inline]
>>  handle_pte_fault mm/memory.c:3947 [inline]
>>  __handle_mm_fault+0x10b5/0x1930 mm/memory.c:4071
>>  handle_mm_fault+0x215/0x450 mm/memory.c:4108
>>  faultin_page mm/gup.c:502 [inline]
>>  __get_user_pages+0x1ff/0x980 mm/gup.c:699
>>  populate_vma_page_range+0xa1/0xb0 mm/gup.c:1200
>>  __mm_populate+0xcc/0x190 mm/gup.c:1250
>>  mm_populate include/linux/mm.h:2233 [inline]
>>  vm_mmap_pgoff+0x103/0x110 mm/util.c:338
>>  SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
>>  SyS_mmap_pgoff+0x215/0x2c0 mm/mmap.c:1491
>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
>>  entry_SYSCALL_64_fastpath+0x1f/0x96
>> RIP: 0033:0x452a09
>> RSP: 002b:7efce66dac58 EFLAGS: 0212 ORIG_RAX: 0009
>> RAX: ffda RBX: 0071bea0 RCX: 00452a09
>> RDX: 0003 RSI: 1000 RDI: 20e5b000
>> RBP: 0033 R08: 0016 R09: 
>> R10: 2011 R11: 0212 R12: 006ed568
>> R13:  R14: 7efce66db6d4 R15: 
>
> --
> You received th

Re: INFO: task hung in filemap_fault

2017-12-18 Thread Tetsuo Handa
On 2017/12/18 23:52, Tetsuo Handa wrote:
> On 2017/12/18 17:43, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on 6084b576dca2e898f5c101baef151f7bfdbb606d
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> Unfortunately, I don't have any reproducer for this bug yet.
>>
> 
> This log has a lot of mmap() but also has Android's binder messages.
> 
> r9 = syz_open_dev$binder(&(0x7f00)='/dev/binder#\x00', 0x0, 0x800)
> 
> [   49.200735] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 
> ret -22
> [   49.221514] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 
> ret -22
> [   49.233325] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 
> ret -22
> [   49.241979] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
> failed -1
> [   49.256949] binder: 9749:9755 unknown command 0
> [   49.262470] binder: 9749:9755 ioctl c0306201 2fd0 returned -22
> [   49.293365] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 
> ret -22
> [   49.301297] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags 
> failed -1
> [   49.314146] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 
> ret -22
> [   49.322732] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 
> ret -22
> [   49.332063] binder: 9749:9755 Release 1 refcount change on invalid ref 1 
> ret -22
> [   49.340796] binder: 9749:9755 Acquire 1 refcount change on invalid ref 2 
> ret -22
> [   49.349457] binder: 9749:9755 BC_DEAD_BINDER_DONE 0001 not 
> found
> [   49.349462] binder: 9749:9755 BC_DEAD_BINDER_DONE  not 
> found
> 
> [  246.752088] INFO: task syz-executor7:10280 blocked for more than 120 
> seconds.
> 
> Anything that hung after uptime > 46.75 can be reported at uptime = 246.75, 
> can't it?

Typo. I wanted to say 126.75 >= uptime > 6.75.
khungtaskd warning with 120 seconds check interval can be delayed for up to 240 
seconds.

> 
> Is it possible to reproduce this problem by running the same program?


Re: INFO: task hung in filemap_fault

2017-12-18 Thread Tetsuo Handa
On 2017/12/18 17:43, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 

This log has a lot of mmap() but also has Android's binder messages.

r9 = syz_open_dev$binder(&(0x7f00)='/dev/binder#\x00', 0x0, 0x800)

[   49.200735] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 ret 
-22
[   49.221514] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 ret 
-22
[   49.233325] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 ret 
-22
[   49.241979] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags failed 
-1
[   49.256949] binder: 9749:9755 unknown command 0
[   49.262470] binder: 9749:9755 ioctl c0306201 2fd0 returned -22
[   49.293365] binder: 9749:9755 IncRefs 0 refcount change on invalid ref 2 ret 
-22
[   49.301297] binder: binder_mmap: 9749 205a3000-205a7000 bad vm_flags failed 
-1
[   49.314146] binder: 9749:9755 Acquire 1 refcount change on invalid ref 4 ret 
-22
[   49.322732] binder: 9749:9755 Acquire 1 refcount change on invalid ref 0 ret 
-22
[   49.332063] binder: 9749:9755 Release 1 refcount change on invalid ref 1 ret 
-22
[   49.340796] binder: 9749:9755 Acquire 1 refcount change on invalid ref 2 ret 
-22
[   49.349457] binder: 9749:9755 BC_DEAD_BINDER_DONE 0001 not found
[   49.349462] binder: 9749:9755 BC_DEAD_BINDER_DONE  not found

[  246.752088] INFO: task syz-executor7:10280 blocked for more than 120 seconds.

Anything that hung after uptime > 46.75 can be reported at uptime = 246.75, 
can't it?

Is it possible to reproduce this problem by running the same program?

> 
> INFO: task syz-executor7:10280 blocked for more than 120 seconds.
>   Not tainted 4.15.0-rc3-next-20171214+ #67
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor7   D    0 10280   3310 0x0004
> Call Trace:
>  context_switch kernel/sched/core.c:2800 [inline]
>  __schedule+0x30b/0xaf0 kernel/sched/core.c:3376
>  schedule+0x2e/0x90 kernel/sched/core.c:3435
>  io_schedule+0x11/0x40 kernel/sched/core.c:5043
>  wait_on_page_bit_common mm/filemap.c:1099 [inline]
>  wait_on_page_bit mm/filemap.c:1132 [inline]
>  wait_on_page_locked include/linux/pagemap.h:530 [inline]
>  __lock_page_or_retry+0x391/0x3e0 mm/filemap.c:1310
>  lock_page_or_retry include/linux/pagemap.h:510 [inline]
>  filemap_fault+0x61c/0xa70 mm/filemap.c:2532
>  __do_fault+0x23/0xa4 mm/memory.c:3206
>  do_read_fault mm/memory.c:3616 [inline]
>  do_fault mm/memory.c:3716 [inline]
>  handle_pte_fault mm/memory.c:3947 [inline]
>  __handle_mm_fault+0x10b5/0x1930 mm/memory.c:4071
>  handle_mm_fault+0x215/0x450 mm/memory.c:4108
>  faultin_page mm/gup.c:502 [inline]
>  __get_user_pages+0x1ff/0x980 mm/gup.c:699
>  populate_vma_page_range+0xa1/0xb0 mm/gup.c:1200
>  __mm_populate+0xcc/0x190 mm/gup.c:1250
>  mm_populate include/linux/mm.h:2233 [inline]
>  vm_mmap_pgoff+0x103/0x110 mm/util.c:338
>  SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
>  SyS_mmap_pgoff+0x215/0x2c0 mm/mmap.c:1491
>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x452a09
> RSP: 002b:7efce66dac58 EFLAGS: 0212 ORIG_RAX: 0009
> RAX: ffda RBX: 0071bea0 RCX: 00452a09
> RDX: 0003 RSI: 1000 RDI: 20e5b000
> RBP: 0033 R08: 0016 R09: 
> R10: 2011 R11: 0212 R12: 006ed568
> R13:  R14: 7efce66db6d4 R15: