Hi, Tao
On Thu, Jun 27, 2024 at 12:08 PM Tao Liu <l...@redhat.com> wrote:

> Hi Lianbo,
>
> Thanks for the patch.
>
> On Fri, Jun 14, 2024 at 3:00 PM Lianbo Jiang <liji...@redhat.com> wrote:
> >
> > Sometimes, in a production environment, there are still some vmcores
> > that are incomplete, such as partial header or the data is corrupted.
> > When crash tool attempts to parse such vmcores, it may fail as below:
> >
> >   $ ./crash --osrelease vmcore
> >   Bus error (core dumped)
> >
> > or
> >
> >   $ crash vmlinux vmcore
> >   ...
> >   Bus error (core dumped)
> >  $
> >
> > The gdb sees that crash tool reads out a null bitmap from the header in
> > this vmcore, when executing memcpy(), emits a SIGBUS error as below:
> >
> >   $ gdb /home/lijiang/src/crash/crash /tmp/core.126301
> >   Core was generated by `./crash --osrelease
> /home/lijiang/src/39317/vmcore'.
> >   Program terminated with signal SIGBUS, Bus error.
> >   #0  __memcpy_evex_unaligned_erms () at
> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
> >   831             LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5),
> %VMM(6), %VMM(7))
> >   (gdb) bt
> >   #0  __memcpy_evex_unaligned_erms () at
> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
> >   #1  0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at diskdump.c:820
> >   #2  0x0000000000651cf3 in is_diskdump (file=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at diskdump.c:1042
> >   #3  0x0000000000502ac9 in get_osrelease (dumpfile=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at main.c:1938
> >   #4  0x00000000004fb2e8 in main (argc=3, argv=0x7ffc59dde3a8) at
> main.c:271
> >   (gdb) frame 1
> >   #1  0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f
> "/home/lijiang/src/39317/vmcore") at diskdump.c:820
> >   820                   memcpy(dd->dumpable_bitmap, dd->bitmap +
> bitmap_len/2,
> >   (gdb) p dd->dumpable_bitmap
> >   $1 = 0x7f8e89800010 ""
> >   (gdb) p dd->bitmap
> >   $2 = 0x7f8e87e09000 ""
> >   (gdb) p dd->bitmap + bitmap_len/2
> >   $3 = 0x7f8e88a17000 ""
> >   (gdb) p *(char*)(dd->bitmap+bitmap_len/2)
> >   $4 = 0 '\000'
> >   (gdb) p bitmap_len/2
> >   $5 = 12640256
> >
> > Let's add a sanity check for such cases to avoid causing a SIGBUS error.
> >
> I didn't really understand the reason why the SIGBUS error occurs. Is
> it because bitmap_len/2 is too large(12M), which exceeded the
> dd->bitmap memory range?
>

I did not see any limitations on the bitmap memory range.

Looks like an unaligned memory address was accessed, please see here:

#0  __memcpy_evex_unaligned_erms () at
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831
  831             LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5),
%VMM(6), %VMM(7))

Maybe the vmcore is corrupted, anyway I have never seen it before this
issue.


> Why *(char*)(dd->bitmap+bitmap_len/2) == '\0' will be considered as an
> error, do we expect a value like "0xff" here?
>

No.
If it is a null string, no need to copy it(or it may be an exceptional
string because of corrupted data).


>
> I guess what we want here is to handle SIGBUS errors nicely right? why
> don't we add a SIGBUS handler and process it directly?
>

This may prevent a core dump file from being generated.
And users can easily report it with a core dump file, sometimes It's hard
to reproduce for us.

Thanks
Lianbo


> Thanks,
> Tao Liu
>
> > With the patch:
> >   $ crash -s vmlinux vmcore
> >   crash: vmcore: not a supported file format
> >   ...
> >   Enter "crash -h" for details.
> >
> >   $ crash --osrelease vmcore
> >   unknown
> >
> > Reported-by: Buland Kumar Singh <bsi...@redhat.com>
> > Signed-off-by: Lianbo Jiang <liji...@redhat.com>
> > ---
> >  diskdump.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/diskdump.c b/diskdump.c
> > index 1f7118cacfc6..c31eab01aa05 100644
> > --- a/diskdump.c
> > +++ b/diskdump.c
> > @@ -814,10 +814,12 @@ restart:
> >                 madvise(dd->bitmap, bitmap_len, MADV_WILLNEED);
> >         }
> >
> > -       if (dump_is_partial(header))
> > +       if (dump_is_partial(header)) {
> > +               if (*(char*)(dd->bitmap + bitmap_len/2) == '\0')
> > +                       goto err;
> >                 memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2,
> >                        bitmap_len/2);
> > -       else
> > +       } else
> >                 memcpy(dd->dumpable_bitmap, dd->bitmap, bitmap_len);
> >
> >         dd->data_offset
> > --
> > 2.45.1
> > --
> > Crash-utility mailing list -- devel@lists.crash-utility.osci.io
> > To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
> > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> > Contribution Guidelines: https://github.com/crash-utility/crash/wiki
>
>
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to