Hi, Tao On Thu, Jun 27, 2024 at 12:08 PM Tao Liu <l...@redhat.com> wrote:
> Hi Lianbo, > > Thanks for the patch. > > On Fri, Jun 14, 2024 at 3:00 PM Lianbo Jiang <liji...@redhat.com> wrote: > > > > Sometimes, in a production environment, there are still some vmcores > > that are incomplete, such as partial header or the data is corrupted. > > When crash tool attempts to parse such vmcores, it may fail as below: > > > > $ ./crash --osrelease vmcore > > Bus error (core dumped) > > > > or > > > > $ crash vmlinux vmcore > > ... > > Bus error (core dumped) > > $ > > > > The gdb sees that crash tool reads out a null bitmap from the header in > > this vmcore, when executing memcpy(), emits a SIGBUS error as below: > > > > $ gdb /home/lijiang/src/crash/crash /tmp/core.126301 > > Core was generated by `./crash --osrelease > /home/lijiang/src/39317/vmcore'. > > Program terminated with signal SIGBUS, Bus error. > > #0 __memcpy_evex_unaligned_erms () at > ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831 > > 831 LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), > %VMM(6), %VMM(7)) > > (gdb) bt > > #0 __memcpy_evex_unaligned_erms () at > ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831 > > #1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f > "/home/lijiang/src/39317/vmcore") at diskdump.c:820 > > #2 0x0000000000651cf3 in is_diskdump (file=0x7ffc59ddff5f > "/home/lijiang/src/39317/vmcore") at diskdump.c:1042 > > #3 0x0000000000502ac9 in get_osrelease (dumpfile=0x7ffc59ddff5f > "/home/lijiang/src/39317/vmcore") at main.c:1938 > > #4 0x00000000004fb2e8 in main (argc=3, argv=0x7ffc59dde3a8) at > main.c:271 > > (gdb) frame 1 > > #1 0x0000000000651096 in read_dump_header (file=0x7ffc59ddff5f > "/home/lijiang/src/39317/vmcore") at diskdump.c:820 > > 820 memcpy(dd->dumpable_bitmap, dd->bitmap + > bitmap_len/2, > > (gdb) p dd->dumpable_bitmap > > $1 = 0x7f8e89800010 "" > > (gdb) p dd->bitmap > > $2 = 0x7f8e87e09000 "" > > (gdb) p dd->bitmap + bitmap_len/2 > > $3 = 0x7f8e88a17000 "" > > (gdb) p *(char*)(dd->bitmap+bitmap_len/2) > > $4 = 0 '\000' > > (gdb) p bitmap_len/2 > > $5 = 12640256 > > > > Let's add a sanity check for such cases to avoid causing a SIGBUS error. > > > I didn't really understand the reason why the SIGBUS error occurs. Is > it because bitmap_len/2 is too large(12M), which exceeded the > dd->bitmap memory range? > I did not see any limitations on the bitmap memory range. Looks like an unaligned memory address was accessed, please see here: #0 __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:831 831 LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7)) Maybe the vmcore is corrupted, anyway I have never seen it before this issue. > Why *(char*)(dd->bitmap+bitmap_len/2) == '\0' will be considered as an > error, do we expect a value like "0xff" here? > No. If it is a null string, no need to copy it(or it may be an exceptional string because of corrupted data). > > I guess what we want here is to handle SIGBUS errors nicely right? why > don't we add a SIGBUS handler and process it directly? > This may prevent a core dump file from being generated. And users can easily report it with a core dump file, sometimes It's hard to reproduce for us. Thanks Lianbo > Thanks, > Tao Liu > > > With the patch: > > $ crash -s vmlinux vmcore > > crash: vmcore: not a supported file format > > ... > > Enter "crash -h" for details. > > > > $ crash --osrelease vmcore > > unknown > > > > Reported-by: Buland Kumar Singh <bsi...@redhat.com> > > Signed-off-by: Lianbo Jiang <liji...@redhat.com> > > --- > > diskdump.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/diskdump.c b/diskdump.c > > index 1f7118cacfc6..c31eab01aa05 100644 > > --- a/diskdump.c > > +++ b/diskdump.c > > @@ -814,10 +814,12 @@ restart: > > madvise(dd->bitmap, bitmap_len, MADV_WILLNEED); > > } > > > > - if (dump_is_partial(header)) > > + if (dump_is_partial(header)) { > > + if (*(char*)(dd->bitmap + bitmap_len/2) == '\0') > > + goto err; > > memcpy(dd->dumpable_bitmap, dd->bitmap + bitmap_len/2, > > bitmap_len/2); > > - else > > + } else > > memcpy(dd->dumpable_bitmap, dd->bitmap, bitmap_len); > > > > dd->data_offset > > -- > > 2.45.1 > > -- > > Crash-utility mailing list -- devel@lists.crash-utility.osci.io > > To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ > > Contribution Guidelines: https://github.com/crash-utility/crash/wiki > >
-- Crash-utility mailing list -- devel@lists.crash-utility.osci.io To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki