Hi Stephen,

On Fri, Apr 4, 2025 at 11:43 AM Stephen Brennan
<stephen.s.bren...@oracle.com> wrote:
>
> Stephen Brennan <stephen.s.bren...@oracle.com> writes:
> > Tao Liu <l...@redhat.com> writes:
> >> Hi Stephen,
> >>
> >> Thanks for reporting the issue and patch. Please check if I understood
> >> you correctly. The correct output you are expecting in your case is:
> >>
> >>     crash> sym ata_dummy_port_ops
> >>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
> >>     crash> mod -s libata
> >>          MODULE       NAME                      TEXT_BASE         SIZE
> >>  OBJECT FILE
> >>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
> >>     
> >> /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
> >>     crash> sym ata_dummy_port_ops
> >>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
> >>     crash> p/x &ata_dummy_port_ops
> >>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
> >> same as sym's output, right?
> >
> > Yes, that's correct.
> >
> >> If that is the case, then after applied your patch, the issue still
> >> exists on my machine:
> >>
> >> crash> sym fuse_miscdevice
> >> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
> >> crash> mod -s fuse
> >>      MODULE       NAME                             TEXT_BASE
> >> SIZE  OBJECT FILE
> >> ffffffffc05f8dc0  fuse                          ffffffffc05da000
> >> 233472  
> >> /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> crash> sym fuse_miscdevice
> >> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
> >> crash> p/x &fuse_miscdevice
> >> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.
> >
> > Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
> > the same version). And I see the same behavior with fuse_miscdevice.
> >
> > In your case and mine, the offset is 0x1d0c0, which could be a clue to
> > how this mismatch happens.
> >
> >> The .data section of fuse.ko.debug have non-zero address:
> >>
> >> $ readelf -S 
> >> /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> Section Headers:
> >>   [Nr] Name              Type             Address           Offset
> >>        Size              EntSize          Flags  Link  Info  Align
> >> ...
> >> [50] .data             NOBITS           0000000000000e20  00000100
> >>        000000000000080f  0000000000000000  WA       0     0     32
> >>
> >> Could you please re-check your patch for this, or is there something
> >> I'm missing?
> >
> > From what I can tell, the important difference here may be that the
> > symbol you selected is a static variable (which becomes a local ELF
> > symbol). Apparently, my patch only resolves the issue for *global
> > variables*, but not for local variables.
> >
> > $ eu-readelf -s /usr/lib/debug/lib/modules/$(uname 
> > -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
> >    85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 
> > fuse_miscdevice
> >  1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex
> >
> > Here is the behavior of crash's master branch on these two symbols:
> >
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> > crash> mod -s fuse
> >      MODULE       NAME                                  TEXT_BASE         
> > SIZE  OBJECT FILE
> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   
> > 233472  
> > /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> > crash> p/x &fuse_miscdevice
> > $1 = 0xffffffffc02ddf20
> > crash> p/x &fuse_mutex
> > $2 = 0xffffffffc02fa680
> >
> > Both fuse_miscdevice, and fuse_mutex are mismatched (with different
> > offsets). And now, here's the behavior of crash with my patch:
> >
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> > crash> mod -s fuse
> >      MODULE       NAME                                  TEXT_BASE         
> > SIZE  OBJECT FILE
> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   
> > 233472  
> > /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> > crash> p/x &fuse_miscdevice
> > $1 = 0xffffffffc02ddf20
> > crash> p/x &fuse_mutex
> > $2 = 0xffffffffc02fb4a0
> >
> > Here the global fuse_mutex is correct, but the static/local
> > fuse_miscdevice is still incorrect. This indicates that there's more yet
> > to fix. (But at least my patch does part of the work!)
> >
> > I'm wondering whether this has anything to do with ELF relocations.
>
> Hello Tao,

Thanks for your detailed debug info, which is quite helpful! For your
No.1 patch which fixes the global variable relocation, it looks good.
However, for your No.2 patch in the attached file, I have some
questions.
>
> I've continued debugging and verified that the root cause for this was
> GDB performing relocations to the DWARF info incorrectly. At the bottom

1) Is this a gdb bug? If it does, then the proper way is to post No.2
patch to gdb mailing list and get it fixed there, then we can do gdb
patch backporting to crash. Frankly I haven't figured out whether it
is gdb or crash bug so far by myself...

2) I'm quoting part of the commit log in your No.2 patch has follows
(Please don't send patch as attachment next time, because I cannot
comment directly on it)

...
1. In default_symfile_offsets(), GDB detects the nonzero sh_addr field
   and fails to apply the user-supplied section offsets. The result is
   that later, in symfile_relocate_debug_section(), relocations are
   applied to the DWARF info using the wrong section addresses, which

Didn't we have corrected the .data section address in No.1 patch of
the .ko? Then why the "wrong section addresses"?

   results in invalid addresses for variables. Clearly, this has
   happened before, because crash has special-cased the "__ksymtab*"

Which line of code do you mean the special-cased __ksymtab? I didn't
find one in gdb/symfile.c.

   section names to avoid this condition. To resolve this, we simply
   drop the check for nonzero sh_addr altogether: the only ET_REL files
   we encounter should be kernel modules, so there's no real reason to
   be picky.

2. Even with that fixed, the user-supplied section addresses are
   clobbered by the addr_info_make_relative(), which subtracts out
   section offsets during its operation. To resolve this, undo the
   operation for ET_REL files where a section address was provided by
   the user (i.e. crash).
...

According to your description, it seems to me the gdb cannot handle
the nonzero sh_addr in the .ko case, but .ko isn't the only one whose
.data VMA is non-zero, see this case:

$ readelf -S /lib64/libc.so.6
...
[30] .data             PROGBITS         00000000001e8000  001e8000
       00000000000016c8  0000000000000000  WA       0     0     32
...

I don't know if gdb also fails in the userspace .so case? I mean, if
gdb can handle the user case with no problem, then probably we don't
need to hack gdb like this.

Thanks,
Tao Liu


> of this email I've attached my patch, which you can apply to the master
> branch to test out my proposed fix. The patch file also contains a
> longer explanation of what I believe the issue is.
>
> However, since it updates the gdb-16.2.patch file, it's quite difficult
> to understand. Thus, I'll also provide a diff here of the relevant GDB
> source file.
>
> I'm not certain how the GDB patches are normally updated & reviewed. I
> also recognize that this change is a bit messy, so I'd appreciate your
> feedback as well.
>
> Thank you,
> Stephen
>
> --- tmp/gdb-16.2/gdb/symfile.c  2025-04-03 15:38:26.093760270 -0700
> +++ gdb-16.2/gdb/symfile.c      2025-04-03 15:07:43.239691104 -0700
> @@ -341,8 +341,13 @@ place_section (bfd *abfd, asection *sect
>      return;
>
>    /* If the user specified an offset, honor it.  */
> -  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0)
> +  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0) {
> +    /* addr_info_make_relative() subtracts out the section VMA. But if the 
> user
> +     * specified an offset, they have already taken this into account. Add it
> +     * back in */
> +    offsets[gdb_bfd_section_index (abfd, sect)] += bfd_section_vma(sect);
>      return;
> +  }
>
>    /* Otherwise, let's try to find a place for the section.  */
>    start_addr = (lowest + align - 1) & -align;
> @@ -628,35 +633,8 @@ default_symfile_offsets (struct objfile
>    if ((bfd_get_file_flags (objfile->obfd.get ()) & (EXEC_P | DYNAMIC)) == 0)
>      {
>        bfd *abfd = objfile->obfd.get ();
> -      asection *cur_sec;
> +      asection *cur_sec = NULL;
>
> -      for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = 
> cur_sec->next)
> -       /* We do not expect this to happen; just skip this step if the
> -          relocatable file has a section with an assigned VMA.  */
> -        if (bfd_section_vma (cur_sec) != 0
> -           /*
> -            *  Kernel modules may have some non-zero VMAs, i.e., like the
> -            *  __ksymtab and __ksymtab_gpl sections in this example:
> -            *
> -            *    Section Headers:
> -            *      [Nr] Name              Type             Address           
> Offset
> -            *           Size              EntSize          Flags  Link  Info 
>  Align
> -            *      ...
> -            *      [ 8] __ksymtab         PROGBITS         0000000000000060  
> 0000ad90
> -            *           0000000000000010  0000000000000000   A       0     0 
>     16
> -            *      [ 9] .rela__ksymtab    RELA             0000000000000000  
> 0000ada0
> -            *           0000000000000030  0000000000000018          43     8 
>     8
> -            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  
> 0000add0
> -            *           00000000000001a0  0000000000000000   A       0     0 
>     16
> -            *      ...
> -            *
> -            *  but they should be treated as if they are NULL.
> -            */
> -            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
> -         break;
> -
> -      if (cur_sec == NULL)
> -       {
>           section_offsets &offsets = objfile->section_offsets;
>
>           /* Pick non-overlapping offsets for sections the user did not
> @@ -704,7 +682,6 @@ default_symfile_offsets (struct objfile
>                                         offsets[cur_sec->index]);
>               offsets[cur_sec->index] = 0;
>             }
> -       }
>      }
>
>    /* Remember the bfd indexes for the .text, .data, .bss and
>
>
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to