Re: [patch 2/2 v2] Fix loading core files without build-ids

Mark Wielaard Mon, 27 May 2013 07:06:05 -0700

On Thu, 2013-05-23 at 21:15 +0200, Jan Kratochvil wrote:
> commit 457b771b842a0f6d23118e4b39d93f6581fb9b41
> Author: Jan Kratochvil <[email protected]>
> Date:   Thu May 23 20:19:16 2013 +0200
> 
>     Use DT_DEBUG library search first.
>     
>     libdwfl/
>     2013-05-23  Jan Kratochvil  <[email protected]>
>     
>       * argp-std.c (parse_opt) <ARGP_KEY_SUCCESS> <opt->core> <opt->e>: Set
>       executable_for_core before calling dwfl_core_file_report.


OK.

>       * core-file.c (free_r_debug_info): New function.

Since it doesn't actually free the struct itself, lets call it
clear_r_debug_info to avoid confusion.

>       (dwfl_core_file_report): Move raw segments reporting
>       lower.  New variable r_debug_info.
>       (dwfl_core_file_report) (report_module): New function.
>       (dwfl_core_file_report):  Call also report_module for EXEC_VADDR and
>       VDSO_L_LD.  Call free_r_debug_info in the end.  Return sum of LISTED
>       and SNIFFED.

OK, but see comments below.

>       * dwfl_module_build_id.c (check_notes): Move into
>       __libdwfl_find_elf_build_id.
>       (__libdwfl_find_build_id): Rename to ...
>       (__libdwfl_find_elf_build_id): ... here.  Add parameters build_id_bits,
>       build_id_elfaddr and build_id_len.  Verify MOD vs. ELF.
>       (__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
>       set, rename data_vaddr to data_elfaddr.  Do not call found_build_id.
>       (__libdwfl_find_elf_build_id): Update the check_notes caller, do not
>       adjust its data_elfaddr parameter.
>       (__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.

OK.

>       * dwfl_segment_report_module.c (dwfl_segment_report_module): New
>       parameter r_debug_info.  New variable name_is_final.  Adjust addresses
>       according to R_DEBUG_INFO->MODULE.  Check conflicts against DWFL.
>       Do not overwise NAME by SONAME if NAME_IS_FINAL.

s/overwise/overwrite/
Also see some questions below.

>       * libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
>       (struct r_debug_info_module, struct r_debug_info): New definitions.
>       (dwfl_segment_report_module, dwfl_link_map_report): Add parameter
>       r_debug_info.

OK.

>       * link_map.c: Include fcntl.h.
>       (report_r_debug): Add parameter r_debug_info, describe them in the
>       function comment.  Delete dwfl_addrmodule call and its dependent code.
>       Verify build-id before calling dwfl_report_elf, also supply
>       executable_for_core to it and skip vDSO.  Report vdso_bias and
>       vdso_l_ld when found.  Clear exec_vaddr when found.  Store
>       r_debug_info->module info when appropriate.
>       (dwfl_link_map_report): Add parameter r_debug_info.
>       (dwfl_link_map_report) (consider_phdr): Add parameter offset.  Set
>       r_debug_info.exec_vaddr when found.
>       (dwfl_link_map_report): New variable in_ok.  Try to read IN from
>       EXECUTABLE_FOR_CORE.  Update consider_phdr caller parameters.  Update
>       report_r_debug caller parameters.

Was a bit hard to follow since I didn't know much about the link-map,
and there is no real documentation it seems except the glibc sources.
But in general looks OK. See some questions below.

>     tests/
>     2013-05-23  Jan Kratochvil  <[email protected]>
>     
>       * Makefile.am (EXTRA_DIST): Add test-core-lib.so.bz2,
>       test-core.core.bz2 and test-core.exec.bz2.
>       * run-addrname-test.sh: New test for these files.
>       * run-unstrip-n.sh: Update expected output.  New test for these files.
>       * test-core-lib.so.bz2: New file.
>       * test-core.core.bz2: New file.
>       * test-core.exec.bz2: New file.

The changes in test results now make sense to me. Reporting the full
file names was arguably wrong to begin with. I don't think we should
restore that, but if people do feel this is a regression we could add a
guesser to unstrip.c itself (but I don't think we should).

>  int
>  dwfl_core_file_report (Dwfl *dwfl, Elf *elf)
>  {
> @@ -431,33 +444,85 @@ dwfl_core_file_report (Dwfl *dwfl, Elf *elf)
>    /* Now we have NT_AUXV contents.  From here on this processing could be
>       used for a live process with auxv read from /proc.  */
>  
> +  struct r_debug_info r_debug_info;
> +  memset (&r_debug_info, 0, sizeof r_debug_info);
>    int listed = dwfl_link_map_report (dwfl, auxv, auxv_size,
> -                                  dwfl_elf_phdr_memory_callback, elf);
> +                                  dwfl_elf_phdr_memory_callback, elf,
> +                                  &r_debug_info);
> +
> +  /* Call dwfl_segment_report_module, update *NDXP when appropriate.  Return
> +     true for a successfully reported module.  Otherwise return false and 
> also
> +     call free_r_debug_info.  */
> +
> +  inline bool report_module (int *ndxp, int *count, const char *name)

Document that count gets updated on success.

The clearing of r_debug_info on failure is a nice because it prevents
accidental memory leaks, but also confused me on first read of the patch
(because I forgot the comment when reading the later code).
 
>    /* We return the number of modules we found if we found any.
>       If we found none, we return -1 instead of 0 if there was an
> -     error rather than just nothing found.  If link_map handling
> -     failed, we still have the sniffed modules.  */
> -  return sniffed == 0 || listed > sniffed ? listed : sniffed;
> +     error rather than just nothing found.  */
> +  return sniffed || listed >= 0 ? listed + sniffed : listed;

What about the case of sniffed == 0 && listed == 0?
Should that return -1?

> diff --git a/libdwfl/dwfl_segment_report_module.c 
> b/libdwfl/dwfl_segment_report_module.c
> index 7cf7499..e2d0337 100644
> --- a/libdwfl/dwfl_segment_report_module.c
> +++ b/libdwfl/dwfl_segment_report_module.c
> @@ -83,7 +83,8 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const char 
> *name,
>                           Dwfl_Memory_Callback *memory_callback,
>                           void *memory_callback_arg,
>                           Dwfl_Module_Callback *read_eagerly,
> -                         void *read_eagerly_arg)
> +                         void *read_eagerly_arg,
> +                         const struct r_debug_info *r_debug_info)
>  {
>    size_t segment = ndx;
>  
> @@ -433,6 +434,43 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const 
> char *name,
>  
>    dyn_vaddr += bias;
>  
> +  /* NAME found from link map has precedence over DT_SONAME possibly read
> +     below.  */
> +  bool name_is_final = false;
> +
> +  /* Try to match up DYN_VADDR against L_LD as found in link map.
> +     Segments sniffing may guess invalid address as the first read-only 
> memory
> +     mapping may not be dumped to the core file (if ELF headers are not 
> dumped)
> +     and the ELF header is dumped first with the read/write mapping of the 
> same
> +     file at higher addresses.  */
> +  if (r_debug_info != NULL)
> +    for (const struct r_debug_info_module *module = r_debug_info->module;
> +      module != NULL; module = module->next)
> +      if (module_start <= module->l_ld && module->l_ld < module_end)
> +     {
> +       GElf_Addr fixup = module->l_ld - dyn_vaddr;
> +       if (module_start + fixup <= module->l_ld
> +           && module->l_ld < module_end + fixup)
> +         {
> +           module_start += fixup;
> +           module_end += fixup;
> +           dyn_vaddr += fixup;
> +           bias += fixup;
> +           if (module->name[0] != '\0')
> +             {
> +               name = module->name;
> +               name_is_final = true;
> +             }
> +           break;
> +         }
> +     }

fixup might need a comment. It wasn't immediately clear to me. The fixup
is the difference between the PT_LOAD address that the link-map know is
the correct one and the PT_DYNAMIC address (of which there can be only
one). And module_start points to the PT_LOAD address that was just
gotten from the phdrs. Right? Actually now that I do understand, that is
kind of what the comment above already says. I just didn't immediately
made the connection with the offset/fixup being static. Is fixup always
positive?

BTW. Why does link-map set module->name to the empty string when no name
(NULL) was found? Just convenience like here, so no NULL check is
needed, or is the empty name significant?

> diff --git a/libdwfl/link_map.c b/libdwfl/link_map.c
> @@ -352,9 +355,108 @@ report_r_debug (uint_fast8_t elfclass, uint_fast8_t 
> elfdata,
>        /* We have to find the file's phdrs to compute along with l_addr
>        what its runtime address boundaries are.  */
>  
> -      // XXX hook for sysroot
> -      mod = INTUSE(dwfl_report_elf) (dwfl, basename (name),
> -                                  name, -1, l_addr, true);
> +      Dwfl_Module *mod = NULL;
> +      if (iterations == 1 && dwfl->executable_for_core != NULL)
> +     {
> +       /* Find the main executable.
> +          FIXME: Try to also find it via build-id.  */
> +       name = dwfl->executable_for_core;
> +     }

To solve that FIXME we should do somewhat the same as is done below with
the "inlined dwfl_report_elf" snippet? And we already do try to open the
executable if we cannot read the phdrs. Maybe this can be factored out a
bit? On the other hand, we are going way beyond call of duty here just
to read a somewhat incomplete/corrupt core file. So it might be lots of
work for not much gain.

> +      if (iterations != 2 && name != NULL)
> +     {
> +       /* Find a shared library or main executable.  Do not try to
> +          find a file for vDSO (where ITERATIONS equals 2).  */

I couldn't find where the iteration == 1 => exe, iteration == 2 => vDSO
comes from. Shouldn't the second check be iterations > 2 ?

> +       /* This code is mostly inlined dwfl_report_elf.  */
> +       // XXX hook for sysroot
> +       int fd = open64 (name, O_RDONLY);
> +       if (fd >= 0)
> +         {
> +           Elf *elf;
> +           Dwfl_Error error = __libdw_open_file (&fd, &elf, true, false);
> +           if (error == DWFL_E_NOERROR)
> +             {
> +               const void *build_id_bits;
> +               GElf_Addr build_id_elfaddr;
> +               int build_id_len;
> +               bool valid = true;
> +
> +               /* FIXME: Bias L_ADDR should be computed from the prelink
> +                  state in memory (when the file got loaded), not against
> +                  the current on-disk file state as is computed below.
> +
> +                  This verification gives false positive if in-core ELF had
> +                  build-id but on-disk ELF does not have any.  But we cannot
> +                  reliably find ELF header and/or the ELF build id just from
> +                  the link map (and checking core segments is also not
> +                  reliable).  */
> +
> +               if (__libdwfl_find_elf_build_id (NULL, elf, &build_id_bits,
> +                                                &build_id_elfaddr,
> +                                                &build_id_len) > 0
> +                   && build_id_elfaddr != 0)
> +                 {
> +                   GElf_Addr build_id_vaddr = build_id_elfaddr + l_addr;
> +                   release_buffer (0);
> +                   int segndx = INTUSE(dwfl_addrsegment) (dwfl,
> +                                                          build_id_vaddr,
> +                                                          NULL);
> +                   if (! (*memory_callback) (dwfl, segndx,
> +                                             &buffer, &buffer_available,
> +                                             build_id_vaddr, build_id_len,
> +                                             memory_callback_arg)
> +                       || memcmp (build_id_bits, buffer, build_id_len) != 0)
> +                     {
> +                       /* File has valid build-id which cannot be verified
> +                          in memory.  */
> +                       valid = false;
> +                     }
> +                 }
> +
> +               if (valid)
> +                 // XXX hook for sysroot
> +                 mod = __libdwfl_report_elf (dwfl, basename (name), name,
> +                                             fd, elf, l_addr, true, true);
> +               if (mod == NULL)
> +                 {
> +                   elf_end (elf);
> +                   close (fd);
> +                 }
> +             }
> +         }
> +     }
> +      if (mod != NULL && iterations == 1)
> +     {
> +       /* Cancel reporting of raw segments for the main executable as
> +          its file has been found now.  */
> +       if (r_debug_info != NULL)
> +         r_debug_info->exec_vaddr = 0;
> +     }
> +      if (iterations == 2)
> +     {
> +       /* vDSO is only reported to the caller, it is never searched
> +          for on the disk.  */
> +       assert (mod == NULL);
> +       if (r_debug_info != NULL)
> +         {
> +           r_debug_info->vdso_bias = l_addr;
> +           r_debug_info->vdso_l_ld = l_ld;
> +         }
> +     }
> +      if (r_debug_info != NULL && mod == NULL)
> +     {
> +       /* Save link map information about valid shared library (or
> +          executable) which has not been found on disk.  */
> +       const char *base = name == NULL ? "" : basename (name);
> +       struct r_debug_info_module *module;
> +       module = malloc (sizeof (*module) + strlen (base) + 1);
> +       if (module == NULL)
> +         return release_buffer (result);
> +       module->l_ld = l_ld;
> +       strcpy (module->name, base);
> +       module->next = r_debug_info->module;
> +       r_debug_info->module = module;
> +     }

Should this last if condition have iterations > 2 && ... ?
Or do you want the executable and vdso also in the r_debug_info?

> @@ -668,8 +773,65 @@ dwfl_link_map_report (Dwfl *dwfl, const void *auxv, 
> size_t auxv_size,
>             .d_size = phnum * phent,
>             .d_buf = NULL
>           };
> -       if ((*memory_callback) (dwfl, phdr_segndx, &in.d_buf, &in.d_size,
> -                               phdr, phnum * phent, memory_callback_arg))
> +       bool in_ok = (*memory_callback) (dwfl, phdr_segndx, &in.d_buf,
> +                                        &in.d_size, phdr, phnum * phent,
> +                                        memory_callback_arg);
> +       if (! in_ok && dwfl->executable_for_core != NULL)
> +         {
> +           /* AUXV -> PHDR -> DYNAMIC
> +              Both AUXV and DYNAMIC should be always present in a core file.
> +              PHDR may be missing in core file, try to read it from
> +              EXECUTABLE_FOR_CORE to find where DYNAMIC is located in the
> +              core file.  */
> +
> +           int fd = open (dwfl->executable_for_core, O_RDONLY);
> +           Elf *elf;
> +           Dwfl_Error error = DWFL_E_ERRNO;
> +           if (fd != -1)
> +             error = __libdw_open_file (&fd, &elf, true, false);
> +           if (error != DWFL_E_NOERROR)
> +             {
> +               __libdwfl_seterrno (error);
> +               return false;
> +             }
> +           GElf_Ehdr ehdr_mem, *ehdr = gelf_getehdr (elf, &ehdr_mem);
> +           if (ehdr == NULL)
> +             {
> +               elf_end (elf);
> +               close (fd);
> +               __libdwfl_seterrno (DWFL_E_LIBELF);
> +               return false;
> +             }
> +           if (ehdr->e_phnum != phnum || ehdr->e_phentsize != phent)
> +             {
> +               elf_end (elf);
> +               close (fd);
> +               __libdwfl_seterrno (DWFL_E_BADELF);
> +               return false;
> +             }
> +           off_t off = ehdr->e_phoff;
> +           assert (in.d_buf == NULL);
> +           assert (in.d_size == phnum * phent);
> +           in.d_buf = malloc (in.d_size);
> +           if (unlikely (in.d_buf == NULL))
> +             {
> +               elf_end (elf);
> +               close (fd);
> +               __libdwfl_seterrno (DWFL_E_NOMEM);
> +               return false;
> +             }
> +           ssize_t nread = pread_retry (fd, in.d_buf, in.d_size, off);
> +           elf_end (elf);
> +           close (fd);
> +           if (nread != (ssize_t) in.d_size)
> +             {
> +               free (in.d_buf);
> +               __libdwfl_seterrno (DWFL_E_ERRNO);
> +               return false;
> +             }
> +           in_ok = true;
> +         }

This looks right. But what a lot of extra work just so some parts can be
left out of a core file. And we better hope the given executable really
matches the core file.

Even though I had some questions above I do think in general this looks
OK.

Thanks,

Mark

_______________________________________________
elfutils-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/elfutils-devel

Re: [patch 2/2 v2] Fix loading core files without build-ids

Reply via email to