I maintain a database which extracts symbol information from ELF objects (among other things). I would like to enrich that with DWARF producer data, and perhaps additional DWARF information in the future.

I'd really like to avoid importing the ELF symbol information twice, once from the real object file, and once from the separate debuginfo.

The database performs content-based deduplication, this means I do not have path name information during extraction. This mean I cannot use file system paths to disambiguate the real thing and its debugging information. Both files are loaded separately and not necessarily at the same time. I don't want to change that if possible because this would result in a scalability issue eventually. I don't want to assume that *all* debuginfo data has been separated, either.

Based on the previous discussion around program interpreter reporting in readelf, there is no easy way to detect separate debuginfo to trigger special processing for it (e.g., do not extract symbols, only DW_at_producer data).

One thing that would help me as well if there is a way to get the exact same set of exported symbols from the real file and its separate debuginfo. The I could deduplicate based on that, and processing both files would not matter anymore. eu-readelf shows quite different output for the two files, so I'm not sure how to achieve that.

I don't actually use eu-readelf output (but my extraction code is derived from it), and I'm open to suggestions to look at particular sections/headers to get matching output. I'm mainly interested in public symbols and undefined symbols. Internal symbols from debugging information could be ignored for the time being.

--
Florian Weimer / Red Hat Product Security Team

Reply via email to