Hi Andrey, (Thanks for upstream-tracker BTW. It has been really useful.)
On Fri, Jul 12, 2013 at 06:49:59PM +0400, Andrey Ponomarenko wrote: > I've found an O(N^2) loop in the libdw library (N - number of > symbols in the library). The dwfl_module_getsym function is called N > times, but it calls resolve_symbol function each time and it loops > through the number of symbols again. > > So `readelf --debug-dump=info shared_object.so` command is very slow > on large input objects. As an example, it takes several minutes to > analyse libOpenImageIO.so.1.1.11 (27mb), but readelf from binutils > takes just several seconds to do it. If the goal is just to have a faster eu-readelf -w than binutils readelf -w then just run it with -N and it will not do any address to symbol lookups and it should be faster. But you are right that -N/--numeric-addresses isn't the default. And by default eu-readelf will do address symbol lookups (because that makes the output much richer than plain binutils readelf gives) and those are indeed pretty slow atm. > The gelf_getsymshndx, gelf_getshdr and elf_getscn functions from > libelf are called 2 billion times and it takes about 60% of total > execution time. Another 26% of execution time are taken by the > search_table function that is called by the dwfl_module_addrsym and > loops through the number of symbols too. The dwfl_module_addrsym is > called for each symbol. So it also takes O(N^2). > > How can dwfl_module_getsym and dwfl_module_addrsym functions may be > optimised to not to loop through all symbols in the library in order > to find information for just one of them? This can make the library > to be 10 times faster. Those functions work with the "raw" .dynsyn or .symtab (in the ELF, separate .debug file or in the .gnu_debugdata). Those tables aren't sorted by address (just by whether they are local or global, we prefer globals so we search those first in the hope not to have to do a full N search) so we have to search them linearly. They are really designed for doing a quick lookup for a single address. If you really need all the addresses/symbols then iterating through them all with dwfl_module_getsymtab() and dwfl_module_getsym (), then you can create a search table for them all. So readelf.c could be considered to use dwfl_module_addrsym "wrongly" since in general it will call it for lots of different addresses. We should probably build a search table. And maybe even do that not just in readelf.c but add it to libdwfl for use with dwfl_module_addrsym directly? Cheers, Mark _______________________________________________ elfutils-devel mailing list [email protected] https://lists.fedorahosted.org/mailman/listinfo/elfutils-devel
