[Reproducible-builds] Привет!
___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Bug#808207: diffoscope: Filter objdump --disassemble output before diffing it
Source: diffoscope Version: 43 Severity: wishlist When comparing large ELF binaries, some minor differences can end up hurting the visibility of more important differences. Specifically, objdump --disassemble displays symbols+offsets for addresses it derives from IP-relative addressing, like the following: 9d2be2: 48 8d 05 42 65 24 02lea0x2246542(%rip),%rax# 2c1912b <_fini@@xul45a1+0x1d803> In the particular case I'm looking at, though, some function ends up pushing the rest of the .text section, so that the _fini symbol (and many others, actually) move. So I end up with a *lot* of differences like: < 9d2be2: 48 8d 05 42 65 24 02lea0x2246542(%rip),%rax# 2c1912b <_fini@@xul45a1+0x1d803> --- > 9d2be2: 48 8d 05 42 65 24 02lea0x2246542(%rip),%rax# > 2c1912b <_fini@@xul45a1+0x1d7e3> (note: this is a diff I got manually, because it's easier to visualize than a copy/paste of the HTML output I got from diffoscope) The code is the same, the address is the same, but the pseudo-symbol doesn't match and it actually doesn't matter because that actually points to some place in .rodata, and the .rodata hasn't moved, only _fini and some earlier symbols have. In another case, the symbol between angle brackets is an actual symbol (on non-stripped binaries) but the symbol name is different because GCC decided to use a different suffix[1]. For example: < 9d2f35: 48 8d 05 d1 5b 33 02lea0x2335bd1(%rip),%rax# 2d08b0d <__FUNCTION__.10544+0x29d> --- > 9d2f35: 48 8d 05 d1 5b 33 02lea0x2335bd1(%rip),%rax# > 2d08b0d <__FUNCTION__.10547+0x29d> The difference might seem interesting to note, but in fact it's not, because it will already appear in the `readelf --all` diff: < 17956: 02d0887021 OBJECT LOCAL DEFAULT 16 __FUNCTION__.10544 --- > 17956: 02d0887021 OBJECT LOCAL DEFAULT 16 __FUNCTION__.10547 Anyways, those symbols between angle brackets are just adding noise that would be better left out. I'm not sure, though, that there is an option to objdump that allows to make it not display those symbols (and a quick glance at the binutils source suggests there isn't). I can only suggest sending the output of objdump through sed :-/ Something like (awful): @tool_required('objdump') @tool_required('sed') def cmdline(self): return ['sh', '-c', 'objdump --disassemble --full-contents "%s" | sed "s/<.*>//"' % self.path] Mike 1. Example of how this can happen: $ cat > test.c
[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated
While we are at it, let's convert HTML character entity references (which each use 6-8 characters and as many bytes in the HTML file) to actual characters (which UTF-8 encodes as 2-3 bytes). Since all diffoscope output files are peppered with abundant amounts of these things, this could reduce the file sizes by a few percent at least. I used Python string literals instead of the actual characters in the Python file, because 1) the non-breaking and zero-width spaces would be very hard to distinguish from ordinary space and missing string content, respectively, and 2) it is impossible to be sure that every piece of software that is ever going to be used to view or edit the file would handle non-ASCII characters correctly. --- presenters/html.py.orig 2015-12-16 19:42:25.0 +0200 +++ presenters/html.py 2015-12-17 15:10:53.654467937 +0200 @@ -290,9 +290,9 @@ n = TABSIZE-(i%TABSIZE) if n == 0: n = TABSIZE -t.write(''+''*(n-1)) +t.write('\xbb'+'\xa0'*(n-1)) elif c == " " and ponct == 1: -t.write('') +t.write('\xb7') elif c == "\n" and ponct == 1: t.write('\') elif ord(c) < 32: @@ -304,11 +304,11 @@ i += 1 if WORDBREAK.count(c) == 1: -t.write('') +t.write('\u200b') i = 0 if i > LINESIZE: i = 0 -t.write("") +t.write('\u200b') return t.getvalue() @@ -353,7 +353,7 @@ print_func(u'') else: s1 = "" -print_func(u'') +print_func(u'\xa0') if s2 is not None: print_func(u'%d ' % line2) @@ -362,7 +362,7 @@ print_func(u'') else: s2 = "" -print_func(u'') +print_func(u'\xa0') finally: print_func(u"\n", force=True) @@ -522,7 +522,7 @@ print_func(u"%s" % escape(difference.source2)) anchor = '/'.join(sources[1:]) -print_func(u" " % (anchor, anchor)) +print_func(u" \xb6" % (anchor, anchor)) print_func(u"") if difference.comments: print_func(u"%s" ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Bug#808121: Bug#808121: Bug#808121: diffoscope: HTML output is bloated
Esa Peuha: > While we are at it, let's convert HTML character entity references > (which each use 6-8 characters and as many bytes in the HTML file) > to actual characters (which UTF-8 encodes as 2-3 bytes). Since all > diffoscope output files are peppered with abundant amounts of these > things, this could reduce the file sizes by a few percent at least. > I used Python string literals instead of the actual characters in > the Python file, because 1) the non-breaking and zero-width spaces > would be very hard to distinguish from ordinary space and missing > string content, respectively, and 2) it is impossible to be sure > that every piece of software that is ever going to be used to view > or edit the file would handle non-ASCII characters correctly. Thanks for the patch. It's been commited and push. I would be grateful if you could submit ready-to-merge Git changes next time (see git-format-patch(1)). -- Lunar.''`. lu...@debian.org: :Ⓐ : # apt-get install anarchism `. `'` `- signature.asc Description: Digital signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Bug#808267: diffoscope: Redundant information in ELF comparisons
Source: diffoscope Version: 43 Severity: normal When comparing ELF files, the following commands are used: - readelf --all - readelf --debug-dump - objdump --disassemble --full-contents objdump --disassemble --full-contents is actually redundant in itself. For example, it will dump both an hexdump and a disassembly of the .text section. It's also redundant with the output of readelf --debug-dump because it does an hexdump of the .debug_* sections that readelf --debug-dump does a dwarf dump of. -- System Information: Debian Release: stretch/sid APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.2.0-1-amd64 (SMP w/4 CPU cores) Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Bug#808267: diffoscope: Redundant information in ELF comparisons
On Fri, Dec 18, 2015 at 10:10:54AM +0900, Mike Hommey wrote: > Source: diffoscope > Version: 43 > Severity: normal > > When comparing ELF files, the following commands are used: > - readelf --all > - readelf --debug-dump > - objdump --disassemble --full-contents > > objdump --disassemble --full-contents is actually redundant in itself. For > example, it will dump both an hexdump and a disassembly of the .text section. > It's also redundant with the output of readelf --debug-dump because it does an > hexdump of the .debug_* sections that readelf --debug-dump does a dwarf dump > of. objdump --disassemble --full-contents also outputs a dump of e.g. .note.gnu.build-id, which is printed out in nicer form in readelf --all. Mike ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds