[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated
While we are at it, let's convert HTML character entity references (which each use 6-8 characters and as many bytes in the HTML file) to actual characters (which UTF-8 encodes as 2-3 bytes). Since all diffoscope output files are peppered with abundant amounts of these things, this could reduce the file sizes by a few percent at least. I used Python string literals instead of the actual characters in the Python file, because 1) the non-breaking and zero-width spaces would be very hard to distinguish from ordinary space and missing string content, respectively, and 2) it is impossible to be sure that every piece of software that is ever going to be used to view or edit the file would handle non-ASCII characters correctly. --- presenters/html.py.orig 2015-12-16 19:42:25.0 +0200 +++ presenters/html.py 2015-12-17 15:10:53.654467937 +0200 @@ -290,9 +290,9 @@ n = TABSIZE-(i%TABSIZE) if n == 0: n = TABSIZE -t.write(''+''*(n-1)) +t.write('\xbb'+'\xa0'*(n-1)) elif c == " " and ponct == 1: -t.write('') +t.write('\xb7') elif c == "\n" and ponct == 1: t.write('\') elif ord(c) < 32: @@ -304,11 +304,11 @@ i += 1 if WORDBREAK.count(c) == 1: -t.write('') +t.write('\u200b') i = 0 if i > LINESIZE: i = 0 -t.write("") +t.write('\u200b') return t.getvalue() @@ -353,7 +353,7 @@ print_func(u'') else: s1 = "" -print_func(u'') +print_func(u'\xa0') if s2 is not None: print_func(u'%d ' % line2) @@ -362,7 +362,7 @@ print_func(u'') else: s2 = "" -print_func(u'') +print_func(u'\xa0') finally: print_func(u"\n", force=True) @@ -522,7 +522,7 @@ print_func(u"%s" % escape(difference.source2)) anchor = '/'.join(sources[1:]) -print_func(u" " % (anchor, anchor)) +print_func(u" \xb6" % (anchor, anchor)) print_func(u"") if difference.comments: print_func(u"%s" ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated
On Wed, Dec 16, 2015 at 11:19:02AM +0100, Jérémy Bobbio wrote: > Close: tag -1 + pending > > Mike Hommey: > > Looking at the HTML in the HTML output, one can see that it is needlessly > > large. > > > > Specifically, there appears to be a lot of e.g. > > following each other, without even a separation between them. This conflates > > the amount of memory necessary for browsers to render those pages. > > I've commited a fix for this specific issue. The HTML presenter borrowed > a lot of code from diff2html which was probably not much optimized in > the first place. I guess the output could be vastly improved, but I'd > rather focus on other part of the code for now. Patches highly welcome > in the meantime. Here's another easy win, attached. Mike diff --git a/diffoscope/presenters/html.py b/diffoscope/presenters/html.py index d843f39..f425889 100644 --- a/diffoscope/presenters/html.py +++ b/diffoscope/presenters/html.py @@ -116,8 +116,9 @@ HEADER = """ tr.diffchanged td { background: #A0 } -span.diffchanged2 { - background: #E0C880 +ins, del { + background: #E0C880; + text-decoration: none } span.diffponct { color: #B08080 @@ -274,15 +275,15 @@ def linediff(s, t): return ''.join(l1).replace(DIFFOFF + DIFFON, ''), ''.join(l2).replace(DIFFOFF + DIFFON, '') -def convert(s, ponct=0): +def convert(s, ponct=0, tag=''): i = 0 t = StringIO() for c in s: # used by diffs if c == DIFFON: -t.write('') +t.write('<%s>' % tag) elif c == DIFFOFF: -t.write('') +t.write('' % tag) # special highlighted chars elif c == "\t" and ponct == 1: @@ -348,7 +349,7 @@ def output_line(print_func, s1, s2): if s1 is not None: print_func(u'%d ' % line1) print_func(u'') -print_func(convert(s1, ponct=1)) +print_func(convert(s1, ponct=1, tag='del')) print_func(u'') else: s1 = "" @@ -357,7 +358,7 @@ def output_line(print_func, s1, s2): if s2 is not None: print_func(u'%d ' % line2) print_func(u'') -print_func(convert(s2, ponct=1)) +print_func(convert(s2, ponct=1, tag='ins')) print_func(u'') else: s2 = "" ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated
Close: tag -1 + pending Mike Hommey: > Looking at the HTML in the HTML output, one can see that it is needlessly > large. > > Specifically, there appears to be a lot of e.g. > following each other, without even a separation between them. This conflates > the amount of memory necessary for browsers to render those pages. I've commited a fix for this specific issue. The HTML presenter borrowed a lot of code from diff2html which was probably not much optimized in the first place. I guess the output could be vastly improved, but I'd rather focus on other part of the code for now. Patches highly welcome in the meantime. -- Lunar.''`. lu...@debian.org: :Ⓐ : # apt-get install anarchism `. `'` `- signature.asc Description: Digital signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds