[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated

2015-12-17 Thread Esa Peuha
While we are at it, let's convert HTML character entity references
(which each use 6-8 characters and as many bytes in the HTML file)
to actual characters (which UTF-8 encodes as 2-3 bytes). Since all
diffoscope output files are peppered with abundant amounts of these
things, this could reduce the file sizes by a few percent at least.
I used Python string literals instead of the actual characters in
the Python file, because 1) the non-breaking and zero-width spaces
would be very hard to distinguish from ordinary space and missing
string content, respectively, and 2) it is impossible to be sure
that every piece of software that is ever going to be used to view
or edit the file would handle non-ASCII characters correctly.
--- presenters/html.py.orig 2015-12-16 19:42:25.0 +0200
+++ presenters/html.py  2015-12-17 15:10:53.654467937 +0200
@@ -290,9 +290,9 @@
 n = TABSIZE-(i%TABSIZE)
 if n == 0:
 n = TABSIZE
-t.write(''+''*(n-1))
+t.write('\xbb'+'\xa0'*(n-1))
 elif c == " " and ponct == 1:
-t.write('')
+t.write('\xb7')
 elif c == "\n" and ponct == 1:
 t.write('\')
 elif ord(c) < 32:
@@ -304,11 +304,11 @@
 i += 1
 
 if WORDBREAK.count(c) == 1:
-t.write('')
+t.write('\u200b')
 i = 0
 if i > LINESIZE:
 i = 0
-t.write("")
+t.write('\u200b')
 
 return t.getvalue()
 
@@ -353,7 +353,7 @@
 print_func(u'')
 else:
 s1 = ""
-print_func(u'')
+print_func(u'\xa0')
 
 if s2 is not None:
 print_func(u'%d ' % line2)
@@ -362,7 +362,7 @@
 print_func(u'')
 else:
 s2 = ""
-print_func(u'')
+print_func(u'\xa0')
 finally:
 print_func(u"\n", force=True)
 
@@ -522,7 +522,7 @@
 print_func(u"%s"
% escape(difference.source2))
 anchor = '/'.join(sources[1:])
-print_func(u" " % 
(anchor, anchor))
+print_func(u" \xb6" % 
(anchor, anchor))
 print_func(u"")
 if difference.comments:
 print_func(u"%s"
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated

2015-12-16 Thread Mike Hommey
On Wed, Dec 16, 2015 at 11:19:02AM +0100, Jérémy Bobbio wrote:
> Close: tag -1 + pending
> 
> Mike Hommey:
> > Looking at the HTML in the HTML output, one can see that it is needlessly 
> > large.
> > 
> > Specifically, there appears to be a lot of e.g. 
> > following each other, without even a separation between them. This conflates
> > the amount of memory necessary for browsers to render those pages.
> 
> I've commited a fix for this specific issue. The HTML presenter borrowed
> a lot of code from diff2html which was probably not much optimized in
> the first place. I guess the output could be vastly improved, but I'd
> rather focus on other part of the code for now. Patches highly welcome
> in the meantime.

Here's another easy win, attached.

Mike
diff --git a/diffoscope/presenters/html.py b/diffoscope/presenters/html.py
index d843f39..f425889 100644
--- a/diffoscope/presenters/html.py
+++ b/diffoscope/presenters/html.py
@@ -116,8 +116,9 @@ HEADER = """
 tr.diffchanged td {
   background: #A0
 }
-span.diffchanged2 {
-  background: #E0C880
+ins, del {
+  background: #E0C880;
+  text-decoration: none
 }
 span.diffponct {
   color: #B08080
@@ -274,15 +275,15 @@ def linediff(s, t):
 return ''.join(l1).replace(DIFFOFF + DIFFON, ''), 
''.join(l2).replace(DIFFOFF + DIFFON, '')
 
 
-def convert(s, ponct=0):
+def convert(s, ponct=0, tag=''):
 i = 0
 t = StringIO()
 for c in s:
 # used by diffs
 if c == DIFFON:
-t.write('')
+t.write('<%s>' % tag)
 elif c == DIFFOFF:
-t.write('')
+t.write('' % tag)
 
 # special highlighted chars
 elif c == "\t" and ponct == 1:
@@ -348,7 +349,7 @@ def output_line(print_func, s1, s2):
 if s1 is not None:
 print_func(u'%d ' % line1)
 print_func(u'')
-print_func(convert(s1, ponct=1))
+print_func(convert(s1, ponct=1, tag='del'))
 print_func(u'')
 else:
 s1 = ""
@@ -357,7 +358,7 @@ def output_line(print_func, s1, s2):
 if s2 is not None:
 print_func(u'%d ' % line2)
 print_func(u'')
-print_func(convert(s2, ponct=1))
+print_func(convert(s2, ponct=1, tag='ins'))
 print_func(u'')
 else:
 s2 = ""
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

[Reproducible-builds] Bug#808121: Bug#808121: diffoscope: HTML output is bloated

2015-12-16 Thread Jérémy Bobbio
Close: tag -1 + pending

Mike Hommey:
> Looking at the HTML in the HTML output, one can see that it is needlessly 
> large.
> 
> Specifically, there appears to be a lot of e.g. 
> following each other, without even a separation between them. This conflates
> the amount of memory necessary for browsers to render those pages.

I've commited a fix for this specific issue. The HTML presenter borrowed
a lot of code from diff2html which was probably not much optimized in
the first place. I guess the output could be vastly improved, but I'd
rather focus on other part of the code for now. Patches highly welcome
in the meantime.

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds