Re: More LiveCD space optimizations

2010-10-07 Thread John McCabe-Dansted
On Thu, Oct 7, 2010 at 10:05 AM, Louis Simard louis.sim...@gmail.com wrote:
 Hey :)

 Thanks for the interest in this optimisation! Unfortunately I wasn't
 pushy enough in my thread from May-June and it wasn't included in the
 Maverick LiveCD. A pending question is what to do to include the
 recompressed files into the archive's packages [1].

I think this will be discussed at UDS-N, see:
http://archives.free.net.ph/message/20101004.065026.e553efd1.en.html

 2010-10-06 16:08 GMT John McCabe-Dansted gma...@gmail.com:
 In May, Louis Simard proposed rencoding PNG files and SVG files to
 reduce their size [Quoted 1]. I note that we can save further space by:

 1) Using advdef on the png files in addition to optipng. This is what
 optimizegraphics does, and this shrinks the pngs on the Maverick RC
 liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.

 So it does; I didn't know about that. Reading the man file for advpng,
 it gave a warning that it was only supported for AdvanceMAME-generated
 PNG files, so I was skeptical, but it does shave off about 4% more
 filesize on average with 'advpng -z4'.

We could test each file to ensure the image is identical, perhaps
using pngtopnm, and md5sum. This would be especially important for
jpegrescan/jpgcrush, which is at version 0.0.0-1.

 2) Recompressing gz files with advdef. Using advdef, we can shrink the
 gz files from 89.5MB to 84.8MB, and provides a saving of 4.7MB.

 That's an interesting optimisation; I didn't really know about it
 either. However, I did use 7zip's Deflate compressor to recompress a
 .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. The method was
 rather crude, but it did the job:

 mkdir extracted
 cd extracted
 unzip ../file.zip
 7z a -tzip -mx=9 -mfb=258 file.repack.zip extracted/*
 rm -r extracted

You mean images_human.zip? I have a hunch that compressing that file
wouldn't actually save space on the liveCD as I can gzip it down to
3.9MB. It may be better to leave it as an uncompressed zip, and let
squashfs deal with it. Recompressing the pngs contained in the zip
sounds worthwhile though. Strangely, even running advzip -z -0
images_human.zip shrinks it by 3%, and even shrinks the corresponding
images_human.zip.gz file

Also, there are 12MB of jar files, which are basically zip files. We
can also shrink those by 5MB or so with advzip, but that doesn't seem
to shrink a .tgz of them so it may not shrink the liveCD. Since zip
files compress file by file, we may be able to save space on the
liveCD by running advzip -z -0 on them. That would expand them to
24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
space on the liveCD if squashfs is similarly efficient.

 3) Recompressing jpeg files with jpegrescan. This only saves 0.5MB,
 but implementing this would add just a couple more lines of code, and
 jpegrescan does not lose any picture quality [Quoted 2].

 jpegoptim indeed performs lossless optimisation of JPEG files by
 editing Huffman tables, and it's used as the basis of jpegrescan.
 However, jpegoptim doesn't make non-progressive files progressive, as
 I understand jpegrescan does. This may make jpegoptim's optimisations
 more transparent to applications that, for some reason, can't decode
 progressive JPEGs and thus have non-progressive JPEGs in their
 packages. However, most applications should be using libjpeg anyway,
 so perhaps this point is moot.


 Together these should shrink the liveCD by over 20MB. This is without
 even considering the .xml and .svg optimizations Louis proposed.

 A further 10MB could be saved by recompressing the gz files as lzma.

 At what LZMA compression level? Default (7) or --best (9)?

--best

Also, if we want to take replacing deflate with lzma to extremes, we
could replace the deflate compression in the png files with lzma. A
command that does this is advpng -z -0 $f  lzma --best $f. I found
that this could save 18.7MB. However,  It may also degrade performance
slightly, but I doubt it would be too significant on modern CPUs.
Running unlzma on all 66MB of the .png.lzma files takes:
real1m2.666s
user0m6.540s
sys 0m5.610s

I think the user/sys are the relevant ones, and taking 12s to read
every png doesn't seem too bad. The main thing is that I doubt that it
would work out of the box.

If we use lzma in the squashfs, just deflating them all with advpng -z
-0 could reduce the liveCD size. Probably wouldn't help the installed
size though.

 This seems reasonable as lzma has reasonable decompression times (e.g.
 7ms to decompress a largish manpage like lsof).

 7 ms? What's your CPU? :)

Core2Duo E7200  @ 2.53GHz

 Since the liveCD is
 compressed anyway, it seems that if a file is compressed with gzip. it
 is worth compressing with lzma.  The command man already seems to
 have lzma support, but we'd want to test each application to ensure
 that it functions correctly when its .gz files are replaced with lzma
 files. We could also selectively recompress the gz 

Re: More LiveCD space optimizations

2010-10-07 Thread Martin Owens
On Fri, 2010-10-08 at 00:07 +0800, John McCabe-Dansted wrote:
 Strangely, even running advzip -z -0
 images_human.zip shrinks it by 3%, and even shrinks the corresponding
 images_human.zip.gz file 

That's not strange, that's just entropic packing principles. You've got
a bunch of assumptions that can be made about data and a bunch of
compression iterations, each make assumptions about the nature of the
data and some are fitting together better.

I'm keen on this work since saving space allows for all sorts of
goodies. Did we save space with any of the SVG cleaning or did that need
to be brought up to the packaging level?

Martin,


-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Matthias Klose
On 07.10.2010 18:07, John McCabe-Dansted wrote:

 That's an interesting optimisation; I didn't really know about it
 either. However, I did use 7zip's Deflate compressor to recompress a
 .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. The method was
 rather crude, but it did the job:

 mkdir extracted
 cd extracted
 unzip ../file.zip
 7z a -tzip -mx=9 -mfb=258 file.repack.zip extracted/*
 rm -r extracted

 You mean images_human.zip? I have a hunch that compressing that file
 wouldn't actually save space on the liveCD as I can gzip it down to
 3.9MB. It may be better to leave it as an uncompressed zip, and let
 squashfs deal with it. Recompressing the pngs contained in the zip
 sounds worthwhile though. Strangely, even running advzip -z -0
 images_human.zip shrinks it by 3%, and even shrinks the corresponding
 images_human.zip.gz file

 Also, there are 12MB of jar files, which are basically zip files. We
 can also shrink those by 5MB or so with advzip, but that doesn't seem
 to shrink a .tgz of them so it may not shrink the liveCD. Since zip
 files compress file by file, we may be able to save space on the
 liveCD by running advzip -z -0 on them. That would expand them to
 24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
 space on the liveCD if squashfs is similarly efficient.

how does OOo behave with the repacked zip file? is it faster, slower, does it 
need more memory when it runs?  imo, changes like this should be integrated 
into 
the package build process, and sent upstream. patches welcome.

same for jar files. are these extracted as fast as without your changes by the 
jvm? if not, then these should be left alone (and afaik there shouldn't be any 
jar files on the live CD).

   Matthias

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Louis Simard
2010-10-07 16:29 GMT Martin Owens docto...@gmail.com:
 On Fri, 2010-10-08 at 00:07 +0800, John McCabe-Dansted wrote:
 Strangely, even running advzip -z -0
 images_human.zip shrinks it by 3%, and even shrinks the corresponding
 images_human.zip.gz file

 That's not strange, that's just entropic packing principles. You've got
 a bunch of assumptions that can be made about data and a bunch of
 compression iterations, each make assumptions about the nature of the
 data and some are fitting together better.

 I'm keen on this work since saving space allows for all sorts of
 goodies. Did we save space with any of the SVG cleaning or did that need
 to be brought up to the packaging level?

 Martin,



Back in May, the preliminary testing I did on the LiveCD's .svg files
resulted in the finding that using Scour on them saved about 7 MB [1].
Of course, not only the LiveCD's packages use .svg files, and it would
be important to get that to other packages as well, for download
times/bandwidth use, if for any other reason. Perhaps rendering speed
would increase too, in SVG's case, but the other file formats
discussed in this thread have different characteristics.

So it needed to be brought up at the packaging level [1]. Scour will
probably itself need to be packaged too, to be included as
build-depends for packages that have SVG files (which is a lot of
application packages, since most have an SVG icon) to work well with
'apt-get source'.

[1] https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2010-May/011505.html

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Louis Simard
* LONG MESSAGE WARNING *
While I've tried to reduce the quotes and quote nesting as much as I
could, this message is still long. It is still important to read, when
you have time.

2010-10-07 16:07 GMT John McCabe-Dansted gma...@gmail.com:
 On Thu, Oct 7, 2010 at 10:05 AM, Louis Simard louis.sim...@gmail.com wrote:
 snipped

 I think this will be discussed at UDS-N, see:
 http://archives.free.net.ph/message/20101004.065026.e553efd1.en.html

Awesome! Will a digest of this conversation need to be posted to
ubuntu-devel only once done, continuing on ubuntu-devel-discuss for
now?

 2010-10-06 16:08 GMT John McCabe-Dansted gma...@gmail.com:
 [...] I note that we can save further space by:

 1) Using advdef on the png files in addition to optipng. This is what
 optimizegraphics does, and this shrinks the pngs on the Maverick RC
 liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.

 We could test each file [after using advpng on them]
 to ensure the image is identical, perhaps
 using pngtopnm, and md5sum. This would be especially important for
 jpegrescan/jpgcrush, which is at version 0.0.0-1.

Good idea. I may be able to integrate this test into my script as an option.

 2) Recompressing gz files with advdef. Using advdef, we can shrink the
 gz files from 89.5MB to 84.8MB, [...] a saving of 4.7MB.

 [...] I did use 7zip's Deflate compressor to recompress a
 .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. [...]

 You mean images_human.zip?

Yes, thanks. :) I had forgotten the name.

 I have a hunch that compressing that file
 wouldn't actually save space on the liveCD as I can gzip it down to
 3.9MB. It may be better to leave it as an uncompressed zip, and let
 squashfs deal with it.

Per that Performance - Disk footprint thread from ubuntu-devel
[brainstorm], we may actually want to also care about the installed
size, and use the 7zip recompression. While it's not going to be
*perfectly optimal*, reducing both the CD footprint and the installed
size by 0.5 MB using 7zip sounds better than reducing the CD footprint
by 2 MB, but increasing the installed size by more than 2 MB. And if
you managed to re-gzip the zip, squashfs will also manage to re-lzma
the zip for more savings and still a decent installed size. You should
test this again with lzma, I think.

 Recompressing the pngs contained in the zip
 sounds worthwhile though. Strangely, even running advzip -z -0
 images_human.zip shrinks it by 3%, and even shrinks the corresponding
 images_human.zip.gz file

I believe you there, only because the original situation has a
deflated container (png) within another deflated container (zip).
Counter-intuitive, but something to consider.

 Also, there are 12MB of jar files, which are basically zip files. We
 can also shrink those by 5MB or so with advzip, but that doesn't seem
 to shrink a .tgz of them so it may not shrink the liveCD. Since zip
 files compress file by file, we may be able to save space on the
 liveCD by running advzip -z -0 on them. That would expand them to
 24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
 space on the liveCD if squashfs is similarly efficient.

Later post by Matthias Klose
 same for jar files. are these extracted as fast as without your changes by the
 jvm? if not, then these should be left alone (and afaik there shouldn't be any
 jar files on the live CD).

Aha! I completely forgot .jar files. The OpenJDK package itself may
become much smaller after this, because of the huge runtime rt.jar.
Must test and benchmark this!

I believe OpenOffice.org is a huge user of Java, so there would be
.jar files on the LiveCD from that too.

 A further 10MB could be saved by recompressing the gz files as lzma.
 At what LZMA compression level? Default (7) or --best (9)?
 --best

I just want to add that blanket recompression of gzip files as lzma
with --best could be harmful, but with small files it's probably OK.
LZMA uses a huge dictionary to do its work, which needs to be
allocated even on the decompressing side, and --best may overrun the
memory of low-end computers on larger files.

 Also, if we want to take replacing deflate with lzma to extremes, we
 could replace the deflate compression in the png files with lzma. A
 command that does this is advpng -z -0 $f  lzma --best $f. I found
 that this could save 18.7MB. However,  It may also degrade performance
 slightly, but I doubt it would be too significant on modern CPUs.
 Running unlzma on all 66MB of the .png.lzma files takes:
 real    1m2.666s
 user    0m6.540s
 sys     0m5.610s

 I think the user/sys are the relevant ones, and taking 12s to read
 every png doesn't seem too bad. The main thing is that I doubt that it
 would work out of the box.

 If we use lzma in the squashfs, just deflating them all with advpng -z
 -0 could reduce the liveCD size. Probably wouldn't help the installed
 size though.

Indeed.

 There are a over a dozen different types of file to be tested (and
 there may be more than 

Re: More LiveCD space optimizations

2010-10-07 Thread Till Kamppeter
On 10/08/2010 12:22 AM, Louis Simard wrote:
 There are a over a dozen different types of file to be tested (and
 there may be more than one application that wants to read them). For
 reference, I have attached them. Probably the most important thing to
 check is that printing still works, as many of the gz files seem to
 e.g. ppd files.

 Maybe if you added it to your script and just gave the resulting iso a
 spin in a VM to see if there was obvious breakage?

 I have no printer supported by OpenPrinting PPDs to test this with,
 but a VM is exactly what I used to test SVG, XML and PNG optimisations
 in May (and realise that librsvg had a bug that needed worked around
 in Scour! [librsvgbug]). I'll do this, but PPDs would still need
 testing afterwards.

 A separate thread and perhaps contact people already exist for the PPD
 gzip compression ([openprinting-ppds-gzip]), and perhaps it would be
 best to communicate with these people to have them test and add
 AdvanceCOMP to their gzipping.


The space occupation of the PPDs I have already solved in Maverick. Most 
of the PPDs in /usr/share/ppd and also the Foomatic XML data in 
/usr/share/foomatic are replaced by highly compressed PPD file archives 
based on LZMA (in /usr/lib/cups/driver). This saves around 30-40 MB on 
the live system even getting the PPDs of the former 
openprinting-ppds-extra package onto the live CD. The tools for building 
these archives are from a Google Summer of Code project which I have 
mentored for OpenPrinting. See

https://bugs.launchpad.net/bugs/493282

and

http://pypi.python.org/pypi/pyppd

The packages on the CD which have their PPDs compressed now are: 
foomatic-db, openprinting-ppds, hplip-data, splix

Till

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss