Re: More LiveCD space optimizations

2010-11-07 Thread Martin Pitt
Hello Martin,
Martin Owens [2010-11-07  1:36 -0400]:
> That already is written in C, it's the script that pulls in the config
> and runs the usb_modeswitch program which is written in tcl.
> 
> It should be very possible to convert it to python or vala.

No python please, it has the same poor boot time behaviour. C or vala
or anything compiled should do.

> I have to wonder what 200 udev rules all with different vendor and
> product ids does to the boot time.

Not much, since either none or just one will match on your system.
usb-modeswitch needs those long lists, but udev is good at efficient
rule matching and parsing (that's what it is for, after all).

Martin
-- 
Martin Pitthttp://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-11-06 Thread Martin Owens
On Fri, 2010-11-05 at 16:20 -0400, Martin Pitt wrote:
> One thing that currently needs it is usb-modeswitch. I'd love the
> usb-modeswitch-dispatcher thing to be rewritten in C, Vala, or another
> compiled language. Not only is it holding tcl in the default install,
> but it also dramatically slows down boot. 

That already is written in C, it's the script that pulls in the config
and runs the usb_modeswitch program which is written in tcl.

It should be very possible to convert it to python or vala.

I have to wonder what 200 udev rules all with different vendor and
product ids does to the boot time.

Martin,


-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-11-05 Thread Micah Gersten
On 10/08/2010 04:54 AM, Matthias Klose wrote:
> [ compression related discussion removed ]
>
> So maybe we can save some MB with better compression, but we can save more by 
> not including files at all.  Of course this requires inspection of the 
> packages 
> included on the liveCD.  In the past we did identify some issues and did add 
> some diagnostics to the live CD build logs [1].  Of course you can't run 
> anything and lengthen the live CD build, but some additional diagnostics 
> maybe 
> could be run.
>
> In the past we did see wasted space:
> 
> - firefox and xulrunner shipping duplicate .js files
>
Well, Firefox is no longer built on top of xulrunner, so this is
necessary, especially with the PGO optimizations if we can get them.  If
webkit has sufficient accessibility and we can port yelp to webkit, we
can drop xulrunner from the CD.

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-11-05 Thread Martin Pitt
John McCabe-Dansted [2010-10-08  0:07 +0800]:
> We could test each file to ensure the image is identical, perhaps
> using pngtopnm, and md5sum. This would be especially important for
> jpegrescan/jpgcrush, which is at version 0.0.0-1.

I use a simple test script for this kind of check, see 
https://bugs.launchpad.net/ubuntu/+source/advancecomp/+bug/671599/comments/1

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-11-05 Thread Martin Pitt
Matthias,

Matthias Klose [2010-10-08 11:54 +0200]:
>   - Packages which should not be on the CD.  Some things should not be
> on the CD at all.  Looking at the current live CD log, a typical
> candidate for this would be tcl8.4. Why is it there, and how can
> it be avoided?

One thing that currently needs it is usb-modeswitch. I'd love the
usb-modeswitch-dispatcher thing to be rewritten in C, Vala, or another
compiled language. Not only is it holding tcl in the default install,
but it also dramatically slows down boot.

>   - Localized help images. You cannot just remove the images from an
> application's help, but in the past we did ship all these localized
> help images on the CD. CC'ing Martin, don't know the current status.
> However it looks like there are some xml files which maybe should
> be part of the language packs.

Since Lucid (or so) we strip those out of the app packages and ship
them in the language packs. That already saved us a lot of space.

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-11 Thread John McCabe-Dansted
On Fri, Oct 8, 2010 at 2:01 AM, Matthias Klose  wrote:
>> Also, there are 12MB of jar files, which are basically zip files. We
>> can also shrink those by 5MB or so with advzip, but that doesn't seem
>> to shrink a .tgz of them so it may not shrink the liveCD. Since zip
>> files compress file by file, we may be able to save space on the
>> liveCD by running "advzip -z -0" on them. That would expand them to
>> 24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
>> space on the liveCD if squashfs is similarly efficient.
>
> how does OOo behave with the repacked zip file? is it faster, slower, does it

No, I made a script to open oowriter 100 times. It didn't find any
consistent difference in performance.

> need more memory when it runs?  imo, changes like this should be integrated 
> into

gzip needs less than 1MB to decode (or even encode). The effect on
memory usage is likely to be minimal.
http://tukaani.org/lzma/benchmarks.html

> the package build process, and sent upstream. patches welcome.
>
> same for jar files. are these extracted as fast as without your changes by the
> jvm? if not, then these should be left alone (and afaik there shouldn't be any
> jar files on the live CD).

FYI, most of the jar files come from firefox and openoffice. Firefox
refuses to start without these jar files. I doubt they are used by a
jvm.

Using the 7z deflate instead of gzip shouldn't harm decompression
time. In fact, it should improve speed slightly because there is less
compressed input to parse (If I tar up /etc and compress it with gzip
and advdef, the one compressed with advdev does in fact seem to gunzip
very slightly faster).

The other question is does compressing decompressed jars (or
visa-versa) affect performance. A atom based netbook with a rotational
disk seemed like a good machine to test on as it is towards the low
end of performance. Repeatedly running oowriter and firefox ten times
did not lead to and consistent performance differences (see attaced
timeopen.ar). If there is any difference it would be within a few
percent. This suggests that we can decompress them (to save liveCD
space), or compress them (to save installed space) without having much
effect on performance.

It is also plausible that decompressing the jars saves installed space
on 'Btrfs -o compress' filesystems, but I have not tested whether
Btrfs compression heuristics automatically detect the jars as being
compressible.

-- 
John C. McCabe-Dansted


timeopen.ar
Description: Binary data
-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-10 Thread Louis Simard
Sorry for the 4th post in a row, but I added a script that uses
AdvanceCOMP to recompress the .gz files that aren't man pages, and I
had to share my findings.

AdvanceCOMP? | ISO size (B) | Install (KiB)

  No |  711,032,832 | 2,474,660
 Yes |  707,821,568 | 2,469,568
---
 Savings |3,211,264 | 5,092

The script is attached.

Due to ext4 extent allocation and the order of the files on the CD,
the reordered CD made by 98make-disc boots faster, but its installed
size is 180 MB bigger, so this new mksquashfs ordering (in
98make-disc) is a tradeoff. This new ordering is not used in the
actual CD building process, though I filed a bug for it [1].

Should I revert to the default ordering done by mksquashfs or start
using ext3 installations to compensate, for testing?

- Louis

[1] https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/589629


92gz-optimisation-experimental
Description: Binary data
-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-08 Thread Louis Simard
Apologies for the previous attachment, it didn't have the addition for
man-page symbolic links.

I attach the proper one this time.

- Louis


ubuntu-opt.tar.gz
Description: GNU Zip compressed data
-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-08 Thread Louis Simard
2010-10-07 16:07 GMT John McCabe-Dansted :
> On Thu, Oct 7, 2010 at 10:05 AM, Louis Simard  wrote:
>> Do you want me to add to my script any of the optimisations discussed
>> in your email? They are: Using AdvanceCOMP to recompress .png images
>> and gzipped files; using either of jpegoptim or jpegrescan to
>> losslessly recompress .jpg images; "transcoding" man pages from .gz to
>> .lzma. I'm not going to add untested optimisations yet, such as
>> transcoding *all* .gz files to .lzma.
>
> Sure. This could help with testing that these actually work ;).

It works! :) 'man' reads its files correctly, after an addition to the
script to fix the broken symlinks (for example, zfgrep.1.gz pointing
to zgrep.1.gz in /usr/share/man/man1 which became zgrep.1.lzma),
OpenOffice.org opens the Human icons correctly, all of the PNG images
compare equal using pngtopnm (although some emit a warning about the
pixel aspect ratio, i.e. non-square pixels) and Java is able to read
.jar files that have been recompressed.

However, while recompressing the files helped the ISO size, it made
the install size grow by about 180 MB...

When | ISO size (B) | Install (KiB)

 Old |  718,864,384 | 2,293,740
 New |  711,032,832 | 2,474,660

Ow! What's going on?

Here's the methodology I used for that result.
* The base CD is Ubuntu 10.04.1.
* Using the virtual hardware provided by VirtualBox OSE 3.1.6 r59338,
384 MB RAM, 5 GB SCSI hard drive and IDE CD-ROM drive using the ISO.
* Look at the size of the CD ISO using ls -l.
* Using 'df', look at the number of disk blocks used on the VM's hard
drive before rebooting from an installation via the GUI:
a) Language: English
b) Time zone: Europe/United Kingdom Time
c) Keyboard layout: USA
d) Disk space preparation: Manual partitioning, no swap, fill the
entire drive with an ext4 partition for /
e) Username, password and computer name: "dummy"

Attached are the scripts I used to mount the original CD, recompress
the files and make the new CD.

No CDs were harmed in the making of this email.

- Louis


ubuntu-opt.tar.gz
Description: GNU Zip compressed data
-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-08 Thread Louis Simard
2010-10-08 09:54 GMT Matthias Klose :
> In the past we did see wasted space:
>
>  - Packages which should not be on the CD.  Some things should not be
>   on the CD at all.  Looking at the current live CD log, a typical
>   candidate for this would be tcl8.4. Why is it there, and how can
>   it be avoided?

foo2zjs made APT install that package.

$ aptitude why tcl8.4
i   tk8.4 Depends tcl8.4 (>= 8.4.16)
$ aptitude why tk8.4
i   foo2zjs Recommends tk8.4

I'm sure there will be other examples, though I'm not as familiar with
the LiveCD's packages as you guys at Canonical.

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-08 Thread Matthias Klose
[ compression related discussion removed ]

So maybe we can save some MB with better compression, but we can save more by 
not including files at all.  Of course this requires inspection of the packages 
included on the liveCD.  In the past we did identify some issues and did add 
some diagnostics to the live CD build logs [1].  Of course you can't run 
anything and lengthen the live CD build, but some additional diagnostics maybe 
could be run.

In the past we did see wasted space:

  - Packages which should not be on the CD.  Some things should not be
on the CD at all.  Looking at the current live CD log, a typical
candidate for this would be tcl8.4. Why is it there, and how can
it be avoided?

  - Large doc directories.  If a package becomes too large, maybe it is
worth to split a package into foo and foo-doc, and not ship foo-doc
on the CD (yes there are other ideas not to ship doc dirs at all).
See python-couchdb for an example. The API documentation does not
need to be on the live CD.  The same may be true for other python
packages.

  - Localized help images. You cannot just remove the images from an
application's help, but in the past we did ship all these localized
help images on the CD. CC'ing Martin, don't know the current status.
However it looks like there are some xml files which maybe should
be part of the language packs.

  - Duplicate files. While this is not that important on the live CD,
it's important for the alternate CDs.  Looking at the list of
duplicate files, I see a lot of potential in:

- all the mono packages and libraries

- broken build systems shipping doc files in every binary package.
  see the upstream changelog.gz files (e.g. gnome and OOo).

- firefox and xulrunner shipping duplicate .js files

- package specific stuff (libc6-dev having some identical libs
  in /usr/lib/xen).

- you may see and find more if you are familiar with a particular package.

There is potential in saving space with better compression, but IMO you can 
even 
save more with closely looking what goes on the CD (where we currently don't do 
a good job). The good thing is that both approaches don't exclude each other.

   Matthias

[1] 
http://people.canonical.com/~ubuntu-archive/livefs-build-logs/maverick/ubuntu/latest/livecd-20101007-i386.out

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Till Kamppeter
On 10/08/2010 12:22 AM, Louis Simard wrote:
>> There are a over a dozen different types of file to be tested (and
>> there may be more than one application that wants to read them). For
>> reference, I have attached them. Probably the most important thing to
>> check is that printing still works, as many of the gz files seem to
>> e.g. ppd files.
>>
>> Maybe if you added it to your script and just gave the resulting iso a
>> spin in a VM to see if there was obvious breakage?
>
> I have no printer supported by OpenPrinting PPDs to test this with,
> but a VM is exactly what I used to test SVG, XML and PNG optimisations
> in May (and realise that librsvg had a bug that needed worked around
> in Scour! [librsvgbug]). I'll do this, but PPDs would still need
> testing afterwards.
>
> A separate thread and perhaps contact people already exist for the PPD
> gzip compression ([openprinting-ppds-gzip]), and perhaps it would be
> best to communicate with these people to have them test and add
> AdvanceCOMP to their gzipping.
>

The space occupation of the PPDs I have already solved in Maverick. Most 
of the PPDs in /usr/share/ppd and also the Foomatic XML data in 
/usr/share/foomatic are replaced by highly compressed PPD file archives 
based on LZMA (in /usr/lib/cups/driver). This saves around 30-40 MB on 
the live system even getting the PPDs of the former 
openprinting-ppds-extra package onto the live CD. The tools for building 
these archives are from a Google Summer of Code project which I have 
mentored for OpenPrinting. See

https://bugs.launchpad.net/bugs/493282

and

http://pypi.python.org/pypi/pyppd

The packages on the CD which have their PPDs compressed now are: 
foomatic-db, openprinting-ppds, hplip-data, splix

Till

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Louis Simard
* LONG MESSAGE WARNING *
While I've tried to reduce the quotes and quote nesting as much as I
could, this message is still long. It is still important to read, when
you have time.

2010-10-07 16:07 GMT John McCabe-Dansted :
> On Thu, Oct 7, 2010 at 10:05 AM, Louis Simard  wrote:
>> 
>
> I think this will be discussed at UDS-N, see:
> http://archives.free.net.ph/message/20101004.065026.e553efd1.en.html

Awesome! Will a digest of this conversation need to be posted to
ubuntu-devel only once done, continuing on ubuntu-devel-discuss for
now?

>> 2010-10-06 16:08 GMT John McCabe-Dansted :
>>> [...] I note that we can save further space by:
>>>
>>> 1) Using advdef on the png files in addition to optipng. This is what
>>> optimizegraphics does, and this shrinks the pngs on the Maverick RC
>>> liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.
>
> We could test each file [after using advpng on them]
> to ensure the image is identical, perhaps
> using pngtopnm, and md5sum. This would be especially important for
> jpegrescan/jpgcrush, which is at version 0.0.0-1.

Good idea. I may be able to integrate this test into my script as an option.

>>> 2) Recompressing gz files with advdef. Using advdef, we can shrink the
>>> gz files from 89.5MB to 84.8MB, [...] a saving of 4.7MB.
>>
>> [...] I did use 7zip's Deflate compressor to recompress a
>> .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. [...]
>
> You mean images_human.zip?

Yes, thanks. :) I had forgotten the name.

> I have a hunch that compressing that file
> wouldn't actually save space on the liveCD as I can gzip it down to
> 3.9MB. It may be better to leave it as an uncompressed zip, and let
> squashfs deal with it.

Per that "Performance - Disk footprint" thread from ubuntu-devel
[brainstorm], we may actually want to also care about the installed
size, and use the 7zip recompression. While it's not going to be
*perfectly optimal*, reducing both the CD footprint and the installed
size by 0.5 MB using 7zip sounds better than reducing the CD footprint
by 2 MB, but increasing the installed size by more than 2 MB. And if
you managed to re-gzip the zip, squashfs will also manage to re-lzma
the zip for more savings and still a decent installed size. You should
test this again with lzma, I think.

> Recompressing the pngs contained in the zip
> sounds worthwhile though. Strangely, even running advzip -z -0
> images_human.zip shrinks it by 3%, and even shrinks the corresponding
> images_human.zip.gz file

I believe you there, only because the original situation has a
deflated container (png) within another deflated container (zip).
Counter-intuitive, but something to consider.

> Also, there are 12MB of jar files, which are basically zip files. We
> can also shrink those by 5MB or so with advzip, but that doesn't seem
> to shrink a .tgz of them so it may not shrink the liveCD. Since zip
> files compress file by file, we may be able to save space on the
> liveCD by running "advzip -z -0" on them. That would expand them to
> 24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
> space on the liveCD if squashfs is similarly efficient.


> same for jar files. are these extracted as fast as without your changes by the
> jvm? if not, then these should be left alone (and afaik there shouldn't be any
> jar files on the live CD).

Aha! I completely forgot .jar files. The OpenJDK package itself may
become much smaller after this, because of the huge runtime rt.jar.
Must test and benchmark this!

I believe OpenOffice.org is a huge user of Java, so there would be
.jar files on the LiveCD from that too.

>>> A further 10MB could be saved by recompressing the gz files as lzma.
>> At what LZMA compression level? Default (7) or --best (9)?
> --best

I just want to add that blanket recompression of gzip files as lzma
with --best could be harmful, but with small files it's probably OK.
LZMA uses a huge dictionary to do its work, which needs to be
allocated even on the decompressing side, and --best may overrun the
memory of low-end computers on larger files.

> Also, if we want to take replacing deflate with lzma to extremes, we
> could replace the deflate compression in the png files with lzma. A
> command that does this is "advpng -z -0 $f && lzma --best $f". I found
> that this could save 18.7MB. However,  It may also degrade performance
> slightly, but I doubt it would be too significant on modern CPUs.
> Running unlzma on all 66MB of the .png.lzma files takes:
> real    1m2.666s
> user    0m6.540s
> sys     0m5.610s
>
> I think the user/sys are the relevant ones, and taking 12s to read
> every png doesn't seem too bad. The main thing is that I doubt that it
> would work out of the box.
>
> If we use lzma in the squashfs, just deflating them all with advpng -z
> -0 could reduce the liveCD size. Probably wouldn't help the installed
> size though.

Indeed.

> There are a over a dozen different types of file to be tested (and
> there may be more th

Re: More LiveCD space optimizations

2010-10-07 Thread Louis Simard
2010-10-07 16:29 GMT Martin Owens :
> On Fri, 2010-10-08 at 00:07 +0800, John McCabe-Dansted wrote:
>> Strangely, even running advzip -z -0
>> images_human.zip shrinks it by 3%, and even shrinks the corresponding
>> images_human.zip.gz file
>
> That's not strange, that's just entropic packing principles. You've got
> a bunch of assumptions that can be made about data and a bunch of
> compression iterations, each make assumptions about the nature of the
> data and some are fitting together better.
>
> I'm keen on this work since saving space allows for all sorts of
> goodies. Did we save space with any of the SVG cleaning or did that need
> to be brought up to the packaging level?
>
> Martin,
>
>

Back in May, the preliminary testing I did on the LiveCD's .svg files
resulted in the finding that using Scour on them saved about 7 MB [1].
Of course, not only the LiveCD's packages use .svg files, and it would
be important to get that to other packages as well, for download
times/bandwidth use, if for any other reason. Perhaps rendering speed
would increase too, in SVG's case, but the other file formats
discussed in this thread have different characteristics.

So it needed to be brought up at the packaging level [1]. Scour will
probably itself need to be packaged too, to be included as
build-depends for packages that have SVG files (which is a lot of
application packages, since most have an SVG icon) to work well with
'apt-get source'.

[1] https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2010-May/011505.html

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Matthias Klose
On 07.10.2010 18:07, John McCabe-Dansted wrote:

>> That's an interesting optimisation; I didn't really know about it
>> either. However, I did use 7zip's Deflate compressor to recompress a
>> .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. The method was
>> rather crude, but it did the job:
>>
>> mkdir extracted
>> cd extracted
>> unzip ../file.zip
>> 7z a -tzip -mx=9 -mfb=258 file.repack.zip extracted/*
>> rm -r extracted
>
> You mean images_human.zip? I have a hunch that compressing that file
> wouldn't actually save space on the liveCD as I can gzip it down to
> 3.9MB. It may be better to leave it as an uncompressed zip, and let
> squashfs deal with it. Recompressing the pngs contained in the zip
> sounds worthwhile though. Strangely, even running advzip -z -0
> images_human.zip shrinks it by 3%, and even shrinks the corresponding
> images_human.zip.gz file
>
> Also, there are 12MB of jar files, which are basically zip files. We
> can also shrink those by 5MB or so with advzip, but that doesn't seem
> to shrink a .tgz of them so it may not shrink the liveCD. Since zip
> files compress file by file, we may be able to save space on the
> liveCD by running "advzip -z -0" on them. That would expand them to
> 24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
> space on the liveCD if squashfs is similarly efficient.

how does OOo behave with the repacked zip file? is it faster, slower, does it 
need more memory when it runs?  imo, changes like this should be integrated 
into 
the package build process, and sent upstream. patches welcome.

same for jar files. are these extracted as fast as without your changes by the 
jvm? if not, then these should be left alone (and afaik there shouldn't be any 
jar files on the live CD).

   Matthias

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread Martin Owens
On Fri, 2010-10-08 at 00:07 +0800, John McCabe-Dansted wrote:
> Strangely, even running advzip -z -0
> images_human.zip shrinks it by 3%, and even shrinks the corresponding
> images_human.zip.gz file 

That's not strange, that's just entropic packing principles. You've got
a bunch of assumptions that can be made about data and a bunch of
compression iterations, each make assumptions about the nature of the
data and some are fitting together better.

I'm keen on this work since saving space allows for all sorts of
goodies. Did we save space with any of the SVG cleaning or did that need
to be brought up to the packaging level?

Martin,


-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: More LiveCD space optimizations

2010-10-07 Thread John McCabe-Dansted
On Thu, Oct 7, 2010 at 10:05 AM, Louis Simard  wrote:
> Hey :)
>
> Thanks for the interest in this optimisation! Unfortunately I wasn't
> pushy enough in my thread from May-June and it wasn't included in the
> Maverick LiveCD. A pending question is what to do to include the
> recompressed files into the archive's packages [1].

I think this will be discussed at UDS-N, see:
http://archives.free.net.ph/message/20101004.065026.e553efd1.en.html

> 2010-10-06 16:08 GMT John McCabe-Dansted :
>> In May, Louis Simard proposed rencoding PNG files and SVG files to
>> reduce their size [Quoted 1]. I note that we can save further space by:
>>
>> 1) Using advdef on the png files in addition to optipng. This is what
>> optimizegraphics does, and this shrinks the pngs on the Maverick RC
>> liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.
>
> So it does; I didn't know about that. Reading the man file for advpng,
> it gave a warning that it was only supported for AdvanceMAME-generated
> PNG files, so I was skeptical, but it does shave off about 4% more
> filesize on average with 'advpng -z4'.

We could test each file to ensure the image is identical, perhaps
using pngtopnm, and md5sum. This would be especially important for
jpegrescan/jpgcrush, which is at version 0.0.0-1.

>> 2) Recompressing gz files with advdef. Using advdef, we can shrink the
>> gz files from 89.5MB to 84.8MB, and provides a saving of 4.7MB.
>
> That's an interesting optimisation; I didn't really know about it
> either. However, I did use 7zip's Deflate compressor to recompress a
> .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. The method was
> rather crude, but it did the job:
>
> mkdir extracted
> cd extracted
> unzip ../file.zip
> 7z a -tzip -mx=9 -mfb=258 file.repack.zip extracted/*
> rm -r extracted

You mean images_human.zip? I have a hunch that compressing that file
wouldn't actually save space on the liveCD as I can gzip it down to
3.9MB. It may be better to leave it as an uncompressed zip, and let
squashfs deal with it. Recompressing the pngs contained in the zip
sounds worthwhile though. Strangely, even running advzip -z -0
images_human.zip shrinks it by 3%, and even shrinks the corresponding
images_human.zip.gz file

Also, there are 12MB of jar files, which are basically zip files. We
can also shrink those by 5MB or so with advzip, but that doesn't seem
to shrink a .tgz of them so it may not shrink the liveCD. Since zip
files compress file by file, we may be able to save space on the
liveCD by running "advzip -z -0" on them. That would expand them to
24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
space on the liveCD if squashfs is similarly efficient.

>> 3) Recompressing jpeg files with jpegrescan. This only saves 0.5MB,
>> but implementing this would add just a couple more lines of code, and
>> jpegrescan does not lose any picture quality [Quoted 2].
>
> jpegoptim indeed performs lossless optimisation of JPEG files by
> editing Huffman tables, and it's used as the basis of jpegrescan.
> However, jpegoptim doesn't make non-progressive files progressive, as
> I understand jpegrescan does. This may make jpegoptim's optimisations
> more transparent to applications that, for some reason, can't decode
> progressive JPEGs and thus have non-progressive JPEGs in their
> packages. However, most applications should be using libjpeg anyway,
> so perhaps this point is moot.
>
>>
>> Together these should shrink the liveCD by over 20MB. This is without
>> even considering the .xml and .svg optimizations Louis proposed.
>>
>> A further 10MB could be saved by recompressing the gz files as lzma.
>
> At what LZMA compression level? Default (7) or --best (9)?

--best

Also, if we want to take replacing deflate with lzma to extremes, we
could replace the deflate compression in the png files with lzma. A
command that does this is "advpng -z -0 $f && lzma --best $f". I found
that this could save 18.7MB. However,  It may also degrade performance
slightly, but I doubt it would be too significant on modern CPUs.
Running unlzma on all 66MB of the .png.lzma files takes:
real1m2.666s
user0m6.540s
sys 0m5.610s

I think the user/sys are the relevant ones, and taking 12s to read
every png doesn't seem too bad. The main thing is that I doubt that it
would work out of the box.

If we use lzma in the squashfs, just deflating them all with advpng -z
-0 could reduce the liveCD size. Probably wouldn't help the installed
size though.

>> This seems reasonable as lzma has reasonable decompression times (e.g.
>> 7ms to decompress a largish manpage like lsof).
>
> 7 ms? What's your CPU? :)

Core2Duo E7200  @ 2.53GHz

>> Since the liveCD is
>> compressed anyway, it seems that if a file is compressed with gzip. it
>> is worth compressing with lzma.  The command "man" already seems to
>> have lzma support, but we'd want to test each application to ensure
>> that it functions correctly when its .gz files are replaced with lzma

Re: More LiveCD space optimizations

2010-10-06 Thread Louis Simard
Hey :)

Thanks for the interest in this optimisation! Unfortunately I wasn't
pushy enough in my thread from May-June and it wasn't included in the
Maverick LiveCD. A pending question is what to do to include the
recompressed files into the archive's packages [1].

2010-10-06 16:08 GMT John McCabe-Dansted :
> In May, Louis Simard proposed rencoding PNG files and SVG files to
> reduce their size [Quoted 1]. I note that we can save further space by:
>
> 1) Using advdef on the png files in addition to optipng. This is what
> optimizegraphics does, and this shrinks the pngs on the Maverick RC
> liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.

So it does; I didn't know about that. Reading the man file for advpng,
it gave a warning that it was only supported for AdvanceMAME-generated
PNG files, so I was skeptical, but it does shave off about 4% more
filesize on average with 'advpng -z4'.

> 2) Recompressing gz files with advdef. Using advdef, we can shrink the
> gz files from 89.5MB to 84.8MB, and provides a saving of 4.7MB.

That's an interesting optimisation; I didn't really know about it
either. However, I did use 7zip's Deflate compressor to recompress a
.zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. The method was
rather crude, but it did the job:

mkdir extracted
cd extracted
unzip ../file.zip
7z a -tzip -mx=9 -mfb=258 file.repack.zip extracted/*
rm -r extracted

> 3) Recompressing jpeg files with jpegrescan. This only saves 0.5MB,
> but implementing this would add just a couple more lines of code, and
> jpegrescan does not lose any picture quality [Quoted 2].

jpegoptim indeed performs lossless optimisation of JPEG files by
editing Huffman tables, and it's used as the basis of jpegrescan.
However, jpegoptim doesn't make non-progressive files progressive, as
I understand jpegrescan does. This may make jpegoptim's optimisations
more transparent to applications that, for some reason, can't decode
progressive JPEGs and thus have non-progressive JPEGs in their
packages. However, most applications should be using libjpeg anyway,
so perhaps this point is moot.

>
> Together these should shrink the liveCD by over 20MB. This is without
> even considering the .xml and .svg optimizations Louis proposed.
>
> A further 10MB could be saved by recompressing the gz files as lzma.

At what LZMA compression level? Default (7) or --best (9)?

> This seems reasonable as lzma has reasonable decompression times (e.g.
> 7ms to decompress a largish manpage like lsof).

7 ms? What's your CPU? :)

> Since the liveCD is
> compressed anyway, it seems that if a file is compressed with gzip. it
> is worth compressing with lzma.  The command "man" already seems to
> have lzma support, but we'd want to test each application to ensure
> that it functions correctly when its .gz files are replaced with lzma
> files. We could also selectively recompress the gz files, as some .gz
> files are actually smaller (by about 40 bytes) than the corresponding
> lzma file.

I hadn't considered this type of "transcoding" for the LiveCD. We may
want to ourselves test which programs accept .lzma files in their
directories in addition to .gz. Shall you do it, shall I, or shall we
both do it? Also, is anyone else interested?

Your point about files being compressed anyway is kind of interesting:
both Deflate and LZMA recompress very poorly, so saving bytes by
switching from one to the other makes sense. That would not shrink the
*installed size* of these man pages much, though, because of default 4
KB blocks for ext[2-4].

>
> Given that recoding SVG files can save 7MB [Quoted 1], simply recoding files
> could free up 37MB for the Natty LiveCD (and presumably also reduce
> the the average size of debs in the repos by about 5%).
>
> [Quoted 1] 
> http://www.mail-archive.com/ubuntu-devel-discuss@lists.ubuntu.com/msg11337.html
> [Quoted 2] http://news.ycombinator.com/item?id=803839
>
> I attach the script I used to check how much space would be saved.
> This is purely for reproduction of my results, it is not integrated
> into Louis's script.

Do you want me to add to my script any of the optimisations discussed
in your email? They are: Using AdvanceCOMP to recompress .png images
and gzipped files; using either of jpegoptim or jpegrescan to
losslessly recompress .jpg images; "transcoding" man pages from .gz to
.lzma. I'm not going to add untested optimisations yet, such as
transcoding *all* .gz files to .lzma.

I'm still very interested in this, despite the lack of posting about
the subject in the last 4 months! I've just been waiting for the guys
at Debian to advise me on how to best integrate these optimisations
into packages. Perhaps I should just devise a set of suitable
build-depends additions (optipng, advancecomp, jpegoptim) and makefile
rules for .png/.jpg/.gz, then file a single bug report on all of the
packages that would benefit the most from optimisations? That way,
package maintainers could opt in rather easily.

- Louis

[

More LiveCD space optimizations

2010-10-06 Thread John McCabe-Dansted
In May, Louis Simard proposed rencoding PNG files and SVG files to
reduce their size [1]. I note that we can save further space by:

1) Using advdef on the png files in addition to optipng. This is what
optimizegraphics does, and this shrinks the pngs on the Maverick RC
liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.
2) Recompressing gz files with advdef. Using advdef, we can shrink the
gz files from 89.5MB to 84.8MB, and provides a saving of 4.7MB.
3) Recompressing jpeg files with jpegrescan. This only saves 0.5MB,
but implementing this would add just a couple more lines of code, and
jpegrescan does not lose any picture quality [2].

Together these should shrink the liveCD by over 20MB. This is without
even considering the .xml and .svg optimizations Louis proposed.

A further 10MB could be saved by recompressing the gz files as lzma.
This seems reasonable as lzma has reasonable decompression times (e.g.
7ms to decompress a largish manpage like lsof). Since the liveCD is
compressed anyway, it seems that if a file is compressed with gzip. it
is worth compressing with lzma.  The command "man" already seems to
have lzma support, but we'd want to test each application to ensure
that it functions correctly when its .gz files are replaced with lzma
files. We could also selectively recompress the gz files, as some .gz
files are actually smaller (by about 40 bytes) than the corresponding
lzma file.

Given that recoding SVG files can save 7MB [1], simply recoding files
could free up 37MB for the Natty LiveCD (and presumably also reduce
the the average size of debs in the repos by about 5%).

[1] 
http://www.mail-archive.com/ubuntu-devel-discuss@lists.ubuntu.com/msg11337.html
[2] http://news.ycombinator.com/item?id=803839

I attach the script I used to check how much space would be saved.
This is purely for reproduction of my results, it is not integrated
into Louis's script.

-- 
John C. McCabe-Dansted


lossless_recompression_iso.sh
Description: Bourne shell script
-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss