On Fri, May 16, 2008 at 10:43:08AM -0400, Daniel Kahn Gillmor wrote: > Package: libmarc-charset-perl > Version: 0.98-2 > Severity: normal
> I tried to install libmarc-charset-perl on a machine with a small > disk, and it wants to chew up a full 410MB. If > I don't know how to distribute a file as a sparse file within the .deb > framework, but if it's possible, i'd like to learn. The files in the .deb are actually inside a tar archive (data.tar.gz), which gets created by dpkg-deb. Sparse files aren't handled efficiently by default without the tar '-S' option. While it's possible to inject this option into the build system ("TAR_OPTIONS=-S dh_builddeb"), dpkg then thinks such a file is corrupt: dpkg: error processing ../libmarc-charset-perl_0.98-2_i386.deb (--install): corrupted filesystem tarfile - corrupted package archive This means that the sparse file would have to be encapsulated somehow in the .deb and then installed in place with maintainer scripts. This is certainly doable and possibly even justified with such significant space savings. However, I think a better solution would be to change the SDBM storage format in Marc::Charset::Table into something else, preferrably something that's architecture-independent (see #429030) and doesn't use sparse files. This is an upstream issue, of course, but it would help the case if we had a better suggestion to offer :) FWIW, the attached trivial patch changes the backend to GDBM_File while still passing all the tests. This fixes the sparse file problem, but it's probably not a viable alternative for upstream because gdbm linkage is optional while sdbm comes with Perl. And it's still architecture-dependent. I suppose an architecture-independent alternative would be to use nstore_fd and fd_retrieve from Storable, which means that the whole database is slurped in for just one lookup. This is probably an unacceptable regression. Better suggestions welcome. -- Niko Tyni [EMAIL PROTECTED]
diff --git a/lib/MARC/Charset/Table.pm b/lib/MARC/Charset/Table.pm index db65c8a..a67862c 100644 --- a/lib/MARC/Charset/Table.pm +++ b/lib/MARC/Charset/Table.pm @@ -34,7 +34,7 @@ UCS code point. These keys map to a serialized MARC::Charset::Code object. use strict; use warnings; use POSIX; -use SDBM_File; +use GDBM_File; use MARC::Charset::Code; use MARC::Charset::Constants qw(:all); use Storable qw(freeze thaw); @@ -49,7 +49,7 @@ sub new { my $class = shift; my $self = bless {}, ref($class) || $class; - $self->_init(O_RDONLY); + $self->_init(&GDBM_READER); return $self; } @@ -174,7 +174,7 @@ sub brand_new { my $class = shift; my $self = bless {}, ref($class) || $class; - $self->_init(O_CREAT|O_RDWR); + $self->_init(&GDBM_WRCREAT); return $self; } @@ -184,7 +184,7 @@ sub brand_new sub _init { my ($self,$opts) = @_; - tie my %db, 'SDBM_File', db_path(), $opts, 0644; + tie my %db, 'GDBM_File', db_path(), $opts, 0644; $self->{db} = \%db; }