On Fri, May 16, 2008 at 10:43:08AM -0400, Daniel Kahn Gillmor wrote:
> Package: libmarc-charset-perl
> Version: 0.98-2
> Severity: normal

> I tried to install libmarc-charset-perl on a machine with a small
> disk, and it wants to chew up a full 410MB.  If

> I don't know how to distribute a file as a sparse file within the .deb
> framework, but if it's possible, i'd like to learn.

The files in the .deb are actually inside a tar archive (data.tar.gz),
which gets created by dpkg-deb. Sparse files aren't handled efficiently
by default without the tar '-S' option. While it's possible to inject
this option into the build system ("TAR_OPTIONS=-S dh_builddeb"),
dpkg then thinks such a file is corrupt:

 dpkg: error processing ../libmarc-charset-perl_0.98-2_i386.deb (--install):
 corrupted filesystem tarfile - corrupted package archive

This means that the sparse file would have to be encapsulated somehow in
the .deb and then installed in place with maintainer scripts. This is
certainly doable and possibly even justified with such significant space
savings.

However, I think a better solution would be to change the SDBM storage
format in Marc::Charset::Table into something else, preferrably something
that's architecture-independent (see #429030) and doesn't use sparse
files. This is an upstream issue, of course, but it would help the case
if we had a better suggestion to offer :)

FWIW, the attached trivial patch changes the backend to GDBM_File
while still passing all the tests. This fixes the sparse file problem,
but it's probably not a viable alternative for upstream because
gdbm linkage is optional while sdbm comes with Perl. And it's still
architecture-dependent.

I suppose an architecture-independent alternative would be to use
nstore_fd and fd_retrieve from Storable, which means that the whole
database is slurped in for just one lookup. This is probably an
unacceptable regression.

Better suggestions welcome.
-- 
Niko Tyni   [EMAIL PROTECTED]
diff --git a/lib/MARC/Charset/Table.pm b/lib/MARC/Charset/Table.pm
index db65c8a..a67862c 100644
--- a/lib/MARC/Charset/Table.pm
+++ b/lib/MARC/Charset/Table.pm
@@ -34,7 +34,7 @@ UCS code point. These keys map to a serialized MARC::Charset::Code object.
 use strict;
 use warnings;
 use POSIX;
-use SDBM_File;
+use GDBM_File;
 use MARC::Charset::Code;
 use MARC::Charset::Constants qw(:all);
 use Storable qw(freeze thaw);
@@ -49,7 +49,7 @@ sub new
 {
     my $class = shift;
     my $self = bless {}, ref($class) || $class;
-    $self->_init(O_RDONLY);
+    $self->_init(&GDBM_READER);
     return $self;
 }
 
@@ -174,7 +174,7 @@ sub brand_new
 {
     my $class = shift;
     my $self = bless {}, ref($class) || $class;
-    $self->_init(O_CREAT|O_RDWR);
+    $self->_init(&GDBM_WRCREAT);
     return $self;
 }
 
@@ -184,7 +184,7 @@ sub brand_new
 sub _init 
 {
     my ($self,$opts) = @_;
-    tie my %db, 'SDBM_File', db_path(), $opts, 0644;
+    tie my %db, 'GDBM_File', db_path(), $opts, 0644;
     $self->{db} = \%db;
 }
 

Reply via email to