Package: libdpkg-perl
Version: 1.16.9
Severity: minor
Hi,
Iterating over the fields in a Dpkg::Control object (e.g. keys
%{$dctrl}) has an O(n^2) performance. This is apparent in the code,
where FIRSTKEY and NEXTKEY loops over the "in-order" list (hinting
performance of at least O(n*m) code):
"""
sub FIRSTKEY {
my $self = shift;
my $parent = $self->[1];
foreach (@{$parent->{'in_order'}}) {
return $_ if exists $self->[0]->{lc($_)};
}
}
sub NEXTKEY {
my ($self, $last) = @_;
my $parent = $self->[1];
my $found = 0;
foreach (@{$parent->{'in_order'}}) {
if ($found) {
return $_ if exists $self->[0]->{lc($_)};
} else {
$found = 1 if $_ eq $last;
}
}
return undef;
}
"""
The attached script can be used to reproduce the behaviour. In my case,
I had to run it with:
$ perl perf-iter.pl 400 50
$ perl perf-iter.pl 800 50
$ perl perf-iter.pl 1600 50
to see it and obtain relable results (ymmv). The O(n^2) behaviour is
visible in that whenever the number of fields double the rate will be
reduced by a factor 4. I tried comparing it to the Perl built-in
hashref, but even for 10000 fields I did not get reliable results.
My guess that the Perl hash is faster than O(n^2) - likely O(n),
but honestly I don't know.
The bug is of minor severity, because it takes an "obscence" number of
fields before the O(n^2) behaviour degrades the performance visibly in
the benchmark[1].
~Niels
[1] Over 100 fields AFAICT.
#!/usr/bin/perl
#
# $ perl perf-iter.pl 400 50
# Benchmark: timing 50 iterations of dpkg-control...
# dpkg-control: 1 wallclock secs ( 0.50 usr + 0.00 sys = 0.50 CPU) @ 100.00/s (n=50)
# Rate dpkg-control
# dpkg-control 100.0/s --
# $ perl perf-iter.pl 800 50
# Benchmark: timing 50 iterations of dpkg-control...
# dpkg-control: 2 wallclock secs ( 1.94 usr + 0.00 sys = 1.94 CPU) @ 25.77/s (n=50)
# Rate dpkg-control
# dpkg-control 25.8/s --
# $ perl perf-iter.pl 1600 50
# Benchmark: timing 50 iterations of dpkg-control...
# dpkg-control: 8 wallclock secs ( 7.70 usr + 0.00 sys = 7.70 CPU) @ 6.49/s (n=50)
# Rate dpkg-control
# dpkg-control 6.49/s --
#
use strict;
use warnings;
use Benchmark qw(cmpthese timethese);
use Dpkg::Control;
my ($fields, $count) = @ARGV;
$fields //= 2000;
$count//= 10;
my $dctrl = Dpkg::Control->new;
my $hashref = {};
for (my $i = 0; $i < $fields ; $i++) {
my $fname = "Field-$i";
$dctrl->{$fname} = "Value $i";
$hashref->{$fname} = "Value $i";
}
my $bench = timethese ($count, {
# 'hashref' => sub {
# foreach my $f (keys %{$hashref}) {
# }
# },
'dpkg-control' => sub {
foreach my $f (keys %{$dctrl}) {
}
},
});
cmpthese $bench;
exit 0;