Package: libdpkg-perl
Version: 1.16.9
Severity: minor

Hi,

Iterating over the fields in a Dpkg::Control object (e.g. keys
%{$dctrl}) has an O(n^2) performance. This is apparent in the code,
where FIRSTKEY and NEXTKEY loops over the "in-order" list (hinting
performance of at least O(n*m) code):

"""
sub FIRSTKEY {
    my $self = shift;
    my $parent = $self->[1];
    foreach (@{$parent->{'in_order'}}) {
        return $_ if exists $self->[0]->{lc($_)};
    }
}

sub NEXTKEY {
    my ($self, $last) = @_;
    my $parent = $self->[1];
    my $found = 0;
    foreach (@{$parent->{'in_order'}}) {
        if ($found) {
            return $_ if exists $self->[0]->{lc($_)};
        } else {
            $found = 1 if $_ eq $last;
        }
    }
    return undef;
}
"""

The attached script can be used to reproduce the behaviour.  In my case,
I had to run it with:

  $ perl perf-iter.pl 400 50
  $ perl perf-iter.pl 800 50
  $ perl perf-iter.pl 1600 50

to see it and obtain relable results (ymmv).  The O(n^2) behaviour is
visible in that whenever the number of fields double the rate will be
reduced by a factor 4.  I tried comparing it to the Perl built-in
hashref, but even for 10000 fields I did not get reliable results.
  My guess that the Perl hash is faster than O(n^2) - likely O(n),
but honestly I don't know.

The bug is of minor severity, because it takes an "obscence" number of
fields before the O(n^2) behaviour degrades the performance visibly in
the benchmark[1].

~Niels

[1] Over 100 fields AFAICT.
#!/usr/bin/perl
#
# $ perl perf-iter.pl 400 50
# Benchmark: timing 50 iterations of dpkg-control...
# dpkg-control:  1 wallclock secs ( 0.50 usr +  0.00 sys =  0.50 CPU) @ 100.00/s (n=50)
#                 Rate dpkg-control
# dpkg-control 100.0/s           --
# $ perl perf-iter.pl 800 50
# Benchmark: timing 50 iterations of dpkg-control...
# dpkg-control:  2 wallclock secs ( 1.94 usr +  0.00 sys =  1.94 CPU) @ 25.77/s (n=50)
#                Rate dpkg-control
# dpkg-control 25.8/s           --
# $ perl perf-iter.pl 1600 50
# Benchmark: timing 50 iterations of dpkg-control...
# dpkg-control:  8 wallclock secs ( 7.70 usr +  0.00 sys =  7.70 CPU) @  6.49/s (n=50)
#                Rate dpkg-control
# dpkg-control 6.49/s           --
#

use strict;
use warnings;

use Benchmark qw(cmpthese timethese);
use Dpkg::Control;

my ($fields, $count) = @ARGV;

$fields //= 2000;
$count//= 10;

my $dctrl = Dpkg::Control->new;
my $hashref = {};

for (my $i = 0; $i < $fields ; $i++) {
    my $fname = "Field-$i";
    $dctrl->{$fname} = "Value $i";
    $hashref->{$fname} = "Value $i";
}

my $bench = timethese ($count, {
#    'hashref' => sub {
#        foreach my $f (keys %{$hashref}) {
#        }
#    },
    'dpkg-control' => sub {
        foreach my $f (keys %{$dctrl}) {
        }
    },
});

cmpthese $bench;

exit 0;

Reply via email to