Bug#605844: lintian: Consider pre-sorting keys %{$info-index}

2010-12-03 Thread Niels Thykier
Package: lintian
Version: 2.4.4

Hi

I examined the source of lintian (pulled from git) and noticed that
lintian could probably benefit from looking at common access patterns.

In the checks, there are 13 accesses to 'keys %{$info-index}'; 11 of
these are 'sort keys %{$info-index}'. This means that the list of
files is sorted 11 times!

grep '%{$info-index}' checks/*
checks/changelog-file:foreach (sort keys %{$info-index}) {
checks/copyright-file:foreach (sort keys %{$info-index}) {
checks/etcfiles:foreach my $file (sort keys %{$info-index}) {
checks/fields:  for my $file (keys %{$info-index}) {
checks/files:for my $file (sort keys %{$info-index}) {
checks/files:foreach my $file (sort keys %{$info-index}) {
checks/files:foreach my $dir (sort keys %{$info-index}) {
checks/infofiles:foreach my $file (sort keys %{$info-index}) {
checks/manpages:foreach my $file (sort keys %{$info-index}) {
checks/menu-format:foreach my $file (sort keys %{$info-index}) {
checks/menus:for my $file (sort keys %{$info-index}) {
checks/scripts:foreach (sort keys %{$info-index}) {
checks/scripts: unless (grep { $_ =~ m/$divertrx/ } keys %{$info-index});
checks/shared-libs:for my $cur_file (sort keys %{$info-index}) {

This hurts a bit for larger packages; it might be a good idea to
supply a $info-sorted_index or so.

~Niels



-- 
To UNSUBSCRIBE, email to debian-lint-maint-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimm3atgnh1l5xvxsft3hsfd8vrpvxx8zowsc...@mail.gmail.com



Bug#605844: lintian: Consider pre-sorting keys %{$info-index}

2010-12-03 Thread Niels Thykier
Interestingly enough this does not appear to be the current bottle
neck of lintian (or I am doing something wrong in my patch).

~Niels

$ dpkg --contents eclipse-platform-data_3.5.2-8_all.deb  | wc -l
1480

lintian 2.4.3
$ for i in {1..5} ; do time lintian -EI --pedantic
eclipse-platform-data_3.5.2-8_all.deb  ; done

real0m9.170s
user0m4.556s
sys 0m0.908s

real0m9.083s
user0m4.628s
sys 0m0.868s

real0m10.285s
user0m4.600s
sys 0m0.928s

real0m9.762s
user0m4.524s
sys 0m0.972s

real0m8.833s
user0m4.624s
sys 0m0.864s

lintian 2.4.4 (from git with patch)
$ for i in {1..5} ; do time lintian -EI --pedantic
eclipse-platform-data_3.5.2-8_all.deb  ; done

real0m7.659s
user0m4.484s
sys 0m1.024s

real0m9.412s
user0m4.556s
sys 0m0.944s

real0m8.672s
user0m4.496s
sys 0m0.992s

real0m9.090s
user0m4.496s
sys 0m0.976s

real0m9.000s
user0m4.460s
sys 0m1.052s
diff --git a/checks/changelog-file b/checks/changelog-file
index 8147095..40f4b0f 100644
--- a/checks/changelog-file
+++ b/checks/changelog-file
@@ -79,7 +79,7 @@ for my $file (sort keys %{$info-file_info}) {
 }
 
 # Read package contents  Capitalization errors are dealt with later.
-foreach (sort keys %{$info-index}) {
+foreach (@{$info-sorted_index}) {
 next unless length $_;
 # skip packages which have a /usr/share/doc/$pkg - foo symlink
 if (m,usr/share/doc/$ppkg$, and defined $info-index-{$_}-{link}) {
diff --git a/checks/copyright-file b/checks/copyright-file
index 336d0f4..267ed43 100644
--- a/checks/copyright-file
+++ b/checks/copyright-file
@@ -45,7 +45,7 @@ my $found = 0;
 my $linked = 0;
 
 # Read package contents...
-foreach (sort keys %{$info-index}) {
+foreach (@{$info-sorted_index}) {
 my $index_info = $info-index-{$_};
 if (m,usr/(share/)?doc/$ppkg/copyright(\.\S+)?$,) {
 	my $ext = $2;
diff --git a/checks/etcfiles b/checks/etcfiles
index bc61715..9a734b1 100644
--- a/checks/etcfiles
+++ b/checks/etcfiles
@@ -45,7 +45,7 @@ if (open(IN, '', $conffiles)) {
 }
 
 # Read package contents...
-foreach my $file (sort keys %{$info-index}) {
+foreach my $file (@{$info-sorted_index}) {
 my $index_info = $info-index-{$file};
 next unless $file =~ m,^etc, and $index_info-{type}=~ m/^[-h]/;
 
diff --git a/checks/fields b/checks/fields
index e5f7108..848e9ac 100644
--- a/checks/fields
+++ b/checks/fields
@@ -503,7 +503,7 @@ if (defined $info-field('installer-menu-item')) {
 my $metapackage = 0;
 if ($type eq 'binary') {
 	$metapackage = 1;
-	for my $file (keys %{$info-index}) {
+	for my $file (@{$info-sorted_index}) {
 		next if $info-index-{$file}-{type} =~ /^d/;
 		next if $file =~ m%^usr/share/doc/%;
 		next if $file =~ m%^usr/share/lintian/overrides/%;
diff --git a/checks/files b/checks/files
index 56cf40b..aa140dc 100644
--- a/checks/files
+++ b/checks/files
@@ -199,7 +199,7 @@ if ($description) {
 }
 }
 if ($is_empty) {
-for my $file (sort keys %{$info-index}) {
+for my $file (@{$info-sorted_index}) {
 # Ignore directories
 unless ($file =~ m,/$,) {
 # Skip if $file is an empty string
@@ -226,7 +226,7 @@ if ($is_empty) {
 }
 
 # Read package contents...
-foreach my $file (sort keys %{$info-index}) {
+foreach my $file (@{$info-sorted_index}) {
 next if $file eq ;
 my $index_info = $info-index-{$file};
 my $owner = $index_info-{owner} . '/' . $index_info-{group};
@@ -1204,7 +1204,7 @@ if ($pkg_section !~ m,games$, and $games  0 and $other == 0) {
 # so just ignore them.
 #
 # python-support needs a directory for each package even it might be empty
-foreach my $dir (sort keys %{$info-index}) {
+foreach my $dir (@{$info-sorted_index}) {
 next if $dir eq  or $info-index-{$dir}-{type} ne 'd';
 next if ($dir =~ m{^var/} or $dir =~ m{^etc/});
 next if $pkg eq 'base-files';
diff --git a/checks/infofiles b/checks/infofiles
index 2c5b994..872a74b 100644
--- a/checks/infofiles
+++ b/checks/infofiles
@@ -40,7 +40,7 @@ my %missing_section;
 my $has_info_file;
 
 # Read package contents...
-foreach my $file (sort keys %{$info-index}) {
+foreach my $file (@{$info-sorted_index}) {
 my $index_info = $info-index-{$file};
 my $file_info = $info-file_info-{$file};
 my $link = $index_info-{link} || '';
diff --git a/checks/manpages b/checks/manpages
index 0035bda..17912d2 100644
--- a/checks/manpages
+++ b/checks/manpages
@@ -40,7 +40,7 @@ my %link;
 my %manpage;
 
 # Read package contents...
-foreach my $file (sort keys %{$info-index}) {
+foreach my $file (@{$info-sorted_index}) {
 my $index_info = $info-index-{$file};
 my $file_info = $info-file_info-{$file};
 my $link = $index_info-{link} || '';
diff --git a/checks/menu-format b/checks/menu-format
index bbb4bc8..4e85dae 100644
--- a/checks/menu-format
+++ b/checks/menu-format
@@ -396,7 +396,7 @@ closedir MENUDIR;
 
 # Find the desktop files in the package for verification.
 my @desktop_files;

Bug#605844: lintian: Consider pre-sorting keys %{$info-index}

2010-12-03 Thread Raphael Geissert
Hi Niels!

[OT but using the opportunity: congrats! :)]

Niels Thykier wrote:
 Interestingly enough this does not appear to be the current bottle
 neck of lintian (or I am doing something wrong in my patch).

Thanks for the analysis and the suggestion (don't know if Russ is going to 
beat me at merging the patch -- Hi Russ :)

From some of my benchmarks the biggest bottle neck is at data collection 
time, objdump and file-info being the two that take the longest to be able 
to run (dependencies) and to run. Maybe it would be worth adding some 
highres timestamps to the debugging messages, to get a better idea without 
running a complete profiler.

Cheers,
-- 
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net



-- 
To UNSUBSCRIBE, email to debian-lint-maint-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4cf994b9.0849960a.3494.8...@mx.google.com



Bug#605844: lintian: Consider pre-sorting keys %{$info-index}

2010-12-03 Thread Russ Allbery
Niels Thykier nthyk...@gmail.com writes:

 Interestingly enough this does not appear to be the current bottle neck
 of lintian (or I am doing something wrong in my patch).

It's probably a good change regardless, but sorting is generally very
fast.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



-- 
To UNSUBSCRIBE, email to debian-lint-maint-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87y686e4o3@windlord.stanford.edu