Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries

2020-07-16 Thread Russ Allbery
Sergio Durigan Junior  writes:

> So, I think I found the problem, and I have a possible solution.
> Apparently, some manpages have malformed .TH headers, and Perl's
> Text::ParseWords::parse_line doesn't cope well with them.  For example,
> kde-dev-scripts's /usr/share/man/ca/man1/create_makefiles.1.gz file has:

> .TH "\FBCREATE_MAKEFILES\" "1" "8 de mar\(,c del 2003" "[FIXME: source]" 
> "[FIXME: manual]"

For the record, groff is also unhappy with that line because of the
backslash before the " after CREATE_MAKEFILES.  I'm a bit surprised that
it doesn't produce a warning, but it renders the name as
CREATE_MAKEFILES() (note the lack of section number) and gives up on
trying to display the other parts of the .TH macro.

That man page deserves some sort of Lintian tag, since the .TH line isn't
really valid.  It would be ideal if the tag were specific to the
underlying problem, but I would consider it reasonable to complain that
one cannot extract the section from that .TH line since groff cannot
either.

-- 
Russ Allbery (r...@debian.org)  



Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries

2020-07-08 Thread Sergio Durigan Junior
On Wednesday, July 08 2020, Felix Lechner wrote:

> On Wed, Jul 8, 2020 at 7:41 PM Sergio Durigan Junior
>  wrote:
>>
>> I honestly don't feel like spending
>> too much time investigating Perl's internals
>
> What does it have to do with Perl's internals, please?

Well, according to ParseWords's documentation:

  https://perldoc.perl.org/Text/ParseWords.html

parse_line should take a line ($line, in our case), a delimeter (\s+, in
our case), and split the line into tokens respecting the delimeters.  It
is obviously not working the way it should when there is a backslash in
the middle of the line, which is strange.  I'm not saying there is a bug
in the implementation, but, in this specific scenario, the documentation
doesn't reflect what really happens (that is, the function doesn't
return the expected tokens).

>> What do you think?
>
> Would Text::Balanced help extract the quoted strings, and work from there?

I don't know.  As I said, Perl is not my specialty.  But the problem
here is not just removing the backslash; the real problem is that the
title of the manpage (the first element in the .TH header) contains an
\FB in the beginning, which seems to be wrong (there is no \fR in the
line, and man doesn't render the title in bold, ignoring the \fB).  Not
mentioning the backslash at the end of the title, which (erroneously)
escapes the double quote.

I don't have the time right now to play a bit with Text::Balanced and
see what comes out of it, so if you're not satisfied with the solution I
proposed, my other suggestion would be to revert the original patch (and
reintroduce the bug I fixed with it, of course) until we can figure out
how to tackle this issue.

Cheers,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
https://sergiodj.net/


signature.asc
Description: PGP signature


Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries

2020-07-08 Thread Felix Lechner
Hi

On Wed, Jul 8, 2020 at 7:41 PM Sergio Durigan Junior
 wrote:
>
> I honestly don't feel like spending
> too much time investigating Perl's internals

What does it have to do with Perl's internals, please?

> What do you think?

Would Text::Balanced help extract the quoted strings, and work from there?

Kind regards
Felix Lechner



Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries

2020-07-08 Thread Sergio Durigan Junior
On Wednesday, July 08 2020, Felix Lechner wrote:

> Hi Sergio,

Hey Felix,

> On Wed, Jun 10, 2020 at 9:57 AM Sergio Durigan Junior
>  wrote:
>>
>> after calling Text::ParseWords::parse_line, we check to
>> see if the first package name has a comma as the last char.  If it does,
>> then we assume that there will be at least one other package name
>> listed, and advance an index.  When we reach a package name whose last
>> char is not a comma, we then assume that the next field is the manpage
>> section number.
>
> Something in that patch is not quite working. I previously added a
> safeguard for an undefined value warning, but that was not enough:
>
> 
> https://salsa.debian.org/lintian/lintian/-/commit/d3c64d8ab40de6e38c96334e2515550df1957a4a
>
> In an archive-wide run, the modified patch still produced the warnings
> below. I show the complete list for the record, and not to intimidate
> anyone. It's no big deal.
>
> You may want to check out kde-dev-scripts, which generated a lot of warnings.

Ouch, thanks for the report, and sorry about the breakage.  My Perl-foo
is very limited.

So, I think I found the problem, and I have a possible solution.
Apparently, some manpages have malformed .TH headers, and Perl's
Text::ParseWords::parse_line doesn't cope well with them.  For example,
kde-dev-scripts's /usr/share/man/ca/man1/create_makefiles.1.gz file has:

.TH "\FBCREATE_MAKEFILES\" "1" "8 de mar\(,c del 2003" "[FIXME: source]" 
"[FIXME: manual]"

Not pretty, huh?  If we make simple Perl program to try to parse this
line:

  use Text::ParseWords;
  @words = parse_line('\s+', 0, q{.TH "\FBCREATE_MAKEFILES\" "1" "8 de mar\(,c 
del 2003" "[FIXME: source]" "[FIXME: manual]"});
  $i = 0;
  foreach (@words) {
  print "$i: <$_>\n";
  $i++;
  }

and run it, you will noticed that it doesn't return anything!  Now, if
we tweak the line a little bit, by removing some of the backslashes for
example, we start getting somewhere:

  ...
  @words = parse_line('\s+', 0, q{.TH "FBCREATE_MAKEFILES" "1" "8 de mar\(,c 
del 2003" "[FIXME: source]" "[FIXME: manual]"});
  ...

Now run it:

  $ perl parseline.pl 
  0: <.TH>
  1: 
  2: <1>
  3: <8 de mar(,c del 2003>
  4: <[FIXME: source]>
  5: <[FIXME: manual]>

So yeah, there's a problem here.  I honestly don't feel like spending
too much time investigating Perl's internals, so I think it's possible
to detect when parse_line failed and act accordingly.

I'm attaching a patch that does just that, and prevents the
warnings/failures from happening.  The idea is to check whether the size
of the @th_fields array is bigger than 0, and just perform the checks if
they are.  I also took the liberty to remove the // EMPTY part, because
it shouldn't be necessary anymore.

What do you think?

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
https://sergiodj.net/

diff --git a/checks/documentation/manual.pm b/checks/documentation/manual.pm
index b30bf6081..350dee927 100644
--- a/checks/documentation/manual.pm
+++ b/checks/documentation/manual.pm
@@ -297,21 +297,23 @@ sub files {
 next if $line =~ /^\.\\\"/; # comments .\"
 if ($line =~ /^\.TH\s/) { # header
 my @th_fields= Text::ParseWords::parse_line('\s+', 0, $line);
-my $pkgname_idx = 1;
-# Iterate over possible package names.  If there is
-# more than one, they will be separated by a comma and
-# a whitespace.  In case we find the comma, we advance
-# $pkgname_idx.
-while ((substr($th_fields[$pkgname_idx], -1) // EMPTY) eq ','){
-$pkgname_idx++;
-}
-# We're now at the last package, so we should be able
-# to obtain the manpage section number by incrementing
-# 1 to the index.
-my $th_section = $th_fields[++$pkgname_idx];
-if ($th_section && (lc($fn_section) ne lc($th_section))) {
-$self->tag('wrong-manual-section',
-"$file:$lc $fn_section != $th_section");
+if ($#th_fields > 0) {
+my $pkgname_idx = 1;
+# Iterate over possible package names.  If there is
+# more than one, they will be separated by a comma and
+# a whitespace.  In case we find the comma, we advance
+# $pkgname_idx.
+while ((substr($th_fields[$pkgname_idx], -1)) eq ','){
+$pkgname_idx++;
+}
+# We're now at the last package, so we should be able
+# to obtain the manpage section number by incrementing
+# 1 to the index.
+my $th_section = $th_fields[++$pkgname_idx];
+if ($th_section && (lc($fn_section) ne lc($th_section))) {
+

Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries

2020-07-08 Thread Felix Lechner
Control: reopen -1

Hi Sergio,

On Wed, Jun 10, 2020 at 9:57 AM Sergio Durigan Junior
 wrote:
>
> after calling Text::ParseWords::parse_line, we check to
> see if the first package name has a comma as the last char.  If it does,
> then we assume that there will be at least one other package name
> listed, and advance an index.  When we reach a package name whose last
> char is not a comma, we then assume that the next field is the manpage
> section number.

Something in that patch is not quite working. I previously added a
safeguard for an undefined value warning, but that was not enough:


https://salsa.debian.org/lintian/lintian/-/commit/d3c64d8ab40de6e38c96334e2515550df1957a4a

In an archive-wide run, the modified patch still produced the warnings
below. I show the complete list for the record, and not to intimidate
anyone. It's no big deal.

You may want to check out kde-dev-scripts, which generated a lot of warnings.

Kind regards,
Felix Lechner

* * *

Warning in group allegro5/2:5.2.6.0-2: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group babeltrace2/2.0.3-2: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group blender/2.82.a+dfsg-1: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group blender/2.83.1+dfsg-2: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group checkit-tiff/0.2.3-2: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group claws-mail/3.17.5-2: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group comptext/1.0.1-4: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group comptty/1.0.1-4: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group dballe/8.6-1: Use of uninitialized value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group delay/1.0-5: Use of uninitialized value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group derivations/0.56.20180123.1-2: Use of uninitialized
value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group gadmin-proftpd/1:0.4.2-1: Use of uninitialized value
in substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm
line 305.
Warning in group git/1:2.27.0-1: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group git/1:2.27.0+next.20200625-1: Use of uninitialized
value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group git-repair/1.20200102-1: Use of uninitialized value
$th_fields[1] in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group gmic/2.4.5-1.1: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group guymager/0.8.12-1: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group kde-dev-scripts/4:18.08.0-1: Use of uninitialized
value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group libapache2-mod-qos/11.63-1: Use of uninitialized
value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group lavacli/1.0-1: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group lava/2020.06-2: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group lirc/0.10.1-6.2: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group lksctp-tools/1.0.18+dfsg-1: Use of uninitialized
value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group lxqt-config/0.14.1-4: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group manpages/5.07-1: Use of uninitialized value in substr
at /lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group manpages-ja/0.5.0.0.20180315+dfsg-1: Use of
uninitialized value in substr at
/lcl/lechner/lintian/git/checks/documentation/manual.pm line 305.
Warning in group manpages-zh/1.6.3.4-1: Use of uninitialized value in
substr at /lcl/lechner/lintian/git/checks/documentation/manual.pm line
305.
Warning in group mariadb-10.3/1:10.3.23-1: Use of uninitialized value
in substr at 

Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries

2020-06-10 Thread Sergio Durigan Junior
Package: lintian
Version: 2.80.0
Severity: normal
Tags: patch

Hi,

I found a lintian limitation while working with a manpage that refers to
two different binaries.  In this scenario, lintian will mistakenly
consider the name of the second binary as the man section number, and
will issue a "manpage-section-mismatch" warning.

Consider a manpage whose header looks like:

  .TH MOUNT.CIFS, MOUNT.SMB3 8 "" "" ""
  .SH NAME
  mount.cifs, mount.smb3 \- mount using the Common Internet File System (CIFS)
  .
  .nr rst2man-indent-level 0

When we run lintian, we will see:

  W: cifs-utils: manpage-section-mismatch usr/share/man/man8/mount.cifs.8.gz:3 
8 != MOUNT.SMB3

However, according to lexgrog(1) and other sources, it is possible to
specify multiple programs in the .TH directive, as long as they are
separated by a comma and a whitespace, which is the case here.

I'd like to propose the following patch to address the problem.  The
idea is simple: after calling Text::ParseWords::parse_line, we check to
see if the first package name has a comma as the last char.  If it does,
then we assume that there will be at least one other package name
listed, and advance an index.  When we reach a package name whose last
char is not a comma, we then assume that the next field is the manpage
section number.

I tested with the problematic manpage I have here, and it works.  I also
tested with other non-problematic manpages, and they still work as well.

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
https://sergiodj.net/

From d8cbe8d82733178589c30adff2335b37b26b3c8a Mon Sep 17 00:00:00 2001
From: Sergio Durigan Junior 
Date: Wed, 10 Jun 2020 12:49:53 -0400
Subject: [PATCH] Adjust manpage-section-mismatch to accept manpages referring
 to more than one program

---
 checks/documentation/man.pm | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/checks/documentation/man.pm b/checks/documentation/man.pm
index 8092e023d..371552b6f 100644
--- a/checks/documentation/man.pm
+++ b/checks/documentation/man.pm
@@ -296,8 +296,20 @@ sub files {
 chomp $line;
 next if $line =~ /^\.\\\"/; # comments .\"
 if ($line =~ /^\.TH\s/) { # header
-my (undef, undef, $th_section, undef)
+my @th_fields
   = Text::ParseWords::parse_line('\s+', 0, $line);
+my $pkgname_idx = 1;
+# Iterate over possible package names.  If there is
+# more than one, they will be separated by a comma and
+# a whitespace.  In case we find the comma, we advance
+# $pkgname_idx.
+while (substr($th_fields[$pkgname_idx], -1) eq ',') {
+$pkgname_idx++;
+}
+# We're now at the last package, so we should be able
+# to obtain the manpage section number by incrementing
+# 1 to the index.
+my $th_section = $th_fields[++$pkgname_idx];
 if ($th_section && (lc($fn_section) ne lc($th_section))) {
 $self->tag('manpage-section-mismatch',
 "$file:$lc $fn_section != $th_section");
-- 
2.26.2



signature.asc
Description: PGP signature