Bug#1064454: debian-policy: Restrict deb822 field names more

2024-02-22 Thread gregor herrmann
On Thu, 22 Feb 2024 14:08:38 +0100, Simon Josefsson wrote:

> Would it make sense to change this to use an inclusive list of permitted
> characters instead?  How about checking the field names that is in use
> today, and construct a regexp of permitted symbols out of that?
> Starting point: [A-Za-z][A-Za-z0-9-_]*

Right, also: defining this as a regexp would be helpful for
implementation in all parsers (and I suspect there are many of them
…).


Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   


signature.asc
Description: Digital Signature


Bug#1064454: debian-policy: Restrict deb822 field names more

2024-02-22 Thread Sam Hartman
> "Niels" == Niels Thykier  writes:

Niels> Simon Josefsson:
>> Would it make sense to change this to use an inclusive list of
>> permitted characters instead?  How about checking the field names
>> that is in use today, and construct a regexp of permitted symbols
>> out of that?  Starting point: [A-Za-z][A-Za-z0-9-_]*
>> 
>> /Simon

Niels> That is an option and would work for me.

I'd support something along these lines--an inclusive description rather
than an exclusive description.



Bug#1064454: debian-policy: Restrict deb822 field names more

2024-02-22 Thread Niels Thykier

Simon Josefsson:

Would it make sense to change this to use an inclusive list of permitted
characters instead?  How about checking the field names that is in use
today, and construct a regexp of permitted symbols out of that?
Starting point: [A-Za-z][A-Za-z0-9-_]*

/Simon


That is an option and would work for me.

Though, the starting point should include "_" as allowed for first 
character for the reasons listed in my opening email.


Best regards,
Niels



Bug#1064454: debian-policy: Restrict deb822 field names more

2024-02-22 Thread Simon Josefsson
Would it make sense to change this to use an inclusive list of permitted
characters instead?  How about checking the field names that is in use
today, and construct a regexp of permitted symbols out of that?
Starting point: [A-Za-z][A-Za-z0-9-_]*

/Simon


signature.asc
Description: PGP signature


Bug#1064454: debian-policy: Restrict deb822 field names more

2024-02-22 Thread Niels Thykier

Package: debian-policy
Severity: wishlist
X-Debbugs-Cc: ni...@thykier.net
X-Debbugs-Cc: lintian-ma...@debian.org
X-Debbugs-Cc: de...@lists.debian.org
X-Debbugs-Cc: debian-d...@lists.debian.org

Hi

I am hoping we can agree to some tighter bounds on the field names 
followed by relevant tools implementing the same restrictions in our 
next versions. Ideally all deb822 parsers should follow suit; I have 
CC'ed the major ones I could think of.
 I will submit a MR for the  python-debian's parser if a change is 
agreed, which would eventually cover dak.


My rationale for this request is as follows.


The policy says the following about a field name:



The field name is composed of US-ASCII characters excluding control characters, 
space, and colon (i.e., characters in the ranges U+0021 (!) through U+0039 (9), 
and U+003B (;) through U+007E (~), inclusive). Field names must not begin with 
the comment character (U+0023 #), nor with the hyphen character (U+002D -).



Which leads to the interesting case that:

```
Package: foo
Depends: bar,
${shlib:Depends}
```

is a *valid* deb822 paragraph with 3 fields. The last being named 
`${shlib` with the value `Depends}`. I spotted this while debugging the 
python-debian parser that did not react to what I thought was a clear 
syntax error.


While the parser is technically correct given the Policy description 
above, I cannot see any case where this is a desirable outcome. Instead, 
I think we should classify this as a syntax error such that users are 
instructed to indent `${shlib:Depends}` token.


A simple fix would be to forbid `$` at the start of the field. Though, I 
think at this point it would make sense to prune more special characters 
from the list like `%&{}[]()/\\<>|$` from anywhere in the field. Note 
that we definitely need to keep `_` as it is used in debconf template 
files for translations. Both single and double underscore is used here, 
though they always come first in the field name.


The reason why I want a tighter bound is that currently, the following 
things are also considered valid field names:



|foo(>=1:2.0),
foo(>=2:2.0),
,foo(>=3:2.0)
,foo(>=4:2.0)[amd64]
)': We can make inverted crying smileys as field names!
`Foo`: Now with markdown formatting of the field name!
$(foo): Can we trick something into running shell commands?
/etc/passwd: Maybe path traversal


In all cases, I see no value to allowing these absurd field names and 
only potential downsides.



Best regards,
Niels

PS: I doubt anything actually uses field names in an unsafe manner any 
more. But we do have some history with deb822 files.


But someone will eventually try to do this. We already received a bug 
report against debhelper long ago about being able to do path traversal 
via questionable package names. Largely, this was considered a non-issue 
because dpkg in its default configuration would abort and debhelper has 
a "run arbitrary code via the upstream build system" feature anyway.


Additionally, 10+ years ago. lintian would create a file per field of 
the file it scanned. Here, if it had used perl's "2-arg" open, you might 
have been able to do something "fun". I do not remember if it did and 
the exploit is a decade overdue even if.