Bug#509935: decide whether Uploaders is parsed per RFC 5322
On Sun, Aug 03, 2014 at 04:51:28PM +1000, Stuart Prescott wrote: Control: block 686638 by 509935 Hi! I quite like Jakub's suggestion that we use /\\K\s*,\s*/ to split the list of Uploaders. It's very permissive and will suit our needs for this field but doesn't imply a large amount of overhead for parsers of the field or require parsers to deal with the full gamut of possibilities that the various RFCs would permit if we referenced only them. In practical terms, what is required now to wrap this up? I would say: - Review whether the current Uploaders fields are compliant with this. - Review whether tools that parse the Uploaders field does it in a safe way. - Actually write the proposal. Cheers, -- Bill. ballo...@debian.org Imagine a large red swirl here. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Control: block 686638 by 509935 Hi! I quite like Jakub's suggestion that we use /\\K\s*,\s*/ to split the list of Uploaders. It's very permissive and will suit our needs for this field but doesn't imply a large amount of overhead for parsers of the field or require parsers to deal with the full gamut of possibilities that the various RFCs would permit if we referenced only them. In practical terms, what is required now to wrap this up? (Knowing how Uploaders should be split would then allow us to expose functionality to do this in python-debian.) cheers Stuart -- Stuart Prescotthttp://www.nanonanonano.net/ stu...@nanonanonano.net Debian Developer http://www.debian.org/ stu...@debian.org GPG fingerprint90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
* Russ Allbery r...@debian.org, 2011-09-08, 19:09: I propose the following simple solution to this bug: - Let's forget about RFC 822/5322 compatibility, as it would introduce only needless complexity. - Let's allow any punctuation characters in maintainer names and e-mail addresses *except* and . This way comma is completely disambiguated: it splits the field if and only it's preceded by the character. I.e. you can use the following Perl regex to split the field: /\\K\s*,\s*/. Oh, hm, yeah, that would work. Currently, the only way to express such a name that works with our existing tools is to drop the comma, since several programs blindly split on commas when parsing the field. Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you know any other tools that parse Uploaders? dak, of course, but it sounds from your message like it's already doing the right thing. One think it doesn't do right is that it doesn't allow for space before the comma. (We have a few packages in the archive with , in the Uploaders field.) Should other (than space) whitespace characters be allowed before/after comma as well? The PTS and DDPO -- I'm not sure what gets that data into those systems. They have both their own, IMO over-engineered parsers of Sources files. PTS: def addresses_from_string(content): pattern = re.compile(([^]),) hacked_content = pattern.sub(\\1WEWANTNOCOMMAS, content) msg = email.message_from_string(Header: + hacked_content) hacked_list = email.Utils.getaddresses(msg.get_all(Header, [])) list = map(lambda p: map(lambda s:string.replace(s,WEWANTNOCOMMAS,,), p), hacked_list) return list Again, PTS trips on a space before comma. DDPO: my @uploaders = ($uploaders =~ /([^,@ ][^@]+@[^@]+)/g); $db{com:$package} = scalar @uploaders; foreach my $uploader (@uploaders) { my ($name, $mail); if ($uploader =~ /^\S+$/) { ($name, $mail) = ((unknown), $uploader); warn Uploader without name: $package $uploader; } else { $uploader =~ /(.+) (.+)/ or warn $fname:$.: syntax error in $uploader; ($name, $mail) = ($1, $2); $db{name:$mail} = $name; } $packages{$mail}-{$component}-{$package} = 1; } DDPO doesn't allow for leading comma or @ in the maintainer's name, but that's a minor nitpick. UDD? UDD uses Python's email.Utils.getaddresses(), so it will need fixing. -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Jakub Wilk jw...@debian.org writes: One think it doesn't do right is that it doesn't allow for space before the comma. (We have a few packages in the archive with , in the Uploaders field.) Should other (than space) whitespace characters be allowed before/after comma as well? The only other ones I can think of are newline and tab, which would be weird but which I believe is allowed by the syntax. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
* Russ Allbery r...@debian.org, 2008-12-27, 12:27: Policy currently says the following about the Maintainer field, which applies by reference to the Uploaders field: The package maintainer's name and email address. The name should come first, then the email address inside angle brackets (in RFC822 format). If the maintainer's name contains a full stop then the whole field will not work directly as an email address due to a misfeature in the syntax specified in RFC822; a program using this field as an address must check for this and correct the problem if necessary (for example by putting the name in round brackets and moving it to the end, and bringing the email address forward). Most software has taken this to mean that the e-mail address should be in RFC822 format, not that the whole field should be. This is primarily posing a problem for people who have commas in their name. The main example to date is Adam C. Powell, IV, but it can happen with various other name qualifiers and honorifics. I propose the following simple solution to this bug: - Let's forget about RFC 822/5322 compatibility, as it would introduce only needless complexity. - Let's allow any punctuation characters in maintainer names and e-mail addresses *except* and . This way comma is completely disambiguated: it splits the field if and only it's preceded by the character. I.e. you can use the following Perl regex to split the field: /\\K\s*,\s*/. One can easily check that this method does the right thing for parsing Uploaders fields of the existing packages: you could e.g. try this on ries: $ zcat /srv/ftp.debian.org/mirror/dists/*/*/source/Sources.gz | grep-dctrl -ns Maintainer,Uploaders -e '' | perl -pe 's/\\K\s*,\s*/\n/g' | sort -u Incidentally, this is (almost) the same method dak uses to split Uploaders: $ grep -r uploaders.*split daklib/ daklib/dbconn.py:for up in u.pkg.dsc[uploaders].replace(, , \t).split(\t): Currently, the only way to express such a name that works with our existing tools is to drop the comma, since several programs blindly split on commas when parsing the field. Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you know any other tools that parse Uploaders? -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Jakub Wilk jw...@debian.org writes: I propose the following simple solution to this bug: - Let's forget about RFC 822/5322 compatibility, as it would introduce only needless complexity. - Let's allow any punctuation characters in maintainer names and e-mail addresses *except* and . This way comma is completely disambiguated: it splits the field if and only it's preceded by the character. I.e. you can use the following Perl regex to split the field: /\\K\s*,\s*/. Oh, hm, yeah, that would work. Currently, the only way to express such a name that works with our existing tools is to drop the comma, since several programs blindly split on commas when parsing the field. Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you know any other tools that parse Uploaders? dak, of course, but it sounds from your message like it's already doing the right thing. The PTS and DDPO -- I'm not sure what gets that data into those systems. UDD? I think your solution sounds excellent. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
On Sun, Jan 18, 2009 at 06:24:46PM -0800, Russ Allbery wrote: Thank you for the concrete wording proposal! Clint Adams sch...@debian.org writes: While I think it would be fine to have a comprehensive and accurate specification, something like this could be an easy improvement. By omitting mention of RFC 822, the mandate for UTF-8 in the control file should obviate RFC 2047 encoding. Despite underspecifying things, I doubt there will be anyone trying to use email addresses of the wrong form. diff --git a/policy.sgml b/policy.sgml index 7de382d..080229c 100644 --- a/policy.sgml +++ b/policy.sgml @@ -2582,17 +2582,14 @@ Package: libc6 p The package maintainer's name and email address. The name should come first, then the email address inside angle - brackets ttlt;gt/tt (in RFC822 format). + brackets ttlt;gt/tt. /p We could say that the e-mail address must be an RFC 5322 addr-spec without obs-* rules so that we don't lose the restriction on what the e-mail address should be like. I wonder if we should also prohibit domain-literal. We allow it now, but there are no uses of it in the archive. p - If the maintainer's name contains a full stop then the - whole field will not work directly as an email address due - to a misfeature in the syntax specified in RFC822; a - program using this field as an address must check for this - and correct the problem if necessary (for example by - putting the name in round brackets and moving it to the - end, and bringing the email address forward). + If the maintainer's name contains a full stop or a comma, + the entire name must either be surrounded by quotation marks + or put within round brackets and moved it to the end + (thus bringing the email address forward). /p /sect1 We should say explicitly that the quotation marks are not part of the maintainer's name. Should we say something about whether the maintainer name can be quoted even if it doesn't contain a comma? I'd like to maintain the current allowance for not quoting the maintainer name even if it contains a full stop, despite the RFC 5322 requirement to quote addresses that contain full stops. Among other things, people who use initials in their maintainer names don't currently do the quoting and I don't really want to make those packages buggy. I think we can safely prohibit for our purposes the em...@address (Name) form. There are no occurrances of it in the archive. Whatever we say here we should probably also say in section 4.4 (the changelog specification). Maintainers should use the same form of the name and be able to do the same quoting in both places. While I can only agree on the technical ground of this proposal, I have quite a number of scripts (including popcon) that depend on the ability to extract the maintainer name from the Maintainer/Uploaders field. I suspect others developers and debian-qa might have others. Adding quotes around the maintainer name break the interface somehow. Using the full Maintainer field is often problematic because: 1) we might not want to display the email address. 2) we might want to merge entries from the same maintainer using different email adresses for different packages. (popcon go farther and check for different capitalization). So I would suggest we keep the format 'Name email' and forbid dot and commas. Developers that need them could use UTF-8 variants of those. Alternatively, debian-policy could spell out the correct regexp to extract the Maintainer name, but there will be a lot of scripts to update. Cheers, -- Bill. ballo...@debian.org Imagine a large red swirl here. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Bill Allombert bill.allomb...@math.u-bordeaux1.fr writes: While I can only agree on the technical ground of this proposal, I have quite a number of scripts (including popcon) that depend on the ability to extract the maintainer name from the Maintainer/Uploaders field. I suspect others developers and debian-qa might have others. Adding quotes around the maintainer name break the interface somehow. Using the full Maintainer field is often problematic because: 1) we might not want to display the email address. 2) we might want to merge entries from the same maintainer using different email adresses for different packages. (popcon go farther and check for different capitalization). So I would suggest we keep the format 'Name email' and forbid dot and commas. Developers that need them could use UTF-8 variants of those. Well, I really don't want to prohibit dots. We allow dots now and they don't pose any problems, other than the note in Policy that you need to put quotes around the name if you use it in an e-mail To: field (which presumably all of our software already deals with). Your point about not wanting to change software that parses the name is well-taken. I think, though, that if we say that you may only put double-quotes around the name if there is a comma in the name and otherwise the quotes should be omitted, that would minimize the problem. Only a handful of existing maintainers would be affected (namely those maintainers who are having trouble right now), so updating software wouldn't be that urgent. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
* Russ Allbery [Mon, 19 Jan 2009 12:10:55 -0800]: Bill Allombert bill.allomb...@math.u-bordeaux1.fr writes: While I can only agree on the technical ground of this proposal, I have quite a number of scripts (including popcon) that depend on the ability to extract the maintainer name from the Maintainer/Uploaders field. I suspect others developers and debian-qa might have others. Adding quotes around the maintainer name break the interface somehow. Using the full Maintainer field is often problematic because: 1) we might not want to display the email address. 2) we might want to merge entries from the same maintainer using different email adresses for different packages. (popcon go farther and check for different capitalization). So I would suggest we keep the format 'Name email' and forbid dot and commas. Developers that need them could use UTF-8 variants of those. Well, I really don't want to prohibit dots. We allow dots now and they don't pose any problems, other than the note in Policy that you need to put quotes around the name if you use it in an e-mail To: field (which presumably all of our software already deals with). I think dots should be allowed, yes, and be allowed unquoted. Your point about not wanting to change software that parses the name is well-taken. I think, though, that if we say that you may only put double-quotes around the name if there is a comma in the name and otherwise the quotes should be omitted, that would minimize the problem. Only a handful of existing maintainers would be affected (namely those maintainers who are having trouble right now), so updating software wouldn't be that urgent. I think we should *consider* do without commas at all, if losing them is something we could live with. I realize that would be annoying for people that have a comma in their name, so I'm not right away saying we should forbid them. But I really think we should consider it, because even if commas have to be quoted, you've already lost the ability to parse the Uploaders field with split /\s*,\s*/, which I think would be a loss, since that works for all other fields. (Oh, and if we do without commas, we should do without quoting as well IMHO.) Just my 2¢, -- Adeodato Simó dato at net.com.org.es Debian Developer adeodato at debian.org - Are you sure we're good? - Always. -- Rory and Lorelai -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Adeodato Simó d...@net.com.org.es writes: I think we should *consider* do without commas at all, if losing them is something we could live with. I realize that would be annoying for people that have a comma in their name, so I'm not right away saying we should forbid them. But I really think we should consider it, because even if commas have to be quoted, you've already lost the ability to parse the Uploaders field with split /\s*,\s*/, which I think would be a loss, since that works for all other fields. (Oh, and if we do without commas, we should do without quoting as well IMHO.) It would certainly make it easier for software. I have to admit to a personal bias (speaking as someone who goes by his middle name rather than his first name) in favor of fixing software to accurately recognize people's names rather than the other way around. I personally find software that refuses to recognize my name the way that I spell it to be quite obnoxious, so I'm sympathetic to people who have commas in their name. But yes, allowing commas, even quoted, does complicate Uploaders parsing quite a bit over the current simple state. Bill mentioned the possibility of a Unicode comma other than the ASCII comma. Does such a thing exist? It's kind of a hack, but it's also an interesting compromise. I'm not sure why there would be such a thing, though, given that there's a perfectly good comma in the ASCII range and Unicode normally doesn't duplicate code points to no purpose. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
On Mon, Jan 19, 2009 at 12:36:56PM -0800, Russ Allbery wrote: Bill mentioned the possibility of a Unicode comma other than the ASCII comma. Does such a thing exist? It's kind of a hack, but it's also an interesting compromise. I'm not sure why there would be such a thing, though, given that there's a perfectly good comma in the ASCII range and Unicode normally doesn't duplicate code points to no purpose. There are several other commas that have code points, but IMHO none of them would be an adequate fit for this given that the glyphs differ. The one with the closest glyph would be U+FE50 SMALL COMMA, but that appears to be a fullwidth character. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developerhttp://www.debian.org/ slanga...@ubuntu.com vor...@debian.org -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
On Mon, Jan 19, 2009 at 12:36:56PM -0800, Russ Allbery wrote: Adeodato Simó d...@net.com.org.es writes: I think we should *consider* do without commas at all, if losing them is something we could live with. I realize that would be annoying for people that have a comma in their name, so I'm not right away saying we should forbid them. But I really think we should consider it, because even if commas have to be quoted, you've already lost the ability to parse the Uploaders field with split /\s*,\s*/, which I think would be a loss, since that works for all other fields. (Oh, and if we do without commas, we should do without quoting as well IMHO.) It would certainly make it easier for software. I have to admit to a personal bias (speaking as someone who goes by his middle name rather than his first name) in favor of fixing software to accurately recognize people's names rather than the other way around. I personally find software that refuses to recognize my name the way that I spell it to be quite obnoxious, so I'm sympathetic to people who have commas in their name. But yes, allowing commas, even quoted, does complicate Uploaders parsing quite a bit over the current simple state. In any case, if commas are allowed, policy should spellout the correct regexp to parse the Uploaders field. Bill mentioned the possibility of a Unicode comma other than the ASCII comma. Does such a thing exist? It's kind of a hack, but it's also an interesting compromise. I'm not sure why there would be such a thing, though, given that there's a perfectly good comma in the ASCII range and Unicode normally doesn't duplicate code points to no purpose. I have the exact opposite experience with unicode :) U+FF0C FULLWIDTH COMMA should do the trick. Cheers, -- Bill. ballo...@debian.org Imagine a large red swirl here. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
On Wed, Jan 14, 2009 at 10:26:03PM -0800, Russ Allbery wrote: I'm leaning that way as well. I also don't want to require people to use RFC 2047 encoding if they have a name that doesn't fit into ASCII. Anyone have any suggestions on a good subset and description of it that isn't too complex? While I think it would be fine to have a comprehensive and accurate specification, something like this could be an easy improvement. By omitting mention of RFC 822, the mandate for UTF-8 in the control file should obviate RFC 2047 encoding. Despite underspecifying things, I doubt there will be anyone trying to use email addresses of the wrong form. diff --git a/policy.sgml b/policy.sgml index 7de382d..080229c 100644 --- a/policy.sgml +++ b/policy.sgml @@ -2582,17 +2582,14 @@ Package: libc6 p The package maintainer's name and email address. The name should come first, then the email address inside angle - brackets ttlt;gt/tt (in RFC822 format). + brackets ttlt;gt/tt. /p p - If the maintainer's name contains a full stop then the - whole field will not work directly as an email address due - to a misfeature in the syntax specified in RFC822; a - program using this field as an address must check for this - and correct the problem if necessary (for example by - putting the name in round brackets and moving it to the - end, and bringing the email address forward). + If the maintainer's name contains a full stop or a comma, + the entire name must either be surrounded by quotation marks + or put within round brackets and moved it to the end + (thus bringing the email address forward). /p /sect1 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Thank you for the concrete wording proposal! Clint Adams sch...@debian.org writes: While I think it would be fine to have a comprehensive and accurate specification, something like this could be an easy improvement. By omitting mention of RFC 822, the mandate for UTF-8 in the control file should obviate RFC 2047 encoding. Despite underspecifying things, I doubt there will be anyone trying to use email addresses of the wrong form. diff --git a/policy.sgml b/policy.sgml index 7de382d..080229c 100644 --- a/policy.sgml +++ b/policy.sgml @@ -2582,17 +2582,14 @@ Package: libc6 p The package maintainer's name and email address. The name should come first, then the email address inside angle - brackets ttlt;gt/tt (in RFC822 format). + brackets ttlt;gt/tt. /p We could say that the e-mail address must be an RFC 5322 addr-spec without obs-* rules so that we don't lose the restriction on what the e-mail address should be like. I wonder if we should also prohibit domain-literal. We allow it now, but there are no uses of it in the archive. p - If the maintainer's name contains a full stop then the - whole field will not work directly as an email address due - to a misfeature in the syntax specified in RFC822; a - program using this field as an address must check for this - and correct the problem if necessary (for example by - putting the name in round brackets and moving it to the - end, and bringing the email address forward). + If the maintainer's name contains a full stop or a comma, + the entire name must either be surrounded by quotation marks + or put within round brackets and moved it to the end + (thus bringing the email address forward). /p /sect1 We should say explicitly that the quotation marks are not part of the maintainer's name. Should we say something about whether the maintainer name can be quoted even if it doesn't contain a comma? I'd like to maintain the current allowance for not quoting the maintainer name even if it contains a full stop, despite the RFC 5322 requirement to quote addresses that contain full stops. Among other things, people who use initials in their maintainer names don't currently do the quoting and I don't really want to make those packages buggy. I think we can safely prohibit for our purposes the em...@address (Name) form. There are no occurrances of it in the archive. Whatever we say here we should probably also say in section 4.4 (the changelog specification). Maintainers should use the same form of the name and be able to do the same quoting in both places. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Russ Allbery wrote: Alternatively, we could document the permitted character set for the name portion of the Maintainer field and exclude commas. It's annoying to do this since commas have been supported in the past (in Maintainer, they're unambiguous) and have only become a problem in Uploaders. We could only restrict them in Uploaders, but the lack of symmetry strikes me as a bad idea. I think it is not polite to force changes in maintainer names. We could also standardize a simple escaping mechanism of our own (allow double quotes, for example, but require that, if used, they surround the entire name and are stripped off by the parsing). However we resolve this, we should probably also update the referece in Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really want to support source-routed e-mail addresses or similar bizarreness in Debian control files. Hmm, RFC5322 is not yet a standard (BTW it is not yet cited in STD1), and anyway it still use the old semantic for compatibility (see the obs- references, e.g. the section 4.4). IMHO we should specify a subset of RFC 822, because a full 5322 parse is IMO too complex (and BTW not so useful) to implement in all the tools. Ev. require to use only a subset in the control file, and to recommend a full 5322 parsing in the tools. ciao cate -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Giacomo A. Catenazzi c...@debian.org writes: Russ Allbery wrote: We could also standardize a simple escaping mechanism of our own (allow double quotes, for example, but require that, if used, they surround the entire name and are stripped off by the parsing). However we resolve this, we should probably also update the referece in Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really want to support source-routed e-mail addresses or similar bizarreness in Debian control files. Hmm, RFC5322 is not yet a standard (BTW it is not yet cited in STD1), This is true, but it's essentially meaningless. It's sort of an artifact of the IETF process, but RFC 822 is for practical purposes obsolete and RFC 5322 reflects the current state of addressing standards. and anyway it still use the old semantic for compatibility (see the obs- references, e.g. the section 4.4). True. We should explicitly rule that out. IMHO we should specify a subset of RFC 822, because a full 5322 parse is IMO too complex (and BTW not so useful) to implement in all the tools. Ev. require to use only a subset in the control file, and to recommend a full 5322 parsing in the tools. I'm leaning that way as well. I also don't want to require people to use RFC 2047 encoding if they have a name that doesn't fit into ASCII. Anyone have any suggestions on a good subset and description of it that isn't too complex? -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509935: decide whether Uploaders is parsed per RFC 5322
Package: debian-policy Version: 3.8.0.1 Severity: wishlist I think we've discussed this before, but I didn't see an open bug, so I'll open one so that we can discuss it in one place. Policy currently says the following about the Maintainer field, which applies by reference to the Uploaders field: The package maintainer's name and email address. The name should come first, then the email address inside angle brackets (in RFC822 format). If the maintainer's name contains a full stop then the whole field will not work directly as an email address due to a misfeature in the syntax specified in RFC822; a program using this field as an address must check for this and correct the problem if necessary (for example by putting the name in round brackets and moving it to the end, and bringing the email address forward). Most software has taken this to mean that the e-mail address should be in RFC822 format, not that the whole field should be. This is primarily posing a problem for people who have commas in their name. The main example to date is Adam C. Powell, IV, but it can happen with various other name qualifiers and honorifics. Currently, the only way to express such a name that works with our existing tools is to drop the comma, since several programs blindly split on commas when parsing the field. The most fully technically correct approach would be to require a full RFC 5322 parse, but that adds a lot of complexity and raises the problem that there's no standard canonicalization of RFC 5322 header fields. It becomes unclear whether one should strip off double quotes, remove blackslashes, remove portions in parentheses, or other things that would be logical to do from the RFC 5322 grammar. Alternatively, we could document the permitted character set for the name portion of the Maintainer field and exclude commas. It's annoying to do this since commas have been supported in the past (in Maintainer, they're unambiguous) and have only become a problem in Uploaders. We could only restrict them in Uploaders, but the lack of symmetry strikes me as a bad idea. We could also standardize a simple escaping mechanism of our own (allow double quotes, for example, but require that, if used, they surround the entire name and are stripped off by the parsing). However we resolve this, we should probably also update the referece in Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really want to support source-routed e-mail addresses or similar bizarreness in Debian control files. -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.26-1-686 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash debian-policy depends on no packages. debian-policy recommends no packages. Versions of packages debian-policy suggests: ii doc-base 0.8.18 utilities to manage online documen -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org