Bug#509935: decide whether Uploaders is parsed per RFC 5322

2014-08-08 Thread Bill Allombert
On Sun, Aug 03, 2014 at 04:51:28PM +1000, Stuart Prescott wrote:
 Control: block 686638 by 509935
 
 Hi!
 
 I quite like Jakub's suggestion that we use /\\K\s*,\s*/ to split the list 
 of 
 Uploaders. It's very permissive and will suit our needs for this field but 
 doesn't imply a large amount of overhead for parsers of the field or require 
 parsers to deal with the full gamut of possibilities that the various RFCs 
 would permit if we referenced only them.
 
 In practical terms, what is required now to wrap this up? 

I would say:

- Review whether the current Uploaders fields are compliant with this.

- Review whether tools that parse the Uploaders field does it in a 
safe way.

- Actually write the proposal.

Cheers,
-- 
Bill. ballo...@debian.org

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2014-08-03 Thread Stuart Prescott
Control: block 686638 by 509935

Hi!

I quite like Jakub's suggestion that we use /\\K\s*,\s*/ to split the list of 
Uploaders. It's very permissive and will suit our needs for this field but 
doesn't imply a large amount of overhead for parsers of the field or require 
parsers to deal with the full gamut of possibilities that the various RFCs 
would permit if we referenced only them.

In practical terms, what is required now to wrap this up? 

(Knowing how Uploaders should be split would then allow us to expose 
functionality to do this in python-debian.)

cheers
Stuart

-- 
Stuart Prescotthttp://www.nanonanonano.net/   stu...@nanonanonano.net
Debian Developer   http://www.debian.org/ stu...@debian.org
GPG fingerprint90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2011-09-09 Thread Jakub Wilk

* Russ Allbery r...@debian.org, 2011-09-08, 19:09:

I propose the following simple solution to this bug:
- Let's forget about RFC 822/5322 compatibility, as it would introduce 
only needless complexity.
- Let's allow any punctuation characters in maintainer names and 
e-mail addresses *except*  and .


This way comma is completely disambiguated: it splits the field if and 
only it's preceded by the  character. I.e. you can use the 
following Perl regex to split the field: /\\K\s*,\s*/.


Oh, hm, yeah, that would work.

Currently, the only way to express such a name that works with our 
existing tools is to drop the comma, since several programs blindly 
split on commas when parsing the field.


Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do 
you know any other tools that parse Uploaders?


dak, of course, but it sounds from your message like it's already doing 
the right thing.


One think it doesn't do right is that it doesn't allow for space before 
the comma. (We have a few packages in the archive with  ,  in the 
Uploaders field.) Should other (than space) whitespace characters be 
allowed before/after comma as well?


The PTS and DDPO -- I'm not sure what gets that data into those 
systems.


They have both their own, IMO over-engineered parsers of Sources files.

PTS:

  def addresses_from_string(content):
  pattern = re.compile(([^]),)
  hacked_content = pattern.sub(\\1WEWANTNOCOMMAS, content)
  msg = email.message_from_string(Header:  + hacked_content)
  hacked_list = email.Utils.getaddresses(msg.get_all(Header, []))
  list = map(lambda p:
 map(lambda s:string.replace(s,WEWANTNOCOMMAS,,), p),
 hacked_list)
  return list

Again, PTS trips on a space before comma.

DDPO:

  my @uploaders = ($uploaders =~ /([^,@ ][^@]+@[^@]+)/g);
  $db{com:$package} = scalar @uploaders;
  foreach my $uploader (@uploaders) {
  my ($name, $mail);
  if ($uploader =~ /^\S+$/) {
  ($name, $mail) = ((unknown), $uploader);
  warn Uploader without name: $package $uploader;
  } else {
  $uploader =~ /(.+) (.+)/ or warn $fname:$.: syntax error in 
$uploader;
  ($name, $mail) = ($1, $2);
  $db{name:$mail} = $name;
  }
  $packages{$mail}-{$component}-{$package} = 1;
  }

DDPO doesn't allow for leading comma or @ in the maintainer's name, but 
that's a minor nitpick.



UDD?


UDD uses Python's email.Utils.getaddresses(), so it will need fixing.

--
Jakub Wilk



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2011-09-09 Thread Russ Allbery
Jakub Wilk jw...@debian.org writes:

 One think it doesn't do right is that it doesn't allow for space before
 the comma. (We have a few packages in the archive with  ,  in the
 Uploaders field.) Should other (than space) whitespace characters be
 allowed before/after comma as well?

The only other ones I can think of are newline and tab, which would be
weird but which I believe is allowed by the syntax.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2011-09-08 Thread Jakub Wilk

* Russ Allbery r...@debian.org, 2008-12-27, 12:27:

Policy currently says the following about the Maintainer field, which
applies by reference to the Uploaders field:

   The package maintainer's name and email address. The name should come
   first, then the email address inside angle brackets  (in RFC822
   format).

   If the maintainer's name contains a full stop then the whole field
   will not work directly as an email address due to a misfeature in the
   syntax specified in RFC822; a program using this field as an address
   must check for this and correct the problem if necessary (for example
   by putting the name in round brackets and moving it to the end, and
   bringing the email address forward).

Most software has taken this to mean that the e-mail address should be
in RFC822 format, not that the whole field should be.

This is primarily posing a problem for people who have commas in their
name.  The main example to date is Adam C. Powell, IV, but it can happen
with various other name qualifiers and honorifics.


I propose the following simple solution to this bug:
- Let's forget about RFC 822/5322 compatibility, as it would introduce 
only needless complexity.
- Let's allow any punctuation characters in maintainer names and e-mail 
addresses *except*  and .


This way comma is completely disambiguated: it splits the field if and 
only it's preceded by the  character. I.e. you can use the following 
Perl regex to split the field: /\\K\s*,\s*/.


One can easily check that this method does the right thing for parsing 
Uploaders fields of the existing packages: you could e.g. try this on 
ries:

$ zcat /srv/ftp.debian.org/mirror/dists/*/*/source/Sources.gz | grep-dctrl -ns 
Maintainer,Uploaders -e '' | perl -pe 's/\\K\s*,\s*/\n/g' | sort -u

Incidentally, this is (almost) the same method dak uses to split 
Uploaders:


$ grep -r uploaders.*split daklib/
daklib/dbconn.py:for up in u.pkg.dsc[uploaders].replace(, , 
\t).split(\t):

Currently, the only way to express such a name that works with our 
existing tools is to drop the comma, since several programs blindly 
split on commas when parsing the field.


Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you 
know any other tools that parse Uploaders?


--
Jakub Wilk



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2011-09-08 Thread Russ Allbery
Jakub Wilk jw...@debian.org writes:

 I propose the following simple solution to this bug:
 - Let's forget about RFC 822/5322 compatibility, as it would introduce
 only needless complexity.
 - Let's allow any punctuation characters in maintainer names and e-mail
 addresses *except*  and .

 This way comma is completely disambiguated: it splits the field if and
 only it's preceded by the  character. I.e. you can use the following
 Perl regex to split the field: /\\K\s*,\s*/.

Oh, hm, yeah, that would work.

 Currently, the only way to express such a name that works with our
 existing tools is to drop the comma, since several programs blindly
 split on commas when parsing the field.

 Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you
 know any other tools that parse Uploaders?

dak, of course, but it sounds from your message like it's already doing
the right thing.  The PTS and DDPO -- I'm not sure what gets that data
into those systems.  UDD?

I think your solution sounds excellent.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-19 Thread Bill Allombert
On Sun, Jan 18, 2009 at 06:24:46PM -0800, Russ Allbery wrote:
 Thank you for the concrete wording proposal!
 
 Clint Adams sch...@debian.org writes:
 
  While I think it would be fine to have a comprehensive and accurate
  specification, something like this could be an easy improvement.
 
  By omitting mention of RFC 822, the mandate for UTF-8 in the control
  file should obviate RFC 2047 encoding.
 
  Despite underspecifying things, I doubt there will be anyone trying
  to use email addresses of the wrong form.
 
  diff --git a/policy.sgml b/policy.sgml
  index 7de382d..080229c 100644
  --- a/policy.sgml
  +++ b/policy.sgml
  @@ -2582,17 +2582,14 @@ Package: libc6
p
  The package maintainer's name and email address.  The name
  should come first, then the email address inside angle
  -   brackets ttlt;gt/tt (in RFC822 format).
  +   brackets ttlt;gt/tt.
/p
 
 We could say that the e-mail address must be an RFC 5322 addr-spec without
 obs-* rules so that we don't lose the restriction on what the e-mail
 address should be like.
 
 I wonder if we should also prohibit domain-literal.  We allow it now, but
 there are no uses of it in the archive.
 
p
  -   If the maintainer's name contains a full stop then the
  -   whole field will not work directly as an email address due
  -   to a misfeature in the syntax specified in RFC822; a
  -   program using this field as an address must check for this
  -   and correct the problem if necessary (for example by
  -   putting the name in round brackets and moving it to the
  -   end, and bringing the email address forward).
  +   If the maintainer's name contains a full stop or a comma,
  +   the entire name must either be surrounded by quotation marks
  +   or put within round brackets and moved it to the end
  +   (thus bringing the email address forward).
/p
  /sect1
 
 We should say explicitly that the quotation marks are not part of the
 maintainer's name.  Should we say something about whether the maintainer
 name can be quoted even if it doesn't contain a comma?
 
 I'd like to maintain the current allowance for not quoting the maintainer
 name even if it contains a full stop, despite the RFC 5322 requirement to
 quote addresses that contain full stops.  Among other things, people who
 use initials in their maintainer names don't currently do the quoting and
 I don't really want to make those packages buggy.
 
 I think we can safely prohibit for our purposes the em...@address (Name)
 form.  There are no occurrances of it in the archive.
 
 Whatever we say here we should probably also say in section 4.4 (the
 changelog specification).  Maintainers should use the same form of the
 name and be able to do the same quoting in both places.

While I can only agree on the technical ground of this proposal, I have
quite a number of scripts (including popcon) that depend on the ability
to extract the maintainer name from the Maintainer/Uploaders field. I suspect
others developers and debian-qa might have others. 

Adding quotes around the maintainer name break the interface somehow.

Using the full Maintainer field is often problematic because:
1) we might not want to display the email address.
2) we might want to merge entries from the same maintainer using
different email adresses for different packages. (popcon go farther
and check for different capitalization).

So I would suggest we keep the format 'Name email' and forbid dot and
commas. Developers that need them could use UTF-8 variants of those.

Alternatively, debian-policy could spell out the correct regexp to
extract the Maintainer name, but there will be a lot of scripts to
update.

Cheers,
-- 
Bill. ballo...@debian.org

Imagine a large red swirl here. 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-19 Thread Russ Allbery
Bill Allombert bill.allomb...@math.u-bordeaux1.fr writes:

 While I can only agree on the technical ground of this proposal, I have
 quite a number of scripts (including popcon) that depend on the ability
 to extract the maintainer name from the Maintainer/Uploaders field. I
 suspect others developers and debian-qa might have others.

 Adding quotes around the maintainer name break the interface somehow.

 Using the full Maintainer field is often problematic because:
 1) we might not want to display the email address.
 2) we might want to merge entries from the same maintainer using
 different email adresses for different packages. (popcon go farther
 and check for different capitalization).

 So I would suggest we keep the format 'Name email' and forbid dot and
 commas. Developers that need them could use UTF-8 variants of those.

Well, I really don't want to prohibit dots.  We allow dots now and they
don't pose any problems, other than the note in Policy that you need to
put quotes around the name if you use it in an e-mail To: field (which
presumably all of our software already deals with).

Your point about not wanting to change software that parses the name is
well-taken.  I think, though, that if we say that you may only put
double-quotes around the name if there is a comma in the name and
otherwise the quotes should be omitted, that would minimize the problem.
Only a handful of existing maintainers would be affected (namely those
maintainers who are having trouble right now), so updating software
wouldn't be that urgent.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-19 Thread Adeodato Simó
* Russ Allbery [Mon, 19 Jan 2009 12:10:55 -0800]:

 Bill Allombert bill.allomb...@math.u-bordeaux1.fr writes:

  While I can only agree on the technical ground of this proposal, I have
  quite a number of scripts (including popcon) that depend on the ability
  to extract the maintainer name from the Maintainer/Uploaders field. I
  suspect others developers and debian-qa might have others.

  Adding quotes around the maintainer name break the interface somehow.

  Using the full Maintainer field is often problematic because:
  1) we might not want to display the email address.
  2) we might want to merge entries from the same maintainer using
  different email adresses for different packages. (popcon go farther
  and check for different capitalization).

  So I would suggest we keep the format 'Name email' and forbid dot and
  commas. Developers that need them could use UTF-8 variants of those.

 Well, I really don't want to prohibit dots.  We allow dots now and they
 don't pose any problems, other than the note in Policy that you need to
 put quotes around the name if you use it in an e-mail To: field (which
 presumably all of our software already deals with).

I think dots should be allowed, yes, and be allowed unquoted.

 Your point about not wanting to change software that parses the name is
 well-taken.  I think, though, that if we say that you may only put
 double-quotes around the name if there is a comma in the name and
 otherwise the quotes should be omitted, that would minimize the problem.
 Only a handful of existing maintainers would be affected (namely those
 maintainers who are having trouble right now), so updating software
 wouldn't be that urgent.

I think we should *consider* do without commas at all, if losing them is
something we could live with. I realize that would be annoying for
people that have a comma in their name, so I'm not right away saying we
should forbid them. But I really think we should consider it, because
even if commas have to be quoted, you've already lost the ability to
parse the Uploaders field with split /\s*,\s*/, which I think would be a
loss, since that works for all other fields.

(Oh, and if we do without commas, we should do without quoting as well
IMHO.)

Just my 2¢,

-- 
Adeodato Simó dato at net.com.org.es
Debian Developer  adeodato at debian.org
 
- Are you sure we're good?
- Always.
-- Rory and Lorelai




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-19 Thread Russ Allbery
Adeodato Simó d...@net.com.org.es writes:

 I think we should *consider* do without commas at all, if losing them is
 something we could live with. I realize that would be annoying for
 people that have a comma in their name, so I'm not right away saying we
 should forbid them. But I really think we should consider it, because
 even if commas have to be quoted, you've already lost the ability to
 parse the Uploaders field with split /\s*,\s*/, which I think would be a
 loss, since that works for all other fields.

 (Oh, and if we do without commas, we should do without quoting as well
 IMHO.)

It would certainly make it easier for software.

I have to admit to a personal bias (speaking as someone who goes by his
middle name rather than his first name) in favor of fixing software to
accurately recognize people's names rather than the other way around.  I
personally find software that refuses to recognize my name the way that I
spell it to be quite obnoxious, so I'm sympathetic to people who have
commas in their name.  But yes, allowing commas, even quoted, does
complicate Uploaders parsing quite a bit over the current simple state.

Bill mentioned the possibility of a Unicode comma other than the ASCII
comma.  Does such a thing exist?  It's kind of a hack, but it's also an
interesting compromise.  I'm not sure why there would be such a thing,
though, given that there's a perfectly good comma in the ASCII range and
Unicode normally doesn't duplicate code points to no purpose.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-19 Thread Steve Langasek
On Mon, Jan 19, 2009 at 12:36:56PM -0800, Russ Allbery wrote:
 Bill mentioned the possibility of a Unicode comma other than the ASCII
 comma.  Does such a thing exist?  It's kind of a hack, but it's also an
 interesting compromise.  I'm not sure why there would be such a thing,
 though, given that there's a perfectly good comma in the ASCII range and
 Unicode normally doesn't duplicate code points to no purpose.

There are several other commas that have code points, but IMHO none of them
would be an adequate fit for this given that the glyphs differ.

The one with the closest glyph would be U+FE50 SMALL COMMA, but that appears
to be a fullwidth character.

-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
Ubuntu Developerhttp://www.debian.org/
slanga...@ubuntu.com vor...@debian.org



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-19 Thread Bill Allombert
On Mon, Jan 19, 2009 at 12:36:56PM -0800, Russ Allbery wrote:
 Adeodato Simó d...@net.com.org.es writes:
 
  I think we should *consider* do without commas at all, if losing them is
  something we could live with. I realize that would be annoying for
  people that have a comma in their name, so I'm not right away saying we
  should forbid them. But I really think we should consider it, because
  even if commas have to be quoted, you've already lost the ability to
  parse the Uploaders field with split /\s*,\s*/, which I think would be a
  loss, since that works for all other fields.
 
  (Oh, and if we do without commas, we should do without quoting as well
  IMHO.)
 
 It would certainly make it easier for software.
 
 I have to admit to a personal bias (speaking as someone who goes by his
 middle name rather than his first name) in favor of fixing software to
 accurately recognize people's names rather than the other way around.  I
 personally find software that refuses to recognize my name the way that I
 spell it to be quite obnoxious, so I'm sympathetic to people who have
 commas in their name.  But yes, allowing commas, even quoted, does
 complicate Uploaders parsing quite a bit over the current simple state.

In any case, if commas are allowed, policy should spellout the
correct regexp to parse the Uploaders field.

 Bill mentioned the possibility of a Unicode comma other than the ASCII
 comma.  Does such a thing exist?  It's kind of a hack, but it's also an
 interesting compromise.  I'm not sure why there would be such a thing,
 though, given that there's a perfectly good comma in the ASCII range and
 Unicode normally doesn't duplicate code points to no purpose.

I have the exact opposite experience with unicode :)
U+FF0C FULLWIDTH COMMA should do the trick.

Cheers,
-- 
Bill. ballo...@debian.org

Imagine a large red swirl here. 



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-18 Thread Clint Adams
On Wed, Jan 14, 2009 at 10:26:03PM -0800, Russ Allbery wrote:
 I'm leaning that way as well.  I also don't want to require people to use
 RFC 2047 encoding if they have a name that doesn't fit into ASCII.
 
 Anyone have any suggestions on a good subset and description of it that
 isn't too complex?

While I think it would be fine to have a comprehensive and accurate
specification, something like this could be an easy improvement.

By omitting mention of RFC 822, the mandate for UTF-8 in the control
file should obviate RFC 2047 encoding.

Despite underspecifying things, I doubt there will be anyone trying
to use email addresses of the wrong form.

diff --git a/policy.sgml b/policy.sgml
index 7de382d..080229c 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -2582,17 +2582,14 @@ Package: libc6
  p
The package maintainer's name and email address.  The name
should come first, then the email address inside angle
-   brackets ttlt;gt/tt (in RFC822 format).
+   brackets ttlt;gt/tt.
  /p
 
  p
-   If the maintainer's name contains a full stop then the
-   whole field will not work directly as an email address due
-   to a misfeature in the syntax specified in RFC822; a
-   program using this field as an address must check for this
-   and correct the problem if necessary (for example by
-   putting the name in round brackets and moving it to the
-   end, and bringing the email address forward).
+   If the maintainer's name contains a full stop or a comma,
+   the entire name must either be surrounded by quotation marks
+   or put within round brackets and moved it to the end
+   (thus bringing the email address forward).
  /p
/sect1
 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-18 Thread Russ Allbery
Thank you for the concrete wording proposal!

Clint Adams sch...@debian.org writes:

 While I think it would be fine to have a comprehensive and accurate
 specification, something like this could be an easy improvement.

 By omitting mention of RFC 822, the mandate for UTF-8 in the control
 file should obviate RFC 2047 encoding.

 Despite underspecifying things, I doubt there will be anyone trying
 to use email addresses of the wrong form.

 diff --git a/policy.sgml b/policy.sgml
 index 7de382d..080229c 100644
 --- a/policy.sgml
 +++ b/policy.sgml
 @@ -2582,17 +2582,14 @@ Package: libc6
 p
   The package maintainer's name and email address.  The name
   should come first, then the email address inside angle
 - brackets ttlt;gt/tt (in RFC822 format).
 + brackets ttlt;gt/tt.
 /p

We could say that the e-mail address must be an RFC 5322 addr-spec without
obs-* rules so that we don't lose the restriction on what the e-mail
address should be like.

I wonder if we should also prohibit domain-literal.  We allow it now, but
there are no uses of it in the archive.

 p
 - If the maintainer's name contains a full stop then the
 - whole field will not work directly as an email address due
 - to a misfeature in the syntax specified in RFC822; a
 - program using this field as an address must check for this
 - and correct the problem if necessary (for example by
 - putting the name in round brackets and moving it to the
 - end, and bringing the email address forward).
 + If the maintainer's name contains a full stop or a comma,
 + the entire name must either be surrounded by quotation marks
 + or put within round brackets and moved it to the end
 + (thus bringing the email address forward).
 /p
   /sect1

We should say explicitly that the quotation marks are not part of the
maintainer's name.  Should we say something about whether the maintainer
name can be quoted even if it doesn't contain a comma?

I'd like to maintain the current allowance for not quoting the maintainer
name even if it contains a full stop, despite the RFC 5322 requirement to
quote addresses that contain full stops.  Among other things, people who
use initials in their maintainer names don't currently do the quoting and
I don't really want to make those packages buggy.

I think we can safely prohibit for our purposes the em...@address (Name)
form.  There are no occurrances of it in the archive.

Whatever we say here we should probably also say in section 4.4 (the
changelog specification).  Maintainers should use the same form of the
name and be able to do the same quoting in both places.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-14 Thread Giacomo A. Catenazzi

Russ Allbery wrote:
  Alternatively, we could document the permitted character set for the name

portion of the Maintainer field and exclude commas.  It's annoying to do
this since commas have been supported in the past (in Maintainer, they're
unambiguous) and have only become a problem in Uploaders.  We could only
restrict them in Uploaders, but the lack of symmetry strikes me as a bad
idea.


I think it is not polite to force changes in maintainer names.



We could also standardize a simple escaping mechanism of our own (allow
double quotes, for example, but require that, if used, they surround the
entire name and are stripped off by the parsing).

However we resolve this, we should probably also update the referece in
Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really
want to support source-routed e-mail addresses or similar bizarreness in
Debian control files.


Hmm, RFC5322 is not yet a standard (BTW it is not yet cited in STD1),
and anyway it still use the old semantic for compatibility (see the
obs- references, e.g. the section 4.4).

IMHO we should specify a subset of RFC 822, because a full 5322 parse
is IMO too complex (and BTW not so useful) to implement in all the
tools.  Ev. require to use only a subset in the control file, and
to recommend a full 5322 parsing in the tools.

ciao
cate



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2009-01-14 Thread Russ Allbery
Giacomo A. Catenazzi c...@debian.org writes:
 Russ Allbery wrote:

 We could also standardize a simple escaping mechanism of our own (allow
 double quotes, for example, but require that, if used, they surround
 the entire name and are stripped off by the parsing).

 However we resolve this, we should probably also update the referece in
 Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really
 want to support source-routed e-mail addresses or similar bizarreness
 in Debian control files.

 Hmm, RFC5322 is not yet a standard (BTW it is not yet cited in STD1),

This is true, but it's essentially meaningless.  It's sort of an artifact
of the IETF process, but RFC 822 is for practical purposes obsolete and
RFC 5322 reflects the current state of addressing standards.

 and anyway it still use the old semantic for compatibility (see the
 obs- references, e.g. the section 4.4).

True.  We should explicitly rule that out.

 IMHO we should specify a subset of RFC 822, because a full 5322 parse is
 IMO too complex (and BTW not so useful) to implement in all the tools.
 Ev. require to use only a subset in the control file, and to recommend a
 full 5322 parsing in the tools.

I'm leaning that way as well.  I also don't want to require people to use
RFC 2047 encoding if they have a name that doesn't fit into ASCII.

Anyone have any suggestions on a good subset and description of it that
isn't too complex?

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509935: decide whether Uploaders is parsed per RFC 5322

2008-12-27 Thread Russ Allbery
Package: debian-policy
Version: 3.8.0.1
Severity: wishlist

I think we've discussed this before, but I didn't see an open bug, so
I'll open one so that we can discuss it in one place.

Policy currently says the following about the Maintainer field, which
applies by reference to the Uploaders field:

The package maintainer's name and email address. The name should come
first, then the email address inside angle brackets  (in RFC822
format).

If the maintainer's name contains a full stop then the whole field
will not work directly as an email address due to a misfeature in the
syntax specified in RFC822; a program using this field as an address
must check for this and correct the problem if necessary (for example
by putting the name in round brackets and moving it to the end, and
bringing the email address forward).

Most software has taken this to mean that the e-mail address should be
in RFC822 format, not that the whole field should be.

This is primarily posing a problem for people who have commas in their
name.  The main example to date is Adam C. Powell, IV, but it can happen
with various other name qualifiers and honorifics.  Currently, the only
way to express such a name that works with our existing tools is to drop
the comma, since several programs blindly split on commas when parsing the
field.

The most fully technically correct approach would be to require a full
RFC 5322 parse, but that adds a lot of complexity and raises the problem
that there's no standard canonicalization of RFC 5322 header fields.  It
becomes unclear whether one should strip off double quotes, remove
blackslashes, remove portions in parentheses, or other things that would
be logical to do from the RFC 5322 grammar.

Alternatively, we could document the permitted character set for the name
portion of the Maintainer field and exclude commas.  It's annoying to do
this since commas have been supported in the past (in Maintainer, they're
unambiguous) and have only become a problem in Uploaders.  We could only
restrict them in Uploaders, but the lack of symmetry strikes me as a bad
idea.

We could also standardize a simple escaping mechanism of our own (allow
double quotes, for example, but require that, if used, they surround the
entire name and are stripped off by the parsing).

However we resolve this, we should probably also update the referece in
Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really
want to support source-routed e-mail addresses or similar bizarreness in
Debian control files.

-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

debian-policy depends on no packages.

debian-policy recommends no packages.

Versions of packages debian-policy suggests:
ii  doc-base  0.8.18 utilities to manage online documen

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org