Re: [dmarc-ietf] Formal specification, URI
On Tue, Mar 17, 2015 at 8:23 AM, Alessandro Vesely ves...@tana.it wrote: Right, which is why things like semi-colon don't need to be percent-encoded; they're already special characters in the context of a URL. So are comma and exclamation. What puzzles me is that DMARC spec treats them differently while RFC3896 does not. Comma and semicolon seem to behave the same; e.g.: Ah, that's true. I was looking specifically at one and not all three. At this point, since RFC7489 has just been published, I suggest you choose (perhaps with direction from the co-chairs) how to record this discrepancy for handling when the base draft gets updated. You could open an erratum report, or you could add it to the WG's tracker, or maybe they have another suggestion. They aren't formally imported, and I'm not sure that's necessary here. The ABNF we have should be comprehensive over DMARC tag-value sets. The prose you cited is merely meant to convey that they follow the same style. Right. The question is if implementations can reuse DKIM parsers. Without studying the ABNF again, I believe so. DKIM parsers separate tag-value entities at unencoded semicolons, after which the tag name and tag value are separated at unencoded equal signs. DMARC records are the same up to that point, and it's below there for ruf and rua in the DMARC case that things get interesting. Just like in the case of b= for DKIM, those two have special rules for value interpretation: make up a list of URIs using an unencoded comma as the separator. Your question is Are they equivalent? I believe they are. Although it might be ideal to have a specification so tight that there's exactly one way to do something, in the end I don't think it's harmful to have two ways to say the same thing. It's more of a concern if there's to ways to interpret a single thing; that's when we arguably have something to fix. I tried the NOT RECOMMENDED syntax quoted above. Dmarcian[1] doesn't raise a brow, and RFC3896-compliant uriparser[2] ingests it smoothly. However, although I sent a test message to a gmail account, I received no report. I guess Google's implementation doesn't deploy a proper URI parser, but just looks for mailto:; followed by a plain path consisting of a single[3] addr-spec (as defined in RFC6068, i.e. w/o comments) with no query nor fragment --that's what I'd do myself, but I find no arguments in the spec that help proving that that record is bad. I think we've wandered into implementation comparisons rather than getting the ABNF right in the specification. Or maybe a better way to say that is: I don't think fixing the discrepancy you've raised would resolve the disparity you're observing. The spec says a report is normally sent to each. How can a publisher express that two URIs are meant to be either-or alternatives to each other? Is that a capability you require? I don't think that's a use case I've ever encountered. It may also be worth to require domain in addr-spec to be A-label, as that simplifies verification and improves dn_ compression. Such idea apparently conflicts with the example at the end of Section 6.3 of RFC6068, where the IDN is percent-encoded instead. That is a completely different topic, something that should be taken up when we do a standards track version. -MSK ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On Mon 16/Mar/2015 20:22:31 +0100 Murray S. Kucherawy wrote: On Mon, Mar 16, 2015 at 3:51 AM, Alessandro Vesely ves...@tana.it wrote: Section 2.2 of RFC3986 lists semi-colon as a reserved character that has to be percent-encoded in these URLs. We don't need to repeat it here, I think. If the spec is going to be read by ignorants like me, it's better to repeat than to omit. RFC3986 has a very wide scope, and uses phrases like may (or may not) be defined as delimiters. It says: If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed. Right, which is why things like semi-colon don't need to be percent-encoded; they're already special characters in the context of a URL. So are comma and exclamation. What puzzles me is that DMARC spec treats them differently while RFC3896 does not. Comma and semicolon seem to behave the same; e.g.: http://www.tana.it/comma,comma.txt http://www.tana.it/comma%2ccomma.txt http://www.tana.it/comma%25%32%63comma.txt http://www.tana.it/semicolon;semicolon.txt http://www.tana.it/semicolon%3bsemicolon.txt http://www.tana.it/semicolon%25%33%62semicolon.txt Commma and exclamation (which are sub-delims like semicolon) are apparently used in dmarc-uri's rule. The preceding DMARC section says: DMARC records follow the extensible tag-value syntax for DNS-based key records defined in DKIM [DKIM]. However, DKIM production rules don't seem to be formally imported. If they are imported, semicolon exclusion is implied by the definition: VALCHAR = %x21-3A / %x3C-7E ; EXCLAMATION to TILDE except SEMICOLON They aren't formally imported, and I'm not sure that's necessary here. The ABNF we have should be comprehensive over DMARC tag-value sets. The prose you cited is merely meant to convey that they follow the same style. Right. The question is if implementations can reuse DKIM parsers. How about the other two questions? I didn't survey but a few DMARC records, but RFC6068 exemplifies the following: Also note that it is syntactically valid to specify both to and an hfname whose value is to. That is, mailto:addr1@an.example,addr2@an.example is equivalent to mailto:?to=addr1@an.example,addr2@an.example is equivalent to mailto:addr1@an.example?to=addr2@an.example However, the latter form is NOT RECOMMENDED because different user agents handle this case differently. In particular, some existing clients ignore to hfvalues. Yahoo instead uses 1st level syntax: rua=mailto:dmarc-yahoo-...@yahoo-inc.com, mailto:dmarc_y_...@yahoo.com; Your question is Are they equivalent? I believe they are. Although it might be ideal to have a specification so tight that there's exactly one way to do something, in the end I don't think it's harmful to have two ways to say the same thing. It's more of a concern if there's to ways to interpret a single thing; that's when we arguably have something to fix. I tried the NOT RECOMMENDED syntax quoted above. Dmarcian[1] doesn't raise a brow, and RFC3896-compliant uriparser[2] ingests it smoothly. However, although I sent a test message to a gmail account, I received no report. I guess Google's implementation doesn't deploy a proper URI parser, but just looks for mailto:; followed by a plain path consisting of a single[3] addr-spec (as defined in RFC6068, i.e. w/o comments) with no query nor fragment --that's what I'd do myself, but I find no arguments in the spec that help proving that that record is bad. [1] https://dmarcian.com/dmarc-inspector/torreinpietra.it [2] http://uriparser.sourceforge.net/doc/html/ [3] haven't yet tried two %2c-separated addr-specs. The goal in allowing a comma-separated list of URLs is that you might conceivably want to put an http and a mailto URL in there, in the try A first, then try B sense. We need to allow for that possibility. We also need to account for the possibility of a comma that is inside of a URL; those are the ones that need to be encoded. Outside of a URL, they're delimiters. The spec says a report is normally sent to each. How can a publisher express that two URIs are meant to be either-or alternatives to each other? Unless I'm missing something, the ABNF for DMARC allows all three of the cited examples, as well as Yahoo's use, and all four of them mean the same thing. That doesn't strike me as a bug. Recall that it took years and an RFC revision to have mailto: URIs treated with reasonable uniformity by web browsers. Here we are specifying an entirely new kind of client, which avails of its specific URI-transmission protocol. IMHO, if we want %2c to be interpreted as addr-spec separator or otherwise, we ought to spell it loud and clear. It may also be worth to require domain in addr-spec to be A-label, as that simplifies verification and
Re: [dmarc-ietf] Formal specification, URI
Alessandro Vesely writes: If the spec is going to be read by ignorants like me, it's better to repeat than to omit. -1. It's good that you read the spec, but that's not the primary purpose of the spec. It's a bad idea to repeat definitions clearly stated in another document (even in informal comments on the formal spec) when you refer to the original document; you're just asking for new ambiguity. ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
Alessandro Vesely writes: If the spec is going to be read by ignorants like me, it's better to repeat than to omit. -1. It's good that you read the spec, but that's not the primary purpose of the spec. It's a bad idea to repeat definitions clearly stated in another document (even in informal comments on the formal spec) when you refer to the original document; you're just asking for new ambiguity. +1. Repeating stuff like this is in the long term a surefire of silly states, where one repetition gets updated but not another. The running code that comes to mind is the MIME specification, which originally had a bunch of repeated and overlapping syntax definitions. In this case it only took one revision for it to get out of sync and cause confusion. Ned ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On Mon, Mar 16, 2015 at 3:51 AM, Alessandro Vesely ves...@tana.it wrote: Section 2.2 of RFC3986 lists semi-colon as a reserved character that has to be percent-encoded in these URLs. We don't need to repeat it here, I think. If the spec is going to be read by ignorants like me, it's better to repeat than to omit. RFC3986 has a very wide scope, and uses phrases like may (or may not) be defined as delimiters. It says: If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed. Right, which is why things like semi-colon don't need to be percent-encoded; they're already special characters in the context of a URL. Commma and exclamation (which are sub-delims like semicolon) are apparently used in dmarc-uri's rule. The preceding DMARC section says: DMARC records follow the extensible tag-value syntax for DNS-based key records defined in DKIM [DKIM]. However, DKIM production rules don't seem to be formally imported. If they are imported, semicolon exclusion is implied by the definition: VALCHAR = %x21-3A / %x3C-7E ; EXCLAMATION to TILDE except SEMICOLON They aren't formally imported, and I'm not sure that's necessary here. The ABNF we have should be comprehensive over DMARC tag-value sets. The prose you cited is merely meant to convey that they follow the same style. How about the other two questions? I didn't survey but a few DMARC records, but RFC6068 exemplifies the following: Also note that it is syntactically valid to specify both to and an hfname whose value is to. That is, mailto:addr1@an.example,addr2@an.example is equivalent to mailto:?to=addr1@an.example,addr2@an.example is equivalent to mailto:addr1@an.example?to=addr2@an.example However, the latter form is NOT RECOMMENDED because different user agents handle this case differently. In particular, some existing clients ignore to hfvalues. Yahoo instead uses 1st level syntax: rua=mailto:dmarc-yahoo-...@yahoo-inc.com, mailto:dmarc_y_...@yahoo.com; Your question is Are they equivalent? I believe they are. Although it might be ideal to have a specification so tight that there's exactly one way to do something, in the end I don't think it's harmful to have two ways to say the same thing. It's more of a concern if there's to ways to interpret a single thing; that's when we arguably have something to fix. The goal in allowing a comma-separated list of URLs is that you might conceivably want to put an http and a mailto URL in there, in the try A first, then try B sense. We need to allow for that possibility. We also need to account for the possibility of a comma that is inside of a URL; those are the ones that need to be encoded. Outside of a URL, they're delimiters. Unless I'm missing something, the ABNF for DMARC allows all three of the cited examples, as well as Yahoo's use, and all four of them mean the same thing. That doesn't strike me as a bug. -MSK ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On Mon 16/Mar/2015 05:17:37 +0100 Murray S. Kucherawy wrote: On Sun, Mar 15, 2015 at 11:53 AM, Alessandro Vesely ves...@tana.it wrote: This seems to be a bug: OLD: dmarc-uri = URI [ ! 1*DIGIT [ k / m / g / t ] ] ; URI is imported from [URI]; commas (ASCII ; 0x2c) and exclamation points (ASCII 0x21) ; MUST be encoded; the numeric portion MUST fit ; within an unsigned 64-bit integer NEW: dmarc-uri = URI [ ! 1*DIGIT [ k / m / g / t ] ] ; URI is imported from [URI]; commas (ASCII ; 0x2c), exclamation points (ASCII 0x21), and ; semicolons (ASCII 0x3b) MUST be percent-encoded; ; the numeric portion MUST fit within an unsigned ; 64-bit integer Is it equivalent to have, say, rua=mailto:a...@example.com%...@example.com and rua=mail...@example.com, mailto:b...@example.com? Is the following meant to to be allowed? mailto:dmarc@ietf.org?subject=Formal%20specification%2c%20URI Section 2.2 of RFC3986 lists semi-colon as a reserved character that has to be percent-encoded in these URLs. We don't need to repeat it here, I think. If the spec is going to be read by ignorants like me, it's better to repeat than to omit. RFC3986 has a very wide scope, and uses phrases like may (or may not) be defined as delimiters. It says: If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed. Commma and exclamation (which are sub-delims like semicolon) are apparently used in dmarc-uri's rule. The preceding DMARC section says: DMARC records follow the extensible tag-value syntax for DNS-based key records defined in DKIM [DKIM]. However, DKIM production rules don't seem to be formally imported. If they are imported, semicolon exclusion is implied by the definition: VALCHAR = %x21-3A / %x3C-7E ; EXCLAMATION to TILDE except SEMICOLON Anyway, I'd add the percent- word, lest anyone tries #44... How about the other two questions? I didn't survey but a few DMARC records, but RFC6068 exemplifies the following: Also note that it is syntactically valid to specify both to and an hfname whose value is to. That is, mailto:addr1@an.example,addr2@an.example is equivalent to mailto:?to=addr1@an.example,addr2@an.example is equivalent to mailto:addr1@an.example?to=addr2@an.example However, the latter form is NOT RECOMMENDED because different user agents handle this case differently. In particular, some existing clients ignore to hfvalues. Yahoo instead uses 1st level syntax: rua=mailto:dmarc-yahoo-...@yahoo-inc.com, mailto:dmarc_y_...@yahoo.com; Ale ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On 03/16/2015 12:22 PM, Murray S. Kucherawy wrote: [...] The goal in allowing a comma-separated list of URLs is that you might conceivably want to put an http and a mailto URL in there, in the try A first, then try B sense. We need to allow for that possibility. We also need to account for the possibility of a comma that is inside of a URL; those are the ones that need to be encoded. Outside of a URL, they're delimiters. Just to be explicit, it also allows for multiple mailto: URIs - something that is seen in the wild, though perhaps not if one looks up a half dozen DMARC records at random. But at the end of January multiple mailto: URIs could be seen in ten of the Alexa Top 100 domains, in both rua and ruf tags. --S. ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On Mon, Mar 16, 2015 at 12:57 PM, Steven M Jones s...@crash.com wrote: Just to be explicit, it also allows for multiple mailto: URIs - something that is seen in the wild, though perhaps not if one looks up a half dozen DMARC records at random. But at the end of January multiple mailto: URIs could be seen in ten of the Alexa Top 100 domains, in both rua and ruf tags. Right, and there might be some operational reason why you want to send one message to A and B, versus a message to A and then a message to B. (Fewer calls to fork()/exec(), perhaps.) -MSK ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On Mon, Mar 16, 2015 at 12:22 PM, Murray S. Kucherawy superu...@gmail.com wrote: Your question is Are they equivalent? I believe they are. Although it might be ideal to have a specification so tight that there's exactly one way to do something, in the end I don't think it's harmful to have two ways to say the same thing. It's more of a concern if there's to ways to interpret a single thing; that's when we arguably have something to fix. Sigh. s/to ways/two ways/ -MSK ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc
Re: [dmarc-ietf] Formal specification, URI
On Sun, Mar 15, 2015 at 11:53 AM, Alessandro Vesely ves...@tana.it wrote: This seems to be a bug: OLD: dmarc-uri = URI [ ! 1*DIGIT [ k / m / g / t ] ] ; URI is imported from [URI]; commas (ASCII ; 0x2c) and exclamation points (ASCII 0x21) ; MUST be encoded; the numeric portion MUST fit ; within an unsigned 64-bit integer NEW: dmarc-uri = URI [ ! 1*DIGIT [ k / m / g / t ] ] ; URI is imported from [URI]; commas (ASCII ; 0x2c), exclamation points (ASCII 0x21), and ; semicolons (ASCII 0x3b) MUST be percent-encoded; ; the numeric portion MUST fit within an unsigned ; 64-bit integer Is it equivalent to have, say, rua=mailto:a...@example.com%...@example.com and rua=mail...@example.com, mailto:b...@example.com? Is the following meant to to be allowed? mailto:dmarc@ietf.org?subject=Formal%20specification%2c%20URI Section 2.2 of RFC3986 lists semi-colon as a reserved character that has to be percent-encoded in these URLs. We don't need to repeat it here, I think. -MSK ___ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc