On Aug 22, 2014, at 5:16 AM, Shane Kerr <sh...@time-travellers.org> wrote:

> Paul,
> 
> On Thu, 21 Aug 2014 09:02:10 -0700
> Paul Hoffman <paul.hoff...@vpnc.org> wrote: 
>> Andreas' and Shane's requests differ. And they both ignore the fact
>> that JSON defines strings as Unicode characters, not as octets. The
>> escaping "defined" in RFC 1035 does not say where it must be applied.
> 
> While we propose different solutions, the thing that you think we're
> ignoring is what we're actually just working around.

Yes, exactly. :-)

> Both Andreas' and my suggestions work by recognizing that ASCII is a
> subset of Unicode, and requiring that DNS JSON messages use ASCII.

Do you mean any part of messages, or only names? I hope you would not encode a 
DNSKEY with either form of escaping you used.

> 
>> Note that the definition of a string in JSON is:
>> 
>>      string = quotation-mark *char quotation-mark
>> 
>>      char = unescaped /
>>          escape (
>>              %x22 /          ; "    quotation mark  U+0022
>>              %x5C /          ; \    reverse solidus U+005C
>>              %x2F /          ; /    solidus         U+002F
>>              %x62 /          ; b    backspace       U+0008
>>              %x66 /          ; f    form feed       U+000C
>>              %x6E /          ; n    line feed       U+000A
>>              %x72 /          ; r    carriage return U+000D
>>              %x74 /          ; t    tab             U+0009
>>              %x75 4HEXDIG )  ; uXXXX                U+XXXX
>> 
>>      escape = %x5C              ; \
>> 
>>      quotation-mark = %x22      ; "
>> 
>>      unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
> 
> Yes, yes.
> 
>> Given this, and the likelihood that escaping is going to screw up
>> NAME/QNAME exactly where it will be needed the most (to get the exact
>> octets of an odd name), I think making NAME/QNAME only hold
>> hostnames, and non-hostnames must be in a different field that is
>> hex-encoded, will be the easiest to get right.
> 
> In Andreas' proposal a name of ^C would be written as "\\003", and in
> mine it would be written as "\u0003". I'm not sure why you
> think this would cause more more of a chance of error than "03".

You picked the easy example. Just to be clear, for "copyright sign", Unicode 
U+00A9, you would encode this as "\u00a9", and Andreas would do so as \\169, 
yes? And that you are only encoding single octets, not Unicode characters? If 
so, then Andreas' proposal is much better than yours, because a label that 
actually contains non-ASCII characters encoded in UTF-8 (which we still 
unfortunately see plenty of) will most likely get encoded wrong in your scheme 
but correctly in Andreas'.

> To be honest, I think a more likely scenario is that a coder consuming
> this data would not bother to look at any specifications, but build a
> quick parser, which would then break every couple days as some random
> packet has a "QNAME" instead of "hostQNAME" value show up. :)

Quite possibly true.

> 
> Further, Andreas' and my proposal both have the nice property that an
> SRV lookup would appear like "_sip._tcp.example.com", instead of 
> "735f70695f2e63742e7078656d616c702e656f636d".

Yes, unless we take my proposal from yesterday, which was to make QNAME/NAME be 
the name unless it is not a hostname, in which case there is a second field.

> One slight advantage of my proposal over Andreas' is that a consumer
> would likely not have to do anything fancy to read the data as it was
> in the original message. (A producer might have to do some gymnastics
> to insure that %x7F to %xFF are output properly, depending on how the
> messages are generated, I suppose.)

I think yours is more fragile in the face of Unicode. For yours, you have to 
assure that the producer is not using any Unicode encoding scheme; Andreas' 
treats everything as octets more cleanly.

> To sum up, I think that if you're going to the bother of transforming
> DNS messages into some vaguely human-readable format, you should try to
> make it as readable as possible.

Fully agree. But I also want to have a second format where it is clearly binary 
in case the producer screws up the escaping. I'm happy to make the QNAME/NAME 
be the easier-to-read one.

--Paul Hoffman
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to