Re: [Gen-art] Genart last call review of draft-crocker-inreply-react-07

2021-01-31 Thread Dave Crocker

On 1/31/2021 2:16 PM, Dale R. Worley wrote:

Dave Crocker  writes:

On 1/27/2021 6:32 PM, Dale Worley via Datatracker wrote:

 The emoji(s) express a recipient's summary reaction to the specific
 message referenced by the accompanying In-Reply-To header field.
 [Mail-Fmt].

This is not specific as to where the In-Reply-To header is.  I assume
you want to say that it is a header of the parent multipart component
of "Reaction" part.  Or perhaps this should be forward-referenced to
the discussion in section 3.


I don't understand the concern.  An In-Reply-To header field is part of
the message header.  That is, it will be in the header of the response
message.


Given that we're deailing with multipart messages, an In-Reply-To header
could be stuck in the message header but it could also be stuck in the
headers of any part.  I don't know if it's ever done, but certainly,
it's plausible that if I include a reply which I had received as an
attachment to another email I send, the In-Reply-To header in the
received e-mail would show up as a header to the attachment part, not
my message as a whole.

In general, the situation is one of unlimited complexity.


RFC 5322's definition of the In-Reply-To field has it being optionally 
present in the message header:


  message =  fields *( CRLF *text )   ; Everything after
 ;  first null line
 ;  is message body

 fields  =dates  ; Creation time,
  source ;  author id & one
1*destination;  address required
 *optional-field ;  others optional


 optional-field =
 /  "In-Reply-To"   ":"  *(phrase / msg-id)


As such, it's location is not as random or varied as you seem to think. 
 Also note that the In-Reply-To field has long history and is already 
well-integrated into MUAs.


So, the complexity is quite limited.

My guess is that you are confusing the variable venues possible for the 
emoji-sequence with the far less variable venue of In-Reply-To.


I suppose a clarification could be added along the lines of:

OLD:
   The emoji(s) express a recipient's summary reaction to the specific
   message referenced by the accompanying In-Reply-To header field.
   [Mail-Fmt].

NEW:
   The emoji(s) express a recipient's summary reaction to the specific
   message referenced by the accompanying In-Reply-To header field, for
   the message in which they both are present. [Mail-Fmt].

If a message is nested within a message, that defines a hard reference 
boundary.  Something inside the nested message does not refer to the 
containing message, for example.




I'm not particular what rules you want to specify, just that when I'm
looking at a part with this Content-Disposition that is somewhere in a
multipart structure (possibly without parts), that it's clear which sets
of headers I need to examine to find the In-Reply-Header.


Perhaps you could offer an example or two of messages that you see as 
creating ambiguity or other confusion?




Now I think in reality, it either has to be in the headers of the part
with disposition "reaction", or in the multipart containing that part.
But whatever the rule is, it should be stated.


see above?



 Reference to unallocated code points SHOULD NOT be treated as an
 error; associated bytes SHOULD be processed using the system default
 method for denoting an unallocated or undisplayable code point.

Code points that do not have the requisite attributes to qualify as
part of an emoji_sequence should also not be treated as an error,
although you probably want to allow the system to alternatively
display them normally (rather than as an unallocated or undisplayable
code point).


I think your comment addresses a different issue than the cited text is
meant for, but I also might be misunderstanding.

For whatever reasons, including not having been allocated by the Unicode
folks, or possibly by running an older system that thinks a code point
is not allocated, there is an issue of how the system should deal with
encountering such a code point.  The text here is merely trying to say
"do whatever you do".


The text is a constraint, though.  It *requires* (sort of) that if the
bytes in the part form a character which the receiver considers
unallocated, it *should not* reject the whole message as being
ill-formed.  The implementation has great freedom in how to display the
caracter, but the message as a whole "SHOULD NOT be treated as an
error".


Since this specification pertains to processing of some octets, rather 
than having anything to do with overall processing of the message, I am 
not understanding your concern.


A system that might reject an entire message because the system is 
unhappy with one or another of the octets in the message is playing it

Re: [Gen-art] Genart last call review of draft-crocker-inreply-react-07

2021-01-31 Thread Dale R. Worley
Dave Crocker  writes:
> On 1/27/2021 6:32 PM, Dale Worley via Datatracker wrote:
>> Reviewer: Dale Worley
>> Review result: Ready with Nits

First to deal with the straightfoward points:

>> The emoji(s) express a recipient's summary reaction to the specific
>> message referenced by the accompanying In-Reply-To header field.
>> [Mail-Fmt].
>>
>> This is not specific as to where the In-Reply-To header is.  I assume
>> you want to say that it is a header of the parent multipart component
>> of "Reaction" part.  Or perhaps this should be forward-referenced to
>> the discussion in section 3.
>
> I don't understand the concern.  An In-Reply-To header field is part of 
> the message header.  That is, it will be in the header of the response 
> message.

Given that we're deailing with multipart messages, an In-Reply-To header
could be stuck in the message header but it could also be stuck in the
headers of any part.  I don't know if it's ever done, but certainly,
it's plausible that if I include a reply which I had received as an
attachment to another email I send, the In-Reply-To header in the
received e-mail would show up as a header to the attachment part, not
my message as a whole.

In general, the situation is one of unlimited complexity.

I'm not particular what rules you want to specify, just that when I'm
looking at a part with this Content-Disposition that is somewhere in a
multipart structure (possibly without parts), that it's clear which sets
of headers I need to examine to find the In-Reply-Header.

Now I think in reality, it either has to be in the headers of the part
with disposition "reaction", or in the multipart containing that part.
But whatever the rule is, it should be stated.

>> Reference to unallocated code points SHOULD NOT be treated as an
>> error; associated bytes SHOULD be processed using the system default
>> method for denoting an unallocated or undisplayable code point.
>>
>> Code points that do not have the requisite attributes to qualify as
>> part of an emoji_sequence should also not be treated as an error,
>> although you probably want to allow the system to alternatively
>> display them normally (rather than as an unallocated or undisplayable
>> code point).
>
> I think your comment addresses a different issue than the cited text is 
> meant for, but I also might be misunderstanding.
>
> For whatever reasons, including not having been allocated by the Unicode 
> folks, or possibly by running an older system that thinks a code point 
> is not allocated, there is an issue of how the system should deal with 
> encountering such a code point.  The text here is merely trying to say 
> "do whatever you do".

The text is a constraint, though.  It *requires* (sort of) that if the
bytes in the part form a character which the receiver considers
unallocated, it *should not* reject the whole message as being
ill-formed.  The implementation has great freedom in how to display the
caracter, but the message as a whole "SHOULD NOT be treated as an
error".

> A different issue is encountering a code-point, here, that is outside of 
> the emoji-sequence set. The text doesn't try to tell the receiver how to 
> process bytes that are illegal here.

Perhaps that is what you intend, and if so, the text is correct.  But it
seems to me that if the bytes form a code point that the receiver
considers to be allocated but not an emoji, it should be under the same
constraint that it should not reject the message as a whole as erroneous.

Now for the messy part:

> The rule emoji_sequence is inherited from [Emoji-Seq].  It permits
> one or more bytes to form a single presentation image.

First, let me say I keep a rigid category distinction between
bytes/octets and characters.  And in this situation, it seems like there
are *three* layers of composition between bytes and displayed items:

- The UTF-8 encoding groups bytes into code points, which are generally
  Unicode "characters".

- The code points can be composed (by Unicode rules) into characters.
  As Barry explains, "as creating “á” from “a” plus combining acute
  accent".  But I'm not so familiar with how that is done and how that
  affects exactly what the word "character" means.  (I also do not know
  whether any emoji code point participates in Unicode composition, but
  a sender can certainly compose reactions containing code points that
  participate in composition, and there probably is no guarantee that
  Unicode will never do such a thing with emoji.)

- Groups of characters may be displayed as single images.  As Barry
  explains, "the sort of thing that’s unique to emoji, wherein the
  emojis for man followed by woman followed by boy, each of which is a
  separate emoji character that would be displayed as it seems, will
  often be rendered as a single image of a family".

Composing these processes, it takes bytes/octets (the encoded form of
the "reaction" part) into a sequence of displayed images.

When I w