Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Marvin W

On 10/24/19 9:40 PM, Kim Alvefur wrote:

We should refrain from using things like grapheme clusters in wire formats,
as those are subject to changes in upcoming Unicode versions and thus the
wire format would be understood differently depending on the Unicode version
implemented by the client.


Doesn't this also depend on the font?


If the font does not support certain graphemes it may be rendered as 
multiple (it may render 臘‍♂️ as 臘 and ♂️). The font rendering toolkit 
may be aware that this is a single grapheme since Emoji 4.0 and thus may 
consider it a single grapheme when selecting (for copy and paste, i.e. 
not allow to only copy the ♂️). If the rendering toolkit does allow to 
select only a part of this grapheme cluster and the user does so and 
instruct the client to make the selected text a reference, this would 
make things interesting again (because in the Unicode counting, you'd be 
in the middle of a character, so it would not be possible to actually do 
what the user instructed). Thus the font may be relevant for various 
UI/UX stuff and developers need to be aware of those when allowing the 
user to input stuff.


For output, the font would not be of any relevance, it doesn't matter if 
in the end the reference link is using a single grapheme or two 
graphemes because the font does not support that single grapheme from 
the newer Unicode version. Of course if the toolkit wants you to give 
highlight instructions in displayed graphemes, you'd have to deal with 
that, but I hope there is no toolkit doing that...


Does it make sense to do an Informational XEP for Unicode handling in XEPs?

Marvin
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Kim Alvefur
On Thu, Oct 24, 2019 at 08:32:04PM +0200, Marvin W wrote:
> Thus, I would vote for using codepoints.

I agree.

> The rule should just be that clients should not do that on outgoing
> data.

I agree with this as well.

> We should refrain from using things like grapheme clusters in wire formats,
> as those are subject to changes in upcoming Unicode versions and thus the
> wire format would be understood differently depending on the Unicode version
> implemented by the client.

Doesn't this also depend on the font?

-- 
Kim "Zash" Alvefur


signature.asc
Description: PGP signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Marvin W

On 10/21/19 4:06 PM, Jonathan Lennox wrote:

The right concept here is probably "grapheme clusters", as defined in
Unicode Standard Annex 29.  ICU has support for this.


We should refrain from using things like grapheme clusters in wire 
formats, as those are subject to changes in upcoming Unicode versions 
and thus the wire format would be understood differently depending on 
the Unicode version implemented by the client.


Technically we could also agree on using a certain Unicode version now 
and for all eternity, but this sounds like a stupid concept and will 
cause people to use ICU or similar which will break eventually as the 
standard changes.


We should strive for the maximum compatibility. This gives us basically 
two options: bytes and codepoints. As our encoding is fixed to UTF-8 per 
RFC6120, both would be equally understandable by clients. However there 
are two good reasons against bytes:
1) At some point we might want to allow the usage of UTF-16 or any other 
encoding. Byte counts would have to be translated when re-encoding which 
a server is probably unable to do generically.
2) There is no useful meaning of starting a link or bold inside a 
codepoint. Depending on the tech stack used, it might cause developers 
to unintentionally allow the generation of invalidly encoded strings, 
causing all kind of issues (including potential security impact)


Thus, I would vote for using codepoints. This would of course open the 
questions what happens if multiple codepoints result in a single 
grapheme and anything points inside the grapheme. The rule should just 
be that clients should not do that on outgoing data. If a clients 
receives input pointing inside a grapheme, it's implementation-defined 
if the grapheme is included, excluded or split. In practice this 
shouldn't happen so I doubt it is really worth it to define ruling in 
the respective XEP, but this would also be an option.


By the way, the often mentioned flag example is not consistent across 
browsers either, try https://larma.de/splitflag.html with various 
browsers and browser versions. (Bonus Task: Build a browser detector 
based on flag rendering)


Marvin
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Andrey Gagarin
пн, 21 окт. 2019 г. в 19:08, Jonathan Lennox :

> The right concept here is probably "grapheme clusters", as defined in
> Unicode Standard Annex 29.  ICU has support for this.
>

We have succeded implementing reference processing on three clients and on
the server side. And not one of the developers had problems calculating the
necessary positions. You just handle every emoji as one glyph.

In addition we made a XMPP bot with which you can test different
references: markup, string with escaped text and different media. You can
try it xmpp:dev...@dev.xabber.com

For instance, if you have such text : " funny comment with some bold
text!" and you want to make it in some part bold, you should count every
symbol in this text and in the end you will get such message to send:






 funny comment with some bold text!


Each of these three emojis is counted as 1 symbol.

The client will render:
[image: Screenshot_2019-10-24 Xabber Web.png]

More complex example with unicode combining characters: "Test ◌⃤ BOLD
italic usual text". We count this graphem as one character. The message
should be like this:








Test ◌⃤ BOLD italic usual text


The client will render:

[image: Screenshot_2019-10-24 Xabber Web(1).png]

In addition we made a XMPP bot with which you can test different
references: markup, string with escaped text and different media content.
You can try it here xmpp:dev...@dev.xabber.com.

-- 
Andrey Gagarin
Developer, Redsolution OÜ
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___