Re: [Standards] Support for stickers (custom emojis)
On 10/24/19 9:40 PM, Kim Alvefur wrote: We should refrain from using things like grapheme clusters in wire formats, as those are subject to changes in upcoming Unicode versions and thus the wire format would be understood differently depending on the Unicode version implemented by the client. Doesn't this also depend on the font? If the font does not support certain graphemes it may be rendered as multiple (it may render 臘♂️ as 臘 and ♂️). The font rendering toolkit may be aware that this is a single grapheme since Emoji 4.0 and thus may consider it a single grapheme when selecting (for copy and paste, i.e. not allow to only copy the ♂️). If the rendering toolkit does allow to select only a part of this grapheme cluster and the user does so and instruct the client to make the selected text a reference, this would make things interesting again (because in the Unicode counting, you'd be in the middle of a character, so it would not be possible to actually do what the user instructed). Thus the font may be relevant for various UI/UX stuff and developers need to be aware of those when allowing the user to input stuff. For output, the font would not be of any relevance, it doesn't matter if in the end the reference link is using a single grapheme or two graphemes because the font does not support that single grapheme from the newer Unicode version. Of course if the toolkit wants you to give highlight instructions in displayed graphemes, you'd have to deal with that, but I hope there is no toolkit doing that... Does it make sense to do an Informational XEP for Unicode handling in XEPs? Marvin ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Support for stickers (custom emojis)
On Thu, Oct 24, 2019 at 08:32:04PM +0200, Marvin W wrote: > Thus, I would vote for using codepoints. I agree. > The rule should just be that clients should not do that on outgoing > data. I agree with this as well. > We should refrain from using things like grapheme clusters in wire formats, > as those are subject to changes in upcoming Unicode versions and thus the > wire format would be understood differently depending on the Unicode version > implemented by the client. Doesn't this also depend on the font? -- Kim "Zash" Alvefur signature.asc Description: PGP signature ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Support for stickers (custom emojis)
On 10/21/19 4:06 PM, Jonathan Lennox wrote: The right concept here is probably "grapheme clusters", as defined in Unicode Standard Annex 29. ICU has support for this. We should refrain from using things like grapheme clusters in wire formats, as those are subject to changes in upcoming Unicode versions and thus the wire format would be understood differently depending on the Unicode version implemented by the client. Technically we could also agree on using a certain Unicode version now and for all eternity, but this sounds like a stupid concept and will cause people to use ICU or similar which will break eventually as the standard changes. We should strive for the maximum compatibility. This gives us basically two options: bytes and codepoints. As our encoding is fixed to UTF-8 per RFC6120, both would be equally understandable by clients. However there are two good reasons against bytes: 1) At some point we might want to allow the usage of UTF-16 or any other encoding. Byte counts would have to be translated when re-encoding which a server is probably unable to do generically. 2) There is no useful meaning of starting a link or bold inside a codepoint. Depending on the tech stack used, it might cause developers to unintentionally allow the generation of invalidly encoded strings, causing all kind of issues (including potential security impact) Thus, I would vote for using codepoints. This would of course open the questions what happens if multiple codepoints result in a single grapheme and anything points inside the grapheme. The rule should just be that clients should not do that on outgoing data. If a clients receives input pointing inside a grapheme, it's implementation-defined if the grapheme is included, excluded or split. In practice this shouldn't happen so I doubt it is really worth it to define ruling in the respective XEP, but this would also be an option. By the way, the often mentioned flag example is not consistent across browsers either, try https://larma.de/splitflag.html with various browsers and browser versions. (Bonus Task: Build a browser detector based on flag rendering) Marvin ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Support for stickers (custom emojis)
пн, 21 окт. 2019 г. в 19:08, Jonathan Lennox : > The right concept here is probably "grapheme clusters", as defined in > Unicode Standard Annex 29. ICU has support for this. > We have succeded implementing reference processing on three clients and on the server side. And not one of the developers had problems calculating the necessary positions. You just handle every emoji as one glyph. In addition we made a XMPP bot with which you can test different references: markup, string with escaped text and different media. You can try it xmpp:dev...@dev.xabber.com For instance, if you have such text : " funny comment with some bold text!" and you want to make it in some part bold, you should count every symbol in this text and in the end you will get such message to send: funny comment with some bold text! Each of these three emojis is counted as 1 symbol. The client will render: [image: Screenshot_2019-10-24 Xabber Web.png] More complex example with unicode combining characters: "Test ◌⃤ BOLD italic usual text". We count this graphem as one character. The message should be like this: Test ◌⃤ BOLD italic usual text The client will render: [image: Screenshot_2019-10-24 Xabber Web(1).png] In addition we made a XMPP bot with which you can test different references: markup, string with escaped text and different media content. You can try it here xmpp:dev...@dev.xabber.com. -- Andrey Gagarin Developer, Redsolution OÜ ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___