Re: PUA (BMP) planned characters HTML tables
On Mon, 12 Aug 2019 at 02:27, James Kass via Unicode wrote: > > On 2019-08-11 5:26 PM, [ Doug Ewell ] via Unicode wrote: > > If you are thinking of these as potential future additions to the standard, > > keep in mind that accented letters that can already be represented by a > > combination of letter + accent will not ever be encoded. This is one of the > > longest-standing principles Unicode has. People seem to be ignoring the fact that Marshallese and Latvian both use L and N with cedilla, but with completely different glyph shapes: > In January 2013, the Unicode Technical Committee discussed issues for the > representation of > Marshallese orthography. In particular, Marshallese uses the Latin script and > requires the letters l, > m, n, and o with cedilla. Latvian orthography uses the Latin script and > requires the letters g, k, l, n, > and r with comma below. For Marshallese, it is unacceptable to display > cedillas as commas below. > Conversely, for Latvian, it is unacceptable to display commas below as > cedillas. However, as fonts have been following Latvian practice for these letters (cedilla is displayed as a comma below) since before Unicode, Marshallese users cannot get their desired outcome using standard Unicode combining diacritical marks unless they apply a font specially designed for Marshallese -- which you can never guarantee if you are writing an email or posting on twitter, etc. This issue was discussed at WG2 in 2013 (https://www.unicode.org/L2/L2013/13128-latvian-marshal-adhoc.pdf), when there was a recommendation to encode precomposed letters L and N with cedilla *with no decomposition*, but that solution does not seem to have been taken up by the UTC. Andrew
Re: Fonts and Canonical Equivalence
On Sat, 10 Aug 2019 at 15:46, Richard Wordingham via Unicode wrote: > > > Just retested on Windows 10 with > > a Tibetan font that supports both sequences of vowels, and both > > sequences display correctly under Harfbuzz (as expected), but only > > vowel-below followed by vowel-above displays correctly when using > > built-in Windows rendering. > > Does vowel above before vowel below yield a dotted circle? Yes. Attached are screenshots for two real world examples, one which is logically spelled as i + u, and one as u + i: 1. ཉིུ <0F49 0F72 0F74> [nyiu] as a contraction for ཉི་ཤུ [nyi shu] "twenty" 2. བཅིུག <0F56 0F45 0F74 0F72 0F42> [bcuig] as a contraction for བཅུ་གཅིག [bcu gcig] "eleven" Andrew
Re: Fonts and Canonical Equivalence
On Sat, 10 Aug 2019 at 08:29, Richard Wordingham via Unicode wrote: > > There are similar issues with Tibetan; some fonts do not work properly > if a vowel below (ccc=132) is separated from the base of the > consonant stack by a vowel above (ccc=130). It's not that the fonts don't work, it's that some the rendering engines do not apply the OpenType features in the font that support both sequences of vowels (vowel-above followed by vowel-below, and vowel-below followed by vowel-above). Just retested on Windows 10 with a Tibetan font that supports both sequences of vowels, and both sequences display correctly under Harfbuzz (as expected), but only vowel-below followed by vowel-above displays correctly when using built-in Windows rendering. It is very frustrating that Windows cannot correctly support the display of Tibetan in normalized form, yet Harfbuzz does not have any problems. Personally, I think USE is a failed experiment, and I wish Microsoft would simply adopt Harfbuzz as the default rendering engine. Andrew
Re: Proposal to extend the U+1F4A9 Symbol
On Sat, 1 Jun 2019 at 23:32, Doug Ewell via Unicode wrote: > > Tex wrote: > > > What I would find useful is an emoji for when my phone falls into the > > toilet. > > I would have thought ⤵ would be sufficient. Don't worry, a brand new foolproof method of defining emoji for anything in the universe using Wikidata QIDs is coming to a phone near you soon (http://www.unicode.org/L2/L2019/19082r-qid-emoji.pdf) ... oh, there is no Wikidata QID for phone dropped in the toilet. Andrew
Re: Encoding italic
On Tue, 5 Feb 2019 at 15:34, wjgo_10...@btinternet.com via Unicode wrote: > > italic version of a glyph in plain text, including a suggestion of to > which characters it could apply, would test whether such a proposal > would be accepted to go into the Document Register for the Unicode > Technical Committee to consider or just be deemed out of scope and > rejected and not considered by the Unicode Technical Committee. Just reminding you that "The initial character in a variation sequence is never a nonspacing combining mark (gc=Mn) or a canonical decomposable character" (The Unicode Standard 11.0 §23.4). This means that a variation sequence cannot be defined for any precomposed letters and diacritics, so for example you could not italicize the word "fête" by simply adding VS14 after each letter because "ê" (in NFC form) cannot act as the base for a variation sequence. You would have to first convert any text to be italicized to NFD, then apply VS14 to each non-combining character. This alone would make a VS solution unacceptable in my opinion. Andrew
Re: Proposal for BiDi in terminal emulators
On Fri, 1 Feb 2019 at 22:20, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > Language tagging is already available in Unicode, via the tag > > characters in the deprecated plane. > > Plane 14 isn't deprecated -- that isn't a property of planes -- and the > tag characters U+E0020 through U+E007E have been un-deprecated for use > with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are > deprecated. Cancel Tag is not deprecated any longer either (http://www.unicode.org/Public/UNIDATA/PropList.txt). Andrew
Re: Encoding italic
On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode wrote: > > This bold new concept was not mine. When I tested it > here, I was using the tag encoding recommended by the developer. Congratulations James, you've successfully interchanged tag-styled plain text over the internet with no adverse side effects. I copied your email into BabelPad and your "bold" is shown bold (see attached screenshot). Andrew
Re: Encoding italic
On Tue, 29 Jan 2019 at 10:25, Martin J. Dürst via Unicode wrote: > > The overall tag proposal had the desired effect: The original proposal > to hijack some unused bytes in UTF-8 was defeated, and the tags itself > were not actually used and therefore could be depreciated. And the tag characters (all except E0001) are now no longer deprecated. As flag tag sequences are now a thing (http://www.unicode.org/reports/tr51/#valid-emoji-tag-sequences), and are widely supported (including on Twitter), your and PV's objections to using tag characters for a plain text font styling protocol simply because they are tag characters carry zero weight. Andrew
Re: Encoding italic (was: A last missing link)
On Thu, 24 Jan 2019 at 15:42, James Kass wrote: > > Here's a very polite reply from John Hudson from 2000, > http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML024/1042.html > ...and, over time, many of the replies to William Overington's colorful > suggestions were less than polite. But it was clear that colors were > out-of-scope for a computer plain-text encoding standard. Going off topic a little, I saw this tweet from Marijn van Putten today which shows examples of Arabic script from early Quranic manuscripts with phonetic information indicated by the use of red and green dots: https://twitter.com/PhDniX/status/1088171783461703682 I would be interested to know how those should be represented in Unicode. Andrew
Re: Encoding italic (was: A last missing link)
On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode wrote: > > FAICT, the emoji repertoire is vendor-driven, just as the pre-Unicode > emoji sets were vendor driven. Pre-Unicode, if a vendor came up with > cool ideas for new emoji they added new characters to the PUA. Now that > emoji are standardized, when vendors come up with new ideas they put > them in the emoji ranges in order to preserve the standardization factor > and ensure interoperability. (That's probably over-simplified and there > are bound to be other factors involved.) I do not believe that recent (post-6.0) emoji additions are vendor-driven. There is no formal vendor representation on the ESC, and most ESC members do not work for vendors. Current emoji additions are driven by ordinary users, who are actively encouraged by the UTC to propose novel characters for encoding: http://blog.unicode.org/2018/04/submissions-open-for-2020-emoji.html http://blog.unicode.org/2016/09/emoji-deadline.html The vendors happily lap up whatever emojis the UTC throws at them, but they seem to have little interest in taking control of the emoji process. > We should no more expect the conventional Unicode character encoding > model to apply to emoji than we should expect the old-fashioned text > ranges to become vendor-driven. Why should we not expect the conventional Unicode character encoding mode to apply to emoji? We were told time and time again when emoji were first proposed that they were required for encoding for interoperability with Japanese telecoms whose usage had spilled over to the internet. At that time there was no suggestion that encoding emoji was anything other than a one-off solution to a specific problem with PUA usage by different vendors, and I at least had no idea that emoji encoding would become a constant stream with an annual quota of 60+ fast-tracked user-suggested novelties. Maybe that was the hidden agenda, and I was just naïve. The ESC and UTC do an appallingly bad job at regulating emoji, and I would like to see the Emoji Subcommittee disbanded, and decisions on new emoji taken away from the UTC, and handed over to a consortium or committee of vendors who would be given a dedicated vendor-use emoji plane to play with (kinda like a PUA plane with pre-assigned characters with algorithmic names [VENDOR-ASSIGNED EMOJI X] which the vendors can then associate with glyphs as they see fit; and as emoji seem to evolve over time they would be free to modify and reassign glyphs as they like because the Unicode Standard would not define the meaning or glyph for any characters in this plane). Andrew
Re: Encoding italic (was: A last missing link)
On Thu, 24 Jan 2019 at 02:10, Mark E. Shoulson via Unicode wrote: > > Unicode isn't here to encode cool new ideas that would be cool and > new. It's here for writing what people already do. http://www.unicode.org/L2/L2018/18141r2-emoji-colors.pdf "Add 14 colored emoji characters for decorative and/or descriptive uses. These may be used to indicate that an emoji has a different color." No evidence has been provided that anybody is currently using colored blobs for this purpose (in fact emoji users have explicitly rejected this method for indicating emoji color: http://www.unicode.org/L2/L2018/18208-white-wine-rgi.pdf), just an assertion that it would be a good idea if emoji users could add a colored swatch to an existing emoji to indicate what color they want it to represent (note that the colored characters do not change the color of the emoji they are attached to [before or after, depending upon whether you are speaking French or English dialect of emoji], they are just intended as a visual indication of what colour you wish the emoji was). This proposal to add 14 additional colored circles, squares and hearts is a perfect example of a cool new idea for something that the authors think would be really useful, but for which there is no evidence of existing use. The UTC should have rejected it as out of scope, but we all know that rules and procedures do not apply to the Emoji Subcommittee, so in fact this cool new idea will be included in Unicode 12 in March. Andrew
Re: Encoding italic (was: A last missing link)
On Sun, 20 Jan 2019 at 03:16, James Kass via Unicode wrote: > > Possible approaches include: > > 3 - Open/Close punctuation treatment > Stateful. Works on ranges. Not currently supported in plain-text. > Could be supported in applications which can take a text string URL and > make it a clickable link. Default appearance in nonsupporting apps may > resemble existing plain-text italic kludges such as slashes. The ASCII > is already in the character string. A possibility that I don't think has been mentioned so far would be to use the existing tag characters (E0020..E007F). These are no longer deprecated, and as they are used in emoji flag tag sequences, software already needs to support them, and they should just be ignored by software that does not support them. The advantages are that no new characters need to be encoded, and they are flexible so that tag sequences for start/end of italic, bold, fraktur, double-struck, script, sans-serif styles could be defined. For example start and end of italic styling could be defined as the tag sequences and (E003C E0069 E003E and E003C E002F E0069 E003E). Andrew
Re: Private Use areas - Vertical Text
On Wed, 29 Aug 2018 at 11:18, wrote: > > I was using a change horizontal to vertical text feature in office, the > PUA characters being from plane 15. I tested with Word 2007, and normal PUA characters from my font were displayed with vertical orientation in a vertical text box, but Plane 15 PUA characters were rotated. I also tested with Word 2016, and both normal PUA characters and Plane 15 PUA characters were displayed with vertical orientation in a vertical text box, as you want, although there were vertical spacing issues with the Plane 15 PUA characters which suggest that the vertical metrics tables (vhea and vmtx) in the font are not being applied for Plane 15 characters (or it could be a problem with my font). Andrew
Re: Private Use areas - Vertical Text
On Wed, 29 Aug 2018 at 05:07, via Unicode wrote: > > Yes, as Richard says when CJK Zhuang text is displayed vertically whilst > the Zhuang characters in Unicode remain upright, but those with PUA > codepoints are rotated 90°. John, you did not explain by what mechanism you were trying to display vertical PUA Zhuang text. I can display vertically-oriented PUA-encoded CJKVZ ideographs in vertical layout in web pages using CSS, as demonstrated in this test page: http://www.babelstone.co.uk/Fonts/PUA_Vertical_Test.html The PUA characters display with correct orientation under Windows 10 on the Edge, Chrome and Firefox browsers. The test page only fails under IE, but we are not meant to use IE anymore anyway. Andrew
Re: Private Use areas - Vertical Text
On Tue, 28 Aug 2018 at 18:15, WORDINGHAM RICHARD via Unicode wrote: > > Unicode is doing what it can in this matter: > > (a) Zhuang PUA characters are being made individually obsolete. Not by a nebulous entity called "Unicode", or even by the Unicode Consortium per se, but by the hard work over many years by individual experts such as John Knightley. Andrew
Re: The Unicode Standard and ISO
On 8 June 2018 at 13:01, Michael Everson via Unicode wrote: > > I wonder if Mark Davis will be quick to agree with me when I say that > ISO/IEC 15897 has no use and should be withdrawn. It was reviewed and confirmed in 2017, so the next systematic review won't be until 2022. And as the standard is now under SC35, national committees mirroring SC2 may well overlook (or be unable to provide feedback to) the systematic review when it next comes around. I agree that ISO/IEC 15897 has no use, and should be withdrawn. Andrew
Re: Translating the standard
On 12 March 2018 at 07:59, Marcel Schneider via Unicodewrote: > > Likewise ISO/IEC 10646 is available in a French version No it is not, and never has been. Why don't you check your facts before making misleading statements to this list? > or at least, it should have an official French version like all ISO standards. That is also blatantly untrue. Only six of the publicly available ISO standards listed at http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html have French versions, and one has a Russian version. You will notice that there is no French version of ISO/IEC 10646. Andrew
Re: Unicode Emoji 11.0 characters now ready for adoption!
On 7 March 2018 at 22:18, Philippe Verdy via Unicodewrote: > > Additional note: the UCS will never large enough to support the personal > signatures of billions Chinese people living today or born since milleniums, > or jsut those to be born in the next century. There's a need to represent > these names using composed strings. A reasonable compositing/ligaturing > process can then present almost all of them ! CJK characters invented for writing personal names are extremely rare, and do not constitute a significant fraction of CJK ideographs proposed for encoding. The majority of unencoded modern-use characters in China (that are not systematic simplified forms of existing encoded characters) are used in place names or in Chinese dialects or for writing non-Chinese languages such as Zhuang. Andrew
Re: Unicode Emoji 11.0 characters now ready for adoption!
On 28 February 2018 at 13:22, Christoph Päper via Unicodewrote: >> >> The 157 new Emoji are now available for adoption > > But Unicode 11.0 (which all new emojis but Pirate Flag and Infinity rely > upon) is not even in beta yet. Don't even get me started on that! >> There are approximately 7,000 living human languages, >> but fewer than 100 of these languages are well-supported on computers, >> mobile phones, and other devices. Adopt-a-character donations are used >> to improve Unicode support for digitally disadvantaged languages, and to >> help preserve the world’s linguistic heritage. > > Why is the announcement mentioning those numbers of languages at all? I agree, the figures are meaningless and misleading (and intended to mislead). I could list a hundred languages that are written with the Latin script without pausing for breath. There are very very few scripts in modern daily use that are not yet encoded in the UCS, but letting out that secret will not help the Unicode Consortium to raise money from character adoption. The latest grant to Anshu from Character Adoption money is for three historic scripts (http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html). If there were still so many digitally disadvantaged languages urgently in need of script encoding then surely the Unicode Consortium would be sponsoring those as a priority rather than historic scripts. Andrew
Re: Unicode Emoji 11.0 characters now ready for adoption!
On 28 February 2018 at 10:48, Martin J. Dürst via Unicodewrote: >> >>> The 157 new Emoji are now available for adoption, to help the Unicode >>> Consortium’s work on digitally disadvantaged languages. >> >> I'm quite curious what it the relation between the new emojis and the >> digitally disadvantages languages. I see none. > > I think this was mentioned before on this list, in particular by Mark: > The money collected from character adoptions (where emoji are a prominent > target) is (mostly?) used to support work on not-yet-encoded (thus digitally > disadvantaged) scripts. Over $250,000 has been raised from Unicode character adoptions to date. I am curious as to how much of this money has been spent, and would very much like to see annual accounts showing how much money has been received, and how much has been disbursed to whom and for what. Andrew . See e.g. the recent announcement at > http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html. > > Regards, Martin.
Re: UNICODE vehicle vanity registration?
You can use ♥⭐➕ in California. Someone has U+1F913 邏 ( https://www.instagram.com/p/BVYtIHensDu/) Andrew On 14 February 2018 at 16:24, Stephane Bortzmeyer via Unicode < unicode@unicode.org> wrote: > On Wed, Feb 14, 2018 at 09:44:06PM +0530, > Shriramana Sharma via Unicodewrote > a message of 6 lines which said: > > > Given that in the US vanity vehicle registrations with arbitrary > > alphanumeric sequences upto 7 characters are permitted (I am correct > > I hope?), I wonder who (here?) owns the UNICODE registration? > > Won't work in New York, unfortunately > > https://dmv.ny.gov/learn-about-personalized-plates > > "A character is a letter (A-Z), number (0-9) or space. Each space > counts as one character." > >
Re: 0027, 02BC, 2019, or a new character?
On 23 January 2018 at 00:55, James Kass via Unicodewrote: > > Regular American users simply don't type umlauts, period. Not even the president of the Unicode Consortium when referring to Christoph Päper: http://www.unicode.org/L2/L2018/18051-emoji-ad-hoc-resp.pdf Andrew
Re: 0027, 02BC, 2019, or a new character?
On 19 January 2018 at 13:19, Michael Everson via Unicodewrote: > > I’d go talk with him :-) I published Alice in Kazakh. He might like that. Damn, you'll have to reprint it with apostrophes now. Andrew
Re: 0027, 02BC, 2019, or a new character?
On 19 January 2018 at 09:16, Shriramana Sharma via Unicodewrote: > Wow. Somebody really needs to convey this to the Kazhaks. Else a > short-sighted decision would ruin their chances at native IDNs. Any Kazhaks > on this list? There's only one Kazakh who counts, and I'm pretty sure he's not on this list. Andrew
Re: Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation)
On 12 April 2017 at 15:58, Garth Wallacewrote: > > So has that proposal been retracted now? Once a proposal has been approved it cannot simply be retracted by the submitter. On the SC2 side, the proposed characters have been subject to ballot comments from national bodies, and no doubt they will be discussed at the WG2 meeting in Hohhot later this year. Andrew
Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation)
On 12 April 2017 at 05:12, Garth Wallace via Unicodewrote: > > Later Xiangqi proposals by Andrew West focused on > the circled ideographs and did not pursue new diagram drawing characters, > and were eventually successful. My Xiangqi proposal (http://www.unicode.org/L2/L2016/16255-n4748-xiangqi.pdf) proposed a minimal set of logical game pieces for Xiangqi/Janggi, regardless of shape (circular or octagonal) or design (traditional characters, simplified characters, cursive characters, or pictures) which I consider a font design issue, and explicitly did not seek to encode circled ideographs. My proposal was rejected, and a different proposal by Michael Everson (http://www.unicode.org/L2/L2016/16270-n4766-xiangqi.pdf) to encode all circled ideographs and negative circled ideographs attested in Xiangqi game diagrams was accepted instead. The accepted proposal for circled ideographs is a glyph encoding model not a character encoding model as for other game symbols (Chess, Dominos, Mahjong, Playing Cards, etc.), and in my opinion it is a very bad model for several reasons. It makes the interchange of Xiangqi game data and game diagrams problematic; it hinders normal text processing operations on Xiangqi game pieces (for example, to search for a red horse piece you have to search for three different characters); and in modern computer usage Xiangqi game pieces may not be represented as simple circled ideographs, but may be coloured designs showing characters or images. It is also very likely that vendors will want to produce emoji versions of Xiangqi pieces, and these could not reasonably be considered to be glyph variants of circled ideographs. There has been some negative feedback on the circled ideographs model on the internet, and I believe that Michael has now been convinced that this model is wrong, and should be replaced by a model using logical game pieces. Andrew