Re: PUA (BMP) planned characters HTML tables

2019-08-12 Thread Andrew West via Unicode
On Mon, 12 Aug 2019 at 02:27, James Kass via Unicode
 wrote:
>
> On 2019-08-11 5:26 PM, [ Doug Ewell ] via Unicode wrote:
> > If you are thinking of these as potential future additions to the standard, 
> > keep in mind that accented letters that can already be represented by a 
> > combination of letter + accent will not ever be encoded. This is one of the 
> > longest-standing principles Unicode has.

People seem to be ignoring the fact that Marshallese and Latvian both
use L and N with cedilla, but with completely different glyph shapes:

> In January 2013, the Unicode Technical Committee discussed issues for the 
> representation of
> Marshallese orthography. In particular, Marshallese uses the Latin script and 
> requires the letters l,
> m, n, and o with cedilla. Latvian orthography uses the Latin script and 
> requires the letters g, k, l, n,
> and r with comma below. For Marshallese, it is unacceptable to display 
> cedillas as commas below.
> Conversely, for Latvian, it is unacceptable to display commas below as 
> cedillas.

However, as fonts have been following Latvian practice for these
letters (cedilla is displayed as a comma below) since before Unicode,
Marshallese users cannot get their desired outcome using standard
Unicode combining diacritical marks unless they apply a font specially
designed for Marshallese -- which you can never guarantee if you are
writing an email or posting on twitter, etc.

This issue was discussed at WG2 in 2013
(https://www.unicode.org/L2/L2013/13128-latvian-marshal-adhoc.pdf),
when there was a recommendation to encode precomposed letters L and N
with cedilla *with no decomposition*, but that solution does not seem
to have been taken up by the UTC.

Andrew



Re: Fonts and Canonical Equivalence

2019-08-10 Thread Andrew West via Unicode
On Sat, 10 Aug 2019 at 15:46, Richard Wordingham via Unicode
 wrote:
>
> > Just retested on Windows 10 with
> > a Tibetan font that supports both sequences of vowels, and both
> > sequences display correctly under Harfbuzz (as expected), but only
> > vowel-below followed by vowel-above displays correctly when using
> > built-in Windows rendering.
>
> Does vowel above before vowel below yield a dotted circle?

Yes. Attached are screenshots for two real world examples, one which
is logically spelled as i + u, and one as u + i:

1. ཉིུ <0F49 0F72 0F74> [nyiu] as a contraction for ཉི་ཤུ [nyi shu] "twenty"

2. བཅིུག <0F56 0F45 0F74 0F72 0F42> [bcuig] as a contraction for
བཅུ་གཅིག [bcu gcig] "eleven"

Andrew


Re: Fonts and Canonical Equivalence

2019-08-10 Thread Andrew West via Unicode
On Sat, 10 Aug 2019 at 08:29, Richard Wordingham via Unicode
 wrote:
>
> There are similar issues with Tibetan; some fonts do not work properly
> if a vowel below (ccc=132) is separated from the base of the
> consonant stack by a vowel above (ccc=130).

It's not that the fonts don't work, it's that some the rendering
engines do not apply the OpenType features in the font that support
both sequences of vowels (vowel-above followed by vowel-below, and
vowel-below followed by vowel-above). Just retested on Windows 10 with
a Tibetan font that supports both sequences of vowels, and both
sequences display correctly under Harfbuzz (as expected), but only
vowel-below followed by vowel-above displays correctly when using
built-in Windows rendering.

It is very frustrating that Windows cannot correctly support the
display of Tibetan in normalized form, yet Harfbuzz does not have any
problems. Personally, I think USE is a failed experiment, and I wish
Microsoft would simply adopt Harfbuzz as the default rendering engine.

Andrew


Re: Proposal to extend the U+1F4A9 Symbol

2019-06-01 Thread Andrew West via Unicode
On Sat, 1 Jun 2019 at 23:32, Doug Ewell via Unicode  wrote:
>
> Tex wrote:
>
> > What I would find useful is an emoji for when my phone falls into the
> > toilet.
>
> I would have thought ⤵ would be sufficient.

Don't worry, a brand new foolproof method of defining emoji for
anything in the universe using Wikidata QIDs is coming to a phone near
you soon (http://www.unicode.org/L2/L2019/19082r-qid-emoji.pdf) ...
oh, there is no Wikidata QID for phone dropped in the toilet.

Andrew



Re: Encoding italic

2019-02-05 Thread Andrew West via Unicode
On Tue, 5 Feb 2019 at 15:34, wjgo_10...@btinternet.com via Unicode
 wrote:
>
> italic version of a glyph in plain text, including a suggestion of to
> which characters it could apply, would test whether such a proposal
> would be accepted to go into the Document Register for the Unicode
> Technical Committee to consider or just be deemed out of scope and
> rejected and not considered by the Unicode Technical Committee.

Just reminding you that "The initial character in a variation sequence
is never a nonspacing combining mark (gc=Mn) or a canonical
decomposable character" (The Unicode Standard 11.0 §23.4). This means
that a variation sequence cannot be defined for any precomposed
letters and diacritics, so for example you could not italicize the
word "fête" by simply adding VS14 after each letter because "ê" (in
NFC form) cannot act as the base for a variation sequence. You would
have to first convert any text to be italicized to NFD, then apply
VS14 to each non-combining character. This alone would make a VS
solution unacceptable in my opinion.

Andrew



Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Andrew West via Unicode
On Fri, 1 Feb 2019 at 22:20, Doug Ewell via Unicode  wrote:
>
> Richard Wordingham wrote:
>
> > Language tagging is already available in Unicode, via the tag
> > characters in the deprecated plane.
>
> Plane 14 isn't deprecated -- that isn't a property of planes -- and the
> tag characters U+E0020 through U+E007E have been un-deprecated for use
> with emoji flags. Only U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are
> deprecated.

Cancel Tag is not deprecated any longer either
(http://www.unicode.org/Public/UNIDATA/PropList.txt).

Andrew


Re: Encoding italic

2019-01-29 Thread Andrew West via Unicode
On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode
 wrote:
>
> This bold new concept was not mine.  When I tested it
> here, I was using the tag encoding recommended by the developer.

Congratulations James, you've successfully interchanged tag-styled
plain text over the internet with no adverse side effects. I copied
your email into BabelPad and your "bold" is shown bold (see attached
screenshot).

Andrew


Re: Encoding italic

2019-01-29 Thread Andrew West via Unicode
On Tue, 29 Jan 2019 at 10:25, Martin J. Dürst via Unicode
 wrote:
>
> The overall tag proposal had the desired effect: The original proposal
> to hijack some unused bytes in UTF-8 was defeated, and the tags itself
> were not actually used and therefore could be depreciated.

And the tag characters (all except E0001) are now no longer
deprecated. As flag tag sequences are now a thing
(http://www.unicode.org/reports/tr51/#valid-emoji-tag-sequences), and
are widely supported (including on Twitter), your and PV's objections
to using tag characters for a plain text font styling protocol simply
because they are tag characters carry zero weight.

Andrew



Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Andrew West via Unicode
On Thu, 24 Jan 2019 at 15:42, James Kass  wrote:
>
> Here's a very polite reply from John Hudson from 2000,
> http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML024/1042.html
> ...and, over time, many of the replies to William Overington's colorful
> suggestions were less than polite.  But it was clear that colors were
> out-of-scope for a computer plain-text encoding standard.

Going off topic a little, I saw this tweet from Marijn van Putten
today which shows examples of Arabic script from early Quranic
manuscripts with phonetic information indicated by the use of red and
green dots:

https://twitter.com/PhDniX/status/1088171783461703682

I would be interested to know how those should be represented in Unicode.

Andrew


Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Andrew West via Unicode
On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode
 wrote:
>
> FAICT, the emoji repertoire is vendor-driven, just as the pre-Unicode
> emoji sets were vendor driven.  Pre-Unicode, if a vendor came up with
> cool ideas for new emoji they added new characters to the PUA.  Now that
> emoji are standardized, when vendors come up with new ideas they put
> them in the emoji ranges in order to preserve the standardization factor
> and ensure interoperability.  (That's probably over-simplified and there
> are bound to be other factors involved.)

I do not believe that recent (post-6.0) emoji additions are
vendor-driven. There is no formal vendor representation on the ESC,
and most ESC members do not work for vendors. Current emoji additions
are driven by ordinary users, who are actively encouraged by the UTC
to propose novel characters for encoding:

http://blog.unicode.org/2018/04/submissions-open-for-2020-emoji.html
http://blog.unicode.org/2016/09/emoji-deadline.html

The vendors happily lap up whatever emojis the UTC throws at them, but
they seem to have little interest in taking control of the emoji
process.

> We should no more expect the conventional Unicode character encoding
> model to apply to emoji than we should expect the old-fashioned text
> ranges to become vendor-driven.

Why should we not expect the conventional Unicode character encoding
mode to apply to emoji?

We were told time and time again when emoji were first proposed that
they were required for encoding for interoperability with Japanese
telecoms whose usage had spilled over to the internet. At that time
there was no suggestion that encoding emoji was anything other than a
one-off solution to a specific problem with PUA usage by different
vendors, and I at least had no idea that emoji encoding would become a
constant stream with an annual quota of 60+ fast-tracked
user-suggested novelties. Maybe that was the hidden agenda, and I was
just naïve.

The ESC and UTC do an appallingly bad job at regulating emoji, and I
would like to see the Emoji Subcommittee disbanded, and decisions on
new emoji taken away from the UTC, and handed over to a consortium or
committee of vendors who would be given a dedicated vendor-use emoji
plane to play with (kinda like a PUA plane with pre-assigned
characters with algorithmic names [VENDOR-ASSIGNED EMOJI X] which
the vendors can then associate with glyphs as they see fit; and as
emoji seem to evolve over time they would be free to modify and
reassign glyphs as they like because the Unicode Standard would not
define the meaning or glyph for any characters in this plane).

Andrew



Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Andrew West via Unicode
On Thu, 24 Jan 2019 at 02:10, Mark E. Shoulson via Unicode
 wrote:
>
> Unicode isn't here to encode cool new ideas that would be cool and
> new.  It's here for writing what people already do.

http://www.unicode.org/L2/L2018/18141r2-emoji-colors.pdf

"Add 14 colored emoji characters for decorative and/or descriptive
uses. These may be used to indicate that an emoji has a different
color."

No evidence has been provided that anybody is currently using colored
blobs for this purpose (in fact emoji users have explicitly rejected
this method for indicating emoji color:
http://www.unicode.org/L2/L2018/18208-white-wine-rgi.pdf), just an
assertion that it would be a good idea if emoji users could add a
colored swatch to an existing emoji to indicate what color they want
it to represent (note that the colored characters do not change the
color of the emoji they are attached to [before or after, depending
upon whether you are speaking French or English dialect of emoji],
they are just intended as a visual indication of what colour you wish
the emoji was).

This proposal to add 14 additional colored circles, squares and hearts
is a perfect example of a cool new idea for something that the authors
think would be really useful, but for which there is no evidence of
existing use. The UTC should have rejected it as out of scope, but we
all know that rules and procedures do not apply to the Emoji
Subcommittee, so in fact this cool new idea will be included in
Unicode 12 in March.

Andrew


Re: Encoding italic (was: A last missing link)

2019-01-20 Thread Andrew West via Unicode
On Sun, 20 Jan 2019 at 03:16, James Kass via Unicode
 wrote:
>
> Possible approaches include:
>
> 3 - Open/Close punctuation treatment
> Stateful.  Works on ranges.  Not currently supported in plain-text.
> Could be supported in applications which can take a text string URL and
> make it a clickable link.  Default appearance in nonsupporting apps may
> resemble existing plain-text italic kludges such as slashes.  The ASCII
> is already in the character string.

A possibility that I don't think has been mentioned so far would be to
use the existing tag characters (E0020..E007F). These are no longer
deprecated, and as they are used in emoji flag tag sequences, software
already needs to support them, and they should just be ignored by
software that does not support them. The advantages are that no new
characters need to be encoded, and they are flexible so that tag
sequences for start/end of italic, bold, fraktur, double-struck,
script, sans-serif styles could be defined. For example start and end
of italic styling could be defined as the tag sequences  and 
(E003C E0069 E003E and E003C E002F E0069 E003E).

Andrew


Re: Private Use areas - Vertical Text

2018-08-29 Thread Andrew West via Unicode
On Wed, 29 Aug 2018 at 11:18,  wrote:
>
> I was using a change horizontal to vertical text feature in office, the
> PUA characters being from plane 15.

I tested with Word 2007, and normal PUA characters from my font were
displayed with vertical orientation in a vertical text box, but Plane
15 PUA characters were rotated.

I also tested with Word 2016, and both normal PUA characters and Plane
15 PUA characters were displayed with vertical orientation in a
vertical text box, as you want, although there were vertical spacing
issues with the Plane 15 PUA characters which suggest that the
vertical metrics tables (vhea and vmtx) in the font are not being
applied for Plane 15 characters (or it could be a problem with my
font).

Andrew


Re: Private Use areas - Vertical Text

2018-08-29 Thread Andrew West via Unicode
On Wed, 29 Aug 2018 at 05:07, via Unicode  wrote:
>
> Yes, as Richard says when CJK Zhuang text is displayed vertically whilst
> the Zhuang characters in Unicode remain upright, but those with PUA
> codepoints are rotated 90°.

John, you did not explain by what mechanism you were trying to display
vertical PUA Zhuang text.

I can display vertically-oriented PUA-encoded CJKVZ ideographs in
vertical layout in web pages using CSS, as demonstrated in this test
page:

http://www.babelstone.co.uk/Fonts/PUA_Vertical_Test.html

The PUA characters display with correct orientation under Windows 10
on the Edge, Chrome and Firefox browsers. The test page only fails
under IE, but we are not meant to use IE anymore anyway.

Andrew



Re: Private Use areas - Vertical Text

2018-08-29 Thread Andrew West via Unicode
On Tue, 28 Aug 2018 at 18:15, WORDINGHAM RICHARD via Unicode
 wrote:
>
> Unicode is doing what it can in this matter:
>
> (a) Zhuang PUA characters are being made individually obsolete.

Not by a nebulous entity called "Unicode", or even by the Unicode
Consortium per se, but by the hard work over many years by individual
experts such as John Knightley.

Andrew


Re: The Unicode Standard and ISO

2018-06-08 Thread Andrew West via Unicode
On 8 June 2018 at 13:01, Michael Everson via Unicode
 wrote:
>
> I wonder if Mark Davis will be quick to agree with me  when I say that 
> ISO/IEC 15897 has no use and should be withdrawn.

It was reviewed and confirmed in 2017, so the next systematic review
won't be until 2022. And as the standard is now under SC35, national
committees mirroring SC2 may well overlook (or be unable to provide
feedback to) the systematic review when it next comes around. I agree
that ISO/IEC 15897 has no use, and should be withdrawn.

Andrew



Re: Translating the standard

2018-03-12 Thread Andrew West via Unicode
On 12 March 2018 at 07:59, Marcel Schneider via Unicode
 wrote:
>
> Likewise ISO/IEC 10646 is available in a French version

No it is not, and never has been.

Why don't you check your facts before making misleading statements to this list?

> or at least, it should have an official French version like all ISO standards.

That is also blatantly untrue.

Only six of the publicly available ISO standards listed at
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
have French versions, and one has a Russian version. You will notice
that there is no French version of ISO/IEC 10646.

Andrew


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Andrew West via Unicode
On 7 March 2018 at 22:18, Philippe Verdy via Unicode
 wrote:
>
> Additional note: the UCS will never large enough to support the personal
> signatures of billions Chinese people living today or born since milleniums,
> or jsut those to be born in the next century. There's a need to represent
> these names using composed strings. A reasonable compositing/ligaturing
> process can then present almost all of them !

CJK characters invented for writing personal names are extremely rare,
and do not constitute a significant fraction of CJK ideographs
proposed for encoding. The majority of unencoded modern-use characters
in China (that are not systematic simplified forms of existing encoded
characters) are used in place names or in Chinese dialects or for
writing non-Chinese languages such as Zhuang.

Andrew


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Andrew West via Unicode
On 28 February 2018 at 13:22, Christoph Päper via Unicode
 wrote:
>>
>> The 157 new Emoji are now available for adoption
>
> But Unicode 11.0 (which all new emojis but Pirate Flag and Infinity rely 
> upon) is not even in beta yet.

Don't even get me started on that!

>> There are approximately 7,000 living human languages,
>> but fewer than 100 of these languages are well-supported on computers,
>> mobile phones, and other devices. Adopt-a-character donations are used
>> to improve Unicode support for digitally disadvantaged languages, and to
>> help preserve the world’s linguistic heritage.
>
> Why is the announcement mentioning those numbers of languages at all?

I agree, the figures are meaningless and misleading (and intended to
mislead). I could list a hundred languages that are written with the
Latin script without pausing for breath. There are very very few
scripts in modern daily use that are not yet encoded in the UCS, but
letting out that secret will not help the Unicode Consortium to raise
money from character adoption.

The latest grant to Anshu from Character Adoption money is for three
historic scripts
(http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html).
If there were still so many digitally disadvantaged languages urgently
in need of script encoding then surely the Unicode Consortium would be
sponsoring those as a priority rather than historic scripts.

Andrew



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Andrew West via Unicode
On 28 February 2018 at 10:48, Martin J. Dürst via Unicode
 wrote:
>>
>>> The 157 new Emoji are now available for adoption, to help the Unicode
>>> Consortium’s work on digitally disadvantaged languages.
>>
>> I'm quite curious what it the relation between the new emojis and the
>> digitally disadvantages languages. I see none.
>
> I think this was mentioned before on this list, in particular by Mark:
> The money collected from character adoptions (where emoji are a prominent
> target) is (mostly?) used to support work on not-yet-encoded (thus digitally
> disadvantaged) scripts.

Over $250,000 has been raised from Unicode character adoptions to
date. I am curious as to how much of this money has been spent, and
would very much like to see annual accounts showing how much money has
been received, and how much has been disbursed to whom and for what.

Andrew



. See e.g. the recent announcement at
> http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html.
>
> Regards,   Martin.



Re: UNICODE vehicle vanity registration?

2018-02-14 Thread Andrew West via Unicode
You can use ♥⭐➕ in California. Someone has U+1F913 邏 (
https://www.instagram.com/p/BVYtIHensDu/)

Andrew


On 14 February 2018 at 16:24, Stephane Bortzmeyer via Unicode <
unicode@unicode.org> wrote:

> On Wed, Feb 14, 2018 at 09:44:06PM +0530,
>  Shriramana Sharma via Unicode  wrote
>  a message of 6 lines which said:
>
> > Given that in the US vanity vehicle registrations with arbitrary
> > alphanumeric sequences upto 7 characters are permitted (I am correct
> > I hope?), I wonder who (here?) owns the UNICODE registration?
>
> Won't work in New York, unfortunately
>
> https://dmv.ny.gov/learn-about-personalized-plates
>
> "A character is a letter (A-Z), number (0-9) or space. Each space
> counts as one character."
>
>


Re: 0027, 02BC, 2019, or a new character?

2018-01-25 Thread Andrew West via Unicode
On 23 January 2018 at 00:55, James Kass via Unicode  wrote:
>
> Regular American users simply don't type umlauts, period.

Not even the president of the Unicode Consortium when referring to
Christoph Päper:

http://www.unicode.org/L2/L2018/18051-emoji-ad-hoc-resp.pdf

Andrew



Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Andrew West via Unicode
On 19 January 2018 at 13:19, Michael Everson via Unicode
 wrote:
>
> I’d go talk with him :-) I published Alice in Kazakh. He might like that.

Damn, you'll have to reprint it with apostrophes now.

Andrew



Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Andrew West via Unicode
On 19 January 2018 at 09:16, Shriramana Sharma via Unicode
 wrote:
> Wow. Somebody really needs to convey this to the Kazhaks. Else a
> short-sighted decision would ruin their chances at native IDNs. Any Kazhaks
> on this list?

There's only one Kazakh who counts, and I'm pretty sure he's not on this list.

Andrew


Re: Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation)

2017-04-12 Thread Andrew West via Unicode
On 12 April 2017 at 15:58, Garth Wallace  wrote:
>
> So has that proposal been retracted now?

Once a proposal has been approved it cannot simply be retracted by the
submitter. On the SC2 side, the proposed characters have been subject
to ballot comments from national bodies, and no doubt they will be
discussed at the WG2 meeting in Hohhot later this year.

Andrew


Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation)

2017-04-12 Thread Andrew West via Unicode
On 12 April 2017 at 05:12, Garth Wallace via Unicode
 wrote:
>
> Later Xiangqi proposals by Andrew West focused on
> the circled ideographs and did not pursue new diagram drawing characters,
> and were eventually successful.

My Xiangqi proposal
(http://www.unicode.org/L2/L2016/16255-n4748-xiangqi.pdf) proposed a
minimal set of logical game pieces for Xiangqi/Janggi, regardless of
shape (circular or octagonal) or design (traditional characters,
simplified characters, cursive characters, or pictures) which I
consider a font design issue, and explicitly did not seek to encode
circled ideographs. My proposal was rejected, and a different proposal
by Michael Everson
(http://www.unicode.org/L2/L2016/16270-n4766-xiangqi.pdf) to encode
all circled ideographs and negative circled ideographs attested in
Xiangqi game diagrams was accepted instead.

The accepted proposal for circled ideographs is a glyph encoding model
not a character encoding model as for other game symbols (Chess,
Dominos, Mahjong, Playing Cards, etc.), and in my opinion it is a very
bad model for several reasons. It makes the interchange of Xiangqi
game data and game diagrams problematic; it hinders normal text
processing operations on Xiangqi game pieces (for example, to search
for a red horse piece you have to search for three different
characters); and in modern computer usage Xiangqi game pieces may not
be represented as simple circled ideographs, but may be coloured
designs showing characters or images. It is also very likely that
vendors will want to produce emoji versions of Xiangqi pieces, and
these could not reasonably be considered to be glyph variants of
circled ideographs. There has been some negative feedback on the
circled ideographs model on the internet, and I believe that Michael
has now been convinced that this model is wrong, and should be
replaced by a model using logical game pieces.

Andrew