RE: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-21 Thread Peter Constable via Unicode
I suspect if you look at the JPEG and MPEG standards you'll find there is a 
normative reference to Unicode or ISO/IEC 10646. Same for standards specifying 
C, ECMAScript and other languages in which modern software is written. So, 
arguably the statement isn't much of a stretch at all.


Peter

From: Unicode  On Behalf Of Costello, Roger L. via 
Unicode
Sent: Tuesday, November 19, 2019 11:00 AM
To: unicode@unicode.org
Subject: Is the Unicode Standard "The foundation for all modern software and 
communications around the world"?

Hi Folks,

Today I received an email from the Unicode organization. The email said this: 
(italics and yellow highlighting are mine)

The Unicode Standard is the foundation for all modern software and 
communications around the world, including all modern operating systems, 
browsers, laptops, and smart phones-plus the Internet and Web (URLs, HTML, XML, 
CSS, JSON, etc.).

That is a remarkable statement! But is it entirely true? Isn't it assuming that 
everything is text? What about binary information such as JPEG, GIF, MPEG, WAV; 
those are pretty core items to the Web, right? The Unicode Standard is silent 
about them, right? Isn't the above quote a bit misleading?

/Roger


RE: New Public Review on QID emoji

2019-11-09 Thread Peter Constable via Unicode
> Yet if QID emoji are implemented by Unicode Inc. without also being 
> implemented by ISO/IEC 10646 then that could lead to future problems,

Neither Unicode Inc. or ISO/IEC 10646 would _implement_ QID emoji. Unicode 
would provide a specification for QID emoji that software vendors could 
implement, while ISO/IEC 10646 would not define that specification. As Ken 
mentions, there are already many emoji in use inter-operably based on 
specifications provided by Unicode but not by ISO/IEC 10646.

Ken's other point is also worth stressing: there are inter-op issues inherent 
to the architecture of the QID mechanism.

If adopted as a Unicode spec, any software vendor could choose to implement 
anything they might want as a QID emoji sequence, and there would be no 
guarantee that any other software would interoperate with that beyond a very 
minimum: if other software supported the QID mechanism, it would recognize the 
sequence as a QID sequence, and might handle it as a unit for segmentation 
purposes (selection, line breaking), but only render the base emoji character 
in a legible way and nothing more.

Moreover, it's possible that two vendors might implement the same QID sequence 
with significantly different appearances, enough to connote different meanings. 
(Think of issues from recent years in which the major vendors had significantly 
different appearances for the same emoji character. Then extend that 
possibility to every QID sequence that any vendor implements.) Also, it's 
entirely possible that two different vendors might implement _different_ QID 
sequences with similar appearances and semantic intent. The PRI doc mentions 
the possibility of a registry for QID sequences; a key benefit of a registry is 
that it may mitigate against these non-interop risks. But the current proposal 
does not in fact provide any mitigations for these issues other than the 
possibility that a QID sequence might be at some point become an RGI sequence.


Peter

-Original Message-
From: Unicode  On Behalf Of Ken Whistler via 
Unicode
Sent: Wednesday, October 30, 2019 12:19 PM
To: wjgo_10...@btinternet.com
Cc: unicode@unicode.org
Subject: Re: New Public Review on QID emoji


On 10/30/2019 10:41 AM, wjgo_10...@btinternet.com via Unicode wrote:
>
> At present I have a question to which I cannot find the answer.
>
> Is the QID emoji format, if approved by the Unicode Technical 
> Committee going to be sent to the ISO/IEC 10646 committee for 
> consideration by that committee?
No.
>
> As the QID emoji format is in a Unicode Technical Standard and does 
> not include the encoding of any new _atomic_ characters, I am 
> concerned that the answer to the above question may well be along the 
> lines of "No" maybe with some reasoning as to why not.
As you surmised.
>
> Yet will a QID emoji essentially be _de facto_ a character even if not 
> _de jure_ a character?
That distinction is effectively meaningless. There are any number of entities 
that end users perceive as "characters", which are not represented by a single 
code point in the Unicode Standard (or 10646) -- and this has been the case now 
for decades.
>
>
> Yet if QID emoji are implemented by Unicode Inc. without also being 
> implemented by ISO/IEC 10646 then that could lead to future problems, 
> notwithstanding any _de jure_ situation that QID emoji are not 
> characters, because they will be much more than Private Use characters 
> yet less than characters that are in ISO/IEC 10646.

What you are missing is that *many* emoji are already represented by sequences 
of characters. See emoji modifier sequences, emoji flag sequences, emoji ZWJ 
sequences. *None* of those are specified in 10646, have not been for years now, 
and never will be. And yet, there is no de jure standardization crisis here, or 
any interoperability issue for emoji arising from that situation.

>
> I am in favour of the encoding of the QID emoji mechanism and its 
> practical application. However I wonder about what are the 
> consequences for interoperability and communication if QID emoji 
> become used - maybe quite widely - and yet the tag sequences are not 
> discernable in meaning from ISO/IEC 10646 or any related ISO/IEC 
> documents.

There may well be interoperability concerns specifically for the QID emoji 
mechanism, but that would be an issue pertaining to the architecture of that 
mechanism specifically. It isn't anything to do with the relationship between 
the Unicode Standard (and UTS #51) and ISO/IEC 10646.

--Ken





RE: The native name of Tai Viet script and language(s)

2019-08-26 Thread Peter Constable via Unicode
" As the proposal for TaiViet script to the Unicode is still on

the progress, we use the Private Use Area for TaiViet

characters (U+F000..U+F07E). "



Er... The script has been in Unicode for about 10 years, since Unicode 5.2.



The block description in 16.8 of Unicode 12 provides useful info:



https://www.unicode.org/versions/Unicode12.0.0/ch16.pdf



What may be helpful to understand is that "Tai" refers, at one level, to an 
entire language family that encompasses languages spoken from southern Thailand 
in the south to central China in the north, and from Vietnam in the east to 
eastern India in the west. "Tai" can also be used at a different level as the 
name for individual languages in that family (with either an un-aspirated /t/ 
as in /tai/, or an aspirated /tʰ/ as in /tʰai/ — and in China /t/ is usually 
written with “d”), though usually a distinguishing qualifier is added to the 
name, as in “Tai Dam” or “Dehong Dai”. Thai, aka Siamese, is a particular 
exception.



So, Tai Viet is used for writing various Tai languages in Vietnam and Laos, and 
reportedly also in Central Thailand. These are all distinct languages. IIRC, 
the script name “Tai Viet” was coined because of predominant use in Vietnam, 
not because that’s what any user community historically would call the script.



The script _is_ related to Thai script, but I’m not sure I would say it has 
“the same origin as that of Thai language/script used in Thailand”, as that is 
too simplistic a view of the historic connections: it suggests that Thai script 
and Tai Viet developed directly from the same precursor, which isn’t really 
accurate.



And the mentions of language reflect misunderstanding.



“TaiViet refers to the Tai language used by Tai people in Vietnam…”



No, it does not refer to a language at all. And “_the_ Tai language… in 
Vietnam” is misunderstanding the language situation: of over 100 languages 
spoken in Vietnam, there are 32 languages from the Tai-Kadai language family, 
and 12 from the Southwestern Tai branch, which is the branch that includes Thai 
(Siamese). To say “the language [has] the same origin as that of Thai…” isn’t 
correct in that there isn’t _one_ language involved. It would be accurate to 
say that the languages written with the Tai Viet script are closely-related to 
Thai (in the same sense that French, Spanish, Italian, etc. are closely-related 
to one another).



For more on the Southwestern Tai languages, see 
https://www.ethnologue.com/subgroups/southwestern.





Hope that’s of some help.





Peter





-Original Message-
From: Unicode  On Behalf Of Eli Zaretskii via 
Unicode
Sent: Thursday, August 22, 2019 6:46 AM
To: unicode@unicode.org
Subject: The native name of Tai Viet script and language(s)



Could someone "in the know" please help me make the Tai Viet script 
documentation in Emacs accurate?



The current short description we have is in the file lisp/language/tai-viet.el 
in the Emacs source tree.  You can see it

here:



  
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit.savannah.gnu.org%2Fcgit%2Femacs.git%2Ftree%2Flisp%2Flanguage%2Ftai-viet.el&data=02%7C01%7Cpetercon%40microsoft.com%7C443ad2416ab54b6697df08d72707bbc4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637020786486232455&sdata=s1TRwKN6lDw%2FAN3wPwrwxPdJanwx8YKwN9yX4bVhuIs%3D&reserved=0



My concern is with the text under "sample-text" (line 40) and in the 
documentation string following that (starting on line 48), which states the 
name of the script and the language expressed with Tai Viet characters.



However, that text is from long ago, before Unicode had a Tai Viet block, so it 
still uses at least one PUA character, whuch I think is incorrect.  In 
addition, I didn't find any place where I could copy/paste the current accurate 
name of the script and at least one of the languages that use that script.



Could someone please help me set this text straight?  Bonus points for also 
telling how to say "hello" (or any similar greeting) in one of the Tai Viet 
languages, so that we could add that to the etc/HELLO file.  (I think the 
sample-text attempts to include such a greeting, but again, I'm not sure it is 
correct.)



Thanks in advance.


RE: MODIFIER LETTER SMALL GREEK PHI in Calibri is wrong.

2019-04-17 Thread Peter Constable via Unicode
Thanks for reporting. The team responsible for the font has recorded a bug 
entry for this issue and will be working on a fix.

From: Unicode  On Behalf Of Oren Watson via Unicode
Sent: Wednesday, April 17, 2019 2:05 PM
To: unicode Unicode Discussion 
Subject: MODIFIER LETTER SMALL GREEK PHI in Calibri is wrong.

Would anyone know where to report this?
In the widely used Calibri typeface included with MS Office, the glyph shown 
for U+1D60 MODIFIER LETTER SMALL GREEK PHI, actually depicts a letter psi, not 
a phi.



RE: Private Use areas

2018-08-27 Thread Peter Constable via Unicode
This was meant to go to the list.

From: Peter Constable
Sent: Monday, August 27, 2018 12:33 PM
To: wjgo_10...@btinternet.com; jameskass...@gmail.com; 
richard.wording...@ntlworld.com; m...@kli.org; beckie...@gmail.com; 
verd...@wanadoo.fr
Subject: RE: Private Use areas

That sounds like a non-conformant use of characters in the U+24xx block.


Peter

From: Unicode mailto:unicode-boun...@unicode.org>> 
On Behalf Of William_J_G Overington via Unicode
Sent: Monday, August 27, 2018 2:00 AM
To: jameskass...@gmail.com<mailto:jameskass...@gmail.com>; 
richard.wording...@ntlworld.com<mailto:richard.wording...@ntlworld.com>; 
m...@kli.org<mailto:m...@kli.org>; 
beckie...@gmail.com<mailto:beckie...@gmail.com>; 
verd...@wanadoo.fr<mailto:verd...@wanadoo.fr>
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Private Use areas

Hi

How about the following method.

In a text file that contains text that uses Private Use Area characters, start 
the file with a sequence of Enclosed Alphanumeric characters from regular 
Unicode, that sequence containing the metadata relating to those Private Use 
Area characters as used in their present context.

http://www.unicode.org/charts/PDF/U2460.pdf

Use circled digits U+24EA, U+2460 .. U+2469, and Circled Latin letters U+24B6 
.. U+24E9.

Use U+2473 as if it were a circled space. The use of 20 to mean a space often 
occurs in web addresses. I know that there it is hexadecimal and here it is 
decimal but it has the same look of being an encoded space and so that is why I 
am suggesting using it.

Start the sequence with PUAINFO encoded using seven circled Latin letters and 
any character other than a carriage return or a line feed shows that the 
sequence has ended. The use of PUAINFO encoded using seven circled Latin 
letters at the start of the sequence is so that text using enclosed 
alphanumeric characters for another purpose would not become disrupted.

Then a suitable software application can read the text file and then, either 
automatically or after the clicking of a button, extract metadata information 
from the sequence of enclosed alphanumeric characters and not display the 
sequence of enclosed alphanumeric characters.

Maybe other circled numbers in the range 10 through to 19 would have special 
meanings.

This method would keep everything within plane zero.

William Overington

Monday 27 August 2018





Original message
From : unicode@unicode.org<mailto:unicode@unicode.org>
Date : 2018/08/21 - 23:23 (GMTDT)
To : d...@ewellic.org<mailto:d...@ewellic.org>
Cc : unicode@unicode.org<mailto:unicode@unicode.org>
Subject : Re: Private Use areas
On Tue, Aug 21, 2018 at 3:02 PM Doug Ewell via Unicode 
mailto:unicode@unicode.org>> wrote:
Ken Whistler wrote:

> The way forward for folks who want to do this kind thing is:
>
> 1. Define a *protocol* for reliable interchange of custom character
> property information about PUA code points.

I've often thought that would be a great idea. You can't get to steps 2
and 3 without step 1. I'd gladly participate in such a project.

As would I.










RE: Private Use areas (was: Re: Thoughts on working with the Emoji Subcommittee (was ...))

2018-08-27 Thread Peter Constable via Unicode
Layout engines that support CJK vertical layout do not rely on the 'vert' 
feature to rotate glyphs for CJK ideographs, but rather rotate the glyph 90° 
and switch to using vertical glyph metrics. The 'vert' feature is used to 
substitute vertical alternate glyphs as needed, such as for punctuation that 
isn't automatically rotated (and would probably need a differently-positioned 
alternate in any case).

Cf. UAX 50.


Peter

-Original Message-
From: Unicode  On Behalf Of Richard Wordingham via 
Unicode
Sent: Tuesday, August 21, 2018 3:02 AM
To: unicode@unicode.org
Subject: Re: Private Use areas (was: Re: Thoughts on working with the Emoji 
Subcommittee (was ...))

On Tue, 21 Aug 2018 08:53:18 +0800
via Unicode  wrote:

> On 2018-08-21 08:04, Mark E. Shoulson via Unicode wrote:

> > Still, maybe it
> > doesn't really matter much: your special-purpose font can treat any 
> > codepoint any way it likes, right?

> Not all properties come from the font. For example a Zhuang character 
> PUA font, which supplements CJK ideographs, does not rotate characters 
> 90 degrees, when change from RTL to vertical display of text.

Isn't that supposed to be treated by an OpenType feature such as 'vert'?  Or 
does the rendering stack get in the way?

However, one might need reflowing text to be about 40% WJ.

Richard.



RE: Unicode 11 Georgian uppercase vs. fonts

2018-07-27 Thread Peter Constable via Unicode
Just an observation on these issues: When the Mtavruli proposal was first 
presented to UTC, several UTC members voiced strong reservation because of the 
kind of issues mentioned for case mapping, and in particular on database 
indexing and querying. Several months later, various UTC members participated 
in a teleconference with representation from Georgian institutions, including 
IT people from Bank of Georgia and TBC Bank. During that meeting, the 
representatives of the Georgian enterprises (i) demonstrated an understanding 
of those issues and the implications, (ii) gave an indication of support from 
those enterprises and a commitment to update their applications as may be 
required, and (iii) gave indication of intent to develop a plan of action for 
preparing their institutions for this change as well as communicating that 
within Georgian industry and society. It was only after that did UTC feel it 
was viable to proceed with encoding Mtavruli characters.


Peter



From: Unicode  On Behalf Of Asmus Freytag via 
Unicode
Sent: Friday, July 27, 2018 7:01 AM
To: unicode@unicode.org
Subject: Re: Unicode 11 Georgian uppercase vs. fonts

On 7/27/2018 3:42 AM, Michael Everson via Unicode wrote:

Yes and it explains clearly that “effectively caseless Georgian” is incorrect. 
Georgian has case. Georgian uses case differently from other scripts. This is 
an orthographic distinction, not a structural one. In fact as it is also stated 
in the proposal, there are 19th-century texts which do titlecase. It’s just 
that that orthography is no longer in use and that behaviour no longer 
desirable.

"Georgian uses case differently from other scripts"

That's one of the key issues here for developers (and users) of libraries. 
Because it means that any implicit assumptions about the applicability of a 
certain case-transform is now broken.

This goes beyond whether fonts are actually installed now or at the end of some 
transition period, or ever: if functions like ToUpper, which used to have no 
effect on Georgian before, suddenly do - in ways that the users of the script 
do not expect, then your application is broken, from one day to the next.

The current situation prior to the change is perhaps best characterized by 
saying that there was support for some locale differences in the way certain 
characters were mapped, but not in whether or not to do a given mapping at all.

If, as has been suggested, the use of case in Georgian is more similar to that 
of smallcaps in other scripts, then, instead of ToUpper doing a case 
transformation for Georgian, what would be need is something like a 
"ToSmallCaps" function (better name here, because the Georgian letters aren't 
actually "small caps").

That way, the existing "ToUpper" could retain its implicit semantic of 
"uppercase transformation in those scripts where such transformations are used 
in a common way".

This would solve 1/2 of the problem, which is to prevent uppercasing where 
users of Georgian do not expect it. However, it does not work in plain text for 
the other scripts, because there, small caps are not encoded, so there's no 
plain-text solution.

To get back to Markus' original question on how to handle this for ICU: it 
seems more and more that Georgian should be exempted from standard library 
functions and that a new function needs to be added that just transforms 
Georgian and leaves all other scripts alone (or one that takes a language/local 
parameter).

A./


RE: Unicode 11 Georgian uppercase vs. fonts

2018-07-27 Thread Peter Constable via Unicode
Alex:

Quoting you from two separate messages:

> Many Georgian scientists working with script and language are not fans of 
> "uppercase" font styles.

>With all my respect, N2608R2 is right and N4712 is wrong about case in 
>Georgian.

Can you comment, then, on N4776, in which the Georgian Minister of Education 
and Science appears to be referring to Mtavruli as “Georgian capital letters”?


Thanks
Peter


RE: Unicode 11 Georgian uppercase vs. fonts

2018-07-20 Thread Peter Constable via Unicode
IMO, the correct answer is 2, except that “all common fonts” is more sweeping 
that necessary: it’s sufficient to have fonts used for fallback in platforms 
and browsers, and the related fallback logic, to get updated. Of course, that 
takes some time, and it’s not even two months since Unicode 11 was released. 
The Georgian community understood that it would take time to get 
implementations in place, and that they would need to take measures to smooth 
over that transition — which can include having Web sites for Georgian 
businesses and institutions using fonts to match the requirements of the 
content.


Peter

From: Unicore  On Behalf Of Markus Scherer via 
Unicore
Sent: Wednesday, July 18, 2018 3:05 PM
To: unicore UnicoRe Discussion 
Cc: mark 
Subject: Unicode 11 Georgian uppercase vs. fonts


Dear fellow Unicoders,



We’ve run into some significant problems with the Georgian capital letters 
added in Unicode 11. If you have run into them yourselves, or have feedback on 
our brainstormed solutions below, we’d love to hear your thoughts.


Here's the problem. The vast majority of Georgian fonts do not yet have the new 
uppercase characters. So when any system uses case mapping to uppercase text 
(e.g. browsers interpreting CSS’s text-transform: capitalize), then the users 
of Georgian will see boxes (“tofu”) if the font they are using does not have 
the glyphs.


For example, a program constructs a web page with buttons. It uses a CSS style 
to uppercase text in buttons, as a house style. Unless the user has a very 
up-to-date font, they see tofu (boxes). If a server does backend rendering, its 
font has to be very up-to-date. We also saw this problem in a program that was 
doing titlecasing, but on the first character it used the uppercase mappings 
rather than titlecase mappings. Not the right thing to do, of course, but code 
that accidentally works (most of the time) doesn't get fixed if nobody reports 
a bug about it.


All of these will result in bad bugs in the UI, in software that formerly 
worked fine.


We brainstormed some options to fix this:


  1.  Get all call sites to change their code to not uppercase Georgian (and 
fix titlecasing to use the titlecase mappings, not the uppercase mappings). 
Since we have no control over call sites and release cycles of affected 
software, this would not help Georgian users for a long time, if ever. We’d 
eventually want to retract these changes, creating even more work.
  2.  Change all common fonts with Georgian characters to add the U11.0 ones. 
This should eventually happen but would probably take a couple of years at 
least, which does not help users in the short term.
  3.  Hack font CMAPs to just map the new characters to the glyphs of the old 
ones. Works but only when a programmer can control the fonts used, such as with 
server-side rendering or downloadable fonts.
  4.  Remove the uppercase mappings for Georgian, until the fonts catch up.

 *   Would at least have to be done in all browsers, otherwise web apps 
will still break for Georgian.
 *   A broader alternative is to do it in ICU. Because that is used by the 
majority of the browser implementations, it would solve the short-term problem 
for the browsers — and many other programs. Drawback: Non-conformant, and 
uppercasing will be inconsistent depending on who has which variant of ICU 
(with vs. without hack, on top of: with Unicode 11 vs. before Unicode 11).

*   One precedent is that in CLDR we deliberately hold back from using 
new currency characters until the font support is sufficiently widespread. 
(Wishing we'd held back the uppercase mappings in Unicode 11.0 too!)


Mark & Markus


RE: The Unicode Standard and ISO

2018-06-10 Thread Peter Constable via Unicode
> ... For another part it [sync with ISO/IEC 15897] failed because the 
> Consortium refused to cooperate, despite of
repeated proposals for a merger of both instances.

First, ISO/IEC 15897 is built on a data-format specification, ISO/IEC TR 14652, 
that never achieved the support needed to become an international standard, and 
has since been withdrawn. (TRs cannot remain TRs forever.) Now, JTC1/SC35 began 
work four or five years ago to create data-format specification for this, 
Approved Work Item 30112. From the outset, Unicode and the US national body 
tried repeatedly to engage with SC35 and SC35/WG5, informing them of UTS #35 
(LDML) and CLDR, but were ignored. SC35 didn’t appear to be interested a pet 
project and not in what is actually being used in industry. After several 
failed attempts, Unicode and the USNB gave up trying.

So, any suggestion that Unicode has failed to cooperate or is is dropping the 
ball with regard to locale data and ISO is simply uninformed.


Peter


From: Unicode  On Behalf Of Mark Davis ?? via 
Unicode
Sent: Thursday, June 7, 2018 6:20 AM
To: Marcel Schneider 
Cc: UnicodeMailing 
Subject: Re: The Unicode Standard and ISO

A few facts.

> ... Consortium refused till now to synchronize UCA and ISO/IEC 14651.

ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could 
speak to the synchronization level in more detail, but the above statement is 
inaccurate.

> ... For another part it [sync with ISO/IEC 15897] failed because the 
> Consortium refused to cooperate, despite of
repeated proposals for a merger of both instances.

I recall no serious proposals for that.

(And in any event — very unlike the synchrony with 10646 and 14651 — ISO 15897 
brought no value to the table. Certainly nothing to outweigh the considerable 
costs of maintaining synchrony. Completely inadequate structure for modern 
system requirement, no particular industry support, and scant content: see 
Wikipedia for "The registry has not been updated since December 2001".)

Mark

Mark

On Thu, Jun 7, 2018 at 1:25 PM, Marcel Schneider via Unicode 
mailto:unicode@unicode.org>> wrote:
On Thu, 17 May 2018 09:43:28 -0700, Asmus Freytag via Unicode wrote:
>
> On 5/17/2018 8:08 AM, Martinho Fernandes via Unicode wrote:
> > Hello,
> >
> > There are several mentions of synchronization with related standards in
> > unicode.org,
> >  e.g. in 
> > https://www.unicode.org/versions/index.html,
> >  and
> > https://www.unicode.org/faq/unicode_iso.html.
> >  However, all such mentions
> > never mention anything other than ISO 10646.
>
> Because that is the standard for which there is an explicit understanding by 
> all involved
> relating to synchronization. There have been occasionally some challenging 
> differences
> in the process and procedures, but generally the synchronization is being 
> maintained,
> something that's helped by the fact that so many people are active in both 
> arenas.

Perhaps the cause-effect relationship is somewhat unclear. I think that many 
people being
active in both arenas is helped by the fact that there is a strong will to 
maintain synching.

If there were similar policies notably for ISO/IEC 14651 (collation) and 
ISO/IEC 15897
(locale data), ISO/IEC 10646 would be far from standing alone in the field of
Unicode-ISO/IEC cooperation.

>
> There are really no other standards where the same is true to the same extent.
> >
> > I was wondering which ISO standards other than ISO 10646 specify the
> > same things as the Unicode Standard, and of those, which ones are
> > actively kept in sync. This would be of importance for standardization
> > of Unicode facilities in the C++ language (ISO 14882), as reference to
> > ISO standards is generally preferred in ISO standards.
> >
> One of the areas the Unicode Standard differs from ISO 10646 is that its 
> conception
> of a character's identity implicitly contains that character's properties - 
> and those are
> standardized as well and alongside of just name and serial number.

This is probably why, to date, ISO/IEC 10646 features character

RE: The Unicode Standard and ISO

2018-05-17 Thread Peter Constable via Unicode
ISO character encoding standards are primarily focused on identifying a 
repertoire of character elements and their code point assignments in some 
encoding form. ISO developed other, legacy character-encoding standards in the 
past, but has not done so for over 20 years. All of those legacy standards can 
be mapped as a bijection to ISO 10646; in regard to character repertoires, they 
are all proper subsets of ISO 10646. 

Hence, from an ISO perspective, ISO 10646 is the only standard for which 
on-going synchronization with Unicode is needed or relevant.


Peter

-Original Message-
From: Unicode  On Behalf Of Martinho Fernandes via 
Unicode
Sent: Thursday, May 17, 2018 8:08 AM
To: unicode@unicode.org
Subject: The Unicode Standard and ISO

Hello,

There are several mentions of synchronization with related standards in 
unicode.org, e.g. in https://www.unicode.org/versions/index.html, and 
https://www.unicode.org/faq/unicode_iso.html. However, all such mentions never 
mention anything other than ISO 10646.

I was wondering which ISO standards other than ISO 10646 specify the same 
things as the Unicode Standard, and of those, which ones are actively kept in 
sync. This would be of importance for standardization of Unicode facilities in 
the C++ language (ISO 14882), as reference to ISO standards is generally 
preferred in ISO standards.

--
Martinho





Thai phintuu + sara u(u)

2018-04-02 Thread Peter Constable via Unicode
Does anyone know of any attested cases in Thai script of a phintuu appearing 
together with either sara u or sara uu, _and_ with the phintuu positioned below 
the sara u(u)?


Thanks
Peter


RE: Unicode Digest, Vol 50, Issue 20

2018-02-27 Thread Peter Constable via Unicode
The OpenType spec doesn’t not in any way suggest that the bits be used that 
way. It’s impossible to assert that there are no applications out there that do 
that, but I wouldn’t expect there to be many widely-used apps that do that 
today.

On the other hand, something that the bits might affect are behaviours like 
font selection / font binding. For example, if you paste plain text into a 
rich-text app, it must select a default font for that text, since it’s a 
rich-text app. Now, an obvious choice would be to use the font applied to the 
characters on either side of the insertion point. But if it turned out that 
that font didn’t support the text being pasted, that would create a rendering 
problem; so the app probably wants to avoid that. An app just might use these 
bits as a heuristic to decide whether the current font can support the text or 
not.

I say that Unicode-range bits probably wouldn’t affect rendering in current 
apps, though that wasn’t necessarily the case in the past. Word 97 was one of 
the very first mainstream apps to support Unicode, but it was limited in the 
scripts that were actually supported. Word 2000 was still early in terms of 
mainstream Unicode support, and still had limitations. I recall working on font 
projects for Ethiopic and Yi scripts (with SIL at the time) and needing to set 
Unicode range or codepage bits in order to get text working in Word using our 
fonts One particular issue was a font-binding issue: Word would lump the Yi 
characters in with CJK (they’re not Western, and they’re not the few complex 
scripts that are supported, so assume they’re CJK), but wouldn’t allow the font 
to be applied until I set bits to make Word think the font supports CJK. But 
then with the Ethiopic font, there was a different effect — a rendering issue — 
that became apparent: Ethiopic characters have many different widths, but Word 
ignored the actual glyph metrics and displayed every glyph with the same width 
(the apparent assumption being that the characters are all CJK and all have the 
same width). Again, bits had to be set to make it observe the actual glyph 
metrics. IIRC, in one case I needed to set the Shift-JIS code page bit, and in 
the other case, to set a bit for one of the kana blocks.

But that was many years ago now. I can’t think of seeing Unicode-range bits 
affecting rendering in a long time.


Peter


From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Neil Patel via 
Unicode
Sent: Tuesday, February 27, 2018 8:46 AM
To: unicode@unicode.org; unicode-requ...@unicode.org
Subject: Re: Unicode Digest, Vol 50, Issue 20

Does the ulUnicodeRange bits get used to dictate rendering behavior or script 
recognition?

I am just wondering about whether the lack of bits to indicate an Adlam charset 
can cause other issues in applications.


-Neil


On Sat, Feb 24, 2018 at 1:00 PM, via Unicode 
mailto:unicode@unicode.org>> wrote:
Send Unicode mailing list submissions to
unicode@unicode.org

To subscribe or unsubscribe via the World Wide Web, visit

http://unicode.org/mailman/listinfo/unicode
or, via email, send a message with subject or body 'help' to
unicode-requ...@unicode.org

You can reach the person managing the list at
unicode-ow...@unicode.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Unicode digest..."

Today's Topics:

   1. Re: metric for block coverage (Norbert Lindenberg via Unicode)


-- Forwarded message --
From: Norbert Lindenberg via Unicode 
mailto:unicode@unicode.org>>
To: Khaled Hosny mailto:khaledho...@eglug.org>>
Cc: James Kass mailto:jameskass...@gmail.com>>, Adam 
Borowski mailto:kilob...@angband.pl>>, Unicode Public 
mailto:unicode@unicode.org>>, Norbert Lindenberg 
mailto:unic...@lindenbergsoftware.com>>
Bcc:
Date: Fri, 23 Feb 2018 10:15:32 -0800
Subject: Re: metric for block coverage

> On Feb 18, 2018, at 3:26 , Khaled Hosny via Unicode 
> mailto:unicode@unicode.org>> wrote:
>
> On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass via Unicode wrote:
>> Adam Borowski wrote,
>>
>>> I'm looking for a way to determine a font's coverage of available scripts.
>>> It's probably reasonable to do this per Unicode block.  Also, it's a safe
>>> assumption that a font which doesn't know a codepoint can do no complex
>>> shaping of such a glyph, thus looking at just codepoints should be adequate
>>> for our purposes.
>>
>> You probably already know that basic scr

RE: metric for block coverage

2018-02-27 Thread Peter Constable via Unicode
You have clarified what exactly the usage is; you've only asked what it means 
to cover a script.

James Kass mentioned a font's OS/2 table. That is obsolete: as Khaled pointed 
out, there has never been a clear definition of "supported" and practice has 
been inconsistent. Moreover, the available bits were exhausted after Unicode 
5.2, and we're now working on Unicode 11. Both Apple and Microsoft have started 
to use 'dlng' and 'slng' values in the 'meta' table of OpenType fonts to convey 
what a font can and is designed to support — a distinction that the OS/2 table 
never allows for, but that is actually more useful. (I'd also point out that, 
in the upcoming Windows 10 feature update, the 'dlng' entries in fonts is used 
to determine what preview strings to use in the Fonts settings UI.) For scripts 
like Latin that have a large set of characters, most of which have infrequent 
usage, there can still be a challenge in characterizing the font, but the 
mechanism does provide flexibility in what is declared.

But again, you haven't said what data to put into fonts is your issue. If you 
are trying to determine whether a given font supports a particular language, 
the OS/2 and 'meta' table provide heuristics — with 'meta' being recommended; 
but the only way to know for absolute certain is to compare an exemplar 
character list for the particular language with the font's cmap table. But 
note, that can only tell you that a font _is able to support_ the language, 
which doesn't necessarily imply that it's actually a good choice for users of 
that language. For example, every font in Windows includes Basic Latin 
characters, but that definitely doesn't mean that the fonts are useful for an 
English speaker. This is why the 'dlng' entry in the 'meta' table was created.



Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Adam Borowski 
via Unicode
Sent: Saturday, February 17, 2018 2:18 PM
To: unicode@unicode.org
Subject: metric for block coverage

Hi!
As a part of Debian fonts team work, we're trying to improve fonts review:
ways to organize them, add metadata, pick which fonts are installed by default 
and/or recommended to users, etc.

I'm looking for a way to determine a font's coverage of available scripts. 
It's probably reasonable to do this per Unicode block.  Also, it's a safe 
assumption that a font which doesn't know a codepoint can do no complex shaping 
of such a glyph, thus looking at just codepoints should be adequate for our 
purposes.

A naïve way would be to count codepoints present in the font vs the number of 
all codepoints in the block.  Alas, there's way too much chaff for such an 
approach to be reasonable: þ or ą count the same as LATIN TURNED CAPITAL LETTER 
SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON.

Another idea would be giving every codepoint a weight equal to the number of 
languages which currently use such a letter.

Too bad, that wouldn't work for symbols, or for dead scripts: a good runic font 
will have a complete coverage of elder futhark, anglo-saxon, younger and 
medieval, while only a completionist would care about franks casket or 
Tolkien's inventions.

I don't think I'm the first to have this question.  Any suggestions?


ᛗᛖᛟᚹ!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄ A master species delegates.



RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
As mentioned in my initial mail, the fonts support the Basic Latin block from 
the BMP.

Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass via 
Unicode
Sent: Monday, November 13, 2017 2:54 PM
To: Unicode list 
Subject: Re: Plane-2-only string

Peter Constable wrote,

> “𠀀𠀁𠀂𠀃𠀄
> 𦬣𦬤𦬥𦬦𦬧
> 𦩒𦩓𦩔𦩕𦩖
> 𨣫𨣬𨣭𨣮𨣯”
>

“𠀀𠀁𠀂𠀃𠀄 𦬣𦬤𦬥𦬦𦬧 𦩒𦩓𦩔𦩕𦩖 𨣫𨣬𨣭𨣮𨣯”

It looks good in blocks on four separate lines, but would a typical font 
viewing or comparison tool be expected to break it down into four lines?  The 
pattern is still apparent if displayed on just one line, but separating the 
blocks with spaces or any punctuation would require BMP characters in the ExtB 
font.

“𠀀𠀁𠀂𠀃𠀄𦬣𦬤𦬥𦬦𦬧𦩒𦩓𦩔𦩕𦩖𨣫𨣬𨣭𨣮𨣯”




RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
I discussed this with one of my Chinese co-workers, and we came up with the 
following:

“𠀀𠀁𠀂𠀃𠀄
𦬣𦬤𦬥𦬦𦬧
𦩒𦩓𦩔𦩕𦩖
𨣫𨣬𨣭𨣮𨣯”

Factors in the choice of characters were:
- different radicals
- for a given radical, have a sequence of consecutive characters so people get 
the idea it's not a sentence but just a sequence of characters with related 
meanings
- radical groups increase in complexity


It's not a sentence that can be read, but there's an obvious pattern, so it's 
also not completely gibberish.


Peter

-Original Message-
From: James Kass [mailto:jameskass...@gmail.com] 
Sent: Monday, November 13, 2017 2:29 PM
To: Peter Constable 
Cc: Unicode list 
Subject: Re: Plane-2-only string

Peter Constable wrote,

> We don't want to add BMP characters to the ExtB fonts.

So the sample text would lack punctuation.  Given that the Supplementary 
Ideographic Plane is composed of rare and historical characters from multiple 
sources, I suspect that the short answer to Peter's original question is:  "No".



RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
Would a typical Chinese speaker be likely to recognize these as used in 
Cantonese? (I wouldn't want to have a font's sample-text string give the 
impression that it's a Cantonese font — unless it were specifically intended 
for Cantonese.)

-Original Message-
From: jenk...@apple.com [mailto:jenk...@apple.com] 
Sent: Monday, November 13, 2017 12:46 PM
To: Peter Constable 
Cc: Unicode list 
Subject: Re: Plane-2-only string

𠆩 𠌥 𠍁 𠓼 𠕄 𠝭 𠝹 𠮨 𠮩 𠯋 𠯦 𠯽 𠰋 𠰲 𠱁 𠱂 𠱃 𠱓 𠱘 𠱥 𠱷 𠱸 𠲜 𠳏 𠳕 𠳖 𠴕 𠴰 𠵇 𠵈 𠵉 𠵩 𠵯 𠵼 𠵾 𠵿 𠶜 𠶧 𠶲 𠸉 

That is an example of forty Cantonese-specific characters which are not obscene 
(that I'm aware of) from Extension B. For the curious, I've appended at the 
bottom the full list of 280 for all of Plane 2 which I was able to pull out of 
the Unihan database. I'm sure some enterprising poet can make something out of 
them.

> On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode 
>  wrote:
> 
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
> 
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
> 
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
> 
> 
> 
> Peter
> 

U+201A9 faan2   (Cant.) to play
U+20325 wu1 wu3 (Cant.) to bow, stoop
U+20341 man3(Cant.) an undesirable situation
U+204FC sip3(Cant.) a wedge; to thrust in
U+20544 nap1(Cant.) 酒𠕄, a dimple
U+2076D peng2   (Cant.) to fell, cut; to sweep away
U+20779 gaai3   (Cant.) to cut with a knife or scissors
U+20BA8 naai3   (Cant.) to tie, tow; bring along
U+20BA9 aa1 liu1(Cant.) an interjection; rare, specialized
U+20BCB jai4 jai5   (Cant.) naughty, inferior
U+20BE6 cai3(Cant.) to eat, take a meal
U+20BFD zi1 (Cant.) a final particle indicating affirmation
U+20C0B jaau1   (Cant.) left-handed
U+20C32 eot1(Cant.) to belch
U+20C41 tam3(Cant.) to fool, trick, cheat
U+20C42 dat1(Cant.) to put something or sit wherever one wishes; to 
rebuke, reproach
U+20C43 nip1(Cant.) thin, flat; poor
U+20C53 ngai1   (Cant.) to importune, beg
U+20C58 ngaak6  (Cant.) contrary, opposing, against; disobedient
U+20C65 fik1 jit6 we5   (Cant.) wrangling, a noise; fitful; a soft 
fabric with no body
U+20C77 ming1   (Cant.) small
U+20C78 san2 seon2  (Cant.) phonetic
U+20C9C zaang1  (Cant.) to owe
U+20CCF ce2 ce6 (Cant.) interjection
U+20CD5 caau3   (Cant.) to search
U+20CD6 dap6(Cant.) to strike, pound
U+20D15 miu2(Cant.) to purse the lips; to wriggle
U+20D30 gau6(Cant.) classifier for a piece or lump of something
U+20D47 keu4(Cant.) peculiar, strange
U+20D48 mui2(Cant.) to suck or chew without using the teeth
U+20D49 hong4   (Cant.) hope
U+20D69 go2 (Cant.) that
U+20D6F gwit1 gwit3 (Cant.) onomatopoetic
U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic
U+20D7E waak1   (Cant.) eloquent, sharp-tongued
U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger
U+20D9C zai3(Cant.) to do, work; to be willing
U+20DA7 dim6(Cant.) straight, vertical; OK; to pick up with the 
fingers; verbal aspect marker of successful completion
U+20DB2 gap6 kap6   (Cant.) to stare at; to take a big bite
U+20E09 kak1(Cant.) to block, obstruct
U+20E0A tap1(Cant.) an intensifying particle
U+20E0E naa1(Cant.) and, with
U+20E0F ge2 (Cant.) final particle
U+20E10 kam1(Cant.) to endure, last
U+20E11 soek3   (Cant.) soft, sodden
U+20E12 bou2(Cant.) 生𠸒人, a stranger
U+20E3A ngaak6  (Cant.) contrary, opposing
U+20E6D ko1 (Cant.) to call (Engl. loan-word)
U+20E73 git6(Cant.) thick, viscous, dense
U+20E77 ngo4(Cant.) to speak tirelessly
U+20E78 kam2(Cant.) to cover, close up
U+20E7A 

RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
Thanks for the suggestion. Alas, the fonts don't support that block.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Charlie Ruland 
via Unicode
Sent: Monday, November 13, 2017 12:05 PM
To: unicode@unicode.org
Subject: Re: Plane-2-only string

Many of characters in the CJK Compatibility Ideographs Supplement block are 
quite common Chinese characters, or variants thereof. You could try and build 
Chinese sentences with these characters.


On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter




RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
Thanks. I’d need to know _at least something_ about what the characters 
signify, though, to have a sense of whether there’s anything potentially 
offensive.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe Verdy 
via Unicode
Sent: Monday, November 13, 2017 11:51 AM
To: James Kass 
Cc: Unicode list 
Subject: Re: Plane-2-only string

May be this test page ?
http://www.i18nguy.com/unicode/supplementary-test.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.i18nguy.com%2Funicode%2Fsupplementary-test.html&data=02%7C01%7Cpetercon%40microsoft.com%7Ce4a52bf8c69943e825e908d52ad06d02%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636461997049400977&sdata=EeoebLU6skgb8lthnSQ3ChDzYCQTuQORcJNnXAYV4Ys%3D&reserved=0>


2017-11-13 20:38 GMT+01:00 James Kass via Unicode 
mailto:unicode@unicode.org>>:
A font's sample text can be used in place of the default "The quick
brown fox..." text which is used to illustrate the typeface in
applications which support that feature.

One approach would be to find a non-gibberish text string using some
Plane 2 characters and add the BMP glyphs to the font mapped to the
BMP PUA.  Because if only a handful of BMP CJK glyphs were added to
the font mapped to their standard code points, the font might need to
claim to support BMP CJK (when in fact it does not) in order to
display the sample text.  Or, (if standard code points are used) the
font might be auto-detected as supporting BMP CJK by some
applications, when it doesn't really support that range.

On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode
mailto:unicode@unicode.org>> wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter



RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
We don't want to add BMP characters to the ExtB fonts.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass via 
Unicode
Sent: Monday, November 13, 2017 11:39 AM
To: Unicode list 
Subject: Re: Plane-2-only string

A font's sample text can be used in place of the default "The quick brown 
fox..." text which is used to illustrate the typeface in applications which 
support that feature.

One approach would be to find a non-gibberish text string using some Plane 2 
characters and add the BMP glyphs to the font mapped to the BMP PUA.  Because 
if only a handful of BMP CJK glyphs were added to the font mapped to their 
standard code points, the font might need to claim to support BMP CJK (when in 
fact it does not) in order to display the sample text.  Or, (if standard code 
points are used) the font might be auto-detected as supporting BMP CJK by some 
applications, when it doesn't really support that range.

On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode 
 wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter




Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
I’m wondering if anyone could come up with a string of 15 to 40 characters 
_using only plane 2 characters_ that wouldn’t be gibberish?

We are considering adding sample-text strings in some of our fonts. (In 
OpenType, the ‘name’ table can take sample-text strings using name ID 19.) One 
particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have 
CJK characters from plane 2 only.

Background:
The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
be added in a single OpenType font, and so the “ExtB” fonts are used to contain 
all of the Plane 2 characters that are supported. For example, the Simsun font 
supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB 
supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The 
combined glyph count exceeds 64K, so can’t go into a single font.



Peter
<>

RE: Unicode education in Schools

2017-08-24 Thread Peter Constable via Unicode
I thought Javascript had a UCS-2 understanding of Unicode strings. Has it 
managed to progress beyond that?


Peter


From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner 
via Unicode
Sent: Thursday, August 24, 2017 5:18 PM
To: Unicode Mailing List 
Subject: Fwd: Unicode education in Schools


-- Forwarded message -
From: David Starner mailto:prosfil...@gmail.com>>
Date: Thu, Aug 24, 2017, 6:16 PM
Subject: Re: Unicode education in Schools
To: Richard Wordingham 
mailto:richard.wording...@ntlworld.com>>


On Thu, Aug 24, 2017, 5:26 PM Richard Wordingham via Unicode 
mailto:unicode@unicode.org>> wrote:
Just steer them away from UTF-16!  (And vigorously prohibit the very
concept of UCS-2).

Richard.

Steer them away from reinventing the wheel. If they use Java, use Java strings. 
If they're using GTK, use strings compatible with GTK. If they're writing 
JavaScript, use JavaScript strings. There's basically no system without Unicode 
strings or that they would be better off rewriting the wheel.


RE: Unicode 10 Cover Art

2017-08-22 Thread Peter Constable via Unicode
http://blog.unicode.org/2017/08/unicode-consortium-announces-cover.html


-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Andre Schappo 
via Unicode
Sent: Monday, August 21, 2017 9:30 AM
To: Unicode 
Subject: Unicode 10 Cover Art

Unicode 10.0 Cover Design Art

Were there entries and, if yes, which won?

André Schappo






RE: Are Emoji ZWJ sequences characters?

2017-05-15 Thread Peter Constable via Unicode
Emoji sequences are not _encoded_, per se, in either Unicode or ISO/IEC 10646. 
The act of "encoding" in either of these coding standards is to assign an 
encoded representation in the encoding method of the standards for a given 
entity. In this case, that means to assign a code point. 

Specifying ZWJ sequences for representation of text elements is not encoding in 
the standard; it is simply defining an encoded representation for those text 
elements. Unicode gives some attention to this kind of thing, but ISO/IEC 
10646, not so much. For instance, you won't find anything in ISO/IEC 10646 
specifying that the encoded representation for a rakaar is < VIRAMA, RA >.

So, your helpful person was, indeed, helpful, giving you correct information: 
ZWJ sequences are not _characters_ and have no implications for ISO/IEC 10646.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of William_J_G 
Overington via Unicode
Sent: Monday, May 15, 2017 7:57 AM
To: unicode@unicode.org
Subject: Are Emoji ZWJ sequences characters?

I am concerned about emoji ZWJ sequences being encoded without going through 
the ISO process and whether Unicode will therefore lose synchronization with 
ISO/IEC 10646.

I have raised this by email and a very helpful person has advised me that 
encoding emoji sequences does not mean that Unicode and ISO/IEC 10646 go out of 
being synchronized because ZWJ sequences are not *characters*, and they have no 
implications for ISO/IEC 10646, noting that ISO/IEC 10646 does not define ZWJ 
sequences. 

Now I have great respect for the person who advised me. However I am a 
researcher and I opine that I need evidence.

Thus I am writing to the mailing list in the hope that there will be a 
discussion please.

https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2Freports%2Ftr51%2Ftr51-11.html&data=02%7C01%7Cpetercon%40microsoft.com%7C5ed7d97f20194242d58908d49ba6034d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636304584722863385&sdata=IWXir%2BfVIg2NW5Q95ClTs5Powet54k5VFEyJaEL7KYE%3D&reserved=0
 (A proposed update document)

https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2FPublic%2Femoji%2F5.0%2Femoji-zwj-sequences.txt&data=02%7C01%7Cpetercon%40microsoft.com%7C5ed7d97f20194242d58908d49ba6034d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636304584722863385&sdata=2TzPVAvyTRaLqFBx8gKG%2BvwK86DTzcZgnQpPYuaQto8%3D&reserved=0

https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2Fcharts%2FPDF%2FU1F300.pdf&data=02%7C01%7Cpetercon%40microsoft.com%7C5ed7d97f20194242d58908d49ba6034d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636304584722863385&sdata=aG3AQEN8iwsyJtcLZFdKYBsM682sGCuBDUTyf8lyhy4%3D&reserved=0

https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2Fcharts%2FPDF%2FU1F680.pdf&data=02%7C01%7Cpetercon%40microsoft.com%7C5ed7d97f20194242d58908d49ba6034d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636304584722863385&sdata=xC2tM5TFs9XLDbbYqfTaeVULxe8ciShAlgbWGQfknPg%3D&reserved=0

In tr51-11.html at 2.3 Emoji ZWJ Sequences

quote

To the user of such a system, these behave like single emoji characters, even 
though internally they are sequences.

end quote

In emoji-zwj-sequences.txt there is the following line.

1F468 200D 1F680; Emoji_ZWJ_Sequence  ; man 
astronaut 

>From U1F300.pdf, 1F468 is MAN

200D is ZWJ

>From U1F680.pdf 1F680 is ROCKET

The reasoning upon which I base my concern is as follows.

0063 is c

0070 is p

0074 is t

If 0063 200D 0074 is used to specifically request a ct ligature in a display of 
some text, then the meaning of 0063 200D 0074 is the same as the meaning of 
0063 0074 and indeed a font with an OpenType table could cause a ct ligature to 
be displayed even if the sequence is 0063 0074 rather than the sequence 0063 
200D 0074 that is used where the ligature glyph is specifically requested. Thus 
the meaning of ct is not changed by using the ZWJ character.

Now the use of the ct ligature is well-known and frequent.

Suppose now that a fontmaker is making a font of his or her own and decides to 
include a glyph for a pp ligature, with a swash flourish joining and going 
beyond the lower ends of the descenders both to the left and to the right.

The fontmaker could note that the ligature might be good in a word like copper 
but might look wrong in a word like happy due to the tail on the letter y 
clashing with the rightward side of the swash flourish. So the fontmaker 
encodes 0070 200D 0070 as a pp ligature but does not encode 0070 0070 as a pp 
ligature, so that the ligature glyph is only used when specifically requested 
using a ZWJ character.

However, when the ZWJ character is used, the meaning of the pp sequence is not 
changed from the meaning when the pp sequence is not used.

Yet when 1F468 200D 1F680 is used, the meaning of the sequence is different 
from the meaning of the sequence 

RE: PETSCII mapping?

2017-04-10 Thread Peter Constable via Unicode
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Rebecca T
Sent: Wednesday, April 5, 2017 2:26 PM

> As time goes on, “not in widespread use” will become a flimsier and flimsier
> argument against inclusion — why isn’t there a larger community of PETSCII
> enthusaists? Partially because the only way to share PETSCII is through 
> images!
> The consortium (passively or actively) prevents communication through 
> exclusion
> and then uses the lack of communication as a justification against inclusion —
> it’s a poor, tautological argument, and it won’t serve the consortium
> long-term.
>
> Simply put, we need new criteria for inclusion…

Your assertions are based on assumptions that simply aren’t valid. Unicode 
regularly encodes characters for things that are not in widespread use, and 
that fit the intended scope of the Standard. If someone can demonstrate that 
there are users who _would_ interchange texts that currently cannot be 
represented in Unicode for lack of appropriate characters, then that certainly 
can be considered. But the fact that some text element was represented in some 
legacy system does not alone comprise an adequate basis for encoding.

And as Asmus said elsewhere in this thread,
> Nothing gets decided by the UTC unless there's a proposal on the table.

Also, as Elias said,
> Wouldn't it make sense to get in touch with active Commodore 64 communities
> to find out how people deal with this today?

This is key: if there isn’t an on-going interest among some user community for 
interchanging the putative characters in Unicode, then that would weaken a case 
for encoding.


> we must weigh a character’s merits and usability on its own. (does
> it fill a gap in communication? Will it be used?)

That is already and has always been a basis on which characters get encoded in 
Unicode.


Peter


RE: Coloured Punctuation and Annotation

2017-04-10 Thread Peter Constable via Unicode
William:

Michael's scenario doesn't require a special palette index value such as you 
propose since (i) he could implement a font with alternate palettes to provide 
different colouring options of his choosing, and (ii) an app can always expose 
customization options to allow the user to customize any of the palette entries 
that are being used, even on a character-by-character basis if the app really 
wanted to.

Moreover, defining palette index 0xFFFE with a special meaning would be a 
breaking change that could negatively impact existing implementations. Also, it 
would create a potential ambiguity about what colour to use: whereas text 
drawing operations _always_ have a foreground colour specified, there is no 
convention for specifying a "first decoration colour". 

For these reasons, this is not going to happen.


Peter


-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of William_J_G 
Overington
Sent: Thursday, April 6, 2017 11:40 AM
To: ever...@evertype.com; richard.wording...@ntlworld.com; unicode@unicode.org
Subject: Re: Coloured Punctuation and Annotation

Michael Everson wrote:

> No. Here is an example of a font available in two variants. In one variant, 
> all those grey swirls are fused to the letters, and it can all be printed in 
> black or one colour ink. 

> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcdn.myfonts.net%2Fs%2Faw%2Foriginal%2F255%2F0%2F131020.png&data=02%7C01%7Cpetercon%40microsoft.com%7C99523bf7480842d3096708d47d1ecae7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636271018606863669&sdata=7r1pdkH%2BGDjMDxhw44fxfwXjQ6IU%2FUXZntejzC5npm4%3D&reserved=0
>  

> There is also a second set of fonts included which separates the swirls from 
> the letters, and those can be used in typesetting to get the two-colour 
> effect you see here. That can’t really be done using standard encoding. You’d 
> probably see IIVVOORRYY in the backing store for that word, with every other 
> letter being set in the letter font and the swirl font. 

Richard Wordingham mentioned the following.

> The third glyph would use 'index' 0x to specify that it be displayed in 
> the foreground colour.

If the OpenType specification were augmented so that 'index' 0xFFFE were to 
specify that the appropriate part of the glyph be displayed in the "first 
decoration colour", a colour specified in the application program and not in 
the font; and an application program were augmented so that an end user were 
able to choose first decoration colour as well as choosing foreground colour, 
then would that produce the result for which Michael is looking?

William Overington

Thursday 6 April 2017






RE: Coloured Punctuation and Annotation

2017-04-10 Thread Peter Constable via Unicode
The color palette entries (CPAL) used for COLR or SVG can potentially be 
customized by an application — whether for user customization or to fit some 
context (such as selection).

Peter

-Original Message-
From: Asmus Freytag (c) [mailto:asm...@ix.netcom.com] 
Sent: Monday, April 10, 2017 1:39 PM
To: Peter Constable ; unicode@unicode.org
Subject: Re: Coloured Punctuation and Annotation

On 4/10/2017 9:30 AM, Peter Constable wrote:
> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus 
> Freytag
> Sent: Wednesday, April 5, 2017 5:30 PM
>
>>> There are certainly MSS (in many languages) where some punctuation made of 
>>> dots have some of the dots red and some black.
>> Agreed, those would be a challenge to reproduce with standard font 
>> technology and in plain text.
> Not at all. This capability has existed in all major OS platforms for some 
> years now.

It may be in the platforms, but of the few clients I've tried this with, only 
one is reliably supporting this.

>   It is what has enabled the growth of interest in Unicode emoji, but it is 
> by no means limited to Unicode emoji: it can be used for multi-color 
> rendering of any text in ways defined within a font. The OpenType spec 
> supports this through a few techniques:
>
> - Decomposing a glyph into several glyphs that are layered (z-ordered) with 
> colour assignments.
> - Glyphs expressed as embedded colour bitmaps.
> - Glyphs expressed as embedded SVG.

Khaled gave a very nice demonstration of that on this list (which allowed me to 
test this).

>
>> But for the same reason, they are out of scope for plain text (and therefore 
>> a bit irrelevant to the current discussion).
> I agree, the rendering aspect is completely orthogonal to Unicode plain-text 
> encoding.

The problem with multicolored fonts would be the integration into font color 
selection via styling.

https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amirifont.org%2Ffatiha-colored.html&data=02%7C01%7Cpetercon%40microsoft.com%7C296e985cc48947a9f25808d48051a201%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636274535476985547&sdata=%2Bj4%2FEA9RA8j4iIjTZYBrGH36BbSpenxKfOFy5uvWyXs%3D&reserved=0

If you select a section of this text, the black ink will invert as you select 
it, but the other colors remain the same, which is different from selecting a 
multicolored image or different from selecting multiple runs of fonts in 
different colors.

I wonder whether high-end tools like Indesign would be able to allow styling of 
individual color levels. For rendering emoji colors via fonts that wouldn't 
matter, but for the kind of annotated text example, it could be interesting to 
be able to tweak these layer colors.

A./

>
>
> Peter





RE: Coloured Punctuation and Annotation

2017-04-10 Thread Peter Constable via Unicode
Michael, your two-tone effect can easily be added into your first font using 
COLR and CPAL tables, so that the one font can support a monochrome rendering 
that uses glyphs in which the swirls are fused with the letters, and can also 
support a poly-chrome rendering in which those glyphs are decomposed into 
separate glyphs that get layered on top of one another in an order you specify 
with different RGBA colours.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Michael Everson
Sent: Thursday, April 6, 2017 5:41 AM
To: unicode Unicode Discussion 
Subject: Re: Coloured Punctuation and Annotation


> On 6 Apr 2017, at 05:41, Richard Wordingham  
> wrote:
> 
> On Thu, 6 Apr 2017 01:11:09 +0100
> Michael Everson  wrote:
> 
>> On 5 Apr 2017, at 22:48, Richard Wordingham 
>>  wrote:
>> 
>>> I tried to read it from UTS#51 ‘Unicode Emoji', which is not part of TUS, 
>>> but I couldn't deduce that a font that enables U+10B99 PSALTER PAHLAVI 
>>> SECTION MARK to have exactly two (as opposed to none or four) red dots is 
>>> in breach of the guidelines therein. 
>> 
>> Kindly explain how ANY font could do this.
> 
> Is this a trick question?

No. Here is an example of a font available in two variants. In one variant, all 
those grey swirls are fused to the letters, and it can all be printed in black 
or one colour ink. 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcdn.myfonts.net%2Fs%2Faw%2Foriginal%2F255%2F0%2F131020.png&data=02%7C01%7Cpetercon%40microsoft.com%7Cd423eda2387c475363ef08d47ceb4b80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636270797424696444&sdata=%2F64giVqctMwconsQVzFvIj7WPbOzNeQ%2F6npJUlIXaTc%3D&reserved=0
 

There is also a second set of fonts included which separates the swirls from 
the letters, and those can be used in typesetting to get the two-colour effect 
you see here. That can’t really be done using standard encoding. You’d probably 
see IIVVOORRYY in the backing store for that word, with every other letter 
being set in the letter font and the swirl font. 

Emoji-style colour fonts use other mechanisms for colour.

Michael Everson



RE: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)

2017-03-31 Thread Peter Constable
William, you completely miss the point: As long as Unicode is the way to 
provide emoji to consumers, their needs and desires will not be best or fully 
met. Unicode as an AND gate is too many AND gates.



Peter



Sent from my Windows 10 phone



From: William_J_G Overington<mailto:wjgo_10...@btinternet.com>
Sent: Friday, March 31, 2017 7:50 AM
To: Peter Constable<mailto:peter...@microsoft.com>; 
unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters 
now final)



Peter Constable wrote:

> The interest of consumers, in regard to emoji, will never be best met by 
> Unicode-encoded emoji, no matter what process there is for determining what 
> should be "recommended", because consumers inevitably want emoji they 
> recommend for themselves, not what anybody else recommends.

The consumers can only choose from what is available to consumers. So what the 
Unicode Technical Committee recommends or "not-recommends" may well have a very 
significant effect upon the choices available to the consumer.

> If Sally wants an emoji to convey her thoughts on her grandson's school play, 
> or on the latest tweet from a politician, or whatever, she wants it _now_, 
> and she doesn't particularly care if you or I would recommend that emoji to 
> her or not.

Sally may not know that the Unicode Technical Committee exists. Sally may have 
bought her computer or mobile telephone and just uses it, choosing from the 
emoji available in a menu system, perhaps never realizing all of the detailed 
standards work and implementation work that took place before the device was 
manufactured. It is not that Sally is having a particular emoji recommended to 
her as such, yet if the Unicode Technical Committee "not-recommends" 
implementation of some emoji that are in the standards document, then Sally may 
never get the opportunity to choose to use those emoji.

> So, before we go talking about whether _Unicode_ is accommodating the benefit 
> of consumers, I think should be asking whether _all the popular 
> communications protocols_ are accommodating the benefit of consumers.

Well, all of the various standards needed to produce useful products are 
important. It is not a matter of one being considered before the other. For a 
particular emoji to become available in a device that is available to a 
consumer there are various stages. They are like an AND gate where all inputs 
must be true in order for the result to be true.

The Unicode Technical Committee has enormous power and influence to affect the 
future of information technology.

It works both ways. Where an encoding is made there can be progress, yet where 
an idea is rejected then there is no way forward for an interoperable plain 
text encoding to become achieved.

I submitted a document in 2015. It was determined to be out of scope and was 
not included in the Document Register and the Unicode Technical Committee did 
not consider it.

I submitted a later version and received no reply about it at all.

So I cannot make progress over an interoperable plain text encoding becoming 
implemented at the present time. Quite a number of UTC meetings have taken 
place since.

Yet the scope of Unicode is a people-made rule, it could change if people with 
influence want it to change. The UTC could consider my document and hold a 
Public Review if it chose to do so.

So, the Unicode Technical Committee has enormous power and influence to affect 
the future of information technology.

When a "not-recommendation" of what to support takes place the decision to do 
that "not-recommending" can have significant and long-lasting effects on 
progress.

William Overington

Friday 31 March 2017




"A Programmer's Introduction to Unicode"

2017-03-10 Thread Peter Constable
FYI:

http://reedbeta.com/blog/programmers-intro-to-unicode/

The visuals may be the most interesting part. E.g., in the usage heat map, 
Arabic Presentation Forms-B lights up much more than I would have expected - as 
much as a lot of emoji.



Peter


"Oh that's what you meant!: reducing emoji misunderstanding"

2016-11-17 Thread Peter Constable
Somewhat interesting: a paper from a conference in Italy a couple of months ago:

http://discovery.dundee.ac.uk/portal/en/research/oh-thats-what-you-meant(20b8923c-28da-49ed-bc78-fcc741db3187).html

I anticipated old news about misunderstanding based on presentation differences 
on the level of water gun vs. etc. But it focuses on subtleties in emotional 
reactions that different users associate with different smileys. E.g., how does 
U+1F624 “😤” compare with U+1F62C “😬”? A given user may perceive the two 
differently, and for either one a given user’s perception may differ when 
evaluating the depiction used in one app/platform versus another. They suggest 
that, if users gave a characterization of reactions to different emoji on a 
given platform (e.g., degree of emotion, how positive or negative) then an 
automated system could translate one user’s message to display an emoji to a 
second user that more closely reflects the emotion intended by the first user.




Peter


RE: The (Klingon) Empire Strikes Back

2016-11-15 Thread Peter Constable
Klingon _should not_ be encoded so long as there are open IP issues. For that 
reason, I think it would be premature to place it in the roadmap.


Peter

From: Mark E. Shoulson [mailto:m...@kli.org]
Sent: Sunday, November 13, 2016 2:10 PM
To: Mark Davis ☕️ ; Shawn Steele 

Cc: Peter Constable ; David Faulks 
; Unicode Mailing List 
Subject: Re: The (Klingon) Empire Strikes Back

On 11/10/2016 02:34 PM, Mark Davis ☕️ wrote:
The committee doesn't "tentatively approve, pending X".

But the good news is that I think it was the sense of the committee that the 
evidence of use for Klingon is now sufficient, and the rest of the proposal was 
in good shape (other than the lack of a date), so really only the IP stands in 
the way.

Fair enough.  There have, I think, been other cases of this sort of informal 
"tentative approval", usually involving someone from UTC telling the proposer, 
"your proposal is okay, but you probably need to change this..."  And that's 
about the best I could hope for at this point anyway.  So it sounds like 
(correct me if I'm wrong) there is at least unofficial recognition that pIqaD 
*should* be encoded, and that it's mainly an IP problem now (like with 
tengwar), and possibly some minor issues that maybe hadn't been addressed 
properly in the proposal.

Can we get pIqaD removed from http://www.unicode.org/roadmaps/not-the-roadmap/ 
then?  And (dare I ask) perhaps enshrined someplace in 
http://www.unicode.org/roadmaps/smp/ pending further progress with Paramount?


I would suggest that the Klingon community work towards getting Paramount to 
engage with us, so that any IP issues could be settled.

I'll see what we can come up with; have to start somewhere.  There is a VERY 
good argument to be made that Paramount doesn't actually have the right to stop 
the encoding, as you can't copyright an alphabet (as we have seen), and they 
don't have a current copyright to "Klingon" in this domain, etc., and it may 
eventually come down to these arguments.  However, I recognize that having a 
good argument on your side, and indeed even having the law on your side, does 
not guarantee smooth sailing when the other guys have a huge well-funded legal 
department on their side, and thus I understand UTC's reluctance to move 
forward without better legal direction.  But at least we can say we've made 
progress, can't we?

~mark



Mark

Mark

On Thu, Nov 10, 2016 at 10:33 AM, Shawn Steele 
mailto:shawn.ste...@microsoft.com>> wrote:
More generally, does that mean that alphabets with perceived owners will only 
be considered for encoding with permission from those owner(s)?  What if the 
ownership is ambiguous or unclear?

Getting permission may be a lot of work, or cost money, in some cases.  Will 
applications be considered pending permission, perhaps being provisionally 
approved until such permission is received?

Is there specific language that Unicode would require from owners to be 
comfortable in these cases?  It makes little sense for a submitter to go 
through a complex exercise to request permission if Unicode is not comfortable 
with the wording of the permission that is garnered.  Are there other such 
agreements that could perhaps be used as templates?

Historically, the message pIqaD supporters have heard from Unicode has been 
that pIqaD is a toy script that does not have enough use.  The new proposal 
attempts to respond to those concerns, particularly since there is more 
interest in the script now.  Now, additional (valid) concerns are being raised.

In Mark’s case it seems like it would be nice if Unicode could consider the 
rest of the proposal and either tentatively approve it pending Paramount’s 
approval, or to provide feedback as to other defects in the proposal that would 
need addressed for consideration.  Meanwhile Mark can figure out how to get 
Paramount’s agreement.

-Shawn

From: Unicode 
[mailto:unicode-boun...@unicode.org<mailto:unicode-boun...@unicode.org>] On 
Behalf Of Peter Constable
Sent: Wednesday, November 9, 2016 8:49 PM
To: Mark E. Shoulson mailto:m...@kli.org>>; David Faulks 
mailto:davidj_fau...@yahoo.ca>>
Cc: Unicode Mailing List mailto:unicode@unicode.org>>
Subject: RE: The (Klingon) Empire Strikes Back

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Mark E. Shoulson
Sent: Friday, November 4, 2016 1:18 PM
> At any rate, this isn't Unicode's problem…

You saying that potential IP issues are not Unicode’s problem does not in fact 
make it not a problem. A statement in writing from authorized Paramount 
representatives stating it would not be a problem for either Unicode, its 
members or implementers of Unicode would make it not a problem for Unicode.



Peter





RE: The (Klingon) Empire Strikes Back

2016-11-09 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Mark E. Shoulson
Sent: Friday, November 4, 2016 1:18 PM

> At any rate, this isn't Unicode's problem…

You saying that potential IP issues are not Unicode’s problem does not in fact 
make it not a problem. A statement in writing from authorized Paramount 
representatives stating it would not be a problem for either Unicode, its 
members or implementers of Unicode would make it not a problem for Unicode.



Peter


RE: Running text requirement?

2016-07-23 Thread Peter Constable
If it’s a symbol / pictograph, then UTC will want to be convinced that it’s 
needed/appropriate for use in running text. There are lots of symbols that get 
used in different kinds of presentation but that are not necessarily used in 
text. Depending on the symbol, it may or may not be obvious. It doesn’t hurt to 
include samples of attested usage in running text. But as Roozbeh says, you can 
float it first to get feedback on whether additional evidence is needed.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Roozbeh 
Pournader
Sent: Saturday, July 23, 2016 5:09 PM
To: Ken Shirriff 
Cc: Unicode Public 
Subject: Re: Running text requirement?

In my experience, no such requirement is a binary yes/no. If you have a good 
character candidate, run it by the list, or just write a proposal. UTC tends to 
look at all the merits together, instead of a list of things that should all be 
there or else there won't be a character.

On Sat, Jul 23, 2016 at 9:26 AM, Ken Shirriff 
mailto:ken.shirr...@gmail.com>> wrote:
Someone asked me about the requirement for evidence that proposed new 
characters are used in running text. I thought it was in the Symbol Guidelines 
(http://www.unicode.org/pending/symbol-guidelines.html) or the Character 
Proposals document (http://unicode.org/pending/proposals.html) but it's not 
there. Is there a written requirement for running text somewhere or is it 
"tradition"?

Ken



RE: The Hebrew Extended (Proposed) Block

2016-05-11 Thread Peter Constable
Robert, your statement seems to have an implicit assumption that the range 
0860..08FF has somehow been reserved for Hebrew. That is not the case. As 
Markus reference elsewhere, people can refer to the Roadmap charts to see what 
is tentatively planned for a given range:

http://unicode.org/roadmaps/bmp/

If you or others are working on or considering working on a proposal for 
additional Hebrew characters, you should not make any firm assumptions about 
code point assignments until some indication of suitable ranges have been given 
by the Unicode Technical Committee and that has been added to the Roadmap.



Peter


From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Robert Wheelock
Sent: Tuesday, May 10, 2016 4:55 PM
To: unicode@unicode.org
Subject: RE: The Hebrew Extended (Proposed) Block

Hello again, y’all!

¡BAD NEWS! (CRUCIALLY IMPORTANT):  The Unicode Consortium has assigned OTHER 
characters into the U+00860-U+008FF areas in the BMP of Unicode—Malayalam 
extended additional characters for Garshuni, and more additional Arabic 
characters.

We’ll need to find a DIFFERENT subblock to plant down our Hebrew extended 
characters...  either somewhere ELSE within the BMP, or somewhere within either 
SMP areas 1 or 2.
It’ll be the same arrangement originally planned for the U+00860 area—but 
relocated and expanded upon!

·Additional characters for correct typesetting of Hebrew
·Hebrew Palestinian vowel and pronunciation points
·The small superscript signs śin and shin for the letter shin
·Hebrew Palestinian cantillation
·Hebrew Babylonian vowel and pronunciation points
·Hebrew Babylonian cantillation
·Hebrew Samaritan vowel and pronunciation points
·Additional Hebrew characters for other Jewish languages
A new TXT listing of this subblock (with the new CORRECT location) will be 
forthcoming.  STAY TUNED!




RE: Mammal emoji

2016-03-07 Thread Peter Constable
I know you’re not proposing anything and just providing info for discussion. I 
want to make sure it’s clear to others that there is no requirement for encoded 
emoji in Unicode to provide comprehensive coverage (by any measure) of any 
semantic or conceptual domain. So, if there isn’t any raccoon emoji in Unicode, 
that doesn’t imply that there must or ever will be a raccoon emoji.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner
Sent: Sunday, March 6, 2016 8:56 PM
To: Unicode Mailing List 
Subject: Mammal emoji

Seeing the presence of foxes on the upcoming emoji list, I remembered the 
Audubon Mammals (North America) app has silhouettes of mammals on the browse by 
shape tab. So let's see if they're covered:
Armored Mammals (-): Okay, we're off to a bad start. The image here is sort of 
porcupine-ish, and there's two distinct creatures under the label, the 
porcupine(-) and armadillo(-). Neither of which are in Unicode.
Bats (N): In the new list. Which is good; they're sort of iconic.
Bears(+)
Cats(+): Several varieties
Chipmunks, Squirrels and Prairie Dogs(+): Breaking down more than icons the app 
uses, there is a Chipmunk(+) emoji, no Squirrel(-) emoji; that might be an 
oversight. Prairie dogs(-) probably aren't.
Hoofed Mammals(+): Breaking it down more Bison(-), Sheep(+), Reindeer (-) (and 
that's sort of surprising), Peccary(-), Deer(N),  Moose(-) (aka Elk in Europe), 
Ox(+) (actually Muskox ... and I'm pretty sure that's a distinction Unicode 
doesn't want to worry about), Pronghorn (-) (nor antelope(-), or the actually 
related giraffe(-) and okapi(-). Probably covered by the unrelated deer.) 
Boar(+), Horse(+)
Large Rodents(-): Beaver(-), Muskrat(-), Marmot(-), Nutria(-)
Marine Mammals(+): Dolphin(+), Whale(+), Seal (-), Sea Lion(-), Walrus(-), 
Manatee(-)
Mice and Rats(+): Mouse(+), Rat(+)
Opossum(-):
Otters(-):
Rabbits and Hares(+):
Raccoons and Their Kin(-):
Shrews and Moles(-):
Voles, Lemmings, Pikas, and Pocket Gophers(-):
Weasels, Skunks and Their Kin(-): While a disparate group, badgers(-), 
skunks(-), ferrets(-), weasels(-) and wolverines(-) all have arguments for 
encoding.
Wolves, Foxes, and Coyote(+): Fox(+), Dog(+), Wolf(+), Coyote(-)
So nine icons out of the 17 have a reasonable encoding in Unicode. To cover the 
set would need an armadillo or porcupine, a beaver, a possum, an otter, a 
raccoon, a shrew, a lemming, and a weasel or skunk. Beavers (O Canada!), 
raccoons, ferrets/weasel (popular pet) and skunk (emoji uses abound) probably 
have the best encoding arguments there.
(This is not an actual proposal, but feel free to forward it on if anyone might 
want to make one. Just a discussion of a set of icons in the reflection of 
emoji.)


RE: Enclosing BANKNOTE emoji?

2016-02-09 Thread Peter Constable
I wish emojitracker had an option to see cumulative stats spanning only the 
last (say) 7 days, rather than (I assume) all time. This would be more 
representative of current usage, fixing the problem of recent introductions. 
Also, comparing the recent and long-term stats would highlight shifting trends.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Leo Broukhis
Sent: Tuesday, February 9, 2016 2:47 PM
To: Mark Davis ☕️ 
Cc: unicode Unicode Discussion 
Subject: Re: Enclosing BANKNOTE emoji?

The emojiexpress.com site is useful to check which new 
emoji or combinations people actually use, but the stats are likely skewed by 
only measuring input from one platform.
Another way to look at the emojitracker.com stats:
339M people in the Eurozone : 389K uses of Euro emoji
126M people in Japan : 354K uses of Yen emoji
140M people in UK + Turkey (likely users of the Pound emoji as a stand-in for 
Lira) : 515K uses of pound emoji
The total is 605M people : 1258K uses of non-dollar emoji
Assuming the same average frequency of use, 2933K uses of the dollar emoji 
would be produced by 1411M people, out of which us + canada + mexico + 
australia   (500M) + other countries using $ as (part of) the sign for their 
currency are way less than a half. This means that substantially more than 500M 
people are using the dollar emoji by default, instead of emoji of their 
national currencies. Assuming a lesser frequency of use will result in a 
greater estimate of the affected population.
Leo


On Tue, Feb 9, 2016 at 8:51 AM, Mark Davis ☕️ 
mailto:m...@macchiato.com>> wrote:
Look at http://www.emojixpress.com/stats/. The stats are different, since they 
collect data from keyboards not twitter posts, but they have a nice button to 
view only the news emoji.

(The numbers on the new ones will be smaller, just because it takes time for 
systems to support them, and people to start using them. However, they bear out 
my predication that the most popular would be the eyes-rolling face).

Mark

On Tue, Feb 9, 2016 at 5:19 PM, Leo Broukhis 
mailto:l...@mailcom.com>> wrote:
A caveat about using emojitracker.com : it doesn't 
count newer emoji yet (e.g. U+1F37E bottle with popping cork is absent), thus, 
when they are added, their counts will be skewed.
Leo

On Tue, Feb 9, 2016 at 2:00 AM, Leo Broukhis 
mailto:l...@mailcom.com>> wrote:
Thank you for the links, quite mesmerizing!

On emojitracker.com (cumulative counts, but only on 
twitter, AFAICS), U+1F4B5 ($) had quite a respectable count of 2932622 (well 
above the middle of the page, around 70%ile), U+1F4B7 (pound) had 514536 
(around 30%ile), and U+1F4B4 and U+1F4B6 had around 353K and 388K resp. (around 
20%ile, but 10x more than the lowest counts, and about the same frequency as 
various individual clock faces).
It is quite evident that the dollar banknote emoji serves as a stand-in for at 
least half a dozen of various currencies.
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]

On Mon, Feb 8, 2016 at 10:25 PM, Mark Davis ☕️ 
mailto:m...@macchiato.com>> wrote:
I would suggest that you first gather statistics and present statistics on how 
often the current combinations are used compared to other emoji, eg by 
consulting sources such as:

http://www.emojixpress.com/stats/
or
http://emojitracker.com/

Mark

On Mon, Feb 8, 2016 at 8:34 PM, Leo Broukhis 
mailto:l...@mailcom.com>> wrote:
There are

💴 U+01F4B4 Banknote With Yen Sign
💵 U+01F4B5 Banknote With Dollar Sign
💶 U+01F4B6 Banknote With Euro Sign
💷 U+01F4B7 Banknote With Pound Sign

This is clearly an incomplete set. It makes sense to have a generic
"enclosing banknote" emoji character which, when combined with a
currency sign, would produce the corresponding banknote, to forestall
requests for individual emoji for banknotes with remaining currency
signs.

Leo







RE: Latin glottal stop in ID in NWT, Canada

2015-10-30 Thread Peter Constable
The Aleutian islands are a long way from NWT.

I don't associate Tlingit with the Aleutians, and wasn't aware of an early 
Cyrillic orthography.  But it's also not a language of NWT. It's spoken in 
areas near the coast. My sister lives in Carcross, which is a Tlingit village. 
This is hundreds of miles from NWT.


Peter

Sent from my IBM 3277/APL

From: Richard Wordingham<mailto:richard.wording...@ntlworld.com>
Sent: ‎10/‎30/‎2015 16:37
To: Unicode Discussion<mailto:unicode@unicode.org>
Subject: Re: Latin glottal stop in ID in NWT, Canada

On Fri, 30 Oct 2015 22:03:31 +
Peter Constable  wrote:

> This is more plausible. The Tlingit peoples live in coastal regions,
> SW parts of Yukon Territory and Alaska. That's not what I would have
> referred to as "Northwest Territories". And it's totally not related
> to the thread, which was clearly about Northwest Territories, not
> Yukon Territory.

I think Cyrillic got into the thread by mistake.

> Can you point to information on Tlingit materials in Cyrillic script?

Google ('Tlingit Cyrillic') does a better job than me!  There's an
example linked to from the Wikipedia article
https://en.wikipedia.org/wiki/Tlingit_alphabet 'Indication of the
Pathway into the Kingdom of Heaven'.  I presume the original spelling
has been preserved.

There's an interesting account in 'Russian Orthodox Church Of Alaska And
The Aleutian Islands And Its Relation To Native American Traditions:
An Attempt At A Multicultural Society, 1794-1912' by Viacheslav
Vsevolodovich Ivanov.  It's interesting that much of the action
happened under American rule - allegedly Orthodox Christianity did well
because it wasn't American!

Richard.


RE: Latin glottal stop in ID in NWT, Canada

2015-10-30 Thread Peter Constable
This is more plausible. The Tlingit peoples live in coastal regions, SW parts 
of Yukon Territory and Alaska. That's not what I would have referred to as 
"Northwest Territories". And it's totally not related to the thread, which was 
clearly about Northwest Territories, not Yukon Territory. 

Can you point to information on Tlingit materials in Cyrillic script?


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Richard 
Wordingham
Sent: Friday, October 30, 2015 12:09 PM
To: Unicode Discussion 
Subject: Re: Latin glottal stop in ID in NWT, Canada

On Fri, 30 Oct 2015 06:07:36 +
Peter Constable  wrote:

> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of 
> Philippe Verdy Sent: Thursday, October 29, 2015 6:26 AM
> 
> > On the opposite, Native Americans HAVE used the Cyrillic script in 
> > Alaska and probably as well in North-Western territories in Canada…
> 
> In Alaska, yes, because the languages in question are, in fact, 
> Siberian languages.

I wouldn't describe Tlingit as a Siberian language.  There are some old 
Cyrillic script Christian materials in Tlingit. The Canadian connections are in 
British Columbia and Yukon.

Richard.




RE: Latin glottal stop in ID in NWT, Canada

2015-10-30 Thread Peter Constable
> If you look at a "common" map centered on the equatorial line,

Philippe, I have personal ties to northern Canada. I’m aware of the distances. 
Alaska is comparable to the combination of France, Germany, Poland, Belarus and 
Ukraine. The distances involved are comparable to migrating from the Ural 
Mountains to France.

>"North-West Territories" is only today's name of an organized Canadian 
>province.

You said you’re earlier reference was with a more generic meaning. But now you 
clearly misspell when referring to the administrative entity, even after I gave 
you the correct spelling. Also, in Canada, territories are not considered 
provinces: these different types of administrative unit have distinct statuses 
in relation to the constitution and the federal government.

Russian migrants going to wherever doesn’t seem relevant to me. Yes, 
potentially they can influence other peoples, but the only kinds of migrants 
that tend to influence literacy among other people groups are missionaries, and 
I’m not aware of Russian missionaries having worked in the Northwest 
Territories.

The languages in question are spoken in coastal regions of Alaska. You either 
have to cross the width of Alaska or else cross the tall coastal mountains 
before you reach northwestern territories of Canada. It seems very unlikely to 
me, given that you’re dealing with very, very different ecological and 
climactic zones.

> there could remain old books

I could just as readily speculate that early Gauls in Normandy wrote with early 
ideographic writing. After all, it is far easier to migrate across Eurasia, 
with much less variation in climactic zones, than to go from the Alaskan coast 
to the Canadian interior.

Rather than speculate, can we just stick to documented attestations we can 
point to? Hypothetical possibilities about Cyrillic don’t seem too relevant to 
the topic of actual glottal stop usage in Canada, which is fairly well 
documented.



Peter


From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy
Sent: Friday, October 30, 2015 6:00 AM
To: Peter Constable 
Cc: Marcel Schneider ; Unicode Discussion 
; Leo Broukhis 
Subject: Re: Latin glottal stop in ID in NWT, Canada

Borders around Alaska were very fuzzy and native Americans were mobile in the 
region. It seems unaoidable that at some time some of their languages have been 
written by some missionaries and books/religious texts exhanged around.

As well, even before Alaska was sold by the Russian Empire to USA, there were 
also many Russian migrants going to Canada and USA via Alaska,; and meeting 
also native Americans. The US and British Canadian authorities were not as 
active as they are today in those areas, and aboriginal populations (as well as 
many mùigrants) were certainly more autonomous and more mobile than they are 
today, and had more cultural exchanges. At that time they were still not small 
minorities as they are today, and the usage of English nad French by them was 
much less common.


PS: Note that I used the term "probably". "North-West Territories" is only 
today's name of an organized Canadian province. For long, this area was not 
incorporated, so I used a *generic* term (with "territories" in lowercase (and 
the term I used was probably referring to the whole Arctic region, where native 
Americans were travelling for long distances across seasons for their 
traditional fishery and hunting).

If you look at a "common" map centered on the equatorial line, the artic region 
seems enormous, but look at a map centered on the pole, and consider what were 
the limits of the iceshelfs in past centuries and how those populations were 
living in the area, independantly of the European/American and Asian countries 
established to the south. The arctic Ocean was an essential resource and people 
lived all around it on a quite thin border of land and on iceshelfs with very 
scarce resources. They had to be mobile and received little help from the 
south. But the area was also regularly visited by European and Asian fishers or 
explorers, and notably from Russia looking for routes to the Atlantic or 
Pacific and selling products to local native populations or trying to fix them 
under some imperial rule.

There were also a many more active native languages than those that remain 
today, many of them are now extinct or persist only in some old transcriptions 
written in the Latin or Cyrillic alphabets (possibly in sinograms or Mongolian 
scripts too, with Chinese or Japanese explorers, fishers and merchants from 
their former empirial regimes: there could remain old books transcribing some 
of those old arctic native languages), but these old transiptions may have been 
preciously kept by today's native peoples in their local communities, or they 
could remain in some museum or public library all around the Northern 
hemisphere.


20

RE: Latin glottal stop in ID in NWT, Canada

2015-10-29 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe Verdy
Sent: Thursday, October 29, 2015 6:26 AM

> On the opposite, Native Americans HAVE used the Cyrillic script in Alaska
> and probably as well in North-Western territories in Canada…

In Alaska, yes, because the languages in question are, in fact, Siberian 
languages.

But where have you gotten the idea that Cyrillic script has been used in 
orthographies for languages spoken in Northwest Territories? I’ve never seen 
any indication of that, and I am very doubtful.

(Btw, it’s “the Northwest Territories”, not “North-Western territories”.)



Peter


RE: The scope of Unicode (from Re: How can my research become implemented in a standardized manner?)

2015-10-22 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Mark E. Shoulson
Sent: Friday, October 23, 2015 9:48 AM

> I have no idea why deposition with the British Library is in any way 
> significant or even relevant.  It's nice to mail documents to people who will 
> save them, yes.

Hmmm... If I (or anyone else) were to forward to the British Library every item 
I post to this or other public lists or fora, or anything else I'd like to have 
publicly recorded, they'll provide a permanent, public record? I would have 
expected them to be pretty selective of what things they decide to hang onto.



Peter



RE: Rights to the Emoji

2015-10-12 Thread Peter Constable
Exactly: specific designs are subject to license terms determined by the 
original designer, which are liberal in some cases and not in others. But the 
concept of a such-and-such emoji and it's encoded representation are not an 
issue.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Martin J. Dürst
Sent: Sunday, October 11, 2015 9:59 PM
To: patapatachakapon . 
Cc: Shervin Afshar ; unicode@unicode.org
Subject: Re: Rights to the Emoji

You can also design your own version of the emoji you want to use. [I'm not a 
lawyer, but as far as I understand,] what's protected is the individual design, 
not the idea of a "donut" or "frowning face" emoji as such.

Regards,   Martin.

On 2015/10/12 09:51, Shervin Afshar wrote:
> Those listed in the column titled "Native" come from the operating 
> system (in your case, Mac OS X) and/or browser you are viewing that 
> page on. One can assume that the right to those belong to the entity 
> who develops those software.
>
> A safer approach for you would be to use symbols from Emoji One[1]; if 
> you can attribute that project on your products, you can use them for 
> free; if you can not do that, they require that you contact them for a 
> custom paid license [2].
>
> Also, with the paid license you are helping a project publishing 
> content under Creative Common license.
>
> [1]: http://emojione.com/
> [2]: http://emojione.com/faq#faq5
>
> ↪ Shervin
>
> On Sat, Oct 10, 2015 at 5:59 AM, patapatachakapon . < 
> bugraaydin1...@gmail.com> wrote:
>
>> Hello,
>>
>> I work for a small company in Turkey. We would like to import/sell 
>> products that have pictures of Emoji on them (such as keychains, cups 
>> etc.) , here in Turkey. The Emoji we would like to use on our 
>> products are the ones that are titled Native on the chart that I've attached 
>> to this email.
>> I would like to know whether or not it's required to buy the rights 
>> these Emoji. Are Emoji copyrighted, or can they be used by anyone for 
>> design purposes?
>>
>> Thanks so much in advance!
>>
>



RE: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Peter Constable
> If a term were invented, you'd generally have to explain it, and you

would do better just to remind readers what ASCII is.



+1





Peter

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for 
Windows 10





From: Richard Wordingham
Sent: Tuesday, September 22, 2015 12:51 AM
To: unicode@unicode.org
Subject: Re: Concise term for non-ASCII Unicode characters


On Sun, 20 Sep 2015 16:52:29 +
Peter Constable  wrote:

> You already have been using "non-ASCII Unicode", which is about as
> concise and sufficiently accurate as you'll get. There's no term
> specifically defined in any standard or conventionally used for this.

As to standards, UTS#18 'Unicode Regular Expression' Requirement RL1.2
requires the support of the 'property' it calls 'ASCII', which is
defined in Section 1.2.1 as the property of being in the range U+ to
U+007F. This implicitly makes 'not ASCII' a derived property held by all
the other codepoints. If you fear that your audience will think that
Latin-1 characters are ASCII, you'll just have to go for the clumsy
'not 7-bit ASCII'  and accept that there isn't an unambiguous way in
English of turning that into an adjective or noun.

If a term were invented, you'd generally have to explain it, and you
would do better just to remind readers what ASCII is.

Richard.




RE: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean Leonard
Sent: Monday, September 21, 2015 1:22 AM

> Well what I am getting at is that when writing standards documents in various 
> SDOs (or any other
> computer science text, for that matter), it is helpful to identify these 
> characters/code points.

[snip]

> However, in contexts where ASCII is getting extended or supplemented (e.g., 
> in the DNS or in e-mail), 
> one needs to be really > clear that the octets 0x80 - 0xFF are Unicode 
> (specifically UTF-8, I suppose), 
> and not something else.

Well, if you are writing standards that "extend ASCII", then you need to be 
completely clear that what is being discussed is _not ASCII_. In that sense, I 
agree with Tony Jollans comments: be clear about what it is that is being 
discussed — including what coded character set, or what encoding form for what 
coded character set.


> FWIW, the term "non-ASCII" is used in e-mail address internationalization 
> ("EAI") in the IETF; its 
> opposite is "all-ASCII" (or simply "ASCII"). (RFCs 6530, 6531, 6532). The 
> term also appears in RFC 
> 2047 from November 1996 but there it has the more expansive meaning (i.e., 
> not limited or 
> targeted to Unicode).

Glancing at the Introduction for RFC 6530, it seems to have clear terminology:

" Without the extensions specified in this document, the mailbox name is 
restricted to a subset of 7-bit ASCII [RFC5321].  Though MIME [RFC2045] enables 
the transport of non-ASCII data..."

Here, "ASCII" means ASCII — the 7-bit encoding originally defined as ANSI X3.4. 
And "non-ASCII data" appears to mean data involving any characters other than 
those in the ASCII coded character set, or any data represented in any other 
encoded representation but ASCII. The term "all-ASCII" is used in section 4.2, 
but it is immediately defined: 

"In this document, an address is "all-ASCII", or just an "ASCII address", if 
every character in the address is in the ASCII character repertoire [ASCII]; an 
address is "non-ASCII", or an "i18n-address", if any character is not in the 
ASCII character repertoire."

So, it seems like they had a similar terminology need to what you describe, and 
the handled it in a satisfactory, clear way.


If what you need to describe is UTF-8 sequences of two or more bytes, then I 
would be clear that the context is Unicode UTF-8, not ASCII or any other coded 
character set / encoding form; and I would say, "Unicode UTF-8 code unit 
sequences of two to four bytes" or "Unicode UTF-8 multi-byte sequences" or 
something along those lines.

If you think it's a serious problem that there isn't one conventional term for 
"characters outside the ASCII repertoire" or "UTF-8 multi-code-unit encoded 
representations" (since different authors could devise different terminology 
solutions), then I suggest you submit a document to UTC explaining why it's a 
problem, documenting inconsistent or unclear terminology that's been used in 
some standards / public specifications, and requesting that Unicode formally 
define terminology for these concepts. I can't guarantee that UTC will do it, 
but I can predict with confidence that it _won't_ do anything of that nature if 
nobody submits such a document.



Peter



RE: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Peter Constable
Check here:

http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS+4-1986%5bR2012%5d


-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean Leonard
Sent: Monday, September 21, 2015 1:52 PM
To: unicode@unicode.org
Subject: Re: Concise term for non-ASCII Unicode characters

Related question as I am researching this:

How can I acquire (cheaply or free) the latest and most official copy of 
US-ASCII, namely, the version that Unicode references?

The Unicode Standard 8.0 refers to the following document:

ANSI X3.4: American National Standards Institute. Coded character set—7-bit 
American national standard code for information interchange. New York: 1986. 
(ANSI X3.4-1986).

(See page 294.)

A quick Google search did not yield results. There are public/university 
library hard copies but they are hundreds of miles away from my location.

Sean




RE: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Peter Constable
Well, if the point is to refer to characters that would require two or more 
code units in UTF-8, then _accurate_ expressions would be, "Unicode characters 
beyond the Basic Latin block" or "Unicode characters above U+007F".


Peter 

-Original Message-
From: Steve Swales [mailto:st...@swales.us] 
Sent: Sunday, September 20, 2015 11:00 AM
To: Phillips, Addison 
Cc: Peter Constable ; Sean Leonard 
; unicode@unicode.org
Subject: Re: Concise term for non-ASCII Unicode characters

Exactly. I think the reason that non-ASCII feels non-concise is that there is 
widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is 
widely confused with Windows-1252).

-steve  




Sent from my iPhone


> On Sep 20, 2015, at 10:05 AM, Phillips, Addison  wrote:
> 
> I agree, although I note that sometimes the additional (redundant) 
> specificity of "non-7-bit-ASCII characters" is needed when talking to people 
> unclear on what "ASCII" means.
> 
> Addison
> 
>> -Original Message-----
>> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter 
>> Constable
>> Sent: Sunday, September 20, 2015 9:52 AM
>> To: Sean Leonard; unicode@unicode.org
>> Subject: RE: Concise term for non-ASCII Unicode characters
>> 
>> You already have been using "non-ASCII Unicode", which is about as 
>> concise and sufficiently accurate as you'll get. There's no term 
>> specifically defined in any standard or conventionally used for this.
>> 
>> 
>> Peter
>> 
>> -Original Message-
>> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean 
>> Leonard
>> Sent: Sunday, September 20, 2015 7:48 AM
>> To: unicode@unicode.org
>> Subject: Concise term for non-ASCII Unicode characters
>> 
>> What is the most concise term for characters or code points outside 
>> of the US-ASCII range (U+ - U+007F)? Sometimes I have referred to 
>> these as "extended characters" or "non-ASCII Unicode" but I do not 
>> find those terms precise. We are talking about the code points U+0080 
>> - U+10. I suppose that this also refers to code points/scalar 
>> values that are not formally Unicode characters, such as U+. 
>> Basically, I am looking for a concise term for values that would 
>> require multiple UTF-8 octets if encoded in UTF-8 (without referring to 
>> UTF-8 encoding specifically).
>> "Non-ASCII" is not precise enough since character sets like Shift-JIS 
>> are non- ASCII.
>> 
>> Also a citation to a relevant standard (whether Unicode or otherwise) 
>> would be helpful.
>> 
>> The terms "supplementary character" and "supplementary code point" 
>> are defined in the Unicode standard, referring to characters or code 
>> points above U+. I am looking for something like those, but for 
>> characters or code points above U+007F.
>> 
>> Thank you,
>> 
>> Sean
> 
> 



RE: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Peter Constable
You already have been using "non-ASCII Unicode", which is about as concise and 
sufficiently accurate as you'll get. There's no term specifically defined in 
any standard or conventionally used for this.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean Leonard
Sent: Sunday, September 20, 2015 7:48 AM
To: unicode@unicode.org
Subject: Concise term for non-ASCII Unicode characters

What is the most concise term for characters or code points outside of the 
US-ASCII range (U+ - U+007F)? Sometimes I have referred to these as 
"extended characters" or "non-ASCII Unicode" but I do not find those terms 
precise. We are talking about the code points U+0080 - U+10. I suppose that 
this also refers to code points/scalar values that are not formally Unicode 
characters, such as U+. Basically, I am looking for a concise term for 
values that would require multiple UTF-8 octets if encoded in UTF-8 (without 
referring to UTF-8 encoding specifically). 
"Non-ASCII" is not precise enough since character sets like Shift-JIS are 
non-ASCII.

Also a citation to a relevant standard (whether Unicode or otherwise) would be 
helpful.

The terms "supplementary character" and "supplementary code point" are defined 
in the Unicode standard, referring to characters or code points above U+. I 
am looking for something like those, but for characters or code points above 
U+007F.

Thank you,

Sean



RE: [somewhat off topic] straw poll

2015-09-11 Thread Peter Constable
I did not intend to create a disturbance. Nor did I intend to do anything that 
might possibly be perceived as seeking action from the list administrator. (I 
mention that since Sarasvati was invoked.) And I certainly was not intending in 
any way to bring up moratoria that may have been declared on past topics or to 
suggest moratoria on new topics. (I mention that since somehow a 
previously-declared moratorium was raised in a reply to my original post.)

I was merely seeking an indication of sentiment on the list regarding certain 
topics. This arose from an off-list discussion with one list member who has on 
occasion posted on certain topics and who indicated interest in seeing an 
indication of sentiment from the list.

But it seem like my approach may be stirring up trouble and hence was not 
well-conceived.

Hence, I apologize to the list and to any individuals I may have offended by 
this.


Peter



RE: VS: [somewhat off topic] straw poll

2015-09-11 Thread Peter Constable
UTC can act on documents submitted to it, or to input submitted to it via the 
contact form (http://www.unicode.org/reporting.html), but will not act in 
response solely to topics discussed in this list.

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell
Sent: Friday, September 11, 2015 11:11 AM
To: Mark Davis ☕️ 
Cc: Unicode Mailing List 
Subject: RE: VS: [somewhat off topic] straw poll

Mark Davis 🍗  wrote:

> I suggest that you create a proposal for the UTC so that it can go on 
> record; I suspect it will get a favorable reception.

I assume this was not meant for me personally. I have no authority to speak for 
UTC. The closest I ever got to that was when I got UTN #14 published.

I'm serious about this (unlike the beer color modifiers). This statement needs 
to come officially and formally from UTC, as William suggested, not from 
randoms like me.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸





RE: [somewhat off topic] straw poll

2015-09-10 Thread Peter Constable
Asmus, this came out of a friendly conversation meant to understand what kinds 
of topics do or don’t seem interesting to people, and how people might react. 
There was real interest in getting some indication of list sentiment. I 
certainly don’t mean to cause offense, or get too off topic. But I won’t push 
this if it’s felt to be that — I am certainly willing to follow the sentiments 
of list members on this and any whether any other topics are appropriate.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
(t)
Sent: Thursday, September 10, 2015 11:23 AM
To: unicode@unicode.org
Subject: Re: [somewhat off topic] straw poll

On 9/10/2015 11:04 AM, Peter Constable wrote:
I was having an offline discussion with someone regarding certain topics that 
may show up on this list on occasion, and the question came up of what evidence 
we might have of sentiment on the list. So, I thought I’d conduct a simple 
straw poll — respond if you feel inclined.

This whole exercise strikes me as off topic.  :)

A./


The questions are framed around this hypothetical scenario: Suppose I were to 
post a message to the list describing some experiment I did, creating a Web 
page containing (say) some Latin characters — not obscure, 
just-added-in-Unicode-8 characters, but ones that have been in the standard for 
some time; that my process for creating the file was to use (say) Notepad and 
entering HTML numeric character references; and that my findings were that it 
worked.

Q1: Would you find that to be an interesting post that adds makes your 
participation in the list more useful, or would you find it a noisy distraction 
that reduces the value you get from participating in the list?

Q2: If I were to send messages along that line on a regular basis, would that 
add value to your participation in the list, or reduce it?

Q3: If 50 people (still a small portion of the list membership) were to send 
messages along that line on a regular basis, would that add value to your 
participation in the list, or reduce it?



Peter




[somewhat off topic] straw poll

2015-09-10 Thread Peter Constable
I was having an offline discussion with someone regarding certain topics that 
may show up on this list on occasion, and the question came up of what evidence 
we might have of sentiment on the list. So, I thought I'd conduct a simple 
straw poll - respond if you feel inclined.

The questions are framed around this hypothetical scenario: Suppose I were to 
post a message to the list describing some experiment I did, creating a Web 
page containing (say) some Latin characters - not obscure, 
just-added-in-Unicode-8 characters, but ones that have been in the standard for 
some time; that my process for creating the file was to use (say) Notepad and 
entering HTML numeric character references; and that my findings were that it 
worked.

Q1: Would you find that to be an interesting post that adds makes your 
participation in the list more useful, or would you find it a noisy distraction 
that reduces the value you get from participating in the list?

Q2: If I were to send messages along that line on a regular basis, would that 
add value to your participation in the list, or reduce it?

Q3: If 50 people (still a small portion of the list membership) were to send 
messages along that line on a regular basis, would that add value to your 
participation in the list, or reduce it?



Peter



RE: Implementing SMP on a UTF-16 OS

2015-08-12 Thread Peter Constable
I’m no expert on driver development, but Max’s comments got me curious.

“Windows Driver Kit (WDK) 10 is integrated with Microsoft Visual Studio 2015…”
https://msdn.microsoft.com/en-us/library/windows/hardware/ff557573(v=vs.85).aspx


“In Visual Studio 2015, the C++ compiler and standard library have been updated 
with enhanced support for C++11 and initial support for certain C++14 features. 
They also include preliminary support for certain features expected to be in 
the C++17 standard.”
https://msdn.microsoft.com/en-us/library/hh409293.aspx



From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Max Truxa
Sent: Monday, August 10, 2015 11:27 PM
To: Marcel Schneider 
Cc: Unicode Mailing List 
Subject: Re: Implementing SMP on a UTF-16 OS


On Aug 10, 2015 10:53 PM, "Marcel Schneider" 
mailto:charupd...@orange.fr>> wrote:
>
> This is clearly a Unicode implementation problem. C and C++ should be 
> standardized for handling of UTF-16. IMO we cannot consider that Windows 
> supports UTF-16 for internal use, if it does not support surrogates pairs 
> except with workarounds using ligatures.

C and C++ *are* "standardized for handling of UTF-16"... and UTF-8... and 
UTF-32.
If you are interested in this topic just search for "C++ Unicode string 
literals" and "C++ Unicode character literals" which are standardized since 
C11/C++11 (with the exception of UTF-8 character literals which will follow in 
C++11; don't know about C though).
The reason you won't be able to easily use these features is because the 
compiler shipping with the WDK is still only supporting C89/C90. And sadly for 
us driver developers Microsoft will not change this.


RE: bang mail

2015-08-10 Thread Peter Constable
Possible exception: you've sent mail with a URL that points to something you 
learned was malicious and want to advise people not to click on that link.

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable
Sent: Monday, August 10, 2015 9:39 AM
To: unicode@unicode.org
Subject: bang mail

I don't think it's helpful or even polite to send bang (high priority) mail to 
this list.


Cheers,
Peter


RE: Standardised Encoding of Text

2015-08-10 Thread Peter Constable
Richard, you can always submit a document to UTC with proposed text to add to 
the Tai Tham block description in a future version.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Richard 
Wordingham
Sent: Sunday, August 9, 2015 11:39 AM
To: Unicode Public 
Subject: Re: Standardised Encoding of Text

On Sun, 9 Aug 2015 17:10:01 +0200
Mark Davis  wrote:

> While it would be good to document more scripts, and more language 
> options per script, that is always subject to getting experts signed 
> up to develop them.
> 
> What I'd really like to see instead of documentation is a data-based 
> approach.
> 
> For example, perhaps the addition of real data to CLDR for a 
> "basic-validity-check" on a language-by-language basis.

One aspect this would not help with is with letter forms that do not resemble 
their forms in the code charts.  The code charts usually broadly answer the 
question "What does this code represent?".  They don't answer the question, 
"What code points represent this glyph?".

Problems I've seen in Tai Tham are the use of U+1A57 TAI THAM CONSONANT SIGN LA 
TANG LAI for the sequence  and of  for
.  The problem is that the subscript 
forms for U+1A43 and U+1A3F are only documented in the proposals.  The 
subscript consonant signs probably add to the confusion of anyone working from 
the code chart.  The people making the errors were far from ignorant of the 
script.

Richard.



bang mail

2015-08-10 Thread Peter Constable
I don't think it's helpful or even polite to send bang (high priority) mail to 
this list.


Cheers,
Peter


RE: Emoji characters for food allergens

2015-08-03 Thread Peter Constable
Once back when I was living in Thailand, I was riding in a taxi to the Bangkok 
airport on a recently-opened highway. There were road signs posted at intervals 
that had a two-digit number (“60” or something like that) enclosed in a circle. 
Having had enough experience with road signage in my home country and also 
other countries, I recognized this to be a speed limit.

But knowing common practices for how many Thais at the time would obtain their 
driver’s license, and the education level of many Thais coming from rural areas 
to work as taxi drivers in Bangkok,  I was curious enough to ask the driver 
what the sign meant. (He being monolingual, this was all in Thai.) He thought 
for a moment and then responded that it was the distance to the airport.

Anecdote aside, the assumption of these discussions is that symbols are iconic 
— which means that the symbol communicates a conventional semantic. And the 
point of this being _conventional_ is that the semantic is not self-evident 
from the appearance of the image, but rather is based on a shared agreement. 
For example, a photograph of a chair is not iconic since it is an ostensive 
rendition of an actual chair. But a symbol of an iron with a dot inside it 
intended to mean “can be ironed with low heat” is iconic because it’s meaning 
is conventional, and like any convention, must be learned.

Some conventions may be universally learned, but very few are. Most are limited 
to particular cultures, and even if used in many cultures, may be learned by 
only small portions of the given culture. Even something like a speed limit 
sign that a driver without a given culture sees every day and is expected to 
understand is not necessarily something that the driver has learned. Much less 
something like icons for handling of laundry, which have been used in several 
countries for a few decades now but that nobody has ever been required to 
learn, and that few people actually do learn to any great extent.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
(t)
Sent: Monday, August 3, 2015 12:01 PM
To: unicode@unicode.org
Subject: Re: Emoji characters for food allergens


I'm sorry to really disagree with this little understandable criticism of 
laundry symbols. The most encountered of the care tags are self-explaining, as 
the washing and iron temperature limits or discouraging. The other symbols 
mainly concern dry cleaning and laundry professionals.

The laundry symbols are like traffic signs. The ones you see daily aren't 
difficult to remember, but any there are always some rare ones that are a bit 
baffling. What you apparently do not realize is that in significant parts of 
the world, these symbols are not common (or occur only as adjunct to text). 
There's therefore no daily reinforcement at all.

Where you live, the situation is reversed; no wonder you are baffled.



All chefs understand English,

I would regard that statement to have a very high probability of being wrong. 
Which would make any conclusions based on it invalid.




RE: ISO 15924

2015-07-15 Thread Peter Constable
I don't see an explanation of the pale yellow or pale green shading. 

Also, re this:

"All changes are displayed in color and italics..."

Every row is a change record, yet not every row (in fact no row) is entirely 
coloured and in italics. If what is meant is "All changed values are displayed 
in color and italics...", then that is still not the case: there are lots of 
coloured cells that do not have italics text. 

To me, it's all rather unclear.


Peter

-Original Message-
From: Unicore [mailto:unicore-boun...@unicode.org] On Behalf Of Michael Everson
Sent: Sunday, July 12, 2015 4:20 AM
To: unicode Unicode Discussion; UnicoRe Mailing List
Subject: Re: ISO 15924

Yes, and this usage is explained on the page (as it has been since 2006).

> On 12 Jul 2015, at 07:09, Peter Constable  wrote:
> 
> Is there a significance to the colours in the table?
> 
> Peter

Michael Everson * http://www.evertype.com/




RE: ISO 15924

2015-07-11 Thread Peter Constable
Is there a significance to the colours in the table?


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Michael Everson
Sent: Thursday, July 9, 2015 2:07 PM
To: unicode Unicode Discussion; UnicoRe Mailing List
Subject: ISO 15924

Please see http://www.unicode.org/iso15924/codechanges.html for today’s updates.

Michael Everson
Registrar, ISO 15924





RE: Adding RAINBOW FLAG to Unicode

2015-07-07 Thread Peter Constable
I never said anything about stability of geopolitical entities. I only 
mentioned stability of encoded character sequences.

Peter

From: Ken Whistler [mailto:kenwhist...@att.net]
Sent: Friday, July 3, 2015 11:24 AM
To: Peter Constable
Cc: unicode@unicode.org
Subject: Re: Adding RAINBOW FLAG to Unicode


On 7/2/2015 5:56 PM, Peter Constable wrote:
Erkki, in this case, I think Philippe is making valid points.


-  For the proposal to be workable requires some means of ensuring 
stability of encoded representations. The way this would be done would be for 
CLDR to provide data with all valid sequences --- effectively becoming a 
registry.

I think that is wrong on a couple of grounds.

First, detailed stability of reference to actual defined geopolitical entities
or particular detailed flag designs is
not *required* for proposal to represent *pictographs* of flags by some
sequence of Unicode characters to be "workable". Sure, more stability
of reference is desirable. But the current RIS pair mechanism for representing
flag pictographs for countries is already "workable" -- it works and is widely 
deployed
and widely used -- without having guarantees that some particular country may
not decide tomorrow to change its official flag and hence result in some
particular pictographic display being obsolete in some sense, for example.

Second, the horse is already out of the barn regarding the particular
data that CLDR would be referring to. This works by reference to
the ISO 3166-2 scheme of subdivisions:

https://en.wikipedia.org/wiki/ISO_3166-2

and *that* becomes the registry required for stability of representations,
plus whatever grandfathering stability-of-code mechanism BCP 47
adds on top of that. We don't require a further detailed level of
registration, I think, to make this workable. If the New Zealand
Hawke's Bay Regional Council (NZ-HKB) decided it needed a district
flag (or decided to change one it may already have), I'm not going to be
overly concerned about the details there. As long as
 has a stable definition as
a Unicode extended flag tag sequence, it is up to somebody else to
decide if they want to actually map a Hawke's Bay flag pictograph in a font to
that sequence -- or update the flag pictograph they may have been
using.

Yeah, this could be a giant headache for any vendor that felt they
had to support *every possible* region/subdivision sequence
and keep the exact representations of flag pictographs stable. But
I predict this will very, very quickly result in people making a
"let's cover the 99% case" set of decisions, and then issues like
"Should we display a flag pictograph for the Hawke's Bay Regional
Council?" will be dealt with by the normal methods of triage for
feature requests.




-  The concepts being denoted are inherently political, often unstable, 
and sometimes highly sensitive.



Sensitive issues aside, a better approach would be to have a URN tagging scheme 
--- which IMO begs the question why this is a Unicode topic as it clearly 
crosses outside the limits of plain text.

A URN tagging scheme might make sense if what we were trying to
do was delegating all identity concerns to external authority,
and if we didn't care about efficiency of representation, either.

I don't think that is what this is about, as I tried to make clear yesterday.
I don't think we are encoding *flags* -- we are creating a mechanism
for the reliable representation of a set of *pictographs (emoji) for flags*.
And those pictographs for flags need an efficient representation that
can coexist comfortably with the rest of plain text -- the way the RIS
pairs already do.



Sensitive issues considered, though, it begs the question as to whether Unicode 
should be considering any of this at all, no matter what the scheme for encoded 
representation may be. Someone helpfully reminded us of this:


>> [...] the UTC does not wish to entertain further proposals for

>> encoding of symbol characters for flags, whether national, state,

>> regional, international, or otherwise. References to UTC Minutes:

>> [134-C2], January 28, 2013.

I believe that that statement (and the referenced decision) refer
specifically to the unwillingness of the UTC to entertain proposals
for encoding an indefinite number of pictographs for flags (of
whatever variety) *as symbol characters* -- that is,
one-by-one encodings as a single, gc=So code point in the standard.
Heading that direction is clearly not an efficient way to deal with
the concern, and would waste everybody's time in one-by-one
proposals and ad hoc decisions for each individual flag pictograph
to be added.

The UTC has a long history of putting a stake in the ground when it
encounters a character encoding problem which requires a *general*
solution, rather than a dribbling in of one-off decisions an item
at a time. And I think the tag p

RE: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of Flags)

2015-07-02 Thread Peter Constable
Erkki, in this case, I think Philippe is making valid points.


-  For the proposal to be workable requires some means of ensuring 
stability of encoded representations. The way this would be done would be for 
CLDR to provide data with all valid sequences --- effectively becoming a 
registry.

-  The concepts being denoted are inherently political, often unstable, 
and sometimes highly sensitive.

Sensitive issues aside, a better approach would be to have a URN tagging scheme 
--- which IMO begs the question why this is a Unicode topic as it clearly 
crosses outside the limits of plain text.

Sensitive issues considered, though, it begs the question as to whether Unicode 
should be considering any of this at all, no matter what the scheme for encoded 
representation may be. Someone helpfully reminded us of this:


>> [...] the UTC does not wish to entertain further proposals for

>> encoding of symbol characters for flags, whether national, state,

>> regional, international, or otherwise. References to UTC Minutes:

>> [134-C2], January 28, 2013.



Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Erkki I 
Kolehmainen
Sent: Thursday, July 2, 2015 5:42 PM
To: verd...@wanadoo.fr; 'Mark Davis ☕️'
Cc: 'Doug Ewell'; 'Unicode Mailing List'
Subject: VS: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types 
of Flags)

I cannot but agree with Mark! Thus, please…

Sincerely, Erkki

Lähettäjä: Unicode [mailto:unicode-boun...@unicode.org] Puolesta Philippe Verdy
Lähetetty: 2. heinäkuuta 2015 12:02
Vastaanottaja: Mark Davis ☕️
Kopio: Doug Ewell; Unicode Mailing List
Aihe: Re: Adding RAINBOW FLAG to Unicode (Fwd: Representing Additional Types of 
Flags)

The political subject is immediately related to the designation of flags and 
their association to ISO 3166-1 and -2 encoded entities. Even if you don't like 
it, this is very political and for a standard seeking for stability, I wonder 
how any flag (directly bound to specific political entities at specific dates 
and within some boundaries which may be contested) can be related to ISO 3166 
and its instability (and the fact that ISO 3166 entities have in fact also no 
defined borders, so that ISO 3166-2 is just a political point of view from the 
current ruler of the current ISO 3166-1 entity).

All this topic is political. In fact the real flags are not even encoded with 
RIS, not even for current nations (and there's still a problem to know what is 
a recognized nation, even when just considering the UN definition. Political 
entities are defined but with fuzzy borders, they just represent in fact some 
local governments, not necessarily their lands, people, or cultures, and in 
some cases they are in exil or not even ruling: their seat in the UN is vacant 
and they exist only on the paper, but even UN members disagree about which 
treaty they recognize).

Consider the case of Western Sahara (which no longer exists except on the paper 
as a dependency of Spain that has abandoned it completely) and with two 
governments competing to control the territory (Morocco controlling most of it, 
another part claimed by Mauritania then abandonned, another part left without 
infrastructures, and many refugees left de facto in Mauritania or Algeria). 
None of the two autorities designate that territory as "Western Sahara". So it 
no longer exists (and will likely never exist again).

The frozen status of Antarctica has not created any new country or territory, 
even if there's a sort of joint administration: that adminsitration does not 
suppresses the existing claims (and new claims that have been made since its 
creation). So this area has no well defined flag and various falgs are used 
informally plus national flags for each claim and sometimes specific regional 
flags created ad hoc. The use of RIS for ISO 3166-1 and its limited extension 
for ISO3166-2 (slightly modified) does not resolve the problem.

In really there's still no standard way to encode flags unambiguously and in a 
stable way. We'd like to have FOTW (Flags of the World) contributors to propose 
their own scheme. But it will not be compatible with the current RIS solution 
or the proposed extension. If ever such standard emerges, it will require 
encoding a new set of characters.

An alternative would be to embed an URN (not reencoded) between some pairs of 
controls (to embed an object by reference) and use that sequence after a White 
flag symbol with a joiner.

The URN scheme being the best long term solution (and preferable to URLs bound 
to specific servers), but we could in fact a generic URI encapsulation 
(supporting URNs and URLs).

It could be used then for representing various kinds of entities, and then link 
them to specific forms: flags, banners, flying flag, flag over a person face, 
micni location maps, "flag maps"... Programs not recognizing the encoded 
entities would have a very simply way to scan over the encasulated URI 
representing

RE: WORD JOINER vs ZWNBSP

2015-06-27 Thread Peter Constable
Marcel: Can you please clarify in what way Windows 7 is not supporting U+2060.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Petr Tomasek
Sent: Friday, June 26, 2015 4:48 PM
To: Marcel Schneider
Cc: Unicode Mailing List
Subject: Re: WORD JOINER vs ZWNBSP

On Fri, Jun 26, 2015 at 12:48:39PM +0200, Marcel Schneider wrote:
> 
> However, despite of the word joiner having been encoded and recommended since 
> version 3.2 of the Standard, it is still not implemented on Windows 7. 
> Therefore I must use the traditional zero width no-break space U+FEFF 
> instead. 

Therefore you should complain by Microsoft, not here.

> Supposing that Microsoft choose not to implement U+2060 WJ

Then you should probably choose another operating system which does...

Petr Tomasek




moratorium on repeated discussion of rejected topics

2015-06-24 Thread Peter Constable
Dear Sarasvati:

There is a new thread on the topic of using characters to give abstract 
representation of semantic propositions that can be rendered as sentences in 
various languages - so called "localizable sentences". This idea has been 
brought up repeatedly over several years now and has gained no traction as 
having potential for a Unicode encoding proposal. To having this topic 
continually re-opened is tiresome; it's a form of spam on this list, degrading 
the experience for all who come to the list to discuss reasonable proposals or 
to get help with real usage scenarios. I wonder if you might want to consider 
putting a moratorium on further discussion of this topic.



Peter





RE: Accessing the WG2 document register

2015-06-16 Thread Peter Constable
There are changes in processes, but nothing that I would consider new 
_discrimination_. Also, Mr. Pandey’s positions have always and continue to be 
very well represented in ISO/IEC JTC1/SC2/WG2.

Again, if you are not yourself engaging in ISO processes or working with your 
country’s national standards body in connection to ISO processes, then you are 
not in a good position to be critiquing ISO processes.


Peter

From: Marcel Schneider [mailto:charupd...@orange.fr]
Sent: Tuesday, June 16, 2015 10:05 AM
To: Peter Constable
Cc: Unicode Mailing List
Subject: RE: Accessing the WG2 document register
Importance: High


On Mon, Jun 15, 2015, Peter Constable 
mailto:peter...@microsoft.com>> wrote:

> I suggest that people on this list that have not personally engaged directly 
> in ISO process via their country’s
> designated standards bodies should stop opining and editorializing on that 
> body.
>
> ISO isn’t perfect by any means, but in the many years I have been directly 
> involved in ISO process
> I can’t say I’ve ever seen discrimination other than appropriate 
> discrimination of ideas on technical merits.

Please consider that Mr Pandey reported a *new* rule change and *new* 
discrimination you canʼt have experienced in the past.

If you have carefully read the emails in this thread, you learned that this new 
discrimination is all but “appropriate discrimination of ideas on technical 
merits” which you refer to. You will be the more indignated, and the more you 
will welcome everybody who does the same.

Having the honor of discussing here, I take the matters (I know about) very 
seriously and I know since a long time that unfortunately, persons who are 
obliged to bodies by contract tend not to point out malfunctioning, so other 
people must help to point out and find ways to correct or improve. Even if 
scarcely expecting any thanks, I underscore that unfortunately I canʼt afford 
to do this every day because it takes time, normally I must think about, mature 
and consolidate.

It would be nice if you too, Mr Constable, thanks to your inside experience and 
relationships from your ISO activity, would help Mr Pandey to get heard at ISO 
Workgroup 2 and accessed the documents register. As everybody knows, every 
person who comes up with proposals deserves full attention, respect and 
consideration, especially when the person did already great work and got 
meritorious. ISO managers who persistently prevent workgroups from ethics, 
deserve to be moved from the responsibilities they do not fulfill.

Everybody on the Unicode Mailing List is well placed to know that Unicode 
publicly reports about its activities and accepts public feedback. Quality 
insurance seems little reason for ISO not to accept input from outside national 
Standards Bodies. What are you knowing about the reasons ISO does not, and even 
recently narrowed its eligibility conditions?

Best wishes,
Marcel Schneider


RE: Accessing the WG2 document register

2015-06-15 Thread Peter Constable
I suggest that people on this list that have not personally engaged directly in 
ISO process via their country’s designated standards bodies should stop opining 
and editorializing on that body.

ISO isn’t perfect by any means, but in the many years I have been directly 
involved in ISO process I can’t say I’ve ever seen discrimination other than 
appropriate discrimination of ideas on technical merits.


Peter


From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Marcel Schneider
Sent: Monday, June 15, 2015 1:30 AM
To: wjgo_10...@btinternet.com; pan...@umich.edu; unicode@unicode.org; 
babelst...@gmail.com
Subject: Re: Accessing the WG2 document register


 On Mon, Jun 15, 2015, William_J_G Overington 
mailto:wjgo_10...@btinternet.com>> wrote:

 > I have been thinking about the current discussion in the Unicode mailing 
 > list about a particular ISO committee no longer being allowed to accept 
 > proposal documents from individuals, because of a rule change from a higher 
 > level within ISO.
>
> I am thinking of how the committee meetings might be different from how they 
> would be if the rules had not been changed and what might not get encoded 
> that might have been encoded had the rule change not happened.
>
> In the short term, the individual contributor is hurt, yet in the long term 
> the document encoding process is hurt and the whole world of information 
> technology may be hurt as potentially good content has been ignored due to 
> discrimination, and a standards document produced that is not as good as it 
> could have been had there not been the discrimination.

> ...
> I opine that it is important when deciding what will be considered for 
> encoding that there is no discrimination about considering encoding 
> proposals. Not only does ignoring contributions cause immediate problems but 
> also there can be second order effects and so on as potential later 
> contributions will not be made as they will not have the original 
> contribution to build upon, and many people may not even realize that the 
> second order effects have taken place.
>

I'm shocked that there is still any discrimination, even against individuals, 
in ISO, and worse, that such discrimination has been newly introduced.



This makes me remember the idea I got about ISO when I considered the ISO/IEC 
9995 standard. This standard specifies that on all keyboards, there should be a 
so-called common secondary group, and that this secondary group should contain 
all the characters that are on the keyboard but aren't for a so-called strictly 
national use.  This sounds to me as if it were fascistic or neofascistic. The 
way this secondary group is accessed seems rather complicated and been 
engineered in disconnect from actual OSs and keyboard drivers. The result was 
that when it went on to be implemented on Windows, the secondary group was not 
accessed like specified but as Kana levels, which is very consistent with a 
real keyboard. But in the meantime, this ISO/IEC 9995 standard wastes a whole 
shift state by excluding it simply from use, on the pretext that you need to 
press more than two keys: Shift + AltGr + another key. This restriction to a 
maximum number of two simultaneously pressed keys was so fancy Microsoft didn't 
bother about. Really, to enter a character from the second level of the 
secondary group, you need to press Shift + Kana + another key.  That's all OK, 
but the ISO/IEC 9995 standard is *not*.



I won't repeat what I already wrote on this List. Sincerely I thought that the 
International Association for Standardization is today a real international 
organization which cares for all nations on the earth, whether the proposals 
come from individuals or collectivities. I dimly recall that in the nineties, 
ISO was even likely to refuse demands made by its own national members. Reports 
and results showed that it even dit not consult anybody of the nations it was 
encoding the characters of, except a few people who were not always reliable, 
ISO 8859-1 showed.



To read such things today makes me furious again. I personally wish that you, 
Mr Pandey, Mr West and Mr Overington, be fully heard at ISO and that *all* 
proposals are treated equally, fully, and successfully.

What are we going to do? What are you going to do? I repeat, I'm shocked, and I 
hate ISO again.





Best regards,

Marcel Schneider


> Message du 15/06/15 09:53
> De : "William_J_G Overington" 
> mailto:wjgo_10...@btinternet.com>>
> A : pan...@umich.edu, 
> unicode@unicode.org, 
> babelst...@gmail.com
> Copie à :
> Objet : Re: Accessing the WG2 document register
>
> I have been thinking about the current discussion in the Unicode mailing list 
> about a particular ISO committee no longer being allowed to accept proposal 
> documents from individuals, because of a rule change from a higher level 
> within ISO.
>
> I am thinking of how the

RE: Another take on the English apostrophe in Unicode

2015-06-13 Thread Peter Constable
I should qualify my statement. The Zwicky and Pullum article was a nice piece 
of linguistic analysis regarding the morphological characteristics of “n’t”. 
Their remark about apostrophe, however, was not so much about orthography — 
which was not the focus of their article — but was rather a way of putting an 
exclamation on their findings.

When it comes to orthography, the notion of what comprise words of a language 
is generally pure convention. That’s because there isn’t any single 
_linguistic_ definition of word that gives the same answer when phonological 
vs. morphological or syntactic criteria are applied. There are book-length 
works on just this topic, such as this:

Di Sciullo, Anna Maria, and Edwin Williams. 1987. On the definition of word. 
(Linguistic Inquiry monograph fourteen.) Cambridge, Massachusetts, USA: The MIT 
Press.


Peter

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy
Sent: Saturday, June 13, 2015 12:03 AM
To: Peter Constable
Cc: Kalvesmaki, Joel; Unicode Mailing List
Subject: Re: Another take on the English apostrophe in Unicode

I disagree: U+02BC already qualifies as a letter (even if it is not specific to 
the Latin script and is not dual-cased). It is perfectly integrable in 
language-specific alphabets and we don't need another character to encode it 
once again as a letter.

So the only question is about choosing between:
- on one side, U+02BC (the existing apostrophe letter), and other possible 
candidate letters for alternate forms (including U+02C8 for the vertical form, 
and the common fallback letter U+00B4 present in many legacy fonts for systems 
built before the UCS was standardized and using legacy 8-bit charsets such as 
ISO 8859-1).
- and on the other side, U+2019 where it is encoded as a quotation punctuation 
mark (like also the legacy ASCII single quote)

Note that U+00B4 (from ISO 8859-1) has also been used in association with 
U+0074 (from ASCII) to replace the more ambiguous ASCII quote U+0027 by 
assigning an orientation: the exact shape of these two is variable, between a 
thin rectangle, or a wedge, or a curly comma (shaped like 6 and 9 digits), as 
well as the exact angle when it is a wedge or thin rectangle (these characters 
however have been used since long in overstriking mode to add accents over 
Latin capital letters, so the curly comma shapes are very uncommon and they are 
more horizontal than vertical and U+00B4 will be a very poor cantidate for the 
apostrophe that should have a narrow advance width.

So there remains in practice U+02BC and U+02C8 for this apostrophe letter 
(which one you'll use is a matter of preference but U+02C8  will not be used if 
there are two distinct apostrophes in the language (e.g. in Polynesian 
languages where the distinction was made even more clearer by using right or 
left rings U+02BE/U+02BF, or glottal letters U+02C0/U+02C1 if that letter has a 
very distinctive phonetic realisation as a plain consonnant with two variants 
like in Arabic or even U+02B0 when this is just a breath without stop: the full 
range range U+02B0-U+02C1 offers much enough variations for this letter if you 
need slight phonetic distinctions).

2015-06-13 8:28 GMT+02:00 Peter Constable 
mailto:peter...@microsoft.com>>:
Nice article, as I recall. (Been a long time.)


Peter

-Original Message-
From: Unicode 
[mailto:unicode-boun...@unicode.org<mailto:unicode-boun...@unicode.org>] On 
Behalf Of Kalvesmaki, Joel
Sent: Friday, June 5, 2015 7:27 AM
To: Unicode Mailing List
Subject: Re: Another take on the English apostrophe in Unicode

I don't have a particular position staked out. But to this discussion should be 
added the very interesting work done by Zwicky and Pullum arguing that the 
apostrophe is the 27th letter of the Latin alphabet. Neither U+2019 nor U+02BC 
would satisfy that position. See:

Zwicky and Pullum 1983 Zwicky, Arnold M., and Geoffrey K. Pullum. 
"Cliticization vs. Inflection: English N'T."Language59, no. 3 (1983): 502-513.

It's nicely summarized and discussed here:
http://chronicle.com/blogs/linguafranca/2013/03/22/being-an-apostrophe/

jk
--
Joel Kalvesmaki
Editor in Byzantine Studies
Dumbarton Oaks
202 339 6435




RE: Another take on the English apostrophe in Unicode

2015-06-12 Thread Peter Constable
Nice article, as I recall. (Been a long time.)


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Kalvesmaki, Joel
Sent: Friday, June 5, 2015 7:27 AM
To: Unicode Mailing List
Subject: Re: Another take on the English apostrophe in Unicode

I don't have a particular position staked out. But to this discussion should be 
added the very interesting work done by Zwicky and Pullum arguing that the 
apostrophe is the 27th letter of the Latin alphabet. Neither U+2019 nor U+02BC 
would satisfy that position. See:

Zwicky and Pullum 1983 Zwicky, Arnold M., and Geoffrey K. Pullum. 
"Cliticization vs. Inflection: English N'T."Language59, no. 3 (1983): 502-513.

It's nicely summarized and discussed here:
http://chronicle.com/blogs/linguafranca/2013/03/22/being-an-apostrophe/

jk
--
Joel Kalvesmaki
Editor in Byzantine Studies
Dumbarton Oaks
202 339 6435




RE: ISO committees

2015-06-12 Thread Peter Constable
William (who, IIRC, lives in the UK) would need to start by engaging with BSA. 

People can't engage directly as individuals with TC 37 or any other ISO 
committee. ISO membership is not composed of individuals, but of countries, and 
representation is from each country's authorized standards organizations.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell
Sent: Friday, June 12, 2015 9:19 AM
To: Unicode Mailing List
Subject: Re: ISO committees

William_J_G Overington 
wrote:

> Regarding my idea that localizable sentence technology could be 
> implemented in Unicode by reference to detailed codes in an ISO 
> document (not yet written), which would be the best ISO committee to 
> become in charge of producing that document please?

Sounds like something TC 37 might enjoy:

http://www.iso.org/iso/iso_technical_committee.html%3Fcommid%3D48104

https://en.wikipedia.org/wiki/ISO/TC_37

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸





RE: Tag characters

2015-05-27 Thread Peter Constable
Well, the same reasoning could also argue for the contra-positive (a→b ⊨ 
¬b→¬a): that UTC should not consider endorsing such a tag scheme.

Peter

From: William_J_G Overington [mailto:wjgo_10...@btinternet.com]
Sent: Wednesday, May 27, 2015 12:54 AM
To: unicode@unicode.org; Peter Constable; eric.mul...@efele.net; 
asmus-...@ix.netcom.com
Subject: Re: Tag characters

Peter Constable wrote as follows:

> Would Unicode really want to get into the business of running a UFL service?

Well, Unicode is about precision, interoperability and long-term stability, 
and, given, in relation to one particular specified base character followed by 
some tag characters, that a particular sequence of Unicode characters is 
intended to lead to the display of an image representing a particular flag, it 
seems to me highly reasonable that the Unicode Technical Committee might 
seriously consider providing that facility.

William Overington

27 May 2015




RE: Tag characters

2015-05-21 Thread Peter Constable
Would Unicode really want to get into the business of running a UFL service?


P

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
(t)
Sent: Wednesday, May 20, 2015 10:15 PM
To: Eric Muller; unicode@unicode.org
Subject: Re: Tag characters

On 5/20/2015 9:57 PM, Eric Muller wrote:
On 5/20/2015 7:11 PM, Doug Ewell wrote:

In any event, URLs that point to images would be an awful basis for an encoding.

I would make an exception for the URL 
http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html.

Eric.


Currently that gives me
Not Found

The requested URL /Public/8.0.0/ucd/StandardizedFlags.html was not found on 
this server.

:)

However, I agree, all we need to do is create a UFL (Universal Flag Locator) 
and we can keep it as stable as we want.

A./


RE: Tag characters

2015-05-19 Thread Peter Constable
Evidently there were more than two type of people. There are those who feel 50 
years is long enough; there are others who feel that five years is long enough; 
there are likely others that feel 75 or 30 or some other values are long 
enough. Then there are also those who feel that any finite length is probably 
not long enough.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell
Sent: Tuesday, May 19, 2015 10:01 AM
To: Unicode Mailing List
Cc: William_J_G Overington
Subject: Re: Tag characters

William_J_G Overington 
wrote:

>> Hopefully the MA will adhere to the new 50-year limit.
>
> What is MA please?

Maintenance Agency:
http://www.iso.org/iso/home/standards/country_codes.htm

> A 50-year limit seems far too short a time.

There are two types of people: those who feel 50 years is too short, and those 
who feel it is too long.

Fifty years is much better than five, which was the previous limit.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸




RE: Future of Emoji? (was Re: Tag characters)

2015-05-15 Thread Peter Constable
Ah,yes. And Messenger “winks”. E.g.,

http://www.msn-tools.net/free-msn-winks-1.htm

I note that this has .swf files, and that’s what we saw one of the Japanese 
carriers saying they’d be moving to instead of PUA characters.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable
Sent: Friday, May 15, 2015 8:47 AM
To: Shervin Afshar
Cc: unicode@unicode.org
Subject: RE: Future of Emoji? (was Re: Tag characters)

MSN Messenger supported extensible stickers years ago. A couple of sites still 
offering add-ons:

http://www.getsmile.com/
http://www.smileys4msn.com/


Peter

From: Shervin Afshar [mailto:shervinafs...@gmail.com]
Sent: Thursday, May 14, 2015 10:40 PM
To: Peter Constable
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Future of Emoji? (was Re: Tag characters)

Good point. I missed these while looking into compatibility symbols. Of course, 
as with Yahoo[1] and MSN[2] Messenger emoji sets, most of these are mappable to 
current or proposed sets of Unicode emoji (e.g. Lips Sealed ≈ U+1F910 
ZIPPER-MOUTH FACE). It would be interesting to see how the extended support for 
flags, most of smiley faces, objects, etc. on all platforms would affect this 
approach.

My idea of a sticker-based solution is something more like Facebook's[3] or 
Line's[4] implementations.

[1]: http://www.unicode.org/L2/L2015/15059-emoji-im-yahoo.pdf
[2]: http://www.unicode.org/L2/L2015/15058-emoji-im-msn.pdf
[3]: 
http://www.huffingtonpost.com/2014/10/14/facebook-stickers-comments_n_5982546.html
[4]: https://creator.line.me/en/guideline/


↪ Shervin

On Thu, May 14, 2015 at 9:37 PM, Peter Constable 
mailto:peter...@microsoft.com>> wrote:
Skype uses stickers, including animated stickers. Here’s the documented set:

https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons

And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”.



Peter


From: Shervin Afshar 
[mailto:shervinafs...@gmail.com<mailto:shervinafs...@gmail.com>]
Sent: Thursday, May 14, 2015 8:12 PM
To: Peter Constable
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Future of Emoji? (was Re: Tag characters)

Peter,

This very topic was discussed in last meeting of the subcommittee and my 
impression is that there are plans to promote the use of embedded graphics (aka 
stickers) either through expansions to section 8 of TR51 or through some other 
means. It should also be noted that none of current members of Unicode seem to 
have a sticker-based implementation (with the exception of an experimental 
limited trial by Twitter[1]).

[1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/


↪ Shervin

On Thu, May 14, 2015 at 7:44 PM, Peter Constable 
mailto:peter...@microsoft.com>> wrote:
And yet UTC devotes lots of effort (with an entire subcommittee) to encode more 
emoji as characters, but no effort toward any preferred longer term solution 
not based on characters.


Peter

From: Unicode 
[mailto:unicode-boun...@unicode.org<mailto:unicode-boun...@unicode.org>] On 
Behalf Of Shervin Afshar
Sent: Thursday, May 14, 2015 2:27 PM
To: wjgo_10...@btinternet.com<mailto:wjgo_10...@btinternet.com>
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Tag characters

Thinking about this further, could the technique be used to solve the 
requirements of
section 8 Longer Term Solutions

IMO, the industry preferred longer term solution (which is also discussed in 
that section with few existing examples) for emoji, is not going to be based on 
characters.


↪ Shervin

On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington 
mailto:wjgo_10...@btinternet.com>> wrote:
> What else would be possible if the same sort of technique were applied to 
> another base character?


Thinking about this further, could the technique be used to solve the 
requirements of

section 8 Longer Term Solutions

of

http://www.unicode.org/reports/tr51/tr51-2.html

?


Both colour pixel map and colour OpenType vector font solutions would be 
possible.


Colour voxel map and colour vector 3d solids solutions are worth thinking about 
too as fun coding thought experiments that could possibly lead to useful 
practical results.



William Overington


14 May 2015





RE: Future of Emoji? (was Re: Tag characters)

2015-05-15 Thread Peter Constable
MSN Messenger supported extensible stickers years ago. A couple of sites still 
offering add-ons:

http://www.getsmile.com/
http://www.smileys4msn.com/


Peter

From: Shervin Afshar [mailto:shervinafs...@gmail.com]
Sent: Thursday, May 14, 2015 10:40 PM
To: Peter Constable
Cc: unicode@unicode.org
Subject: Re: Future of Emoji? (was Re: Tag characters)

Good point. I missed these while looking into compatibility symbols. Of course, 
as with Yahoo[1] and MSN[2] Messenger emoji sets, most of these are mappable to 
current or proposed sets of Unicode emoji (e.g. Lips Sealed ≈ U+1F910 
ZIPPER-MOUTH FACE). It would be interesting to see how the extended support for 
flags, most of smiley faces, objects, etc. on all platforms would affect this 
approach.

My idea of a sticker-based solution is something more like Facebook's[3] or 
Line's[4] implementations.

[1]: http://www.unicode.org/L2/L2015/15059-emoji-im-yahoo.pdf
[2]: http://www.unicode.org/L2/L2015/15058-emoji-im-msn.pdf
[3]: 
http://www.huffingtonpost.com/2014/10/14/facebook-stickers-comments_n_5982546.html
[4]: https://creator.line.me/en/guideline/


↪ Shervin

On Thu, May 14, 2015 at 9:37 PM, Peter Constable 
mailto:peter...@microsoft.com>> wrote:
Skype uses stickers, including animated stickers. Here’s the documented set:

https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons

And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”.



Peter


From: Shervin Afshar 
[mailto:shervinafs...@gmail.com<mailto:shervinafs...@gmail.com>]
Sent: Thursday, May 14, 2015 8:12 PM
To: Peter Constable
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Future of Emoji? (was Re: Tag characters)

Peter,

This very topic was discussed in last meeting of the subcommittee and my 
impression is that there are plans to promote the use of embedded graphics (aka 
stickers) either through expansions to section 8 of TR51 or through some other 
means. It should also be noted that none of current members of Unicode seem to 
have a sticker-based implementation (with the exception of an experimental 
limited trial by Twitter[1]).

[1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/


↪ Shervin

On Thu, May 14, 2015 at 7:44 PM, Peter Constable 
mailto:peter...@microsoft.com>> wrote:
And yet UTC devotes lots of effort (with an entire subcommittee) to encode more 
emoji as characters, but no effort toward any preferred longer term solution 
not based on characters.


Peter

From: Unicode 
[mailto:unicode-boun...@unicode.org<mailto:unicode-boun...@unicode.org>] On 
Behalf Of Shervin Afshar
Sent: Thursday, May 14, 2015 2:27 PM
To: wjgo_10...@btinternet.com<mailto:wjgo_10...@btinternet.com>
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Tag characters

Thinking about this further, could the technique be used to solve the 
requirements of
section 8 Longer Term Solutions

IMO, the industry preferred longer term solution (which is also discussed in 
that section with few existing examples) for emoji, is not going to be based on 
characters.


↪ Shervin

On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington 
mailto:wjgo_10...@btinternet.com>> wrote:
> What else would be possible if the same sort of technique were applied to 
> another base character?


Thinking about this further, could the technique be used to solve the 
requirements of

section 8 Longer Term Solutions

of

http://www.unicode.org/reports/tr51/tr51-2.html

?


Both colour pixel map and colour OpenType vector font solutions would be 
possible.


Colour voxel map and colour vector 3d solids solutions are worth thinking about 
too as fun coding thought experiments that could possibly lead to useful 
practical results.



William Overington


14 May 2015





RE: Future of Emoji? (was Re: Tag characters)

2015-05-14 Thread Peter Constable
Skype uses stickers, including animated stickers. Here’s the documented set:

https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons

And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”.



Peter


From: Shervin Afshar [mailto:shervinafs...@gmail.com]
Sent: Thursday, May 14, 2015 8:12 PM
To: Peter Constable
Cc: unicode@unicode.org
Subject: Future of Emoji? (was Re: Tag characters)

Peter,

This very topic was discussed in last meeting of the subcommittee and my 
impression is that there are plans to promote the use of embedded graphics (aka 
stickers) either through expansions to section 8 of TR51 or through some other 
means. It should also be noted that none of current members of Unicode seem to 
have a sticker-based implementation (with the exception of an experimental 
limited trial by Twitter[1]).

[1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/


↪ Shervin

On Thu, May 14, 2015 at 7:44 PM, Peter Constable 
mailto:peter...@microsoft.com>> wrote:
And yet UTC devotes lots of effort (with an entire subcommittee) to encode more 
emoji as characters, but no effort toward any preferred longer term solution 
not based on characters.


Peter

From: Unicode 
[mailto:unicode-boun...@unicode.org<mailto:unicode-boun...@unicode.org>] On 
Behalf Of Shervin Afshar
Sent: Thursday, May 14, 2015 2:27 PM
To: wjgo_10...@btinternet.com<mailto:wjgo_10...@btinternet.com>
Cc: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Tag characters

Thinking about this further, could the technique be used to solve the 
requirements of
section 8 Longer Term Solutions

IMO, the industry preferred longer term solution (which is also discussed in 
that section with few existing examples) for emoji, is not going to be based on 
characters.


↪ Shervin

On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington 
mailto:wjgo_10...@btinternet.com>> wrote:
> What else would be possible if the same sort of technique were applied to 
> another base character?


Thinking about this further, could the technique be used to solve the 
requirements of

section 8 Longer Term Solutions

of

http://www.unicode.org/reports/tr51/tr51-2.html

?


Both colour pixel map and colour OpenType vector font solutions would be 
possible.


Colour voxel map and colour vector 3d solids solutions are worth thinking about 
too as fun coding thought experiments that could possibly lead to useful 
practical results.



William Overington


14 May 2015




RE: Tag characters

2015-05-14 Thread Peter Constable
And yet UTC devotes lots of effort (with an entire subcommittee) to encode more 
emoji as characters, but no effort toward any preferred longer term solution 
not based on characters.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Shervin Afshar
Sent: Thursday, May 14, 2015 2:27 PM
To: wjgo_10...@btinternet.com
Cc: unicode@unicode.org
Subject: Re: Tag characters

Thinking about this further, could the technique be used to solve the 
requirements of
section 8 Longer Term Solutions

IMO, the industry preferred longer term solution (which is also discussed in 
that section with few existing examples) for emoji, is not going to be based on 
characters.


↪ Shervin

On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington 
mailto:wjgo_10...@btinternet.com>> wrote:
> What else would be possible if the same sort of technique were applied to 
> another base character?


Thinking about this further, could the technique be used to solve the 
requirements of

section 8 Longer Term Solutions

of

http://www.unicode.org/reports/tr51/tr51-2.html

?


Both colour pixel map and colour OpenType vector font solutions would be 
possible.


Colour voxel map and colour vector 3d solids solutions are worth thinking about 
too as fun coding thought experiments that could possibly lead to useful 
practical results.



William Overington


14 May 2015



RE: Script / font support in Windows 10

2015-05-11 Thread Peter Constable
When the update with Windows 10 info was posted, earlier sections for Windows 
2000 / XP / XP SP2 were inadvertently deleted. Those have been restored.

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable
Sent: Friday, May 8, 2015 7:16 AM
To: unicode@unicode.org
Subject: RE: Script / font support in Windows 10

I think this is the right public link:

https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx


From: Peter Constable
Sent: Thursday, May 7, 2015 10:29 PM
To: Peter Constable; unicode@unicode.org<mailto:unicode@unicode.org>
Subject: RE: Script / font support in Windows 10

Oops... my bad: maybe it isn't on live servers yet. It will be soon. I'll 
update with the public link when it is.

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable
Sent: Thursday, May 7, 2015 10:15 PM
To: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Script / font support in Windows 10

This page on MSDN that provides an overview of Windows support for different 
scripts has now been updated for Windows 10:

https://msdnlive.redmond.corp.microsoft.com/en-us/bb688099



Peter


RE: Script / font support in Windows 10

2015-05-08 Thread Peter Constable
I think this is the right public link:

https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx


From: Peter Constable
Sent: Thursday, May 7, 2015 10:29 PM
To: Peter Constable; unicode@unicode.org
Subject: RE: Script / font support in Windows 10

Oops... my bad: maybe it isn't on live servers yet. It will be soon. I'll 
update with the public link when it is.

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable
Sent: Thursday, May 7, 2015 10:15 PM
To: unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Script / font support in Windows 10

This page on MSDN that provides an overview of Windows support for different 
scripts has now been updated for Windows 10:

https://msdnlive.redmond.corp.microsoft.com/en-us/bb688099



Peter


RE: Script / font support in Windows 10

2015-05-07 Thread Peter Constable
Oops... my bad: maybe it isn't on live servers yet. It will be soon. I'll 
update with the public link when it is.

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable
Sent: Thursday, May 7, 2015 10:15 PM
To: unicode@unicode.org
Subject: Script / font support in Windows 10

This page on MSDN that provides an overview of Windows support for different 
scripts has now been updated for Windows 10:

https://msdnlive.redmond.corp.microsoft.com/en-us/bb688099



Peter


Script / font support in Windows 10

2015-05-07 Thread Peter Constable
This page on MSDN that provides an overview of Windows support for different 
scripts has now been updated for Windows 10:

https://msdnlive.redmond.corp.microsoft.com/en-us/bb688099



Peter


RE: Are you CONFUSED about WHAT CHARACTER(S) you type?!?!

2015-03-25 Thread Peter Constable
It's the first time it was retorted to us, AFAIK.

Sent from my IBM 3277/APL

From: Doug Ewell
Sent: ‎3/‎25/‎2015 3:44 PM
To: Unicode Mailing List
Subject: Re: Are you CONFUSED about WHAT CHARACTER(S) you type?!?!

Robert Wheelock  wrote:

> The default Microsoft Sans Serif font (within Microsoft Windows) has
> this ABOMINABLE habit of substituting this IDENTICAL TO SIGN (which
> should be at U+02261)—because Microsoft (regrettably) placed this math
> symbol where the HOLLOW HEART SUIT should be (at U+02661)!
> * ¡AGONISTES!*

It's a known font bug. It's been around since at least 2010. It's
probably not the end of the world.

(Calling U+2661 by the name HOLLOW HEART SUIT instead of its real name,
WHITE HEART SUIT, is also a bug.)

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: The rapid … erosion of definition ability

2014-11-17 Thread Peter Constable
That would be a bit like our forebears having said, “It’s ridiculous to call 
them ‘tomatoes’ outside Mexico. They’re just big berries, and that’s it.” That 
observation may have been true, but also beside the point if, in practice, the 
Europeans found it convenient and chose to call them ‘tomatoes’.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Andreas Stötzner
Sent: Monday, November 17, 2014 2:09 AM
To: Mark Davis ☕️
Cc: unicode@unicode.org
Subject: Re: The rapid … erosion of definition ability


Am 17.11.2014 um 08:35 schrieb Mark Davis ☕️:


IT’S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous

The only ridiculous thing is to name them “Emoji” outside Japan.
They’re just signs and that’s it.


Regards,
  Andreas Stötzner.




___

Andreas Stötzner  Gestaltung Signographie Fontentwicklung

Haus des Buches
Gerichtsweg 28, Raum 434
04103 Leipzig
0176-86823396

http://stoetzner-gestaltung.prosite.com


















___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Terms for rotations

2014-11-10 Thread Peter Constable
Might also be useful that the primary purpose of the character names is to 
provide unique, reference identifiers that should be reasonably reflective of 
the character identity. But they don't need to guarantee unambiguous 
understanding of the character identity absent of any additional information. 
In particular, two things that can be assumed when interpreting a character 
name to understand the character identity are (1) access to the representative 
glyph for the character from the code charts, and (2) access to the name and 
representative glyph from the code charts for related characters. 

So, for example, the identity of 026F LATIN SMALL LETTER TURNED M and 1D1F 
LATIN SMALL LETTER SIDEWAYS TURNED M can only be clearly understood in 
reference to the representative glyphs for these characters and to 006D LATIN 
SMALL LETTER M.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Whistler, Ken
Sent: Monday, November 10, 2014 4:12 PM
To: Jean-François Colson
Cc: Whistler, Ken; unicode@unicode.org
Subject: RE: Terms for rotations

> Look at this picture:
> http://www.permisecole.com/code-route/priorites/faux-carrefour-a-sens-
> giratoire.jpg Imagine you sit in this car and you want to turn RIGHT. 
> What will you do? Will you turn the driving wheel clockwise or 
> counterclockwise?

And now imagine that you are motoring in a 1904 Cyklonette.
Which way would you move the tiller? ;-)

Seriously, I think that Ilya's point is well-taken. Although in English there 
is a strong association of the phrase "turn to the right" with clockwise motion 
for control devices which rotate, if you take the phrase out of that mechanical 
context and just talk about the orientation of pictures on paper, there can be 
some ambiguity based on the conceptual confusion with the concept of "turning 
to[wards] facing the right", which can mean something very different for 
symbols which seem to have built-in directions, like arrows.

--Ken


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: fonts for U7.0 scripts

2014-10-24 Thread Peter Constable
Have you tried checking what the Unicode Terms of Use has to say about all this?

Let me help: here's the Terms of Use page:

http://www.unicode.org/copyright.html

Regarding online code charts, it says, "The online code charts carry specific 
restrictions."

If you load any of the code chart PDFs, there's a copyright notice that says 
this:


Terms of Use 
You may freely use these code charts for personal or internal business uses 
only. You may not incorporate them either wholly or in part into any product or 
publication, or otherwise distribute them without express written permission 
from the Unicode Consortium. However, you may provide links to these charts. 

The fonts and font data used in production of these code charts may NOT be 
extracted, or used in any other way in any product or publication, without 
permission or license granted by the typeface owner(s).  

The Unicode Consortium is not liable for errors or omissions in this file or 
the standard itself. Information on characters added to the Unicode Standard 
since the publication of the most recent version of the Unicode Standard, as 
well as on characters currently being considered for addition to the Unicode 
Standard can be found on the Unicode web site.


Anyone publishing a book and taking content from some other source is probably 
going to (or should) contact the owner of that content to get permission. The 
Unicode Consortium regularly receives requests for permission to use content.


Peter


-Original Message-
From: Tom Gewecke [mailto:t...@bluesky.org] 
Sent: Friday, October 24, 2014 11:27 AM
To: Peter Constable
Cc: Unicode Public
Subject: Re: fonts for U7.0 scripts


On Oct 24, 2014, at 8:18 AM, Peter Constable wrote:

> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Tom 
> Gewecke
> 
>> If someone wants to publish and sell a book in which they say 
>> something like "This is how Unicode suggests that character U+ is 
>> supposed to look:"
> 
> Well, since the intent of the codes is to give indication of what the 
> character identity is and _not_ to say how the character _should_ look, then 
> it's a good thing if Unicode isn't authors to make such statements.

I probably didn't express myself clearly before.  Even if the book simply says  
"The charts published by Uncode.org indicate that the following would be a 
representative glyph for the Character U+", it seems that you would need 
permission to copy the glyph.   I wonder if that is necessary.



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: fonts for U7.0 scripts

2014-10-24 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Tom Gewecke

> If someone wants to publish and sell a book in which they say something like 
> "This is how Unicode suggests that character U+ is supposed to look:" 

Well, since the intent of the codes is to give indication of what the character 
identity is and _not_ to say how the character _should_ look, then it's a good 
thing if Unicode isn't authors to make such statements.



Peter

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: fonts for U7.0 scripts

2014-10-23 Thread Peter Constable
Sure: People find a font that isn’t a truly functional, Unicode-conformant font 
for script X and…


-  They try using it, find it doesn’t display text as expected, and 
conclude that Unicode doesn’t work for their script

-  The font has glyphs mapped from ASCII characters; they try typing 
and it seems to display their text as desired, so they start generating 
content. Now we have data interop problems.

-  The font kinda works, but not perfectly. They decide that they can 
fix it by just changing some of the glyphs to certain presentation forms and by 
adding certain other glyphs on some unused code positions. Then they start 
generating content. Now we have data interop problems.

The last scenario is really similar to the serious problems we have now for 
Myanmar. Iow, this isn’t just hypothetical.



Peter

From: Shriramana Sharma [mailto:samj...@gmail.com]
Sent: Thursday, October 23, 2014 4:48 PM
To: Peter Constable
Cc: Andrew West; Andrew Glass (WINDOWS); Unicode Public
Subject: Re: fonts for U7.0 scripts

On Thursday, October 23, 2014, Peter Constable 
mailto:peter...@microsoft.com>> wrote:
. But publishing fonts created for the purpose of chart production may lead to 
all kinds of problems if they are not truly functional, Unicode-conformant 
fonts -

Dear Peter,

Can you clarify what "all kinds of problems" you foresee?


--
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: fonts for U7.0 scripts

2014-10-23 Thread Peter Constable
I think Debbie's position is entirely reasonable. Sure, having useful fonts in 
the public domain soon after standardization would be great. But publishing 
fonts created for the purpose of chart production may lead to all kinds of 
problems if they are not truly functional, Unicode-conformant fonts - which is 
not necessarily a product of SEI-funded proposal work.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Andrew West
Sent: Thursday, October 23, 2014 1:47 AM
To: Andrew Glass (WINDOWS)
Cc: Unicode Public
Subject: Re: fonts for U7.0 scripts

On 22 October 2014 21:47, Andrew Glass (WINDOWS)  
wrote:
>
> I think that distributing fonts that are known to be deficient in 
> shaping does not address needs other than reproducing code charts and 
> supressing tofu. Moreover, such fonts create can mislead lead users 
> into thinking that a script is supported when we know that more work 
> remains to be done. When work appears to be complete to someone that 
> can't read a script, then the motivation to address the remaining 
> issues to support that script are undermined. There can also be other 
> negative consequences. I think that making a set of character only fonts 
> available would be against the interests of the SEI and Unicode.

Well, not all scripts have complex rendering behaviour, so for some scripts the 
code chart font mapped to the correct Unicode code points is all that is needed.

Even for fonts with deficient rendering behaviour or which are mapped to ASCII 
or PUA code points, if the font was released under the SIL Open Font license or 
an equivalent free license then people could use it for the basis for a fully 
functional Unicode font.

> In this respect, I think the effort of the Noto project to including 
> shaping support for complex scripts is commendable. I hope that the 
> current gaps in Noto will soon be filled by suitable fonts so that the need 
> to release 'chart-only' fonts is removed.

I'm a great fan of the Noto project, but as Mark's original question indicates 
Noto does not supply a solution for newly encoded scripts, and I very much 
dislike the idea of Google having a monopoly on supplying free fonts for minor 
and historic scripts.  A code chart font, released under a free license such as 
the SIL OFL (with any necessary limitations clearly stated) is far and away 
better than leaving people puzzling over little square boxes for years.

Andrew
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Current support for N'Ko

2014-09-29 Thread Peter Constable
Don,

You mention testing IE 8. That's a 5.5-year-old version that shipped before 
N'Ko script was supported on any platform. It's interesting that anything 
worked. You also mentioned IE11 on Windows 7 but testing without the Deja Vu 
fonts. Windows has supported N'Ko since Windows 8. Did you try testing with 
that and using the Ebrima font?

Btw, the text appears to display correctly on my Windows Phone.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of 
d...@bisharat.net
Sent: Friday, September 26, 2014 8:11 PM
To: unicode@unicode.org
Cc: charles.ri...@yale.edu
Subject: Current support for N'Ko

Some observations concerning N'Ko support in browsers may be of
interest:

http://niamey.blogspot.com/2014/09/nko-on-web-review-of-experience-with.html

This is pursuant to reposting a translation in N'Ko of a World Heath 
Organization FAQ on ebola. That translation was one of several facilitated by 
Athinkra LLC, and available at https://sites.google.com/site/athinkra/ebola-faqs

Don Osborn
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


FYI: Ruble sign in Windows

2014-08-14 Thread Peter Constable
For those interested, there is an update for Windows available now to add font, 
keyboard and locale data support for the Ruble sign that was added in Unicode 
7.0. For details, see here:

http://support.microsoft.com/kb/2970228




Peter
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Corrigendum #9

2014-06-12 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Karl Williamson
Sent: Wednesday, June 11, 2014 9:30 PM

> I have a something like a library that was written a long time ago 
> (not by me) assuming that noncharacters were illegal in open interchange. 
> Programs that use the library were guaranteed that they would not receive 
> noncharacters in their input.

I haven't read every post in the thread, so forgive me if I'm making incorrect 
inferences. 

I get the impression that you think that Unicode conformance requirements have 
historically provided that guarantee, and that Corrigendum #9 broke that. If 
so, then that is a mistaken understanding of Unicode conformance.

Here is what has historically been said in the way of conformance requirements 
related to non-characters:

TUS 1.0: There were no conformance requirements stated. This recommendation was 
given:
"U+ and U+FFFE are reserved and should not be transmitted or stored."

This same recommendation was repeated in later versions. However, it must be 
recognized that "should" statements are never absolute requirements.

Conformance requirements first appeared in TUS 2.0:

TUS 2.0, TUS 3.0: 
"C5 A process shall not interpret either U+FFFE or U+ as an abstract 
character."


TUS 4.0:
"C5 A process shall not interpret a noncharacter code point as an abstract 
character."

"C10When a process purports not to modify the interpretation of a valid 
coded character representation, it shall make no change to that coded character 
representation other than the possible replacement of character sequences by 
their canonical-equivalent sequences or the deletion of noncharacter code 
points."

Btw, note that C10 makes the assumption that a valid coded character sequence 
can include non-character code points.


TUS 5.0 (trivially different from TUS4.0):
C2 = TUS4.0, C5

"C7 When a process purports not to modify the interpretation of a valid 
coded character sequence, it shall make no change to that coded character 
sequence other than the possible replacement of character sequences by their 
canonical-equivalent sequences or the deletion of noncharacter code points."


TUS 6.0:
C2 = TUS5.0, C2

"C7 When a process purports not to modify the interpretation of a valid 
coded character
sequence, it shall make no change to that coded character sequence other than 
the possible
replacement of character sequences by their canonical-equivalent sequences."

Interestingly, the change to C7 does not permit non-characters to be replaced 
or removed at all while claiming not to have left the interpretation intact. 


So, there was a change in 6.0 that could impact conformance claims of existing 
implementations. But there has never been any guarantees made _by Unicode_ that 
non-character code points will never occur in open interchange. Interchange has 
always been discouraged, but never prohibited.




Peter

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: UTF-16 Encoding Scheme and U+FFFE

2014-06-04 Thread Peter Constable
th random access and easy 
resynchronization (in live streams), when the producer safely breaks data 
blocks at boundary of combining sequences (allowing these blocks to be 
normalized separately and reunited later witout creating problem.


2014-06-04 1:50 GMT+02:00 Richard Wordingham 
mailto:richard.wording...@ntlworld.com>>:
On Tue, 3 Jun 2014 21:28:05 +
Peter Constable mailto:peter...@microsoft.com>> wrote:

> There's never been anything preventing a file from containing and
> beginning with U+FFFE. It's just not a very useful thing to do, hence
> not very likely.
Well, while U+FFFE was apparently prohibited from public interchange,
one could be very confident of not finding it in an external file.  As
an internally generated file, it would then be much more likely to be
in the UTF-16BE or UTF-16LE encoding scheme.

Richard.
___
Unicode mailing list
Unicode@unicode.org<mailto:Unicode@unicode.org>
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: UTF-16 Encoding Scheme and U+FFFE

2014-06-03 Thread Peter Constable
There's never been anything preventing a file from containing and beginning 
with U+FFFE. It's just not a very useful thing to do, hence not very likely.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Richard 
Wordingham
Sent: June 3, 2014 11:53 AM
To: unicode@unicode.org
Subject: UTF-16 Encoding Scheme and U+FFFE

How do I read definition D98 in TUS Version 6.3.0 Chapter 3 to prohibit a file 
in the UTF-16 encoding scheme from starting with U+FFFE?  Or is
U+FFFE actually allowed to start such a file?

Is an implementation that deduces the encoding scheme of a plain text file from 
a leading BOM to be characterised as reckless?

Richard.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Updated emoji working draft

2014-04-15 Thread Peter Constable
William, the UTC is not in the business of creating file formats for 
localization data.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of William_J_G 
Overington
Sent: April 14, 2014 12:27 AM
To: Unicode Public; Mark Davis ☕️
Cc: wjgo_10...@btinternet.com
Subject: Re: Updated emoji working draft

>> Is this format suitable to become standardized for use in producing 
>> localized text-to-speech from emoji to the chosen local language?

> no, not particularly

Thank you for replying.

Well, I feel that it would be good if a format, whatever it may be, were 
decided at the May 2014 UTC (Unicode Technical Committee) meeting. This would 
enable application implementers to use a standardized format with a 
standardized file name; and enable advocates for localization to a particular 
language to produce a localization file for that particular language confident 
that the file produced would be widely compatible with various applications, 
such as browsers and email clients and ebook readers, from various 
manufacturers.

The particular file format that I mentioned is a simplified variant of an 
earlier format that I produced for my research. The original format contains a 
facility for organizing a cascading menu system for use in generating messages 
as well.

Yet, in general, what features are needed for such a format and can such a 
format become specified in good time for discussion before the May 2014 UTC 
meeting?

William Overington

14 April 2014


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Editing Sinhala and Similar Scripts

2014-03-19 Thread Peter Constable
If you click into the existing text in this email and backspace, what keystroke 
will you expect to be "erased"? Your system has no way of knowing what 
keystroke might have been involved in creating the text.

What is _can_ make sense to talk about is to say that a user expects execution 
of a particular key sequence, such as pressing a Backspace key, to have a 
particular editing effect on the content of text. "Erasing a keystroke" and 
"keystrokes resulting in edits" are different things. One makes sense, the 
other does not.

It may seem like I'm being pedantic, but I think the distinction is important. 
Our failure is in framing our thinking from years of experience (and perhaps 
some behaviours originally influenced by typewriter and teletype technologies) 
in which a keyboard has a bunch of keys that add characters, and variations on 
that that even include a lot of logic to get input keying sequences that can 
generate tens of thousands of different character; but then one or two keys 
(delete, backspace) that can only operate in very dumb ways. (We've also always 
assumed that any logic in keying behaviours can be conditioned only by the 
input sequences, but not by any existing content, but that steps beyond my 
earlier point.) These constraints in how we think limit possibilities


Peter


-Original Message-
From: Doug Ewell [mailto:d...@ewellic.org] 
Sent: March 19, 2014 9:39 AM
To: Peter Constable; unicode@unicode.org
Subject: RE: Editing Sinhala and Similar Scripts

Peter Constable  wrote:

>> There are two types of people:
>>
>> 1. those who fully expect Backspace to erase a single keystroke
>
> It is nonsensical to talk about erasing a _keystroke_.

But that's what they expect.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Editing Sinhala and Similar Scripts

2014-03-19 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell

>>> When you backspace it destroys multiple keystrokes.


> There are two types of people:
>
> 1. those who fully expect Backspace to erase a single keystroke

It is nonsensical to talk about erasing a _keystroke_. That would be comparable 
to erasing a mouse click, or erasing a tap on a touch-sensitive device. These 
are user actions that may result in any number of machine states. Unless you 
can manage to build a time machine, at the time when the erasing is happening, 
there is no longer any record of what process might have been operating that 
responded to the user action or of what machine state was the result. 

All that is available to act on at that point are characters.



Peter

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: [private] Re: Unicode : Greek Extended.

2014-03-17 Thread Peter Constable
Font tables to position diacritics are not "much harder to create" than 
anything else involved in font development, and certainly don't require being a 
programmer. Hinting is harder than positioning tables and does literally 
involve programming, though I don't hear font developers griping about that. 
Professional font developers are not quite the luddites the comment suggests.


Petr

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Jean-François 
Colson
Sent: March 16, 2014 9:33 PM
To: unicode@unicode.org
Subject: Re: [private] Re: Unicode : Greek Extended.

>
> > Some fonts don't display this correctly; they show the macron 
> > partially or completely to the right of the base letter, instead of 
> > directly
> below
> > it. The solution is to use another font, and to ask font vendors to 
> > fix this combination so it looks decent.

"(2) Fonts are much harder to create. Instead of just needing a graphic 
designer to draw characters, you now need to a programmer as well, who 
understands OpenType tables. [.] Again, HackAscii wins."

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Websites in Hindi

2014-03-03 Thread Peter Constable
Looking at the thread that William pointed at, the person asking for help gave 
no indication as to what problems he might have been encountering. Without 
specifics, the two obvious recommendations would be (i) encode the content 
using conformant UTF-8, and (ii) use conforming OpenType fonts leveraging CSS 
web font mechanisms.

Beyond that, that thread seemed not especially interesting.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Neil Harris
Sent: March 3, 2014 12:22 PM
To: James Lin; Christopher Fynn; William_J_G Overington
Cc: unicode@unicode.org
Subject: Re: Websites in Hindi

On 03/03/14 18:14, James Lin wrote:
> another problem you may need to consider is the support of the 
> glyph/fonts on your system.  Not all fonts are supported/install by 
> default when installing the OS.
>
> Warm Regards,
> -James
>
>

This is where webfonts should be extremely useful -- I believe recent versions 
of at least Firefox, and probably other modern browsers, should support both 
webfonts and text shaping for Indic scripts by default, whether or not the 
underlying platform has the correct fonts.

Neil

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Behdad Esfahbod won an O'Reilly Open Source Award!

2013-07-29 Thread Peter Constable
Congratulations on a great contribution.

Peter

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Roozbeh Pournader
Sent: July 29, 2013 12:42 PM
To: unicode@unicode.org
Cc: Behdad Esfahbod
Subject: Behdad Esfahbod won an O'Reilly Open Source Award!

Some of you probably have heard the news already, but in case you haven't, 
Behdad won the prestigious O'Reilly Open Source Award, announced last Friday.

Here's the announcement:
http://www.oscon.com/oscon2013/public/schedule/detail/29956

Selected quotes:

"The O’Reilly Open Source Awards recognize individual contributors who have 
demonstrated exceptional leadership, creativity, and collaboration in the 
development of Open Source Software. [...]

Behdad Esfahbod (HarfBuzz): Through the HarfBuzz project Behdad is working 
relentlessly to get all languages supported in Free Software operating systems, 
word processors, devices and browsers, no matter how complex their scripts are."

I wish to congratulate Behdad for his achievements, which has really helped 
make open source way more accessible to billions of users around the world. I'm 
eagerly waiting for his amazing magic and superhacker skills to bear even more 
fruits over the years to come. I'm proud to have been able to call him a 
friend, colleague, and collaborator for more than fifteen years now.

Roozbeh


RE: Ways to show Unicode contents on Windows?

2013-07-19 Thread Peter Constable
I'm sorry that Microsoft's approach to product servicing does not meet your 
expectations. It is what it is, however. The costs involved in servicing the XP 
code base (which was forked from all subsequent Windows versions in 2001, so 
effectively does date back to then) are greater than I think you realize. Also, 
while you would evidently appreciate seeing an optional update for Uniscribe 
show up in Windows Update, the vast majority of users would only be confused by 
that. While I wish we could provide what you'd like, you represent a tiny 
fraction of all customers. 


Peter

-Original Message-
From: Eli Zaretskii [mailto:e...@gnu.org] 
Sent: Friday, July 19, 2013 11:29 AM
To: Peter Constable
Cc: nospam-ab...@ilyaz.org; unicode@unicode.org
Subject: Re: Ways to show Unicode contents on Windows?

> From: Peter Constable 
> CC: "nospam-ab...@ilyaz.org" , "unicode@unicode.org"
>   
> Date: Fri, 19 Jul 2013 15:15:51 +
> 
> The largest share of customers, by far, wouldn't want us to add new features 
> to XP since that would entail risks of new bugs, application compatibility 
> regressions, and frequent need to retrain users. If Ford or BMW were 
> continuously retrofitting design changes to vehicles in use, the mechanics, 
> parts dealers etc. would have headaches keeping up.

I'm sorry, but your analogy is broken.  OS updates are installed by myself, so 
no dealers, let alone mechanics, are involved, and no spare parts are anywhere 
in sight.  This is software; any analogy with hardware is almost always 
fundamentally wrong in any number of levels.

And I did mention that these upgrades could have been offered as "optional", so 
only those who really need them would install them (since "optional" updates 
are not automatically installed when the user chooses "express" installation, 
something 99.99% of users and the automatic update installation do).

> You shouldn't expect Unicode 6.2 support in an Android phone from 2008; even 
> less so, from a Windows XP system from 2001.

XP SP3 is not a 2001 system.  It was released in May 2008.  It upgraded many 
parts of Windows, including Internet Explorer.  I don't see why it couldn't 
upgrade Uniscribe.









RE: Ways to show Unicode contents on Windows?

2013-07-19 Thread Peter Constable
Every Unicode code point will have some default behaviour in any text process 
on Windows. If those default behaviours happen to fit the character in 
question, then you should get the behaviour you want. But we don't service 
Windows for each UCD update. Also, not every text process relies solely on UCD 
data.


Peter

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Richard Wordingham
Sent: Friday, July 19, 2013 1:21 PM
To: unicode@unicode.org
Subject: Re: Ways to show Unicode contents on Windows?

Peter Constable  wrote:
> Behalf Of Ilya Zakharevich wrote:

> > Why would one NEED to upgrade the OS to use Old Italic?

> You can't expect an OS like Windows XP to support Old Italic 
> characters that weren't even defined in Unicode at the time it 
> shipped.

That actually came as a great surprise to me.  I once naively thought that all 
that had to be done was to update the version of the Unicode Character Database 
(UCD) that the system was using, and then only new
*properties* should be causing major trouble.  Now scripts needing reordering 
have their own problems, but that sort of problem is what SIL developed 
Graphite for.  (I fear the case for Microsoft Office to support Graphite is 
steadily reducing.)

The problem with changes to the UCD arises partly because enough developers 
prefer speed and compactness to flexibility.

> That said, it turns out that a given version of Windows does support 
> later-encoded characters such as Old Italic that have no special 
> requirements fairly well -- provided you have a font and format your 
> content with that font.

Are you sure this tolerance isn't by design?

> It is the case of simple rendering.  Given a font, and a keyboard 
> layout (both doable in user-land), it should “just work”.  Or I am 
> missing something?

The biggest thing you're missing is too much cleverness, and the second is 
centralisation.

Word switches keyboard at the very least as you step through text, which in 
simple cases is quite helpful.  Also, Office has (at least) three current fonts 
- one for simple scripts, one for complex scripts, and one for CJK scripts.  
This in itself can cause problems with new scripts - I have a fair bit of Tai 
Tham text in Open Document format that has the wrong size because LibreOffice 
hesitantly changed the script's classification from simple to complex.

The centralisation issue is that Indic rearrangement and selection of Arabic 
and Syria contextual forms seemed obvious things to abstract away from fonts 
and handle centrally. Consequently, text is split by
script and each script run handled separately.

Combining the two, we can certainly have Word XP asking whether a font supports 
a script, and refusing to use it for the script if it doesn't declare it does. 
I had to fiddle the OS/2 table of a Tai Tham hack font (Lannaworld) to be able 
to use it.  The font maps Latin and Thai characters to Tai Tham glyphs, but 
when I downloaded the font it didn't declare support for the 'Basic Latin' 
character range or the 'Latin-1' encoding.  To get the font to work, I not only 
had to dodge the constraints on Thai character sequences, I also had to change 
the
OS/2 table to declare that the font supported the Latin range and encoding.
 
I still don't think we've got to the bottom of Doug's PUA problem.  For all I 
know, he may have been violating the agreement he made with Microsoft for the 
use of the PUA.  I'm not aware of Microsoft publishing a consolidated statement 
of this agreement, but I've a feeling some characters are reserved for symbol 
fonts and yet others are reserved for Thai glyphs.  Its also conceivable that 
he trespassed on the PUA assignments decreed by China for Tibetan.

Richard.









RE: Ways to show Unicode contents on Windows?

2013-07-19 Thread Peter Constable
Not everything that is technically possible makes good sense. My comments 
clearly were not framed solely in terms of what is technically possible. 


Peter

-Original Message-
From: Eli Zaretskii [mailto:e...@gnu.org] 
Sent: Friday, July 19, 2013 1:36 PM
To: Peter Constable
Cc: nospam-ab...@ilyaz.org; unicode@unicode.org
Subject: Re: Ways to show Unicode contents on Windows?

> From: Peter Constable 
> CC: "nospam-ab...@ilyaz.org" ,
> "unicode@unicode.org"
>   
> Date: Fri, 19 Jul 2013 19:49:10 +
> 
> I'm sorry that Microsoft's approach to product servicing does not meet your 
> expectations. It is what it is, however.

That's not the issue here.  The issue here is that such updates _could_ be 
provided without requiring users to install a newer version of the OS.  IOW, 
the assertion that one cannot expect an OS shipped in
2001 to support scripts that didn't exist at that time is simply false.  
There's no technical problem here, only a managerial decision.

> Also, while you would evidently appreciate seeing an optional update for 
> Uniscribe show up in Windows Update, the vast majority of users would only be 
> confused by that.

How can a newer and better text shaping engine possibly confuse users?









  1   2   3   4   5   6   >