Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Christopher Fynn
I think the idea of encoding "regional identifiers" instead of actual
flags was to avoid a political minefield, and that flags change over
time. (Afghanistan has haad something like 20 different flags) I also
imagine PR China wouldn't be too happy if someone wanted to encode a
Tibetan flag. You're also right about things like sporting events
where England, Scotland, Wales and Northern Ireland have seperate
teams, etc.  Still, the use of flags as identifiers is common on the
web,  at international confrences to identify delegates and so on.
With the arrival of colored fonts (supported in Windows 8.1) I suspect
people will inevitably try and make use of these characters - in spite
of all the current limitations you have pointed out.

I guess someone could come up with a private registry, similar to the
ConScript registry, where ways of encoding all kinds of symbols (e.g.
FC logos)  using these identifiers could be listed.


On 06/08/2013, Philippe Verdy  wrote:
> OK I see the point of the PRI. But using joiners in the middle of the same
> flag is worse than just using start/end (which also have a clean way to e
> mapped to glyphs without using complex rendering like ligatures :
> start+RIS...+RIS+end can fully be converted to individual glyphs producing
> a flag showing the region code in the middle (good for simple editors) and
> then ligaturing can be aplied if needed on sequences to generate actual
> flags (possibly colorful as emoji icons)
>
> Your PRI does not dolve the problem of versioning, notaly in ISO 3166 which
> is not stable, e.g. for [CS], but as well for chaging flags of a country.
> You'll need dates or other specifiers in extensions of the code. The
> start/end solution also ensures stability of the default rendering without
> having to create and maintain any registry for the actual flags (this cold
> be made on another project, e.g. by maintainers and participants of the
> Flags of the World on their existing collaborative site, just the same way
> that Unicode does not have to maintain a dictionary of all words of a
> language. The start+RIS+end solution would act like a "word" in its own
> language, using its own ortography, and would be freed from ISO 3166-1
> dependency.
>
> Font creators would immediately be able to provide a font with a reasonable
> default rendering which will be suitable for the default, monochromatic,
> rendering of these "words". It would then be up to other applications to
> decide which word they recognize to replace them by colorful flag icons or
> emojis. The problem is solved once for Unicode and ISO/IEC 10646. The
> Unicode standard just has too say that these "words" can be freely replaced
> by icons showing a flag of the same encoded entity. It does not have to
> specify which ones, just like Unicode does not mandate any typographical
> ligatures (however TUS may specify the internal syntax of these encoded
> flags, to ensure that it would be compatible with ISO 3166 or with some
> other flags libraries like the IOC flags and codes.
>
> For Unicode however, the codes will be treated as all different : if [FR]
> is used for representing France, [-IOC-FRA] for reprenting the French
> Olympic team, both could display exactly the same flag (and [MQ] could as
> well display the same flag or the cultural regional flag, becayuse here
> there's no other qualifier to say which one to use, and both are valid ;
> but if only the official national flag used in UN must be used then
> [-UN-MQ] will only display the tricolor flag, and if needed a versioning
> sufix could be used) The syntax could be similar to the syntax developed
> for language tags (or locale tags).
>
> 2013/8/5 Christopher Fynn 
>
>> On 05/08/2013, Philippe Verdy  wrote:
>>
>> > The way I perceive the regional indicators (in Uncode 6.0), they are
>> > absolutely not used and will be never used at all as long as there are
>> > no
>> > complements such as the minimum brackets I suggest to fix them. The 26
>> > letter-like characters are basically broken in their identity, you
>> > can't
>> > safely align multiple flags or delimit them with break iterators, like
>> you
>> > can break words, paragraphs, syllables (in some languages this is
>> difficult
>> > as it is contextual too, but not impossible, and in many languages you
>> can
>> > find syllabel breaks without having to parse backward on indefinite
>> length)
>> > or lines.
>>
>> See:
>>
>> http://www.unicode.org/review/pri215/pri215-background.html
>>
>> http://www.unicode.org/L2/L2012/12284r3-reg-indicator-seg.pdf
>>
>



Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Philippe Verdy
OK I see the point of the PRI. But using joiners in the middle of the same
flag is worse than just using start/end (which also have a clean way to e
mapped to glyphs without using complex rendering like ligatures :
start+RIS...+RIS+end can fully be converted to individual glyphs producing
a flag showing the region code in the middle (good for simple editors) and
then ligaturing can be aplied if needed on sequences to generate actual
flags (possibly colorful as emoji icons)

Your PRI does not dolve the problem of versioning, notaly in ISO 3166 which
is not stable, e.g. for [CS], but as well for chaging flags of a country.
You'll need dates or other specifiers in extensions of the code. The
start/end solution also ensures stability of the default rendering without
having to create and maintain any registry for the actual flags (this cold
be made on another project, e.g. by maintainers and participants of the
Flags of the World on their existing collaborative site, just the same way
that Unicode does not have to maintain a dictionary of all words of a
language. The start+RIS+end solution would act like a "word" in its own
language, using its own ortography, and would be freed from ISO 3166-1
dependency.

Font creators would immediately be able to provide a font with a reasonable
default rendering which will be suitable for the default, monochromatic,
rendering of these "words". It would then be up to other applications to
decide which word they recognize to replace them by colorful flag icons or
emojis. The problem is solved once for Unicode and ISO/IEC 10646. The
Unicode standard just has too say that these "words" can be freely replaced
by icons showing a flag of the same encoded entity. It does not have to
specify which ones, just like Unicode does not mandate any typographical
ligatures (however TUS may specify the internal syntax of these encoded
flags, to ensure that it would be compatible with ISO 3166 or with some
other flags libraries like the IOC flags and codes.

For Unicode however, the codes will be treated as all different : if [FR]
is used for representing France, [-IOC-FRA] for reprenting the French
Olympic team, both could display exactly the same flag (and [MQ] could as
well display the same flag or the cultural regional flag, becayuse here
there's no other qualifier to say which one to use, and both are valid ;
but if only the official national flag used in UN must be used then
[-UN-MQ] will only display the tricolor flag, and if needed a versioning
sufix could be used) The syntax could be similar to the syntax developed
for language tags (or locale tags).

2013/8/5 Christopher Fynn 

> On 05/08/2013, Philippe Verdy  wrote:
>
> > The way I perceive the regional indicators (in Uncode 6.0), they are
> > absolutely not used and will be never used at all as long as there are no
> > complements such as the minimum brackets I suggest to fix them. The 26
> > letter-like characters are basically broken in their identity, you can't
> > safely align multiple flags or delimit them with break iterators, like
> you
> > can break words, paragraphs, syllables (in some languages this is
> difficult
> > as it is contextual too, but not impossible, and in many languages you
> can
> > find syllabel breaks without having to parse backward on indefinite
> length)
> > or lines.
>
> See:
>
> http://www.unicode.org/review/pri215/pri215-background.html
>
> http://www.unicode.org/L2/L2012/12284r3-reg-indicator-seg.pdf
>


Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Mark Davis ☕
> Classical Greek might qualify [for a CLDR entry]

It certainly qualifies, but we require that a submitter commit to
collecting a minimal amount of data before we add it. See
http://cldr.unicode.org/index/cldr-spec/minimaldata


Mark 
*
*
*— Il meglio è l’inimico del bene —*
**


On Mon, Aug 5, 2013 at 3:58 PM, Stephan Stiller
wrote:

>  On 8/5/2013 11:26 AM, Whistler, Ken wrote:
>
> Inclusion of the precomposed characters now seen in the U+1FXX block was part 
> of the price of the merger. What was included was precisely the repertoire 
> requested by Greece, and no attempt was made to further rationalize forms 
> including macrons for Ancient Greek.
>
>  Thanks, Ken. It's good to know that there is no other reason. Partial
> credit goes to Tom Gewecke who had pointed me off-list to
> http://www.tlg.uci.edu/~opoudjis/unicode/ken_adscripts.html
> and the fact that the precomposed set from ISO 10646 can be traced back to
> ELOT (ΕΛΟΤ).
>
>  On 8/5/2013 1:25 PM, Richard Wordingham wrote:
>
> Classical Greek might qualify [for a CLDR entry]
>
>  Yes or no, and I have in fact no(t yet an) opinion on the necessity
> thereof – it's a different question from the one to what extent D matters
> for A *if* A had an entry, but I think we're on the same page at this
> point:
>
>
> On 8/5/2013 1:25 PM, Richard Wordingham wrote:
>
> However, if vowels with macrons had made it into D, then one would expect 
> them in A.
>
>  Yep, I agree. A loose analogy and one sensible view (which is in fact
> compatible with yours) is that it's imaginable for say a lexicographer for
> English to have some version of Cyrillic letters available for typesetting
> but defensible for him to not have/use stress marks, whereas any Cyrillic
> typesetting engine within a Cyrillic locale should be able to provide them.
> This made-up example is imperfect, but it might help someone understand the
> thread. That said, I have not yet formed an opinion on whether a font
> intended for a Modern Greek locale should be able to render ᾱ, ῑ, ῡ with
> additional diacritics. (One intended for Ancient Greek should, I think.)
>
> Stephan
>
>


Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Stephan Stiller

On 8/5/2013 11:26 AM, Whistler, Ken wrote:

Inclusion of the precomposed characters now seen in the U+1FXX block was part 
of the price of the merger. What was included was precisely the repertoire 
requested by Greece, and no attempt was made to further rationalize forms 
including macrons for Ancient Greek.
Thanks, Ken. It's good to know that there is no other reason. Partial 
credit goes to Tom Gewecke who had pointed me off-list to

http://www.tlg.uci.edu/~opoudjis/unicode/ken_adscripts.html
and the fact that the precomposed set from ISO 10646 can be traced back 
to ELOT (ΕΛΟΤ).


On 8/5/2013 1:25 PM, Richard Wordingham wrote:

Classical Greek might qualify [for a CLDR entry]
Yes or no, and I have in fact no(t yet an) opinion on the necessity 
thereof – it's a different question from the one to what extent D 
matters for A /if/ A had an entry, but I think we're on the same page at 
this point:


On 8/5/2013 1:25 PM, Richard Wordingham wrote:

However, if vowels with macrons had made it into D, then one would expect them 
in A.
Yep, I agree. A loose analogy and one sensible view (which is in fact 
compatible with yours) is that it's imaginable for say a lexicographer 
for English to have some version of Cyrillic letters available for 
typesetting but defensible for him to not have/use stress marks, whereas 
any Cyrillic typesetting engine within a Cyrillic locale should be able 
to provide them. This made-up example is imperfect, but it might help 
someone understand the thread. That said, I have not yet formed an 
opinion on whether a font intended for a Modern Greek locale should be 
able to render ᾱ, ῑ, ῡ with additional diacritics. (One intended for 
Ancient Greek should, I think.)


Stephan



RE: Just an observation

2013-08-05 Thread Whistler, Ken
Steffen Daode Nurpmeso observed:

> Hello, in UAX #44 i read
> 
>   Simple_Titlecase_Mapping ...
> Note: If this field is null, then the Simple_Titlecase_Mapping
> is the same as the Simple_Uppercase_Mapping for this character.
> 
> So a parser has to be aware of this, automatically falling back to
> the uppercase mapping (index 12) when there is no explicit
> titlecase mapping (index 14).
> 
> Given this the following surprised me:
> 
>   ?0[steffen@sherwood unicode]$  {if (length($15) && $15 = $13) print}' |wc -l
>   1051
>   ?0[steffen@sherwood unicode]$  {if (length($15) && $15 != $13) print}' |wc -l
> 12
> 
> (I.e., 1051 times the redundant mapping is defined.)

Prior to Unicode 5.2, the relevant documentation (in UCD.html) used
to say:

The simple titlecase may be omitted in the data file if the titlecase is the 
same as the uppercase.

Someone correctly pointed out that that statement was ambiguous.
It was corrected to the current note, which is both correct and states
the intention of the simple titlecase mapping: that it be equivalent
to the simple uppercase mapping unless it isn't, in which case a different
explicit value will be in the field (the 12 cases you noted).

The redundant titlecase mapping values were not *removed* from
the data file, as there was a significant chance that that would disrupt
parsers which had long been using conventions which expected
explicit values in the field.

--Ken





RE: _Unicode_code_page_and_?.net

2013-08-05 Thread Whistler, Ken

> > On 7/30/2013 3:27 PM, Asmus Freytag wrote:
> > > architectures that depended on swapping character sets  (code
> > > pages) in mid stream
> >
> > I thought systems were usually married to a particular code page. I'm
> > wondering where (historically) you'd actually change to a different
> > code page mid-stream.
> 
> ISO 2022 allows it.  For historical reasons, the emacs input method
> definitions are full of such code switches.  ...
> 
> Richard.

To add to Richard Wordingham's examples, many legacy database
architectures supported (and still do support) code pages.
In most such contexts, rather than "swapping character sets...
in mid stream", such architectures involve configuring a particular
database to use a particular code page (one among many that
in principle the software could support), and then dynamically
configuring each connection made to that database to match a client's
character set against the database character set, and doing conversions
as required as text passes up or down the connection.

These kinds of systems are widely deployed, but the endgame we are
all working towards (and in large part have achieved) consists of
servers configured in Unicode and clients connections configured in
Unicode. Conversions still may be going on, but more often of
the UTF-8 <--> UTF-16 type which preserve all data, instead of
spitting out multiple instances of uninterpretable "?" characters
when client and data source don't match.

--Ken





Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Richard Wordingham
On Mon, 5 Aug 2013 03:09:37 +0200
Philippe Verdy  wrote:

> 2013/8/4 Richard Wordingham 

> > They are missing from the set of precomposed characters.  (It's been
> > argued that, in an ideal world, *all* precomposed characters would
> > be missing.)
 
> Of course you could argue that, but then the number of characters to
> encode would have been tremendous,...

The Latin script would have far fewer characters if precomposed
characters had been kept out.  If they had been kept out, they would
*all* be missing.

> and we would have not been able
> to benefit from the normalization stability.

We might have benefited from an abandonment of NFC.

> You ould also argue that
> normalization stabiity was not needed for this case, but then it
> would have been extremely difficult to define conformant processes
> (i.e. the assertion that applications are trating all canonical
> equivalents the same way, except for binary sorting those subsets of
> canonical equivalents).

The requirement is that conformant processes not think they are doing
the right thing by treating canonically equivalent strings
differently.  If there is latitude in a process, e.g. rendering, I
can't find a requirement to treat canonically equivalent strings
identically.  Can you?

> > > They are not needed in fact, but they just should be documented
> > > somewhere for implementers of renderers and fonts, to support
> > > these types of clusters.
> >
> > Assuming that fonts containing COMBINING DOUBLE BREVE are not
> > required or morally obliged to support it properly.
> >
> 
> I have not said this was required. That's why I suggested NOT a
> normative addition in TUS, but an evolutive, informative technical
> report instead.

> > May be it will be enough to include them somewhere in CLDR data
> > > (notably if they are still not listed explicitly in the Greek
> > > collation table),

> > The CLDR does not yet support Ancient Greek!  It's by no means
> > certain that COMBINING DOUBLE BREVE would make it to the list of
> > auxiliary exemplar characters.  Vowels with plain COMBINING BREVE
> > and COMBINING MACRON don't make to the list of auxiliary exemplar
> > characters for Modern Greek.

> I was not speaking about exemplar character subsets for any language,
> or even their auxiliary subset. Even if the last one is not
> standardized and evolutive, it is based on frequency of use and
> someagreements that these characters are desirable under common
> conventions, and that theiruse will be understood with minor efforts.

I believe I have seen a claim that CLDR data should only concern itself
with exemplar characters.

> > On the contrary, a simple remark in TUS Section 7.9 (precise
> > location is an exercise for editors who like to make it difficult
> > to cite) that diacritics over two base characters are not limited
> > to the Latin script should suffice.  It's covered by 'pronunciation
> > systems' in TUS 6.2 - they're not limited to the Latin script.  I
> > did notice some cases of ties apparently being used in annotated
> > Greek to indicate that a sequence of consonants counted as a single
> > character for metrical purposes.
 
> Where did I write that it should be limited to Latin ?

You didn't.  You suggested a vast number of guides on rendering, when
diacritics acting on two base characters are already covered pretty
well by TUS.  The only problem is that the discussion in the TUS might
be taken to imply that they would be restricted to Latin characters.

> You won't have neough flexiblity with the existing CLDR examplar
> subsets per language (and CLDR does not focus on non-linguistic uses
> such as technical and epigraphic notations, or phonetic/phonologic
> notations, or specific uses in multilingual texts, including texts
> that represent simultaneously several languages or optional readings).

Which is why I think CLDR is the wrong context for this.

Richard.



Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Richard Wordingham
On Sun, 04 Aug 2013 19:21:34 -0700
Stephan Stiller  wrote:

> > Most of the polytonic precomposed vowels are in the auxiliary
> > exemplars for Modern Greek.
> I don't know – probably because of the Katharevousa legacy and the
> fact that Ancient Greek lives on in literary idioms, for which you
> ordinarily don't use a macron for reasons of orthographic convention.
> (And as for the breve, you shouldn't be needing it anyways.) 

> It
> doesn't really matter what the precise reason is: the two are
> different languages, so "it's not in D, so it shouldn't be in A" is
> a /non sequitur/, esp if you know that D is a typographically smaller
> language in a number of respects. Or maybe someone made mistakes.

There's no logical implication, and none was intended.  However, if
vowels with macrons had made it into D, then one would expect them in A.

> > A CLDR entry could get rather silly when deciding on the Attic,
> > Ionic and Doric Greek for Yoruba and !Xu - Cambodia's going to be
> > bad enough.  Do we look for the Ancient Greek representation of
> > Kambuja?
> successfully lost me here :-)

Sorry, I'd overestimated the data requirements for a language to have
a CLDR entry.  I'd got the impression it had to have a translation for
the name of every language and territory in the CLDR.  It seems that
Classical Greek might qualify, though Classical Latin wouldn't.  A
Latin script language has to have documentation on its decimal
separators, and Classical Latin doesn't have decimal numbers for them
to apply to!
 
Richard.




RE: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Whistler, Ken
Poring back over this voluminous thread to Stephan Stiller's original question:

> If one wants to indicate vowel length for the length-ambiguous vowels α,
> ι, υ in Ancient Greek, one writes ᾱ, ῑ, ῡ. Is there a reason for why
> there are no diacritic-precomposed characters? I guess it's because
> macron usage is rare in orthographic practice, even though vowel length
> here is not clearly less important than the other phonetic aspects
> indicated by the various diacritics in use in polytonic orthography.
> Thus I am wondering when and how the relevant decisions were made.

The decision was made in 1992, in SC2/WG2, as part of the deal which
made the merger of the Unicode Standard and ISO/IEC 10646 possible.
Note that Unicode 1.0 did not contain any precomposed polytonic Greek.
That was rather included in the early drafts of ISO/IEC 10646 at the behest
of the Greek national body, which at the time was attempting to
standardize polytonic Greek. Inclusion of the precomposed characters
now seen in the U+1FXX block was part of the price of the merger.
What was included was precisely the repertoire requested by Greece,
and no attempt was made to further rationalize forms including
macrons for Ancient Greek.

--Ken





Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Christopher Fynn
On 05/08/2013, Philippe Verdy  wrote:

> The way I perceive the regional indicators (in Uncode 6.0), they are
> absolutely not used and will be never used at all as long as there are no
> complements such as the minimum brackets I suggest to fix them. The 26
> letter-like characters are basically broken in their identity, you can't
> safely align multiple flags or delimit them with break iterators, like you
> can break words, paragraphs, syllables (in some languages this is difficult
> as it is contextual too, but not impossible, and in many languages you can
> find syllabel breaks without having to parse backward on indefinite length)
> or lines.

See:

http://www.unicode.org/review/pri215/pri215-background.html

http://www.unicode.org/L2/L2012/12284r3-reg-indicator-seg.pdf



Re: FW: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Philippe Verdy
Anyway the only sequences that are mapped to regional indicators are for
private extensions of SJIS in Japan, respetively from  KDDI and SoftBank

1F1E8 1F1F3;;F3D2;FBB3 # [CN] People's Republic of China
1F1E9 1F1EA;;F3CF;FBAE # [DE] Germany
1F1EA 1F1F8;;F348;FBB1 # [ES] Spain
1F1EB 1F1F7;;F3CE;FBAD # [FR] France
1F1EC 1F1E7;;F3D1;FBB0 # [GB] United Kingdom
1F1EE 1F1F9;;F3D0;FBAF # [IT] Italian
1F1EF 1F1F5;;F6A5;FBAB # [JP] Japan
1F1F0 1F1F7;;F3D3;FBB4 # [KR] Korean
1F1F7 1F1FA;;F349;FBB2 # [RU] Federation of Russia
1F1FA 1F1F8;;F790;FBAC # [US] United States of America

This limited set of "flags" highly suggests that in fact these flags
are used not really to convey a regional information, but only a set
of well known languages used in these countries and spoken/written
internationally i.e. these are frequently visual indicators of the
language, for use in web menus for language selection (even if it is a
common but bad practice):

- Why the British flag or the US flag should be used to designate the
English language?


- And why not the Indian or South African flags ? and why nothing os
appropraite here for the Portuguese language, and which country flag
to use between Portugal, Brasil, Cape Verde, Guinea Bissau, or even
for the minority Portuguese speaking comumunity in Macau ???).

These mappings for roundtrip compatibility with private extensions of
SJIS, which are not even compatible with each other, and not supported
in the DoCoMo extension of SJIS, means that these characters are
already deprecated from the start. And not all letters are used. It is
also very unlikely that they will be used in sequences longer than 1
"country flag", due to the many missing countries (even of the UCS
encoding could allow mapping other country flags, but with stability
problems whose origin in in ISO 3166-1 (but the above country codes
are stable since long in all ISO 3166-1 versions... except UK for
which the *informative* UCS roundtrip mapping chose to use the GB code
and not the legacy UK code).

For many other countries or regions we would need extensions including
longer codes, or version suffixes. The UCS encoding the way it is made
also allows mapping pairs of codes that are npt associated with any
country and that will never be mapped such as [QQ] or [ZZ]. This type
of roundtrip mapping will then nver be used by these private SJIS
extensions.

---

Note finaly that if the country codes above are stable, this is not
true of their flags, and some countries even have several flags for
different use (civil, military with navy variants, enseign,
honorific), plus decorations. Their colors and proportion may also
change over time or depending on presentation (e.g. the tricolor flag
of France has been modified so that its first vertical band (blue)
near the hoist is narrower than the the third vertical band (red), so
that the flag seems to have bands of equal width when th eflag is
waving; and the exact color matching was also changed.

I don't remember the details exactly but this is true as well in
Russia, as well as in Japan since WW2; Spain uses several variants of
its flag. If these flags were hanging vertically, they could also
exist with very different proportions or could keep the orientation of
the bands, and the form of the flag may also no longer be rectangular,
with triangular shapes on the bottom floatting side.

For this reason I still approve the fact that Unicode does not
standardize the colors and shapes. But I still think that it should
haev modeled a scheme allowing more precision if needed, as well as
allowing the representation of all countries, including former ones
(and avoiding ambiguities like [CS] between the former Czechoslovakia
and the former Federation of Serbia and Montenegro : for these
distinctions between codes of former country, ISO 3166 defines
4-letter codes (and if needed, it will use digits); but it will be
impossible to track the country to which the former 2-letter code was
mapped if it's not specified with a stable extension in the code such
as [CS:1945] instead of just [CS].

If these extensions were used, we could represent the IOC white flag
or flags of national OC member teams (using their 3-IOC letter code
after a prefix, such as [-IOC:ENG] for the Rugby team of England when
it does not compete in Olymic games).

I am not convinced that defining these extensions would break the
existing private implemetnations in SJIS.



2013/8/5 Markus Scherer 

> Dear Pradeep,
>
> The information you got from WhatsApp is wrong. The Unicode Consortium
> does not "design and create Emoji", and support in WhatsApp for the Indian
> flag is entirely up to WhatsApp. Please read the last two questions at
> http://www.unicode.org/faq/emoji_dingbats.html#12 and work with WhatsApp
> on support for the Indian flag.
>
> Best regards,
> markus
>


Re: FW: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Markus Scherer
Dear Pradeep,

The information you got from WhatsApp is wrong. The Unicode Consortium does
not "design and create Emoji", and support in WhatsApp for the Indian flag
is entirely up to WhatsApp. Please read the last two questions at
http://www.unicode.org/faq/emoji_dingbats.html#12 and work with WhatsApp on
support for the Indian flag.

Best regards,
markus


Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Philippe Verdy
I don't know. I've not followed much the ongoing work on Japanese emoji.
But I know that it contains a few flags in Japanese systems (I'm not sure
they were or will be encoded in the UCS or if regional indicator characters
will be substitutes for the proposed Emojis if they are not encoded).

The way I perceive the regional indicators (in Uncode 6.0), they are
absolutely not used and will be never used at all as long as there are no
complements such as the minimum brackets I suggest to fix them. The 26
letter-like characters are basically broken in their identity, you can't
safely align multiple flags or delimit them with break iterators, like you
can break words, paragraphs, syllables (in some languages this is difficult
as it is contextual too, but not impossible, and in many languages you can
find syllabel breaks without having to parse backward on indefinite length)
or lines.

It is evitent that a single flag is normally an unbreakable cluster and
nothing in the current encoding allows defining cluster boundaries when you
put multiple flags side by side (of course you could separate flags with
additionally encoded spaces or punctuations, but I don't see why we should
have to do it)
When not using graphic flags, do we really (1) write and read
"FRGFGPMQREYTPFWF" (sic!), or (2) "[FR][GF][GP][MQ][RE][YT][PF][WF]" or at
least "FR;GF;GP;MQ;RE;YT;PF;WF" ? Of course the text makes only sense and
will wrap on lines cleanly if we use the simple second solution which uses
explicit separators between sequences of letters that are all the same
type. The first solution makes no sense as we have to "guess" that these
letters will compose by pairs, and we will have to count them from the
begining. If we have about 30 country codes aligned or more (example the
flags of countries in the European Union, or NATO, or in the Council of
Europe, or participants to an international sport event) it will not work.

Note also that for sport events we need more than just country flags (how
would you differentiate England vs. Scotland for example in international
Rugby competitions, or an international team using the Olympic white flag)?
You need more codes than just ISO 3166-1, and sequences will be longer than
2 letters.

My opinion is that it's not the job of Unicode to define which codes will
be used (ISO 3166-1 or anything else), just like it's not its job to define
orthographies ; even the ISO 3166 standard may be amended later to include
more codes than just two letters (or to accept other letters than just
basic Latin). We DO need delimiters encoded in the UCS for use within
regional indicators only to create full clusters for enabling their correct
substition by icons, and without inserting any other separate clusters,
exactly the same way we can align graphic icons.

As long as this will not be possible, documents will still use some
upper-layer rich-text format with its parsed syntax and embedding rules, to
reference external images by location (URL) or by name/identifier (URN or
code), both of which requiring a decoder and lexical analyser (for
separating embedded elements) and a syntaxic parser to differentiate them
by type, and and external resolver to retrieve an associated graphic to
insert in the same stream a the one used on ouput by the plain-text
renderer. The reional indicators were supposed to eliminate these extra
syntax and components but it does not work.

In fact it would have been gully enough to encode *only* the two REGIONAL
INDICATOR START/END brackets (used between existing ASCII letters, digits
and punctuation, except whitespaces and paired punctuations) to allow
renderers to perform special substitition of each fully bracketed by a
graphic icon or glyph. Immediately, we would have coded Scotland with
.

For regions not encoded within ISO 3166-1 it was enough to start the
embedded code with "-" followed by some prefix, just like with CSS private
extensions, or by inserting an URL directly (starting by "htpp:" or
"https:" or other URL schemes for local attachments in envelope formats
like MIME) or an URN (starting by "urn:" or "uuid:").When using an URN or
URL, it does not necessarily designates the location of the glyph or icon
data or its format, the remote location accessed by the resolver will
report the appropriate format according to the format supported by the
client (in HTTP we have "Accept:" headers and MIME resource types for that
purpose, independanthly of the URL used).




2013/8/5 Christopher Fynn 

> Since the original JapaneseEmoji contained some country flags  - are
> these now being represented by Unicode REGIONAL INDICATOR characters?
> Is there any working   mplementation where pairs of these characters
> are displayed as flags or some other country indicator?
>


Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Philippe Verdy
The way they are encoded, they assume that they will contextually represent
the start or the end of the code. This completely breaks the character
model and these characters will probably never be used as such.

It would have been easier to support them if there was TWO additional
bracketing characters :
- 1F1E4 REGIONAL INDICATOR START # similar to '[' punctuation
- 1F1E5 REGIONAL INDICATOR END # similar to ']' punctuation

Then all these characters would have a reasoable default glyph assignable
to them :
- The 26 letters would be drawn enclosed in a box whose only the top and
botem borders are visible (in the chart, the left and right borders could
be dotted, with the left/start side with a small desdending hoist to
suggest the meaning.
- The START/left character would represent in the horizontal center the
vertical hoist (and in the chart the right side would add the rest of a
narrow flag with dotted lines).
- The END/right character would represent in the center the floating side
of the flag (and in the chartwould add the rest of a narrow flag with
fotted lines.
- The two-character sequence START+END would be valid and could display an
empty/white flag, and could also mean the explicit absence of region
indicator i.e. any place on Earth).

To be complete, there should also exist an minus-hyphen separator, and the
10 decimal digits, to allow all ISO 3166 codes (not just the 2-letter
ISO-3166-1 codes, but as well the longer codes assigned to historic
countries), the representative glyph being similar to those for the 26
Basic latin letters. Here also the START/END are needed to allow correct
delimitation of codes.

This would mean a total of 13 additional codes (they would all fit in the
two columns 1F1Dx-1F1Ex, the first column being used by the 10 digits and
separators, the second one being containing the two START/END delimiters
before the existing 26 letter; this would still leave 9 unassigned
positions in these two columns (possibly for additional separators needed
for some distinctions, e.g. COLON for versioning with a date). Some visual
variants of the flag. Could be used on the START delimiter (to show/hide
the hoist, or to represent the flah hanging vertically) and on the END
delimiter (variants of the free floatting side, instead of the flat
rectangular look, e.g. curved top and bottom borders, and slightly falling
down, or flag attached to hoists on both sides)

Standard ligature system could be used to create fully connected flags and
sequence would no longer be recognized contextually (the START/END pair is
expected to be present). Implementations would then be free to remap to map
actual emoji icons (possibly colorful) instead of the default ligatures for
 the actual flags without being limited to contextual ISO 3166-1 pairs.



For now the representative glyphs for isolated letters (using an enclosing
dotted square) are quite bad, they do not suggest any flag. In my opinion
they should also display the top and bottom horizontal borders with
continuous strokes, instead of dotted strokes that are used (correctly) on
the left and right vertical sides).

In other words, the existing characters have been defined without
considering even the actual usage, and they will probably never be used in
any font, and finally not in any renderer. I do think that the way they
were encoded was explicitly to prevent their use (just like with language
indicators in the special plane, defined only to be deprecated immediately
because they are unusable...)

Most applications will prefer displaying only the 4 ASCII characters "[FR]"
instead of 2 letter-like regional indicator characters "FR" in this
subblock, even if they can't be automtically be converted to emoji's (I
wonder why this could not happen, given that Emoji applications are mostly
those for interpersonal communication via SMS/MMS or chat, which frequently
convert automatically sequences of ASCII characters like :-) into graphic
icons)

2013/8/5 Christopher Fynn 

> 🇮🇳
>
> http://en.wikipedia.org/wiki/Regional_Indicator_Symbol
>
>


Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Christopher Fynn
🇮🇳

http://en.wikipedia.org/wiki/Regional_Indicator_Symbol


On 05/08/2013, Michael Everson  wrote:
> Pradeep,
>
> The Unicode Consortium and ISO/IEC JTC1/SC2 defined a set of "Regional
> Indicator symbols" (basically a special coded form of the letters A-Z) and
> when two of those come together like CN or GB, the Chinese or UK flag can be
> displayed, if the font vendor supports a single flag glyph for those
> sequences.
>
> In the case of the WhatsApp implementation, the font vendor is Apple.
>
> On 5 Aug 2013, at 05:44, Pradeep Aluru  wrote:
>
>> Hi,
>>
>> Im not sure if this is the right contact for the request below, if not Im
>> sure it will be directed to the right group.
>>
>> From the below email it is understood that you are the creators of
>> emoticons in the WhatsApp application.
>>
>> So, I'm sure you are the right people to help us with this.
>>
>> Unfortunately there is no flag of India available in the emoticons in
>> WhatsApp. Since, Indian Independence day is close by, me on behalf of
>> millions of users from India would like to request you all to please
>> design and add a Flag of India which would come to a huge help and GREATLY
>> be appretiated by millions of users.
>>
>> Looking forward to see it happen soon.
>>
>> Thanks in advance,
>> Pradeep.
>
> Michael Everson * http://www.evertype.com/
>
>
>
>




Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Michael Everson
Pradeep,

The Unicode Consortium and ISO/IEC JTC1/SC2 defined a set of "Regional 
Indicator symbols" (basically a special coded form of the letters A-Z) and when 
two of those come together like CN or GB, the Chinese or UK flag can be 
displayed, if the font vendor supports a single flag glyph for those sequences. 

In the case of the WhatsApp implementation, the font vendor is Apple.

On 5 Aug 2013, at 05:44, Pradeep Aluru  wrote:

> Hi,
> 
> Im not sure if this is the right contact for the request below, if not Im 
> sure it will be directed to the right group.
> 
> From the below email it is understood that you are the creators of emoticons 
> in the WhatsApp application.
> 
> So, I'm sure you are the right people to help us with this.
> 
> Unfortunately there is no flag of India available in the emoticons in 
> WhatsApp. Since, Indian Independence day is close by, me on behalf of 
> millions of users from India would like to request you all to please design 
> and add a Flag of India which would come to a huge help and GREATLY be 
> appretiated by millions of users.
> 
> Looking forward to see it happen soon.
> 
> Thanks in advance,
> Pradeep.

Michael Everson * http://www.evertype.com/