Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Joan Montané via Unicode
2018-06-04 21:49 GMT+02:00 Manish Goregaokar via Unicode <
unicode@unicode.org>:

> Hi,
>
> The Rust community is considering
> <https://github.com/rust-lang/rfcs/pull/2457> adding non-ascii
> identifiers, which follow UAX #31 <http://www.unicode.org/reports/tr31/>
> (XID_Start XID_Continue*, with tweaks). The proposal also asks for
> identifiers to be treated as equivalent under NFKC.
>
> Are there any cases where this will lead to inconsistencies? I.e. can the
> NFKC of a valid UAX 31 ident be invalid UAX 31?
>

Yes, such case exists, for instance in Latin alphabet and Catalan language.

* Ŀ, LATIN CAPITAL LETTER L WITH MIDDEL DOT  NFKC decomposes to
LATIN CAPITAL LETTER L (U+004C) MIDDLE DOT (U+00B7): 
* ŀ, LATIN SMALL LETTER L WITH MIDDLE DOT  NFKC decomposes to LATIN
SMALL LETTER L (U+006C) MIDDLE DOT (U+00B7): 

Ŀ and ŀ are (were) used for Catalan language for encoding geminate L [1]
when it is (was) encoded using 2 chars only. Preferred (and common used)
encoding is currently that of 3 chaacters: . So, some adjustments
are needed if you whant to support Catalan language identifiers [2]

Yours,
Joan Montané


[1] https://en.wikipedia.org/wiki/Interpunct#Catalan
[2] http://www.unicode.org/reports/tr31/#Specific_Character_Adjustments


Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Joan Montané
2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ :

> To add to what Ken and Markus said: like many other identifiers, there are
> a number of different categories.
>
>1. *Ill-formed: *"$1"
>2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
>http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence
><http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence>,
>but is not *valid* according to http://unicode.org/reports/tr5
>1/proposed.html#valid-emoji-tag-sequences
><http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences>
>.
>3. *Valid, but not recommended: "usca". *Corresponds to the valid
>Unicode subdivision code for California according to
>http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
><http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences>
>and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
>4. *Recommended:* "gbsct". Corresponds to the valid Unicode
>subdivision code for Scotland, and *is* listed in
>http://unicode.org/Public/emoji/5.0/
><http://unicode.org/Public/emoji/5.0/>.
>
>  As Ken says, the terminology is a little bit in flux for term
> 'recommended'. TR51 is still open for comment, although we won't make any
> changes that would invalidate http://unicode.org/Public/emoji/5.0/.
>

Just two remarks

1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
arises something like chicken-egg problem. Vendors don't easily add new
subdivision-flags (because they aren't recommended), and Unicode doesn't
recommend new subdivision flags (because they aren't supported by vendors).

2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
valid, but not recommended, Unicode subdivisions codes eligible? For
instances, say, could someone adopt California, Texas, Pomerania, or
Catalonia flags?


Regards,
Joan Montané


Re: About cultural/languages communities flags

2015-02-10 Thread Joan Montané
2015-02-10 17:16 GMT+01:00 Doug Ewell :

>
> In order to make a system like this work with an arbitrary number of
> symbols, a terminating symbol would have to be defined. Finding the
> longest match between a string of symbols and a TLD wouldn't work;
> someone might really want to encode "Brazil, United States, Sweden,
> Lesotho" consecutively, and would not want this converted to "Brussels."
>
> And as Ken pointed out, TLDs are TLDs; they are not a general-purpose
> geographic coding system. They don't include every sub-national region
> or separatist group, only the ones that Donuts and similar companies
> chose to register. There's no TLD for Abkhazia, for example, or for
> ISIS.
>
>
well, my propose for using GeoTLDs is an answer to the question "where do
you put the line?"

I agree a terminating symbol would help in expanding RIS system.


> IMHO keept tied to 2-alpha codes is a poor choice for users. May be
> > industry manufactures could find a better approach.
>
> Let's hope that industry manufacturers adhere to the standard instead of
> going off on their own. I thought that was the idea when all these
> cell-phone symbols were added to Unicode in the first place.
>
>
I really full agree. Manufacturers must follow standards. I support
standard, but IMHO RIS dessign is very strict.

Unicode doesn't define flags.
Unicode doesn't define country flags.
Unicode define a mechanism to define ISO country (and dependent
territories) flags

But manufacturers doesn't follow 100% ISO country codes, for instance,
dependent territories codes are usually mapped to country flag [1]. This is
a choice made by industry manufacturers, but, it's not in ISO.

Another choice made by industry is using a private code, like XK for
Kosovo, that's good!

The issue with Scotland, Walles, Catalonia and similar flags is a chicken
and egg situation. If a manufacturer wants to add such flags, standard
doesn't allow it!!! (PUA can be used, of course). And Unicode doesn't
expand RIS because manufacturers doesn't use these flags.

IMHO RIS mechanism should be expanded being more flexible, beyond 2 char
RIS. Unicode doesn't define flags, it defines a mechanism. Manufacturers
will choice supported flags, just like they are doing now!

So, the real question here is: Where do you put the line?

Currently it's put on ISO 3166-1 + some customizations made by industry,
but always it's tied to 2 char RIS. IMHO this is too poor for covering real
world use/request.

I suggested using currently ISO country codes + cultural/language TLDs.
Maybe there is a better approach

Best regards,
Joan Montané



[1] https://github.com/googlei18n/region-flags/blob/master/ALIASES
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Joan Montané
Thanks for your replies,


As far as I see, my informal request for expanding current RIS design
hasn't a good response. I understand it. Flags are cause of disputes, and
it isn't an issue for Unicode encode them.

IMHO keept tied to 2-alpha codes is a poor choice for users. May be
industry manufactures could find a better approach.

Best regards,
Joan Montané
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Joan Montané
Hi all,

I am the one who made the request to tweemoji Github.


2015-02-09 20:16 GMT+01:00 Markus Scherer :

> On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi <
> andrea.giammar...@gmail.com> wrote:
>
>> > if a cultural/language TLD is typed with Unicode RIS, then show the
>> flag for these culture/language:
>>
>
> This does not work. The "Unicode RIS" are defined to be used in pairs,
> with semantics according to corresponding ISO 3166 alpha2 codes. In your
> examples, each successive pair will encode a flag.
>
>
AFAIK, this is done in font side. Emoji flags are just ligatures, so a font
can provide a ligature for 4 RIS characters. This is not an issue here.

I agree some strange behaviour can appear if a 3 RIS string, take CAT, is
shown in a system with only 2 RIS support (a Canadian will appear followed
by a T).


If you want to represent every flag of every locality, you first have to
> figure out how to catalog and label them. You are mentioning provinces, one
> level down from nation states; I guess there are thousands of them. In much
> of Europe, every little village <http://de.wikipedia.org/wiki/Butterstadt>
> has its own flag and coat of arms. Where do you want the text encoding and
> fonts to stop?
>
>
I don't request flag support for every flag in the world. I requested flags
for culture/language communities *with* an approved TLD (Top Level Domain).

I know flags are an issue, and I know flags represents territories, not
languages, but I think some support should be done for these active
communities. As I pointed, some country flag collections expand with a fews
non-independent country.  See [1], [2] and [3] (search for Scottish or
Welsh flag). You can check this [4] petition requesting Catalan flag on
WhatsApp.

So, there is a demand and they are used in real world. What's the way for
encoding them in Unicode standard?

Thanks,

Joan Montané

[1] http://www.famfamfam.com/lab/icons/flags/
[2] https://www.gosquared.com/resources/flag-icons/
[3] http://www.sherv.net/flag-emoticons.html
[4]
https://www.change.org/p/whatsapp-inc-incloure-la-senyera-de-catalunya-a-whatsapp
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Joan Montané
Sorry, my reply was sended CC: to Unicode ML,

My apologies,

Joan Montané

2015-02-09 22:11 GMT+01:00 Joan Montané :

>
> Hi all,
>
> I am the one who made the request to tweemoji Github.
>
>
> 2015-02-09 20:16 GMT+01:00 Markus Scherer :
>
>> On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi <
>> andrea.giammar...@gmail.com> wrote:
>>
>>> > if a cultural/language TLD is typed with Unicode RIS, then show the
>>> flag for these culture/language:
>>>
>>
>> This does not work. The "Unicode RIS" are defined to be used in pairs,
>> with semantics according to corresponding ISO 3166 alpha2 codes. In your
>> examples, each successive pair will encode a flag.
>>
>>
> AFAIK, this is done in font side. Emoji flags are just ligatures, so a
> font can provide a ligature for 4 RIS characters. This is not an issue here.
>
> I agree some strange behaviour can appear if a 3 RIS string, take CAT, is
> shown in a system with only 2 RIS support (a Canadian will appear followed
> by a T).
>
>
> If you want to represent every flag of every locality, you first have to
>> figure out how to catalog and label them. You are mentioning provinces, one
>> level down from nation states; I guess there are thousands of them. In much
>> of Europe, every little village
>> <http://de.wikipedia.org/wiki/Butterstadt> has its own flag and coat of
>> arms. Where do you want the text encoding and fonts to stop?
>>
>>
> I don't request flag support for every flag in the world. I requested
> flags for culture/language communities *with* an approved TLD (Top Level
> Domain).
>
> I know flags are an issue, and I know flags represents territories, not
> languages, but I think some support should be done for these active
> communities. As I pointed, some country flag collections expand with a fews
> non-independent country.  See [1], [2] and [3] (search for Scottish or
> Welsh flag). You can check this [4] petition requesting Catalan flag on
> WhatsApp.
>
> So, there is a demand and they are used in real world. What's the way for
> encoding them in Unicode standard?
>
> Thanks,
>
> Joan Montané
>
> [1] http://www.famfamfam.com/lab/icons/flags/
> [2] https://www.gosquared.com/resources/flag-icons/
> [3] http://www.sherv.net/flag-emoticons.html
> [4]
> https://www.change.org/p/whatsapp-inc-incloure-la-senyera-de-catalunya-a-whatsapp
>
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode