And, Marcel, while you are at it, this is getting tiresome.
Please find some other place to vent about events you know very little
about; the internet is full of them.
Mark
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Tue, Jun 16, 2015 at 7:33 PM, Doug Ewell
On Mon, Jun 15, 2015 at 9:17 AM, Marcel Schneider charupd...@orange.fr
wrote:
When we take the topic down again from linguistics to the core mission of
Unicode, that is character encoding and text processing standardisation,
ellipsis and Swedish abbreviation colon differ from the single
On Sat, Jun 13, 2015 at 5:10 PM, Peter Constable peter...@microsoft.com
wrote:
When it comes to orthography, the notion of what comprise words of a
language is generally pure convention. That’s because there isn’t any
single *_linguistic_ *definition of word that gives the same answer when
I think the whole thread got overheated, and Andrew was just responding to
other heated comments. So it might be time to let this thread cool off a
bit.
The collaboration over the years between the Unicode Consortium and ISO has
been, on the whole, a remarkable success. There have been
Whoops, sent too soon.
A surprise: http://✈.ws
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Fri, Jun 5, 2015 at 4:47 PM, Mark Davis ☕️ m...@macchiato.com wrote:
One of many on http://unicode.org/press/emoji.html
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Mon, Jun 1, 2015 at 8:23 PM, Karl Williamson pub...@khwilliamson.com
wrote:
Hmmm. How accurate can it be? They forgot Austria, and got Switzerland
wrong by almost a power of 10.
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Wed, May 27, 2015 at 10:18 AM, Denis Jacquerye moy...@gmail.com wrote:
The South China Morning Post published a
and gives a population of 727 000 Standard
German L1 speakers in Switzerland (the difference is counted as Swiss
German L1 speakers).
On Wed, 27 May 2015 at 11:22 Mark Davis [image: ☕]️ m...@macchiato.com
wrote:
Hmmm. How accurate can it be? They forgot Austria, and got Switzerland
wrong by almost
A few notes.
A more concrete proposal will be in a PRI to be issued soon, and people
will have a chance to comment more then. (I'm not trying to discourage
discussion, just pointing out that there will be something more concrete
relatively soon to comment on—people are pretty busy getting 8.0
The consortium is in no position to enhance protocols *itself* for
exchanging images. That's firmly in other groups' hands. We can try to
noodge them a bit, but what *will* make a difference is when the *vendors*
of sticker solutions put pressure on the different groups responsible for
the
http://www.washingtonpost.com/blogs/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts/
Thanks!
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Fri, May 8, 2015 at 7:15 AM, Peter Constable peter...@microsoft.com
wrote:
I think this is the right public link:
https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx
*From:* Peter Constable
The simplest approach would be to use ICU in a little program that scans
the file. For example, you could write a little Java program that would
scan the file, and turn any any sequence of (\u)+ into a String, then
test that string with:
static final UnicodeSet OK = new
I happened to run across a good example of productive use of combining
marks, the Duden site (a great online dictionary for German). They use
U+0323 ( ̣) COMBINING DOT BELOW to indicate the stress. Here is an
example:
ụnterbuttern
http://www.duden.de/rechtschreibung/unterbuttern
They aren't,
borrowed from French) also have either a
single line
under the whole digraph or (this happens rarely) a single dot in the
middle of the
digraph.
--Jörg Knappen
*Gesendet:* Donnerstag, 16. April 2015 um 10:01 Uhr
*Von:* Mark Davis [image: ☕]️ m...@macchiato.com
*An:* Unicode Public unicode
It only provides a stand-in glyph if you don't otherwise have a font for
that character on your system. That stand-in just indicates the type of
character (eg script).
No single font with current technology can handle all of Unicode. The most
complete open font set I know of is the Noto family:
Congrats!
{phone}
On Mar 14, 2015 03:09, Roozbeh Pournader rooz...@unicode.org wrote:
Android 5.1
http://officialandroid.blogspot.com/2015/03/android-51-unwrapping-new-lollipop.html,
released earlier this week, has added support for 25 minority scripts. The
wide coverage can be reproduced by
We are being pretty conservative about what we add. There are approximately
1,200 emoji characters now (see tr51), and we're anticipating adding
perhaps 50 per release. And we are encouraging a sticker approach for the
longer term.
On the other hand, I wouldn't be surprised if the 41 emoji
In what character encoding standard, or extension, does ROBOT FACE appear?
Unicode has never been limited to what is in other character encoding
standard or extensions, official or de facto.
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Mon, Feb 9, 2015 at 9:16
On Tue, Feb 10, 2015 at 12:11 AM, Ken Whistler kenwhist...@att.net wrote:
for the full context, and for the current 26x26 letter matrix which is
the basis for the flag glyph implementations of regional indicator
code pairs on smartphones.
SC, SO, ST are already taken, but might I suggest
I apology in advance that I'm running low on time, and didn't go through
all the messages on this thread carefully. So I may not be fully
appreciating people's positions. I'm just making some quick points about 2
items that caught my eye.
1. There are certainly times where two rules in sequence
On Thu, Dec 18, 2014 at 11:31 AM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
standard variant sensitive
It is not clear what you mean by standard variant sensitive. Can you
elaborate?
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
, Mark Davis [image: ☕]️
m...@macchiato.com wrote:
On Thu, Dec 18, 2014 at 11:31 AM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
standard variant sensitive
It is not clear what you mean by standard variant sensitive. Can you
elaborate?
Mark https://google.com/+MarkDavis
We just had a new blog posting; we've moved the media list out of tr51, and
the list already had that item on it. See:
http://www.unicode.org/press/emoji.html#media
Separately, I keep a list of how the media refers to the Unicode
consortium: my favorite is shadowy emoji overlords.
Bonus points
On Wed, Dec 17, 2014 at 9:03 PM, Murray Sargent
murr...@exchange.microsoft.com wrote:
http://www.theguardian.com/commentisfree/2014/nov/28/the-problem-with-emojis
Bingo, Murray wins the prize!
[image: Inline image 1]
Not to open until Christmas...
On Mon Nov 17 2014 at 12:15:08 PM Andreas Stötzner a...@signographie.de
wrote:
Am 17.11.2014 um 11:46 schrieb Leonardo Boiko:
Sign is too general
in its generality it is just perfect. The sets of signs in question are
most general, covering much more matters, objects and topics than the
nothing is.
[1] http://en.wiktionary.org/wiki/sign
2014-11-17 8:09 GMT-02:00 Andreas Stötzner a...@signographie.de:
Am 17.11.2014 um 08:35 schrieb Mark Davis ☕️:
IT’S EASY TO DISMISS EMOJI. They are, at first glance, ridiculous
The only ridiculous thing is to name them “Emoji” outside
http://nymag.com/daily/intelligencer/2014/11/emojis-rapid-evolution.html
A more extended article from NY Magazine about the growing usage of emoji,
and the ways in which that usage is developing. Has a quote from Peter
Constable and (indirect) reference to +Steven R. Loomis.
“IT’S EASY TO
As far as I can tell it is garnering interest all over.. Several German
publications, including Spiegel, to French and Italian regional papers, to
Indonesian, Vietnamese
http://www.spiegel.de/netzwelt/web/unicode-consortium-emojis-demnaechst-fuer-alle-hautfarben-a-1001125.html
?
Thanks
On Fri, Nov 7, 2014 at 12:18 AM, Mark Davis ☕️ m...@macchiato.com wrote:
Very nice.
I'd have one suggestion. People appear to be converging on similar file
names for the emoji.
- Lowercase hex numbers,
- at least 4 digits,
- otherwise no leading zeros,
- multiple code
As an experiment, we recorded the keynote at the Unicode Conference. I
posted them at
http://macchiati.blogspot.com/2014/11/unicode-emoji.html
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
___
Unicode mailing list
Very nice.
I'd have one suggestion. People appear to be converging on similar file
names for the emoji.
- Lowercase hex numbers,
- at least 4 digits,
- otherwise no leading zeros,
- multiple code points separated by _,
- with optional prefix/suffix.
Like dcm_0030_20e3.png. I'd
On Thu, Oct 23, 2014 at 6:54 PM, Aaron Cannon
cann...@fireantproductions.com wrote:
0061 05AE 0305 0300 0315 0062
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cu0061+%5Cu05AE+%5Cu0305+%5Cu0300+%5Cu0315+%5Cu0062g=ccc
0305 and 0300 have the same ccc, so the first one blocks the
I'm looking for freely downloadable TTF fonts for any of the following.
I'd appreciate links to sites for any of these:
1. Bassa_Vah
2. Duployan
3. Grantha
4. Khojki
5. Khudawadi
6. Mahajani
7. Mende_Kikakui
8. Modi
9. Mro
10. Nabataean
11. Old_Permic
12.
I agree that we should minute at least some reason for declining. It need
only be a sentence or two.
(BTW I wasn't at that discussion.)
{phone}
On Sep 20, 2014 3:17 AM, Asmus Freytag asm...@ix.netcom.com wrote:
On 9/19/2014 5:38 PM, Whistler, Ken wrote:
Michael,
Declines to take action”
Cool, congratulations!
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Thu, Aug 14, 2014 at 3:52 PM, Peter Constable peter...@microsoft.com
wrote:
For those interested, there is an update for Windows available now to
add font, keyboard and locale data support
These variation selector characters only apply to specific characters,
those listed in
http://unicode.org/Public/UNIDATA/StandardizedVariants.html
There is a machine-readable version at
http://unicode.org/Public/UNIDATA/StandardizedVariants.txt
Mark https://google.com/+MarkDavis
*— Il meglio
I haven't done any analysis, but on first glance it looks like it is based
on
http://www.unicode.org/reports/tr31/#Alternative_Identifier_Syntax
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Thu, Jun 5, 2014 at 5:46 PM, Jeff Senn s...@maya.com wrote:
Has
Apparently you can use emoji in the identifiers.
(
http://www.globalnerdy.com/2014/06/03/swift-fun-fact-1-you-can-use-emoji-characters-in-variable-constant-function-and-class-names/
)
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Wed, Jun 4, 2014 at 11:28 AM,
On Mon, Jun 2, 2014 at 10:32 PM, David Starner prosfil...@gmail.com wrote:
Why? It seems you're changing the rules
...
This isn't are changing, it is has changed. The Corrigendum was issued
at the start of 2013, about 16 months ago; applicable to all relevant
earlier versions. It was the
On Tue, Jun 3, 2014 at 9:41 AM, David Starner prosfil...@gmail.com wrote:
Thinking that a utility would never mangle them if encountered in
input text was a pipe-dream.
I didn't say not mangle, I said break, as in crash.
I don't think this thread is going anywhere productive, so I'm
\uD808\uDF45 specifies a sequence of two codepoints.
That is simply incorrect.
In Java (and similar environments), \u means a char (a UTF16 code
unit), not a code point. Here is the difference. If you are not used to
Java, string.replaceAll(x,y) uses Java's regex to replace the pattern x
The problem is where to draw the line. In today's world, what's an app? You
may have a cooperating system of apps, where it is perfectly reasonable
to interchange sentinel values (for example).
I agree with Markus; I think the FAQ is pretty clear. (And if not, that's
where we should make it
On Mon, Jun 2, 2014 at 6:21 PM, Shawn Steele shawn.ste...@microsoft.com
wrote:
The “problem” is now that previously these characters were illegal
The problem was that we were inconsistent in standard and related material
about just what the status was for these things.
Mark
. Any app where input of noncharacters causes
security problems or crashes is and was not a very good app.
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Mon, Jun 2, 2014 at 6:37 PM, Asmus Freytag asm...@ix.netcom.com wrote:
On 6/2/2014 9:27 AM, Mark Davis ☕️ wrote
I think you have a point here. We should probably change to:
To meet this requirement, an implementation shall supply a mechanism for
specifying any Unicode scalar value (from U+ to U+D7FF and U+E000 to
U+10), using the hexadecimal code point representation.
and then in the notes say
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Fri, May 30, 2014 at 12:39 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
I am a little confused by the call for a review of UTS #39, Unicode
Security Mechanisms (PRI #273). Are we being requested to
A few quick items. (I admit to only skimming your response, Phillipe; there
is only so much time in the day.)
Any discussion of changing non-characters is really pointless. See
http://www.unicode.org/policies/property_value_stability_table.html
As to breaking up the block, that is not forbidden:
They are defined in http://unicode.org/reports/tr35/tr35.html#Unicode_Sets.
We should add a pointer to that; could you please file a feedback report
for #18 to that effect?
Also, if you find any problems in the description in #35, you can file a
ticket at http://unicode.org/cldr/trac/newticket to
On 25 April 2014 20:53, Karl Williamson pub...@khwilliamson.com wrote:
And in fact in some Unicode releases, they contained errors.
I think you know this, but for others.
A derived property value in the UCD is defined by the value in the derived
data file, NOT by the derivation. Of course,
We try not to do that. There are some known holes, like RBNF. if you know
of others please file a ticket.
{phone}
On Apr 21, 2014 9:18 PM, Doug Ewell d...@ewellic.org wrote:
From: Asmus Freytag asmusf at ix dot netcom dot com wrote:
In general, I heartily dislike specifications that just
On 15 April 2014 13:14, William_J_G Overington wjgo_10...@btinternet.comwrote:
If the UTC (Unicode Technical Committee) accepts the introduction of
read-out labels, each read-out label both linked to a pictograph character
and also linked to a language-localization text string, then that will
This is really off topic. If you want to start up a thread about this,
please use a different subject.
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On 14 April 2014 16:01, William_J_G Overington wjgo_10...@btinternet.comwrote:
Here are two examples each of a
On 12 April 2014 11:46, William_J_G Overington wjgo_10...@btinternet.comwrote:
...
In March 2014 I published the attached document, depositing a copy with the
British Library.
The_format_of_the_translit.dat_file_suggested_for_possible_use_for_transliteration.pdf
Is this format suitable to
On 12 April 2014 16:54, William_J_G Overington wjgo_10...@btinternet.comwrote:
Would it be good, for an emoji that is not encoded in regular Unicode, to
include mention of the possibility of transmission by markup bubble,
rendered upon reception as an unmapped glyph by an OpenType colour font?
I tend to agree with Roozbeh and Behdad. I would expect to find the visible
appearance of the hyphen replacing the letters that were broken off from
the last word. That is, if the word was beekeeper, I'd expect to see:
bee- .
That would be no matter where the word occurred, and no
More emoji from Chrome:
http://chrome.blogspot.ch/2014/04/a-faster-mobiler-web-with-emoji.html
with video: https://www.youtube.com/watch?v=G3NXNnoGr3Y
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode
Yup!
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On 1 April 2014 09:13, Philippe Verdy verd...@wanadoo.fr wrote:
April 1st joke...
2014-04-01 9:01 GMT+02:00 Mark Davis ☕️ m...@macchiato.com:
More emoji from Chrome:
http://chrome.blogspot.ch/2014/04
They do have aliases in NameAliases.txt
;NULL;control
;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation
0002;START OF TEXT;control
0002;STX;abbreviation
...
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Wed, Mar 12, 2014 at 1:32
Not sure about your exact case, but ICU's normalization does handle those
characters.
http://unicode.org/cldr/utility/transform.jsp?a=nfc%3Bhexb=%5Cu30B9%5Cu3099
(That tool uses ICU for NFC).
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Tue, Mar 11, 2014 at
Unicode is not anti-Serbian or Macedonian.
The exact level of Unicode support will depend on your operating system and
font choice. For example, on the Mac there are reasonable results with
arbitrary
accents. Here are examples with q,U+0308 and Q,U+0308
q̈
Q̈
Here is an image, in case your
Boy, I'd forgotten about those. There is an open-source collection of IDSs
that I used to create those files. Unfortunately, I found that *that* data
would take a lot of cleanup.
I do agree that it would be very useful to have an open-source repository
of IDSs for Unicode characters, but I don't
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0077056
with a popular article at
http://www.washingtonpost.com/blogs/worldviews/wp/2013/12/04/how-the-internet-is-killing-the-worlds-languages/
The source article was interesting, although I'd take issue with some of
their
These are two well-known serious flaws in EAI and URLs; there is no useful
syntactic limit on what is in the query part of a URL or on the local part
of an email address that would allow their boundaries to be detected in
plaintext.
No use complaining about them, because people are concerned with
Mark Davis ☕ m...@macchiato.com
These are two well-known serious flaws in EAI and URLs; there is no
useful syntactic limit on what is in the query part of a URL or on the
local part of an email address that would allow their boundaries to be
detected in plaintext.
No use complaining about them
è l’inimico del bene —*
**
On Tue, Oct 15, 2013 at 8:53 PM, Mark Davis ☕ m...@macchiato.com wrote:
but as Michel mentioned the data
does not seem consistent in that case.
You might add that to your report...
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è
Normally the term ASCII just refers to the 7-bit form. What is sometimes
called 8-bit ASCII is the same as ISO Latin 1. If you want to be
completely clear, you can say 7-bit ASCII.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Tue, Oct 29,
/2013 12:40 AM, Mark Davis ☕ wrote:
For the confusables, the presumption is that implementations have
already either normalized the input to NFKC or have rejected input that
is not NFKC.
Thanks for the explanation Mark. It makes sense for implementations
which want to detect confusability
For the confusables, the presumption is that implementations have already
either normalized the input to NFKC or have rejected input that is not
NFKC.
More broadly, in gathering data the main emphasis is on characters that fit
the profile in
http://www.unicode.org/faq/char_combmark.html#9 and following.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Sat, Sep 21, 2013 at 7:38 PM, Robert Wheelock rwhlk...@gmail.com wrote:
Hello again, y’all!
I’ve got quite a few characters
Nicely stated.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Thu, Sep 19, 2013 at 11:21 PM, Whistler, Ken ken.whist...@sap.comwrote:
Stephan Stiller seems unconvinced by the various attempts to explain the
situation. Perhaps an
Thanks for the feedback; the typo is fixed.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Fri, Sep 13, 2013 at 1:19 AM, Philippe Verdy verd...@wanadoo.fr wrote:
Typo in section 2.3 Number Symbols, for the new item
superscriptingExponent
Classical Greek might qualify [for a CLDR entry]
It certainly qualifies, but we require that a submitter commit to
collecting a minimal amount of data before we add it. See
http://cldr.unicode.org/index/cldr-spec/minimaldata
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è
Great news, and well deserved!
Congratulations, Behdad!
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Jul 29, 2013 at 9:41 PM, Roozbeh Pournader rooz...@google.comwrote:
Some of you probably have heard the news already, but in case
Popping up a level.
ICU (and some other libraries) have heuristic encoding detection, that will
take a sequence of bytes and come up with a likely encoding id.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Fri, Jul 19, 2013 at 8:40 PM,
Saw that, thanks!
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Wed, May 8, 2013 at 8:26 PM, Tim Greenwood timo...@greenwood.namewrote:
http://xkcd.com/1209/
LOL...
{phone}
On Apr 20, 2013 8:44 PM, Erkki I Kolehmainen e...@iki.fi wrote:
Mr. Overington,
I'm sorry to have to admit that I cannot follow at all your train of
thought on what would be the practical value of localizable sentences in
any of the forms that you are contemplating. In my
Should the Unicode Consortium decide to recommend an existing (or new)
character as a raised decimal for numbers, we would add that to CLDR, and
recommend that implementations accept either one as equivalent when parsing.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è
I think just the main data is converted. If you want to request the other
data you can file a cldr ticket.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Sat, Mar 2, 2013 at 8:35 PM, Edwin Hoogerbeets ehoogerbe...@gmail.comwrote:
Hi all, I
But still non-conformant.
That's incorrect.
The point I was making above is that in order to say that something is
non-conformant, you have to be very clear what it is non-conformant *TO*
.
Also, we commonly read code points from 16-bit Unicode strings, and
unpaired surrogates are returned
That is not the typical way that Unicode text is processed.
Typically whatever OS you are using will supply mechanisms for iterating
through any Unicode string, returning each of the code points. It may also
offer APIs for returning information about each character (called 'property
values', or
That's not the point (see successive messages).
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Jan 7, 2013 at 4:59 PM, Martin J. Dürst due...@it.aoyama.ac.jpwrote:
On 2013/01/08 3:27, Markus Scherer wrote:
Also, we commonly read code
In practice and by design, treating isolated surrogates the same as
reserved code points in processing, and then cleaning up on conversion to
UTFs works just fine. It is a tradeoff that is up to the implementation.
It has nothing to do with a legacy of C pointer arithmetic. It does
represent a
Some of this is simply historical: had Unicode been designed from the start
with 8 and 16 bit forms in mind, some of this could be avoided. But that is
water long under the bridge. Here is a simple example of why we have both
UTFs and Unicode Strings.
Java uses Unicode 16-bit Strings. The
There are many cases of such digraphs.
Example from Slovak:
c d h
but
cd h ch
Cf http://www.unicode.org/reports/tr10/, searching for Slovak.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Sun, Jan 6, 2013 at 1:56 PM, Costello, Roger L.
http://www.unicode.org/alloc/CurrentAllocaiton.html
=
http://www.unicode.org/alloc/CurrentAllocation.html
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Fri, Jan 4, 2013 at 10:24 AM, Whistler, Ken ken.whist...@sap.com wrote:
Stephan Stiller
To assess whether a string is invalid, it all depends on what the string is
supposed to be.
1. As Ken says, if a string is supposed to be in a given encoding form
(UTF), but it consists of an ill-formed sequence of code units for that
encoding form, it would be invalid. So an isolated surrogate
.
-Shawn
-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
Behalf Of James Cloos
Sent: Tuesday, January 1, 2013 5:43 PM
To: Mark Davis ☕
Cc: Whistler, Ken; unicode@unicode.org
Subject: Re: locale-aware string comparisons
MD == Mark Davis ☕ m
3. Regarding LDML and CLDR, somebody with specific expertise on CLDR
James,
Even without locale differences, the situation is a bit tricky. Assuming
that str_tolower() and str_toupper() were straightforwardly defined in
terms of the (full) Unicode case mappings, there is still the issue that
the
There are different use cases, and I think they are getting confused.
1. Present a name for each character, some sort of formal name.
I think this is probably the least useful for average users.
2. Allow searching for characters, eg in a character picker.
Sample use case: search for dash (or the
I have a new google blog post about the new ECMAScript (JavaScript)
internationalization spec.
“Until now, it has been very difficult for web application designers to do
something as simple as sort names correctly according to the user's
language. And it matters: English readers wouldn’t expect
0300 *is* blocked, because there is a preceding character (0305) that has
the same combining class (230).
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Dec 10, 2012 at 11:55 AM, Edwin Hoogerbeets
ehoogerbe...@gmail.comwrote:
Looking at
Their inference, it appears, is that had I not read Tolkien when I was 13
I would not be who I am today and the content of the Universal Character
Set might be a lot different than it is.
I doubt it.
Many people are far more responsible for the structure, model, properties,
and characters of
I agree with that analysis.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Nov 26, 2012 at 1:53 PM, Whistler, Ken ken.whist...@sap.com wrote:
Actually, I think the omission here is the word canonical. In other
words, Section 16.4
This case remains very infrequent: it is extremely rare to start typing
text in
With arrow keys or mouse clicking it is more frequent to end up on a
directional boundary.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Nov 12, 2012 at
I tend to agree. What would be useful is to have one column for the city in
the local language (or more columns for multilingual cities), but it is
extremely useful to have an ASCII version as well.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
Eg, in http://www.unece.org/fileadmin/DAM/cefact/locode/gr.htm
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Tue, Oct 2, 2012 at 1:49 PM, Mark Davis ☕ m...@macchiato.com wrote:
I tend to agree. What would be useful is to have one column
://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Tue, Oct 2, 2012 at 2:52 PM, Mark Davis ☕ m...@macchiato.com wrote:
Eg, in http://www.unece.org/fileadmin/DAM/cefact/locode/gr.htm
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico
BTW, if you want to share the announcement:
- Google+:
https://plus.sandbox.google.com/u/0/109412260435993059737/posts (I also
reposted at with my personal
accounthttps://plus.google.com/114199149796022210033
.)
- Facebook:
201 - 300 of 920 matches
Mail list logo