Re: Tag characters and in-line graphics (from Tag characters)

2015-06-01 Thread Philippe Verdy
2015-06-01 1:33 GMT+02:00 Chris :

>
> Of course, anyone can invent a character set. The difficult bit is having
> a standard way of combining custom character sets. That’s why a standard
> would be useful.
>
> And while stuff like this can, to some extent, be recognised by magic
> numbers, and unique strings in headers, such things are unreliable. Just
> because example.net/mycharset/ appears near the start of a document,
> doesn’t necessarily mean it was meant to define a character set. Maybe it
> was a document discussing character sets.
>

That's not what I described. I spoke about using a MIME-compatible private
charset identifier, and how  such private identifier can be made
reasonnably unique by binding it to a domain name or URI.

If you had read more carefully I also said that it was absolutely not
necessary to dereference that URL: there are many XML schemas binding their
namespaces to a URI which is itself not a webpage or to any downloadable
DTD or XML schema or XML stylesheet. Google and Microsoft are using this a
lot in lots of schemas (which are not described and documented at this URL
if they are documented).

The URI by itself is just an identifier, it becomes a webpage only when you
use it in a web page with an href attribute to create an hyperlink, or to
perform some query to a service returning some data. An identifier for a
private charset does not need to perform any request to be usable by
itself, we just have the identifier which is sufficient by itself. The URI
can be also only a base URI for a collection of resources (whose URLs start
by this base URI, with conventional extensions appended to get the
character properties, or a font; but the best way is to embed this data in
your document, in some header or footer, if your document using the private
charset is not part of a collection of docs using the same private charset)

In that case, you don't need a new UTF: UTF-8 remains usable and you can
map your private charset to standard PUAs (and/or to "hacked" characters)
according to the private charset needs. The charset indicated in your
document (by some meta header) should be sufficient to avoid collisions
with other private conventions, it will define the scope of your private
charset as the document itself, which will then be interchangeable (and
possibly mixable with other documents with some renumbering if there a
collisions of assignments between two distinct private charsets: in the
document header; add to the charset identifier the range of PUAs which is
used, then with two documents colling on this range, you can reencode one
automatically by creating a compound charset with subranges of PUAs
remapped differently to other ranges).


Re: The Oral History Of The Poop Emoji

2015-06-01 Thread Philippe Verdy
Article de "merde" ? (not an insult, this is a true French word,
appropriate to the subject). Bon appétit ! (if you think about orality...)

2015-06-02 0:42 GMT+02:00 Doug Ewell :

> I agree with one of the commenters that certain words just should not be
> used together in headlines.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸
>
>


Re: The Oral History Of The Poop Emoji

2015-06-01 Thread Doug Ewell
I agree with one of the commenters that certain words just should not be
used together in headlines.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸



Re: The Oral History Of The Poop Emoji

2015-06-01 Thread Mark Davis ☕️
One of many on http://unicode.org/press/emoji.html


Mark 

*— Il meglio è l’inimico del bene —*

On Mon, Jun 1, 2015 at 8:23 PM, Karl Williamson 
wrote:

>
> https://www.fastcompany.com/3037803/the-oral-history-of-the-poop-emoji-or-how-google-brought-poop-to-america
>


The Oral History Of The Poop Emoji

2015-06-01 Thread Karl Williamson

https://www.fastcompany.com/3037803/the-oral-history-of-the-poop-emoji-or-how-google-brought-poop-to-america


RE: Sencoten and Unicode policy (was: the usage of LATIN SMALL LETTER A WITH STROKE)

2015-06-01 Thread Erkki I Kolehmainen
Please note that overlaid diacritics are not used in decomposition of 
characters in the Unicode Standard, unless they are used for the indication of 
negation of mathematical rules (see TUS 7.0, section 7.9 Combining Marks and 
2.12 Equivalent Sequences).

Sincerely 
Erkki I. Kolehmainen

-Alkuperäinen viesti-
Lähettäjä: Unicode [mailto:unicode-boun...@unicode.org] Puolesta Janusz S. 
"Bien"
Lähetetty: 1. kesäkuuta 2015 14:50
Vastaanottaja: David Starner
Kopio: unicode@unicode.org
Aihe: Sencoten and Unicode policy (was: the usage of LATIN SMALL LETTER A WITH 
STROKE)

On Mon, Jun 01 2015 at  3:29 CEST, prosfil...@gmail.com writes:
> On Sun, May 31, 2015 at 11:09 AM Janusz S. Bien 
> wrote:
>
> The proposal makes me curious about past and present Unicode
> policy,
> e.g. would it be accepted if submitted now. 
> 
>
> Why wouldn't it? Unicode has, if anything, seemed to become more 
> flexible about adding characters that seeing any sort of use.
>

On Sun, May 31 2015 at 18:20 CEST, frederic.grossh...@gmail.com writes:

[...]

> The upper case was introduces for Sencoten, and the proposal is here 
> http://www.unicode.org/L2/L2004/04170-sencoten.pdf

The document's author states:

Although they could be made up of Letter + overlay diacritic, it is
my understanding that the Unicode Consortium would prefer to create
unique code points for these types of letters (e.g. recent
acceptance of LATIN LETTER SMALL C WITH STROKE).

Is this true?

On the other hand, according to Wikipedia

   http://en.wikipedia.org/wiki/Saanich_dialect

in 2014 there was "about 5" native speakers of the language.

Best regards

Janusz

-- 
   ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics 
Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, 
http://fleksem.klf.uw.edu.pl/~jsbien/




Re: Sencoten and Unicode policy (was: the usage of LATIN SMALL LETTER A WITH STROKE)

2015-06-01 Thread David Starner
On Mon, Jun 1, 2015 at 4:49 AM Janusz S. Bień  wrote:

> The document's author states:
>
> Although they could be made up of Letter + overlay diacritic, it is
> my understanding that the Unicode Consortium would prefer to create
> unique code points for these types of letters (e.g. recent
> acceptance of LATIN LETTER SMALL C WITH STROKE).
>
> Is this true?
>

As far as I know it's still true. Overlay diacritics don't work well, so
they're pretty much ignored in encoding new characters.


> On the other hand, according to Wikipedia
>
>http://en.wikipedia.org/wiki/Saanich_dialect
>
> in 2014 there was "about 5" native speakers of the language.
>
> It's what you get when you stock the committee who  chooses what
characters to encode with linguists. In the most general case, there is
text in that language, and someone will want to digitize it.


Sencoten and Unicode policy (was: the usage of LATIN SMALL LETTER A WITH STROKE)

2015-06-01 Thread Janusz S. Bień
On Mon, Jun 01 2015 at  3:29 CEST, prosfil...@gmail.com writes:
> On Sun, May 31, 2015 at 11:09 AM Janusz S. Bien 
> wrote:
>
> The proposal makes me curious about past and present Unicode
> policy,
> e.g. would it be accepted if submitted now. 
> 
>
> Why wouldn't it? Unicode has, if anything, seemed to become more
> flexible about adding characters that seeing any sort of use.
>

On Sun, May 31 2015 at 18:20 CEST, frederic.grossh...@gmail.com writes:

[...]

> The upper case was introduces for Sencoten, and the proposal is here
> http://www.unicode.org/L2/L2004/04170-sencoten.pdf

The document's author states:

Although they could be made up of Letter + overlay diacritic, it is
my understanding that the Unicode Consortium would prefer to create
unique code points for these types of letters (e.g. recent
acceptance of LATIN LETTER SMALL C WITH STROKE).

Is this true?

On the other hand, according to Wikipedia

   http://en.wikipedia.org/wiki/Saanich_dialect

in 2014 there was "about 5" native speakers of the language.

Best regards

Janusz

-- 
   ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


Re: Some questions about Unicode's CJK Unified Ideograph

2015-06-01 Thread Andre Schappo

On 30 May 2015, at 01:20, gfb hjjhjh wrote:

2. Is combined characters like U+20DD intended to work with all different type 
of characters, or is it some problem related to implementation ? as I when i 
write ゆ⃝ (Japanese Hiragana Letter Yu + Combining Enclosing Circle) appear to 
be separate on most font I use, but if I change the Hiragana Yu into a 
conventional = sign or some latin character, most fonts are at least somehow 
able to put them together. Or, is there any better/alternative representation 
in unicode that can show japanese hiragana yu in a circle?

Japanese Hiragana Letter Yu + Combining Enclosing Circle works fine for me 
using TextEdit on OSX

André Schappo