Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Phake Nick via Unicode
Those characters could also be put into another block for the same script
similar to how dubious characters in CJK are included by placing them into
"CJK Compatibility Ideographs" for round trip compatibility with source
encoding.

在 2020年2月14日週五 03:35,Richard Wordingham via Unicode 
寫道:

> On Thu, 13 Feb 2020 10:18:40 +0100
> Hans Åberg via Unicode  wrote:
>
> > > On 13 Feb 2020, at 00:26, Shawn Steele 
> > > wrote:
> > >> From the point of view of Unicode, it is simpler: If the character
> > >> is in use or have had use, it should be included somehow.
> > >
> > > That bar, to me, seems too low.  Many things are only used briefly
> > > or in a private context that doesn't really require encoding.
> >
> > That is a private use area for more special use.
>
> Writing the plural ('Egyptologists') by writing the plural strokes below
> the glyph could be difficult if the renderer won't include them in the
> same script run.
>
> Richard.
>
>


Re: Encoding italic

2019-01-28 Thread Phake Nick via Unicode
2019-1-25 13:46, Garth Wallace via Unicode  wrote:

>
> On Wed, Jan 23, 2019 at 1:27 AM James Kass via Unicode <
> unicode@unicode.org> wrote:
>
>>
>> Nobody has really addressed Andrew West's suggestion about using the tag
>> characters.
>>
>> It seems conformant, unobtrusive, requiring no official sanction, and
>> could be supported by third-partiers in the absence of corporate
>> interest if deemed desirable.
>>
>> One argument against it might be:  Whoa, that's just HTML.  Why not just
>> use HTML?  SMH
>>
>> One argument for it might be:  Whoa, that's just HTML!  Most everybody
>> already knows about HTML, so a simple subset of HTML would be
>> recognizable.
>>
>> After revisiting the concept, it does seem elegant and workable. It
>> would provide support for elements of writing in plain-text for anyone
>> desiring it, enabling essential (or frivolous) preservation of
>> editorial/authorial intentions in plain-text.
>>
>> Am I missing something?  (Please be kind if replying.)
>>
>
> There is also RFC 1896 "enriched text", which is an attempt at a
> lightweight HTML substitute for styling in email. But these, and the ANSI
> escape code suggestion, seem like they're trying to solve the wrong problem
> here.
>
> Here's how I understand the situation:
> * Some people using forms of text or mostly-text communication that do not
> provide styling features want to use styling, for emphasis or personal flair
> * Some of these people caught on to the existence of the "styled"
> mathematical alphanumerics and, not caring that this is "wrong", started
> using them as a workaround
> * The use of these symbols, which are not technically equivalent to basic
> Latin, make posts inaccessible to screen readers, among other problems
>
> These are suggestions for Unicode to provide a different, more
> "acceptable" workaround for a lack of functionality in these social media
> systems (this mostly seems to be an issue with Twitter; IME this shows up
> much less on Facebook). But the root problem isn't the kludge, it's the
> lack of functionality in these systems: if Twitter etc. simply implemented
> some styling on their own, the whole thing would be a moot point.
> Essentially, this is trying to add features to Twitter without waiting for
> their development team.
>
> Interoperability is not an issue, since in modern computers copying and
> pasting styled text between apps works just fine.
>

How about outside social media system? For example, Chinese Braille have
symbols that indicate the start and end position of proper name mark and
book name mark punctuation, however when converted to plain text they
cannot be displayed with Unicode text because of the mindset that it should
be the task of styling software to render this punctuation, just because
the two punctuations are basically straight underline and wavy underline
beneath text in normal Chinese text.

>


Re: Encoding italic

2019-01-28 Thread Phake Nick via Unicode
Gmail can do *Märchen* although I am not too sure about how they transmit
such formatting and not sure about how interoperatable are they.

在 2019年1月22日週二 14:43,Adam Borowski via Unicode  寫道:

> On Mon, Jan 21, 2019 at 12:29:42AM -0800, David Starner via Unicode wrote:
> > On Sun, Jan 20, 2019 at 11:53 PM James Kass via Unicode
> >  wrote:
> > >  Even though /we/ know how to do
> > > it and have software installed to help us do it.
> >
> > You're emailing from Gmail, which has support for italics in email.
>
> ... and how exactly can they send italics in an e-mail?  All they can do is
> to bundle a web page as an attachment, which some clients display instead
> of
> the main text.
>
> The e-mail's body text supports anything Unicode does, including
> 𝑖𝑡𝑎𝑙𝑖𝑐 and
> even 𐌏𐌋𐌃 𐌉𐌕𐌀𐌋𐌉𐌂, but, remarkably, not italic umlauted characters,
> thai nor
> han.
>


Re: wws dot org

2019-01-16 Thread Phake Nick via Unicode
Feedback after briefly reading the East Asia section of the website:
1. I am pretty sure the "Kaida" script is not living anymore, according to
Wikipedia description
2. Hentaigana refers to all alternative form of kana that're used before
modern standardization, I don't think they're still used actively now.
3. The meaning of the "Old Hanzi" is not clear. If it is the same
definition as the one stated in this blog:
http://babelstone.blogspot.com/2007/07/old-hanzi.html , then it is not
referring to a single script and instead refer to all historical ways to
write Hanzi, including Oracle Bone script, Bronze script, and (Small) Seal
script and such. And the list have already separately include oracle bone
script, bronze script and seal script, which apparently make this "old
hanzi" entry redundant.



在 2019年1月16日週三 02:25,Johannes Bergerhausen via Unicode 
寫道:

> Dear list,
>
> I am happy to report that www.worldswritingsystems.org is now online.
>
> The web site is a joint venture by
>
> — Institut Designlabor Gutenberg (IDG), Mainz, Germany,
> — Atelier National de Recherche Typographique (ANRT), Nancy, France and
> — Script Encoding Initiative (SEI), Berkeley, USA.
>
> For every known script, we researched and designed a reference glyph.
>
> You can sort these 292 scripts by Time, Region, Name, Unicode version and
> Status.
> Exactly half of them (146) are already encoded in Unicode.
>
> Here you can find more about the project:
> www.youtube.com/watch?v=CHh2Ww_bdyQ
>
> And is a link to see the poster:
> https://shop.designinmainz.de/produkt/the-worlds-writing-systems-poster/
>
> All the best,
> Johannes
>
>
>
>
> ↪ Prof. Bergerhausen
>
> Hochschule Mainz, School of Design, Germany
>
> www.designinmainz.de
>
> www.decodeunicode.org
>


Re: Japan may not announce new era name until April 11

2018-12-14 Thread Phake Nick via Unicode
"Until April 11 or later". As in after certain commemoration ceremony that
will take place on April 10.
According to report, the reduction in notification period is meant to be a
concession to conservative legislators within the ruling party, that they
don't want to have such prior announcement at all.

2018-12-15 06:51, Craig, David O via Unicode  wrote:

> Note the recent article in the japan times:
>
>
>
>
> https://www.japantimes.co.jp/news/2018/12/06/national/politics-diplomacy/japan-mulls-announcing-new-era-name-april-11-sources/#.XBQPC6qWxD8
>
>
>
> April 11 leaves less than three weeks before the May 1 ascension.
>
>
>
>
>
>
>
> David Craig
>
>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread Phake Nick via Unicode
ah right that's it.

2018年3月5日 19:25 於 "James Kass"  寫道:

Phake Nick wrote,


> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ⿰牜爲

Isn't that U+246E8? "𤛨"


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread Phake Nick via Unicode
在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode  寫道:

> Hello John,
>
> On 2018/03/01 12:31, via Unicode wrote:
>
> > Pen, or brush and paper is much more flexible. With thousands of names
> > of people and places still not encoded I am not sure if I would describe
> > hans (simplified Chinese characters) as well supported. nor with current
> > policy which limits China with over one billion people to submitting
> > less than 500 Chinese characters a year on average, and names not being
> > all to be added, it is hard to say which decade hans will be well
> > supported.
>
> I think this contains several misunderstandings. First, of course
> pen/brush and paper are more flexible than character encoding, but
> that's true for the Latin script, too.
>

In latin script, as an example, I can simply name myself "Phake", but in
Chinese with current Unicode-based environment, it would not be possible
for me to randomly name myself using a character  ⿰牜爲 as I would like to.


> Second, while I have heard that people create new characters for naming
> a baby in a traditional Han context, I haven't heard about this in a
> simplified Han context. And it's not frequent at all, the same way
> naming a baby John in the US is way more frequent than let's say Qvtwzx.
> I'd also assume that China has regulations on what characters can be
> used to name a baby, and that the parents in this age of smartphone
> communication will think at least twice before giving their baby a name
> that they cannot send to their relatives via some chat app.
>

Traditional character versus simplified characters in this context is just
like Fraktur vs Antiqua. The way to write some components have been changed
and then there are also orthographical changes that make some characters no
longer comprise of same component, but they are still Chinese characters
and their usage are still unchanged. I believe there are regulations on
naming but that regulations would have be manmade to adopt to the
limitations of current computational system. Plus, once in a while I still
often heard about news that people are having difficulties in using e.g.
train booking system or banking systems due to characters that they are
using. (Although in many case those are encoded characters not supported by
system)


> Third, I cannot confirm or deny the "500 characters a year" limit, but
> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need
> to encode more characters, everybody would find a way to handle these.


> Due to the nature of your claims, it's difficult to falsify many of
> them. It would be easier to prove them (assuming they were true), so if
> you have any supporting evidence, please provide it.
>
> Regards,   Martin.
>
> > John Knightley
>
>


Re: IDC's versus Egyptian format controls

2018-02-20 Thread Phake Nick via Unicode
Actually, given that the IDS characters are confusing in term of some users
might expect it to show the composition while in other situations users
might expect them to be composited together, would it be a good idea to
encode a copy of IDS that is explicitly for the use of combining characters
while the original ODS can be left to show compositions?


Re: Why so much emoji nonsense?

2018-02-16 Thread Phake Nick via Unicode
2018-02-16 FRI 15:55, James Kass via Unicode  wrote:

> Pierpaolo Bernardi wrote:
>
> > But it's always a good time to argue against the addition of more
> > nonsense to what we already have got.
>
> It's an open-ended set and precedent for encoding them exists.
> Generally, input regarding the addition of characters to a repertoire
> is solicited from the user community, of which I am not a member.
>
> My personal feeling is that all of the time, effort, and money spent
> by the various corporations in promoting the emoji into Unicode would
> have been better directed towards something more worthwhile, such as
> the unencoded scripts listed at:
>
>  http://www.linguistics.berkeley.edu/sei/scripts-not-encoded.html
>
> ... but nobody asked me.
>

1. In UTS #51, it have been mentioned that embedded graphic is the way to
go as a longer term solution to emoji, in addition to emoji characters. But
then that would requires substantial infrastructure changes, and even then
in pure text environment they would most probably not be supported.

2. Actually, the problem is not just limited to emoji. Many Ideographic
characters (Chinese, Japanese, etc) are adding to the unicode each years,
while at the current rate there are still many rooms in Unicode standard to
contain them, it's still more open-ended than would be desired for a
multilingual encoding system, and the it also make it hard to expect newly
encoded ideographic characters to just "work" on different system with
sufficient font support. The situation that a character have to be encoded
into Unicode before they can be exchanged digitally have also limited
activities by users in term of creating new characters in ad hoc manner,
which is something that would probably happen in pre-digital era more
often. Different parties have proposed some solutions to dynamically
construct and use these characters as desired instead of relying on an
encoding mechanism but then they all seems to be so radically different
from modern computer infrastructure that they are not being adopted.

>


Re: Why so much emoji nonsense?

2018-02-15 Thread Phake Nick via Unicode
2018-02-16 10:46, "James Kass"  wrote

Phake Nick wrote,

> By the standard of "if one can't string word together that speak for
> themselves can use otger media", then we can scrap Unicode and simply use
> voice recording for all the purposes. →_→

Not for me, I can still type faster than I can talk.  Besides, voice
recordings are all about communicating by stringing words together.


There are thousands of situations where one would want to express something
in text form instead of voice form other than to be fast. Voice
communication isn't just about communicating "string of words" together.
Emotion and any other rhibgs are also transferred. That's also why carriers
are supporting HQ Voice transmission over telephony system for better
clarity in this aspect.


>> These are rhetorical questions.
>
> Tonal emoticon for telephone or voice transmission? There are tones for
> voice based transmission system
> And yes, there are limits in these technology which make teleconferencing
> still not all that popular and people still have to fly across the world
> just to attend all different sort of meetings.

At least, that's what they tell their accountants and tax people, right?

Then why do those people who pay for their own trip still do so?

> […]

2018-02-16 11:27, "James Kass via Unicode"  wrote:

If someone were to be smiling and shrugging while giving you the
finger, would you be smiling too?

Heck, I'd probably be laughing out loud while running for my life!
So, poor example.  OK.  A smiling creep is still a creep.

This is an example of extravocal communication. If the person was sayong
thankyou with smiling face while giving you a middle finger, it would be
totally different context from a regular thank you goven by other people.


Suppose for a moment that you and I are pals in the same room having a
face-to-face conversation.  I advise you that, due to unforeseen
events, I'm a bit financially strapped and could use a spot of cash to
sort of tide me over until my ship comes into orbit.  You smile and
nod your head while saying "no".  Which response applies?

Words suffice.  We go by what people actually say rather than whatever
they might have meant.  When we read text, we go by what's written.

Then, what would be the feeling of the listener if he onky hear you say no
but didn't know about your facial and body reaction? They might not be able
to grasp the pevep of no you are giving out, and you would want to use some
rather lengthy description to explain to the person why you want to reject
him. Why do that when a simple non-verbal expression is enough?

An inability to communicate any essential feelings and overtones using
words is not a gross failure of either language or writing.  It's more
about the skill levels of the speaker, listener, author, and reader.

https://en.wikipedia.org/wiki/Nonverbal_communication


As for the thread title question, perhaps the exchanges within the
thread offer insight.  Emoji exist and are interchanged.  Unicode
enables them to be interchanged in a standard fashion.  Even if
they're just for fun, frivolous, silly, and ephemeral.  Even if some
people consider them beyond the scope of The Unicode Standard.  The
best time to argue against the addition of emoji to Unicode would be
2007 or 2008, but you'd be wasting your time travel.  Trust me.

I would like to add that, if Unicode didn't include emoji at the time, then
I suspect many more systems will continue to use Shift-JIS instead.
Individual mobile phone carriers will continue to use each of their own
provate codepoints and app/platform developers either have to find a way to
convert between code point between different emoji being used (remember
implementation by each carriers don't strictly correspond to each other),
or invent yet another private use font to correspond to each of all those
emoji within their platform.


Re: Why so much emoji nonsense?

2018-02-15 Thread Phake Nick via Unicode
2018-02-16 04:55, "James Kass via Unicode"  wrote:

Ken Whistler replied to Erik Pedersen,

> Emoticons were invented, in large part, to fill another
> major hole in written communication -- the need to convey
> emotional state and affective attitudes towards the text.

There is no such need.  If one can't string words together which
'speak for themselves', there are other media.  I suspect that
emoticons were invented for much the same reason that "typewriter art"
was invented:  because it's there, it's cute, it's clever, and it's
novel.

By the standard of "if one can't string word together that speak for
themselves can use otger media", then we can scrap Unicode and simply use
voice recording for all the purposes. →_→


> This is the kind of information that face-to-face
> communication has a huge and evolutionarily deep
> bandwidth for, but which written communication
> typically fails miserably at.

Does Braille include emoji?  Are there tonal emoticons available for
telephone or voice transmission?  Does the telephone "fail miserably"
at oral communication because there's no video to transmit facial tics
and hand gestures?  Did Pontius Pilate have a cousin named Otto?
These are rhetorical questions.

Tonal emoticon for telephone or voice transmission? There are tones for
voice based transmission system
And yes, there are limits in these technology which make teleconferencing
still not all that popular and people still have to fly across the world
just to attend all different sort of meetings.


For me, the emoji are a symptom of our moving into a post-literate
age.  We already have people in positions of power who pride
themselves on their marginal literacy and boast about the fact that
they don't read much.  Sad!

Emoji is part of the literacy. Remember that Japanese writing system use
ideographic characters plus kana, it won't be odd to add yet another set of
pictographic writing system in line to express what you don't want to spell
out.


Re: 0027, 02BC, 2019, or a new character?

2018-01-23 Thread Phake Nick via Unicode
>I found the Windows 'US International' keyboard layout highly intuitive
>for accented Latin-1 characters.
How common is the US International keyboard in real life..?

Users would still need to manually add them in Windows, or in other
computing tools vendors would need to add support for "US International"
before they can be used

> Regular American users simply don't type umlauts, period.  Eccentric

Which is exactly why they aren't using unlauts.

> American users needing umlauts, such as foreign language students or
> heavy metal enthusiasts, generally find an easy way.  Practically
> everybody knows how to search the web.

How about, for example, a random tourist looking for info of random
Kazakhstan city? Will they know how to type umlaut in a city's name? Most
likely they'll simply type it without any umlaut and lost the distinction


Re: 0027, 02BC, 2019, or a new character?

2018-01-21 Thread Phake Nick via Unicode
It's probably still too difficult to input a character with umlaut for
general people in 2018, like the official Chinese romanization system used
the character "ü", but because it's so hard to be input or process many
people in many occasion just use "v" instead and more recently standarised
"yu" as a replacement for the character. There are language-dependent
keyboards for French or German with special keys or deadkeys that help
input these umlauts, but they are language dependent and it is not possible
for e.g. a regular American user using Windows to simply type them out, at
least not without prior knowledge about these umlauts.

2018-01-22 2:49 GMT+08:00 Richard Wordingham via Unicode <
unicode@unicode.org>:

> On Sun, 21 Jan 2018 13:49:46 +0100
> Philippe Verdy via Unicode  wrote:
>
> > But there's NO standard keyboard in Kazakhstan with the Latin
> > alphabet. Those you'll find are cyrillic keyboards with a way to type
> > basic Latin. Or keyboards made for other countries.
>
> I believe we're talking about physical keyboards here.  From the
> Wikipedia web page
> https://kk.wikipedia.org/wiki/%D0%9F%D0%B5%D1%80%D0%BD%D0%B5
> %D1%82%D0%B0%D2%9B%D1%82%D0%B0
> and the only credible pictures I can find -
> https://sabaqtar.kz/informatika/8876-pernetata-pernetatamen-tanysu.html
> (tolerable) and
> https://kaz.tengrinews.kz/gadgets/kazaksha-klaviatura-100-
> mektepte-syinaktan-ott-255562/
> (poor)
> - I beg to differ.  It seems that the available keyboards are labelled
> in Kazakh Cyrillic and US QWERTY.
>
> There is a different layout tagged as 'Kazakh national layout' at
> http://aitaber.kz/blog/komputer/3991.html - and again the keys are
> labelled for both writing systems.
>
> On-screen keyboards should not be an issue at all.
>
> So, what devices are you talking about?
>
> Richard.
>


Re: Traditional and Simplified Han in UTS 39

2017-12-27 Thread Phake Nick via Unicode
2017年12月28日 上午5:34 於 "Karl Williamson via Unicode"  寫道:
>
> In UTS 39, it says, that optionally,
>
> "Mark Chinese strings as “mixed script” if they contain both simplified
(S) and traditional (T) Chinese characters, using the Unihan data in the
Unicode Character Database [UCD].
>
> "The criterion can only be applied if the language of the string is known
to be Chinese."
>
> What does it mean for the language to "be known to be Chinese"?
As in, the string is written in Chinese language, not Japanese language,
not old Korean/Vietnamese text that use Chinese character, nor any other
languages that use Chinese characters.
According to my knowledge, some Chinese dialects/variants also use both
Simplified and Traditional characters together with different etymology and
that probably shouldn't be considered as mixed script too, although they
aren't really common and is not mentioned in the UTS either.

> Is this something algorithmically determinable, or does it come from
information about the input text that comes from outside the UCD?
>
> The example given shows some Hirigana in the text.  That clearly
indicates the language isn't Chinese.  So in this example we can
algorithmically rule out that its Chinese.

Usually when there are Japanese kana in the mix then the text would be
Japanese instead of Chinese. However the reverse is not necessarily true,
especially for a single word or short pharse, older styled text and such,
where a string with only Chinese characters can still be a Japanese text.

>
> And what does Chinese really mean here?
>
The written form of the (Mandarin) Chinese language?


Re: Plane-2-only string

2017-11-13 Thread Phake Nick via Unicode
Perhaps the http://en.wikipedia.org/wiki/Martian_language should be
considered as a way to construct an example Chinese sentence from
characters that are only within Plane2? Probably coukd be understand by
more people than something Cantonese too


Re: ASCII v Unicode

2017-11-03 Thread Phake Nick via Unicode
The entire Unicode can also be printed onto a single page if you use a very
huge paper coupled with smaller font size! ​I think a football field sized
paper could possibly do the job?

2017-11-03 19:29 GMT+08:00 Andre Schappo via Unicode :

>
> On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode 
> wrote:
>
> On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote:
>
>
> You may find https://twitter.com/andreschappo/status/926163719331176450 
> amusing
> 😀
>
> André Schappo
>
> You're wildly off in your page count.
>
> The "book" part of Unicode (Core Specification) alone is 1,500 pages. I
> haven't looked at the single file code charts in a while, but I believe you
> get at least that number again. Then add the dozen or so "Annexes" for a
> few hundred additional pages and be happy that nobody prints the Unicode
> Character Database (or the Unihan Database for that matter).
>
> A./
>
>
> Yes, I agree, my page count is much lower than it should be for Unicode,
> if I was being literal. I was being figurative rather than literal. I was
> just making a point to the ASCII developers/programmers and ASCII Academics
> 😀
>
> Prior to tweeting I did consider other numbers. My considerations included
> 1000, 5000 and 1. But in my mind "Unicode is a 500 page book" seemed to
> flow better. I don't know why.
>
> Actually, it probably for the best that I wrote "500 page" because
> otherwise ASCII developers/programmers and ASCII Academics would not even
> start reading the Unicode book if they thought it was (say) 5000 pages long.
>
> Let's now look at it literally and here is a template "Unicode is a X page
> book".
>
> My guess would be "Unicode is a 1+ page book"
>
> Anyone care to estimate X?
>
> André Schappo
>
>
>


Encoding of character for new Japanese era name after Heisei

2017-06-02 Thread Phake Nick via Unicode
Nowadays Unicode have encoded four characters, from U+337E to U+337B, as
character for the four most recent Japanese era name, which people are
using them quite a lot. In recent months, The intention for Japanese
emperor to resign from the duty have been announced and Japan is expected
to get a new era name together with the new emperor. It can be expected
that people would want to type a single character for the new era name just
like how people typed old era names now. However, with the new era name
cominh into effect in Jan 1 2019 and the name of the new Japanese era is
expected to be announced only half years ahead of the use of the character,
how will Unicode handle the new era name?
According to recent years Unicode release schedule, the announcement time
will only be a few weeks before the official release of Unicode 11.0, and
way passed the time of the beta. Is it possible for the character to be
included in Unicode 11.0, or a 11.0.1 released some dates after? We won't
know what the shape of the glyph would be until the era name being
announced and as the era name itself is included in the unicode character
description in past example, it is also not possible to come up with a name
for the expected new character before the era name actually get announced,
which mean if by usually process then an application cannot really start
until the era name announcement have been made. Is there some methods to
apply for inclusion of a character into Unicode without actually knowing
what the character would be?
Or if it's really too difficult to encode the character within the little
amount of time ahead of the era's start, would it be possible to first
reserve some codepoints for encoding of upcoming Japanese era, so that
people can know what code point they will be using instead of using PUA?


Re: Petition to ban Google from designing emoji

2017-05-18 Thread Phake Nick via Unicode
Is it possible to introduce variation selector for emoji with large design
variation among vendors so that when users send emoji with selectors their
variation among vendors can be minimized by asking vendors to support both
versions?