Re: Preliminary proposal to encode Unifon in the UCS.

2012-05-31 Thread Jean-François Colson

Hello

I wrote: “1st possibility: a separate script. There’ll be no problem.”
You wrote: “There would, because the bulk of the script would look just 
like Latin, and the encoding committees consider this to be a security 
issue for internet spoofing for instance.”

I don’t understand.
Internet spoofing would be possible for example by mixing Latin and 
Cyrillic letters in internationalized domain names. For example, instead 
of paypal.com, you could take advantage of the fact that the first five 
letters all have looking alike Cyrillic letters and register one of the 
31 (2⁵-1) DIFFERENT domain names paypаl.com, payрal.com, payраl.com, 
paуpal.com, paуpаl.com, paурal.com, paураl.com, pаypal.com, pаypаl.com, 
pаyрal.com, pаyраl.com, pауpal.com, pауpаl.com, pаурal.com, pаураl.com, 
рaypal.com, рaypаl.com, рayрal.com, рayраl.com, рaуpal.com, рaуpаl.com, 
рaурal.com, рaураl.com, раypal.com, раypаl.com, раyрal.com, раyраl.com, 
рауpal.com, рауpаl.com, раурal.com or раураl.com to ask their paypal 
e-mail and password to your “customers”. That could only work if the 
said customer is very distracted or if he has previously typed 
“about:config” in the address bar and set network.IDN_show_punycode to 
false. (That works with Firefox. The way to do it could be different 
with other browsers.)
But, as far as I know, the domain names are commonly written in 
lowercase. When I type in capital a domain name which doesn’t exist, 
such as CUYOPUIESVRDKRSIXTVESVRDSHKSE.com, it is automatically converted 
in lowercase (http://www.cuyopuiesvrdkrsixtvesvrdshkse.com/) before the 
“not found” message is displayed.
In Unifon, only the capital letters would look alike. The lowercase 
letters would be different. There could be a problem with the letter o, 
but that would be a drop in the ocean, not more problematic than the 
letter ᴏ (small capital o), ο (Greek omicron), о (Cyrillic o), ⲟ (Coptic 
o), Ь (Deseret o), ჿ (Georgian labial sign), ੦ (Gurmukhi zero), all the 
zeros, most of which look like circles, etc.
What exactly is the real security issue with Unifon as a separate 
script? Some one who wants to spoof will find a way to do it without that.






NOW, a few comments about the Unifon proposal.

You didn’t correct “for several the Hupa, Yurok, Tolowa, and Karok 
languages”.

There’s also the word “Karok”. Below, you write “Karuk”.

In the Unifon letters unified with existing characters, you forgot the 
letter I.


You propose a Latin capital letter small capital i to be paired with ɪ 
(Latin letter small capital i). Would ɪ have wider serifs when displayed 
in small caps?


For the Latin capital beta, you wrote: “The unique Latin capital form 
meets one of the major criteria for disunification.”
Could I use the same formula for Unifon? The unique Unifon small forms 
meet one of the major criteria for disunification…


In the previous proposal, you also included a letter which looked a 
little like a ƆC ligature or a rounded X. You called it zhay in n4195. 
Have you forgotten it deliberately? That’s the last letter in figure 1, 
although you wrote X in the caption.


You also used an X in Figure 7’s caption: it would be strange to have an 
X pronounced /ʒ/ (zh) in a phonemic alphabet for English.


In the first three columns of the table at page 12, the two parts of 
Latin letter oy are detached. In all samples of Unifon I’ve seen which 
use that letter, the vertical line of the turned Ⱶ is tangent to the 
right of the O.


In the same table, the Latin letter dhe should have a round shape. 
That’s one of the two features which permit to distinguish it from the 
Latin letter the.
In all Unifon fonts I know except one, the left part of the letter dhe 
is not really a T but something midway between a T and a Γ.


I think Latin letter the should have a small top bar.

In this table of the Tolowa Unifon alphabet, 
http://unifon.org/images/TOLOWA.jpg , some letters have a different 
value when followed by a small stroke which looks like an apostrophe. 
Should it be an ASCII apostrophe, a ’ (U+2019), a ʼ (U+02BC), a Ꞌ 
(saltillo) or something else?


On page 3, the capital ʃ looks like an enlarged form of the lowercase 
letter, different from the Greek capital sigma-like Ʃ. Would the unique 
Latin capital form meets one of the major criteria for disunification. 
What about the capital U with a tail?


I wonder whether the 8th letter of the 42-letter “Indian Unifon 
Single-Sound Alphabet” is a turned or a reversed C.


For the turned e-r, I think a new lower case is needed.

For the Latin letter reversed-e e, could the double ϵ, used for the same 
sound in the Initial Teaching Alphabet, be used as a lower case letter?


Would a separate proposal be required for the Initial Teaching Alphabet 
(http://en.wikipedia.org/wiki/Initial_Teaching_Alphabet)?

28 or 29 letters of this 44 letter alphabet are already supported:
b, c, d, f, ɡ, h, j, k, l, m, n are already supported.
ng ligature is different from ŋ.
p, r, s are already

RE: Preliminary proposal to encode Unifon in the UCS.

2012-05-30 Thread Doug Ewell
Michael Everson everson at evertype dot com wrote:

 “10a. Can any of the proposed character(s) be considered to be
 similar (in appearance or function) to an existing character?”
 “No.”
 I’m a little surprised. If the 2nd possibility was envisioned, isn’t
 it because many Unifon letters are similar in appearance and often in
 function with some capital Latin letters?

 I didn't bother with that in an exploratory proposal.

N4262 says the same, and so do practically all proposal forms in
response to that question, no matter how similar any of the characters
are to others in appearance or function. I think authors know it's a big
red flag if they say Yes.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell 






Re: Preliminary proposal to encode Unifon in the UCS.

2012-05-30 Thread Michael Everson
On 30 May 2012, at 20:46, Doug Ewell wrote:

 N4262 says the same, and so do practically all proposal forms in response to 
 that question, no matter how similar any of the characters are to others in 
 appearance or function. I think authors know it's a big red flag if they say 
 Yes.

That, or we don't really care about any but the lines in the form which are 
actually looked at when a script is discussed in WG2, namely the block name and 
character count. 

Michael Everson * http://www.evertype.com/





Re: Preliminary proposal to encode Unifon in the UCS.

2012-05-30 Thread Benjamin M Scarborough
I do have a few comments and questions I'd like to make about N4262.

αʹ) I think LATIN LETTER TURNED-E R should be disunified from U+025A LATIN 
LETTER SCHWA WITH HOOK. I don't think the identity of the new capital character 
matches the established identity of U+025A. Of the five glyphs provided for 
LATIN SMALL LETTER TURNED-E R, I think the first one is the best choice. The 
second glyph resembles ɚ too closely (confusable!), and the other three use a 
small capital r which doesn't seem fitting.

βʹ) Should the glyph for LATIN SMALL LETTER CHE extend below the baseline, like 
in the Metelko alphabet? Obviously this doesn't matter for Unifon, where the 
character will appear as a small capital anyway. However, this could make it 
look too similar to U+0265 LATIN SMALL LETTER TURNED H.

γʹ) On page 7, there are two characters that derive from earlier versions of 
Unifon. The letter on the right is clearly U+023D LATIN CAPITAL LETTER L WITH 
BAR, but the character on the left is discussed nowhere else in the document. 
What is it? I honestly can't tell.

δʹ) In the Lepsius text example on page 5, on the sixth line I see a 
delta-looking symbol. I assume this is U+1E9F LATIN SMALL LETTER DELTA. Since 
this is normally-cased text, is there any evidence of a LATIN CAPITAL LETTER 
DELTA, or is this particular letter just an anomaly?

εʹ) LATIN LETTER OVERTURNED WINEGLASS stands out to me as an odd character 
name. I know that a few other characters, such as U+0264 LATIN SMALL LETTER 
RAMS HORN, have such illustrative names, but this still seems like an odd name 
choice to me. However, I cannot think of a more fitting name.

ϛʹ) The only Unifon alphabets that use LATIN LETTER TLE put it at the very 
beginning of the alphabet. Will the finished proposal sort TLE before A? Could 
this have a negative impact on collation? (I notice that N4262 does not address 
the issue of collation for any character.)

That's all I can think of for now.

—Ben Scarborough




Re: Preliminary proposal to encode Unifon in the UCS.

2012-05-30 Thread Benjamin M Scarborough
Actually, I just noticed that Hupa and Yurok have TLE sorted after Y, so point 
ϛʹ is moot.

—Ben Scarborough




Re: Unifon

2012-05-29 Thread Jean-François Colson

Le 29/05/12 06:57, Benjamin M Scarborough a écrit :

On May 28, 2012, at 01:52, Michael Everson wrote:
There are many blorts. I've discovered some working with Unifon. I 
haven't exactly had much support from the UTC with what I've 
discovered. I've found the usual posturing about possible 
unifications with other scripts.


I went in saying, well, we could do this like Lisu, which none of you 
will like. And that was true eniough. So I did it the unification way 
as was agreeed one UTC, but then I get push-back about the encoding 
model and isn't the script dead and more of that.
Dead script? Wasn't it still seeing use in the 1980:s? And why would 
being a dead script be a problem? The UCS is full of characters with 
little to no contemporary use (at least not for authoring new 
documents). Sure, if this was still the era when we were limited to 
65,536 code points, it would be a big concern, but this is the 
1,114,112-code-point era. There is plenty of space.


Maybe you should propose the characters for the SMP. It worked for 
Deseret, right? And last I saw Deseret's useful lifespan ended before 
1900. I bet even the English Phonotypic Alphabet would get accepted if 
it were proposed for the SMP instead of the BMP. You could call the 
block Latin Extended-F, since there are plenty of letters left in 
that series.


And I think unifying Unifon with Latin is a good idea. In Unifon I see 
ABȻDEFGHIJKLMNOPRSTUVWYƵ all being used in familiar ways that don't 
seem at all unusual for a Latin-based script.


But that's just me.

—Ben Scarborough


Unification is a good idea while you use only the capital Unifon. But it 
seems cased Unifon has lowercase letters which look like small capitals 
and therefore, in my opinion, the unification with Latin would only 
provide a partial solution: every texts in Unifon which contain 
lowercase letters should be marked as small caps or special fonts would 
be used.
I think the best way to encode Unifon would be as a new script, in SMP. 
After all, in the 1,114,112-code-point era, is it so important to save 
50 code-points with a weird unification?


Another possibility, if the unification is chosen, would be to add a 
variation selector to each Unifon letter to express that the lowercase 
letters are different. Would that be possible?


JF



Unifon (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

2012-05-28 Thread Benjamin M Scarborough
On May 28, 2012, at 01:52, Michael Everson wrote:
 There are many blorts. I've discovered some working with Unifon. I haven't 
 exactly had much support from the UTC with what I've discovered. I've found 
 the usual posturing about possible unifications with other scripts.
 
 I went in saying, well, we could do this like Lisu, which none of you will 
 like. And that was true eniough. So I did it the unification way as was 
 agreeed one UTC, but then I get push-back about the encoding model and isn't 
 the script dead and more of that. 

Dead script? Wasn't it still seeing use in the 1980:s? And why would being a 
dead script be a problem? The UCS is full of characters with little to no 
contemporary use (at least not for authoring new documents). Sure, if this was 
still the era when we were limited to 65,536 code points, it would be a big 
concern, but this is the 1,114,112-code-point era. There is plenty of space.

Maybe you should propose the characters for the SMP. It worked for Deseret, 
right? And last I saw Deseret's useful lifespan ended before 1900. I bet even 
the English Phonotypic Alphabet would get accepted if it were proposed for the 
SMP instead of the BMP. You could call the block Latin Extended-F, since 
there are plenty of letters left in that series.

And I think unifying Unifon with Latin is a good idea. In Unifon I see 
ABȻDEFGHIJKLMNOPRSTUVWYƵ all being used in familiar ways that don't seem at all 
unusual for a Latin-based script.

But that's just me.

—Ben Scarborough




Re: Unifon

2011-07-04 Thread Doug Ewell

Karl Pentzlin wrote:

CP In conclusion, most of this should probably be handled at the 
(smart) font level.


Today, many not yet encoded characters (Latin-like and others)
can be approximately represented by smart font technology.
...
However, doing such is hiding the identity of characters


I think Christoph was saying these ARE the same characters as the 
already-encoded ones, with the same identity but a slightly different 
look.  This is not at all the same as using ASCII code points for Greek 
letters.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­ 





Re: Unifon

2011-07-04 Thread Michael Everson
On 4 Jul 2011, at 14:54, Doug Ewell wrote:

 I think Christoph was saying these ARE the same characters as the 
 already-encoded ones, with the same identity but a slightly different look.  
 This is not at all the same as using ASCII code points for Greek letters.

There's also such a thing as over-unification, though.

Michael Everson * http://www.evertype.com/





Re: Unifon

2011-07-04 Thread Doug Ewell

Michael Everson wrote:


There's also such a thing as over-unification, though.


Right, and I'm not arguing for or against unifying Unifon with Latin, or 
indeed for or against encoding it at all.  I just don't think the glyph 
variations Christoph was describing were tantamount to hiding totally 
different characters behind a font hack.


Perhaps the use of the term smart font was unfortunate, as it might 
evoke the type of Latin/Greek hack Karl mentioned.


I do worry about encoding even more letters that are intended to look 
identical to Basic Latin letters, because of the spoofing issue.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­ 





Re: Unifon

2011-07-03 Thread Christoph Päper
Karl Pentzlin:
 Attached is a Unifon chart as used for Hupa, according to
 http://eric.ed.gov/PDFS/ED286691.pdf , p. 12.

That’s it? Looks like diacritics to me, combined with some typographic 
preferences and a changed collation sequence perhaps.

a/A with preferred typographic uppercase rendering akin Delta Δ
b/B
c/C
ɔ/Ɔ 0254/0186
d/D
e/E
i/I
j/J
g/G
h/H
i̵/I̵   +0335, with typographic preference for vertical serifs on the bar
i̯/I̯   +032F, or mandatory ai/AI digraph ligature, e.g. aͥ/Aͥ (+0365)
k/K
l/L
m/M
n/N
o/O
o̲/O̲   +0332
o⃒/O⃒   +20D2, or ø/Ø with typographic preference for vertical line
o⃓/O⃓   +20D3, or q/Q or mandatory ao/AO(?) digraph ligature
ƣ/Ƣ 01A3/01A2, or mandatory oi/OI digraph ligature, e.g. oͥ/Oͥ (+0365)
ŋ/Ŋ 014B/014A, or new letter Latin Capital Letter Reversed N
s/S
t/T
u/U
ū/Ū 016B/016A, or ū/Ū (+0304)
w/W
y/Y
/   not sure whether H-based, O-based or neither
x/X
z/Z or ƶ/Ƶ (01B6/01B5)
x̄/X̄   +0304

In conclusion, most of this should probably be handled at the (smart) font 
level.



Re: Unifon

2011-07-03 Thread Karl Pentzlin
Am Sonntag, 3. Juli 2011 um 18:13 schrieb Christoph Päper:

CP In conclusion, most of this should probably be handled at the (smart) font 
level.

Today, many not yet encoded characters (Latin-like and others)
can be approximately represented by smart font technology.
(See e.g. http://www.dkuug.dk/JTC1/SC2/WG2/docs/n4047.pdf
 which contains many ideas ideas how to mimic metrical symbols by
 diacritical marks).
However, doing such is hiding the identity of characters, and making
the correct reading of texts dependent of the use of specific fonts.
This is a fallback into the 1980s when e.g. Greek fonts were developed
which used the ASCII codepoints.
Also, this enables a possible correct reading only to human readers,
not to data processing systems like searching, or storing in databases
from where text can be retrieved in environments preferring other fonts.

We talk of character encoding here. That means, in first line, we have
to decide whether a written thing has an identity qualifying it as a
character, before we consider smart tricks to represent its graphic
appearance by a modified use of existing characters.

Smart font technology, as it has developed now, in fact is a
mighty tool.
But this does not mean that everybody who can use such a hammer
should regard every problem as a nail.

- Karl




Unifon

2011-06-28 Thread Jean-François Colson
I’m interested in Unifon (http://www.unifon.org). That’s a phonemic
alphabet for English which is used to teach reading.
Although it has been encoded in the ConScript Unicode Registry as a new
script in a three-columns block, it has in fact been designed as an
extension of the Latin alphabet.
Therefore, considering that three fifths of its letters are already
available, I wonder whether a proposal shouldn’t be limited to the 16
missing letters.
What’s your opinion?



Re: Unifon

2011-06-28 Thread Karl Pentzlin
Am Dienstag, 28. Juni 2011 um 09:43 schrieb Jean-François Colson:

JFC I’m interested in Unifon (http://www.unifon.org).

The first issue with Unifon is whether it is to be encoded at all.
Given that it is a stable system since its design in the 1950s, and
that references to it are found quite often, the answer probably is
yes. But the case has to be made, providing evidence.

Then, it seems appropriate to consider it is a script separate from Latin,
like Lisu http://www.unicode.org/charts/PDF/UA4D0.pdf .
Otherwise, we end up with a number of uppercase Latin letters
with no lowercase counterpart. This would be a problem due to
Unicode stability policies, which do not allow to encode a lowercase
counterpart later for an already encoded uppercase letter.

- Karl





Re: Unifon

2011-06-28 Thread Andreas Stötzner

Am 28.06.2011 um 09:43 schrieb Jean-François Colson:

 I’m interested in Unifon (http://www.unifon.org). That’s a phonemic alphabet 
 for English which is used to teach reading.
 Although it has been encoded in the ConScript Unicode Registry as a new 
 script in a three-columns block, it has in fact been designed as an extension 
 of the Latin alphabet.
 Therefore, considering that three fifths of its letters are already 
 available, I wonder whether a proposal shouldn’t be limited to the 16 missing 
 letters.
 What’s your opinion?
 

Is there a real need for regular encoding? 
If proposed as kind of extension to Latin there will be one issue at least to 
be considered carefully: Unifon does not fit the Latin Writing system since it 
is unicameral, not bicameral (as far as I can see).
By which I doubtlessly not intend at all to encourage any of the enthusiasts to 
think they ought now go to their desks and try to invent new lowercase glyphs.
 

Mit freundlichen Grüßen,

Andreas Stötzner.







»Der Bundestag möge beschließen, sich umfassend gegen den geplanten künftigen 
Europäischen Stabilitätsmechanismus – ESM – auszusprechen.«
https://epetitionen.bundestag.de/index.php?action=petition;sa=details;petition=18123
_

Andreas Stötzner   
Gestaltung Signographie Fontentwicklung

Wilhelm-Plesse-Straße 32, 04157 Leipzig
0152-08336058



Re: Unifon

2011-06-28 Thread Asmus Freytag

On 6/28/2011 1:40 AM, Andreas Stötzner wrote:


Am 28.06.2011 um 09:43 schrieb Jean-François Colson:

I’m interested in Unifon (http://www.unifon.org). That’s a phonemic 
alphabet for English which is used to teach reading.
Although it has been encoded in the ConScript Unicode Registry as a 
new script in a three-columns block, it has in fact been designed as 
an extension of the Latin alphabet.
Therefore, considering that three fifths of its letters are already 
available, I wonder whether a proposal shouldn’t be limited to the 16 
missing letters.

What’s your opinion?



Is there a real need for regular encoding?
If proposed as kind of extension to Latin there will be one issue at 
least to be considered carefully: Unifon does not fit the Latin 
Writing system since it is unicameral, not bicameral (as far as I can 
see).


Same restriction applies to IPA and phonetic notations, all of which 
have been unified with Latin as far as common letters are concerned.
By which I doubtlessly not intend at all to encourage any of the 
enthusiasts to think they ought now go to their desks and try to 
invent new lowercase glyphs.





More relevant would be who uses this system, where and how widely.

The answer to those questions decides, among others, whether any 
standardization effort is warranted.


A./


Re: Unifon

2011-06-28 Thread Doug Ewell
Karl Pentzlin karl dash pentzlin at acssoft dot de wrote:

 Then, it seems appropriate to consider it is a script separate from
 Latin, like Lisu http://www.unicode.org/charts/PDF/UA4D0.pdf .
 Otherwise, we end up with a number of uppercase Latin letters
 with no lowercase counterpart. This would be a problem due to
 Unicode stability policies, which do not allow to encode a lowercase
 counterpart later for an already encoded uppercase letter. 

Assuming that there is a use case to encode Unifon at all, I take this
to mean that encoding the missing (uppercase) Unifon letters as Latin
might trigger a defensive reaction to encode unattested or newly
invented lowercase equivalents.  I hope this is not the effect that the
stability policy is having.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­






Re: Unifon

2011-06-28 Thread Bill Poser
Unifon was used at one point to write several languages in northern
California, so it has seen practical application. I'm not sure how much
material was published in this form. I don't think that any of these tribes
is still using Unifon.


Re: Unifon

2011-06-28 Thread Jean-François Colson

On 28/06/11 19:22, Bill Poser wrote:
Unifon was used at one point to write several languages in northern 
California, so it has seen practical application. I'm not sure how 
much material was published in this form. I don't think that any of 
these tribes is still using Unifon.
You’re right. Unifon has been used by the Yurok, Karuk, Tolowa and Hupa 
in the 70’s and the 80’s IIRC. Now, they have switched to writing 
systems based on the Latin alphabet. I’ve been told that several books 
have been printed in their languages using Unifon. However, a few 
letters have changed since then.





Re: Unifon

2011-06-28 Thread Bill Poser
Unifon was used for Hupa only, I think, for some materials prepared by Ruth
Bennett. Most if not all of these can be found in the ERIC database:
http://eric.ed.gov/ERICWebPortal/search/simpleSearch.jsp?newSearch=trueeric_sortField=searchtype=basicpageSize=10ERICExtSearch_SearchValue_0=Hupaeric_displayStartCount=1_pageLabel=ERICSearchResultERICExtSearch_SearchType_0=kwNone
of the more recent material in Hupa is in Unifon.

On Tue, Jun 28, 2011 at 11:05 AM, Jean-François Colson j...@colson.eu wrote:

 On 28/06/11 19:22, Bill Poser wrote:

 Unifon was used at one point to write several languages in northern
 California, so it has seen practical application. I'm not sure how much
 material was published in this form. I don't think that any of these tribes
 is still using Unifon.

 You’re right. Unifon has been used by the Yurok, Karuk, Tolowa and Hupa in
 the 70’s and the 80’s IIRC. Now, they have switched to writing systems based
 on the Latin alphabet. I’ve been told that several books have been printed
 in their languages using Unifon. However, a few letters have changed since
 then.





Re: Unifon

2011-06-28 Thread Bill Poser
Here is a document by Bennett that describes the use of Unifon for Hupa,
Tolowa, Yurok and
Karok:http://eric.ed.gov/ERICWebPortal/contentdelivery/servlet/ERICServlet?accno=ED310889

On Tue, Jun 28, 2011 at 11:05 AM, Jean-François Colson j...@colson.eu wrote:

 On 28/06/11 19:22, Bill Poser wrote:

 Unifon was used at one point to write several languages in northern
 California, so it has seen practical application. I'm not sure how much
 material was published in this form. I don't think that any of these tribes
 is still using Unifon.

 You’re right. Unifon has been used by the Yurok, Karuk, Tolowa and Hupa in
 the 70’s and the 80’s IIRC. Now, they have switched to writing systems based
 on the Latin alphabet. I’ve been told that several books have been printed
 in their languages using Unifon. However, a few letters have changed since
 then.