Re: Preliminary proposal to encode Unifon in the UCS.
Hello I wrote: “1st possibility: a separate script. There’ll be no problem.” You wrote: “There would, because the bulk of the script would look just like Latin, and the encoding committees consider this to be a security issue for internet spoofing for instance.” I don’t understand. Internet spoofing would be possible for example by mixing Latin and Cyrillic letters in internationalized domain names. For example, instead of paypal.com, you could take advantage of the fact that the first five letters all have looking alike Cyrillic letters and register one of the 31 (2⁵-1) DIFFERENT domain names paypаl.com, payрal.com, payраl.com, paуpal.com, paуpаl.com, paурal.com, paураl.com, pаypal.com, pаypаl.com, pаyрal.com, pаyраl.com, pауpal.com, pауpаl.com, pаурal.com, pаураl.com, рaypal.com, рaypаl.com, рayрal.com, рayраl.com, рaуpal.com, рaуpаl.com, рaурal.com, рaураl.com, раypal.com, раypаl.com, раyрal.com, раyраl.com, рауpal.com, рауpаl.com, раурal.com or раураl.com to ask their paypal e-mail and password to your “customers”. That could only work if the said customer is very distracted or if he has previously typed “about:config” in the address bar and set network.IDN_show_punycode to false. (That works with Firefox. The way to do it could be different with other browsers.) But, as far as I know, the domain names are commonly written in lowercase. When I type in capital a domain name which doesn’t exist, such as CUYOPUIESVRDKRSIXTVESVRDSHKSE.com, it is automatically converted in lowercase (http://www.cuyopuiesvrdkrsixtvesvrdshkse.com/) before the “not found” message is displayed. In Unifon, only the capital letters would look alike. The lowercase letters would be different. There could be a problem with the letter o, but that would be a drop in the ocean, not more problematic than the letter ᴏ (small capital o), ο (Greek omicron), о (Cyrillic o), ⲟ (Coptic o), Ь (Deseret o), ჿ (Georgian labial sign), ੦ (Gurmukhi zero), all the zeros, most of which look like circles, etc. What exactly is the real security issue with Unifon as a separate script? Some one who wants to spoof will find a way to do it without that. NOW, a few comments about the Unifon proposal. You didn’t correct “for several the Hupa, Yurok, Tolowa, and Karok languages”. There’s also the word “Karok”. Below, you write “Karuk”. In the Unifon letters unified with existing characters, you forgot the letter I. You propose a Latin capital letter small capital i to be paired with ɪ (Latin letter small capital i). Would ɪ have wider serifs when displayed in small caps? For the Latin capital beta, you wrote: “The unique Latin capital form meets one of the major criteria for disunification.” Could I use the same formula for Unifon? The unique Unifon small forms meet one of the major criteria for disunification… In the previous proposal, you also included a letter which looked a little like a ƆC ligature or a rounded X. You called it zhay in n4195. Have you forgotten it deliberately? That’s the last letter in figure 1, although you wrote X in the caption. You also used an X in Figure 7’s caption: it would be strange to have an X pronounced /ʒ/ (zh) in a phonemic alphabet for English. In the first three columns of the table at page 12, the two parts of Latin letter oy are detached. In all samples of Unifon I’ve seen which use that letter, the vertical line of the turned Ⱶ is tangent to the right of the O. In the same table, the Latin letter dhe should have a round shape. That’s one of the two features which permit to distinguish it from the Latin letter the. In all Unifon fonts I know except one, the left part of the letter dhe is not really a T but something midway between a T and a Γ. I think Latin letter the should have a small top bar. In this table of the Tolowa Unifon alphabet, http://unifon.org/images/TOLOWA.jpg , some letters have a different value when followed by a small stroke which looks like an apostrophe. Should it be an ASCII apostrophe, a ’ (U+2019), a ʼ (U+02BC), a Ꞌ (saltillo) or something else? On page 3, the capital ʃ looks like an enlarged form of the lowercase letter, different from the Greek capital sigma-like Ʃ. Would the unique Latin capital form meets one of the major criteria for disunification. What about the capital U with a tail? I wonder whether the 8th letter of the 42-letter “Indian Unifon Single-Sound Alphabet” is a turned or a reversed C. For the turned e-r, I think a new lower case is needed. For the Latin letter reversed-e e, could the double ϵ, used for the same sound in the Initial Teaching Alphabet, be used as a lower case letter? Would a separate proposal be required for the Initial Teaching Alphabet (http://en.wikipedia.org/wiki/Initial_Teaching_Alphabet)? 28 or 29 letters of this 44 letter alphabet are already supported: b, c, d, f, ɡ, h, j, k, l, m, n are already supported. ng ligature is different from ŋ. p, r, s are already
RE: Preliminary proposal to encode Unifon in the UCS.
Michael Everson everson at evertype dot com wrote: “10a. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character?” “No.” I’m a little surprised. If the 2nd possibility was envisioned, isn’t it because many Unifon letters are similar in appearance and often in function with some capital Latin letters? I didn't bother with that in an exploratory proposal. N4262 says the same, and so do practically all proposal forms in response to that question, no matter how similar any of the characters are to others in appearance or function. I think authors know it's a big red flag if they say Yes. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Preliminary proposal to encode Unifon in the UCS.
On 30 May 2012, at 20:46, Doug Ewell wrote: N4262 says the same, and so do practically all proposal forms in response to that question, no matter how similar any of the characters are to others in appearance or function. I think authors know it's a big red flag if they say Yes. That, or we don't really care about any but the lines in the form which are actually looked at when a script is discussed in WG2, namely the block name and character count. Michael Everson * http://www.evertype.com/
Re: Preliminary proposal to encode Unifon in the UCS.
I do have a few comments and questions I'd like to make about N4262. αʹ) I think LATIN LETTER TURNED-E R should be disunified from U+025A LATIN LETTER SCHWA WITH HOOK. I don't think the identity of the new capital character matches the established identity of U+025A. Of the five glyphs provided for LATIN SMALL LETTER TURNED-E R, I think the first one is the best choice. The second glyph resembles ɚ too closely (confusable!), and the other three use a small capital r which doesn't seem fitting. βʹ) Should the glyph for LATIN SMALL LETTER CHE extend below the baseline, like in the Metelko alphabet? Obviously this doesn't matter for Unifon, where the character will appear as a small capital anyway. However, this could make it look too similar to U+0265 LATIN SMALL LETTER TURNED H. γʹ) On page 7, there are two characters that derive from earlier versions of Unifon. The letter on the right is clearly U+023D LATIN CAPITAL LETTER L WITH BAR, but the character on the left is discussed nowhere else in the document. What is it? I honestly can't tell. δʹ) In the Lepsius text example on page 5, on the sixth line I see a delta-looking symbol. I assume this is U+1E9F LATIN SMALL LETTER DELTA. Since this is normally-cased text, is there any evidence of a LATIN CAPITAL LETTER DELTA, or is this particular letter just an anomaly? εʹ) LATIN LETTER OVERTURNED WINEGLASS stands out to me as an odd character name. I know that a few other characters, such as U+0264 LATIN SMALL LETTER RAMS HORN, have such illustrative names, but this still seems like an odd name choice to me. However, I cannot think of a more fitting name. ϛʹ) The only Unifon alphabets that use LATIN LETTER TLE put it at the very beginning of the alphabet. Will the finished proposal sort TLE before A? Could this have a negative impact on collation? (I notice that N4262 does not address the issue of collation for any character.) That's all I can think of for now. —Ben Scarborough
Re: Preliminary proposal to encode Unifon in the UCS.
Actually, I just noticed that Hupa and Yurok have TLE sorted after Y, so point ϛʹ is moot. —Ben Scarborough
Re: Unifon
Le 29/05/12 06:57, Benjamin M Scarborough a écrit : On May 28, 2012, at 01:52, Michael Everson wrote: There are many blorts. I've discovered some working with Unifon. I haven't exactly had much support from the UTC with what I've discovered. I've found the usual posturing about possible unifications with other scripts. I went in saying, well, we could do this like Lisu, which none of you will like. And that was true eniough. So I did it the unification way as was agreeed one UTC, but then I get push-back about the encoding model and isn't the script dead and more of that. Dead script? Wasn't it still seeing use in the 1980:s? And why would being a dead script be a problem? The UCS is full of characters with little to no contemporary use (at least not for authoring new documents). Sure, if this was still the era when we were limited to 65,536 code points, it would be a big concern, but this is the 1,114,112-code-point era. There is plenty of space. Maybe you should propose the characters for the SMP. It worked for Deseret, right? And last I saw Deseret's useful lifespan ended before 1900. I bet even the English Phonotypic Alphabet would get accepted if it were proposed for the SMP instead of the BMP. You could call the block Latin Extended-F, since there are plenty of letters left in that series. And I think unifying Unifon with Latin is a good idea. In Unifon I see ABȻDEFGHIJKLMNOPRSTUVWYƵ all being used in familiar ways that don't seem at all unusual for a Latin-based script. But that's just me. —Ben Scarborough Unification is a good idea while you use only the capital Unifon. But it seems cased Unifon has lowercase letters which look like small capitals and therefore, in my opinion, the unification with Latin would only provide a partial solution: every texts in Unifon which contain lowercase letters should be marked as small caps or special fonts would be used. I think the best way to encode Unifon would be as a new script, in SMP. After all, in the 1,114,112-code-point era, is it so important to save 50 code-points with a weird unification? Another possibility, if the unification is chosen, would be to add a variation selector to each Unifon letter to express that the lowercase letters are different. Would that be possible? JF
Unifon (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
On May 28, 2012, at 01:52, Michael Everson wrote: There are many blorts. I've discovered some working with Unifon. I haven't exactly had much support from the UTC with what I've discovered. I've found the usual posturing about possible unifications with other scripts. I went in saying, well, we could do this like Lisu, which none of you will like. And that was true eniough. So I did it the unification way as was agreeed one UTC, but then I get push-back about the encoding model and isn't the script dead and more of that. Dead script? Wasn't it still seeing use in the 1980:s? And why would being a dead script be a problem? The UCS is full of characters with little to no contemporary use (at least not for authoring new documents). Sure, if this was still the era when we were limited to 65,536 code points, it would be a big concern, but this is the 1,114,112-code-point era. There is plenty of space. Maybe you should propose the characters for the SMP. It worked for Deseret, right? And last I saw Deseret's useful lifespan ended before 1900. I bet even the English Phonotypic Alphabet would get accepted if it were proposed for the SMP instead of the BMP. You could call the block Latin Extended-F, since there are plenty of letters left in that series. And I think unifying Unifon with Latin is a good idea. In Unifon I see ABȻDEFGHIJKLMNOPRSTUVWYƵ all being used in familiar ways that don't seem at all unusual for a Latin-based script. But that's just me. —Ben Scarborough
Re: Unifon
Karl Pentzlin wrote: CP In conclusion, most of this should probably be handled at the (smart) font level. Today, many not yet encoded characters (Latin-like and others) can be approximately represented by smart font technology. ... However, doing such is hiding the identity of characters I think Christoph was saying these ARE the same characters as the already-encoded ones, with the same identity but a slightly different look. This is not at all the same as using ASCII code points for Greek letters. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
Re: Unifon
On 4 Jul 2011, at 14:54, Doug Ewell wrote: I think Christoph was saying these ARE the same characters as the already-encoded ones, with the same identity but a slightly different look. This is not at all the same as using ASCII code points for Greek letters. There's also such a thing as over-unification, though. Michael Everson * http://www.evertype.com/
Re: Unifon
Michael Everson wrote: There's also such a thing as over-unification, though. Right, and I'm not arguing for or against unifying Unifon with Latin, or indeed for or against encoding it at all. I just don't think the glyph variations Christoph was describing were tantamount to hiding totally different characters behind a font hack. Perhaps the use of the term smart font was unfortunate, as it might evoke the type of Latin/Greek hack Karl mentioned. I do worry about encoding even more letters that are intended to look identical to Basic Latin letters, because of the spoofing issue. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
Re: Unifon
Karl Pentzlin: Attached is a Unifon chart as used for Hupa, according to http://eric.ed.gov/PDFS/ED286691.pdf , p. 12. That’s it? Looks like diacritics to me, combined with some typographic preferences and a changed collation sequence perhaps. a/A with preferred typographic uppercase rendering akin Delta Δ b/B c/C ɔ/Ɔ 0254/0186 d/D e/E i/I j/J g/G h/H i̵/I̵ +0335, with typographic preference for vertical serifs on the bar i̯/I̯ +032F, or mandatory ai/AI digraph ligature, e.g. aͥ/Aͥ (+0365) k/K l/L m/M n/N o/O o̲/O̲ +0332 o⃒/O⃒ +20D2, or ø/Ø with typographic preference for vertical line o⃓/O⃓ +20D3, or q/Q or mandatory ao/AO(?) digraph ligature ƣ/Ƣ 01A3/01A2, or mandatory oi/OI digraph ligature, e.g. oͥ/Oͥ (+0365) ŋ/Ŋ 014B/014A, or new letter Latin Capital Letter Reversed N s/S t/T u/U ū/Ū 016B/016A, or ū/Ū (+0304) w/W y/Y / not sure whether H-based, O-based or neither x/X z/Z or ƶ/Ƶ (01B6/01B5) x̄/X̄ +0304 In conclusion, most of this should probably be handled at the (smart) font level.
Re: Unifon
Am Sonntag, 3. Juli 2011 um 18:13 schrieb Christoph Päper: CP In conclusion, most of this should probably be handled at the (smart) font level. Today, many not yet encoded characters (Latin-like and others) can be approximately represented by smart font technology. (See e.g. http://www.dkuug.dk/JTC1/SC2/WG2/docs/n4047.pdf which contains many ideas ideas how to mimic metrical symbols by diacritical marks). However, doing such is hiding the identity of characters, and making the correct reading of texts dependent of the use of specific fonts. This is a fallback into the 1980s when e.g. Greek fonts were developed which used the ASCII codepoints. Also, this enables a possible correct reading only to human readers, not to data processing systems like searching, or storing in databases from where text can be retrieved in environments preferring other fonts. We talk of character encoding here. That means, in first line, we have to decide whether a written thing has an identity qualifying it as a character, before we consider smart tricks to represent its graphic appearance by a modified use of existing characters. Smart font technology, as it has developed now, in fact is a mighty tool. But this does not mean that everybody who can use such a hammer should regard every problem as a nail. - Karl
Unifon
I’m interested in Unifon (http://www.unifon.org). That’s a phonemic alphabet for English which is used to teach reading. Although it has been encoded in the ConScript Unicode Registry as a new script in a three-columns block, it has in fact been designed as an extension of the Latin alphabet. Therefore, considering that three fifths of its letters are already available, I wonder whether a proposal shouldn’t be limited to the 16 missing letters. What’s your opinion?
Re: Unifon
Am Dienstag, 28. Juni 2011 um 09:43 schrieb Jean-François Colson: JFC I’m interested in Unifon (http://www.unifon.org). The first issue with Unifon is whether it is to be encoded at all. Given that it is a stable system since its design in the 1950s, and that references to it are found quite often, the answer probably is yes. But the case has to be made, providing evidence. Then, it seems appropriate to consider it is a script separate from Latin, like Lisu http://www.unicode.org/charts/PDF/UA4D0.pdf . Otherwise, we end up with a number of uppercase Latin letters with no lowercase counterpart. This would be a problem due to Unicode stability policies, which do not allow to encode a lowercase counterpart later for an already encoded uppercase letter. - Karl
Re: Unifon
Am 28.06.2011 um 09:43 schrieb Jean-François Colson: I’m interested in Unifon (http://www.unifon.org). That’s a phonemic alphabet for English which is used to teach reading. Although it has been encoded in the ConScript Unicode Registry as a new script in a three-columns block, it has in fact been designed as an extension of the Latin alphabet. Therefore, considering that three fifths of its letters are already available, I wonder whether a proposal shouldn’t be limited to the 16 missing letters. What’s your opinion? Is there a real need for regular encoding? If proposed as kind of extension to Latin there will be one issue at least to be considered carefully: Unifon does not fit the Latin Writing system since it is unicameral, not bicameral (as far as I can see). By which I doubtlessly not intend at all to encourage any of the enthusiasts to think they ought now go to their desks and try to invent new lowercase glyphs. Mit freundlichen Grüßen, Andreas Stötzner. »Der Bundestag möge beschließen, sich umfassend gegen den geplanten künftigen Europäischen Stabilitätsmechanismus – ESM – auszusprechen.« https://epetitionen.bundestag.de/index.php?action=petition;sa=details;petition=18123 _ Andreas Stötzner Gestaltung Signographie Fontentwicklung Wilhelm-Plesse-Straße 32, 04157 Leipzig 0152-08336058
Re: Unifon
On 6/28/2011 1:40 AM, Andreas Stötzner wrote: Am 28.06.2011 um 09:43 schrieb Jean-François Colson: I’m interested in Unifon (http://www.unifon.org). That’s a phonemic alphabet for English which is used to teach reading. Although it has been encoded in the ConScript Unicode Registry as a new script in a three-columns block, it has in fact been designed as an extension of the Latin alphabet. Therefore, considering that three fifths of its letters are already available, I wonder whether a proposal shouldn’t be limited to the 16 missing letters. What’s your opinion? Is there a real need for regular encoding? If proposed as kind of extension to Latin there will be one issue at least to be considered carefully: Unifon does not fit the Latin Writing system since it is unicameral, not bicameral (as far as I can see). Same restriction applies to IPA and phonetic notations, all of which have been unified with Latin as far as common letters are concerned. By which I doubtlessly not intend at all to encourage any of the enthusiasts to think they ought now go to their desks and try to invent new lowercase glyphs. More relevant would be who uses this system, where and how widely. The answer to those questions decides, among others, whether any standardization effort is warranted. A./
Re: Unifon
Karl Pentzlin karl dash pentzlin at acssoft dot de wrote: Then, it seems appropriate to consider it is a script separate from Latin, like Lisu http://www.unicode.org/charts/PDF/UA4D0.pdf . Otherwise, we end up with a number of uppercase Latin letters with no lowercase counterpart. This would be a problem due to Unicode stability policies, which do not allow to encode a lowercase counterpart later for an already encoded uppercase letter. Assuming that there is a use case to encode Unifon at all, I take this to mean that encoding the missing (uppercase) Unifon letters as Latin might trigger a defensive reaction to encode unattested or newly invented lowercase equivalents. I hope this is not the effect that the stability policy is having. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
Re: Unifon
Unifon was used at one point to write several languages in northern California, so it has seen practical application. I'm not sure how much material was published in this form. I don't think that any of these tribes is still using Unifon.
Re: Unifon
On 28/06/11 19:22, Bill Poser wrote: Unifon was used at one point to write several languages in northern California, so it has seen practical application. I'm not sure how much material was published in this form. I don't think that any of these tribes is still using Unifon. You’re right. Unifon has been used by the Yurok, Karuk, Tolowa and Hupa in the 70’s and the 80’s IIRC. Now, they have switched to writing systems based on the Latin alphabet. I’ve been told that several books have been printed in their languages using Unifon. However, a few letters have changed since then.
Re: Unifon
Unifon was used for Hupa only, I think, for some materials prepared by Ruth Bennett. Most if not all of these can be found in the ERIC database: http://eric.ed.gov/ERICWebPortal/search/simpleSearch.jsp?newSearch=trueeric_sortField=searchtype=basicpageSize=10ERICExtSearch_SearchValue_0=Hupaeric_displayStartCount=1_pageLabel=ERICSearchResultERICExtSearch_SearchType_0=kwNone of the more recent material in Hupa is in Unifon. On Tue, Jun 28, 2011 at 11:05 AM, Jean-François Colson j...@colson.eu wrote: On 28/06/11 19:22, Bill Poser wrote: Unifon was used at one point to write several languages in northern California, so it has seen practical application. I'm not sure how much material was published in this form. I don't think that any of these tribes is still using Unifon. You’re right. Unifon has been used by the Yurok, Karuk, Tolowa and Hupa in the 70’s and the 80’s IIRC. Now, they have switched to writing systems based on the Latin alphabet. I’ve been told that several books have been printed in their languages using Unifon. However, a few letters have changed since then.
Re: Unifon
Here is a document by Bennett that describes the use of Unifon for Hupa, Tolowa, Yurok and Karok:http://eric.ed.gov/ERICWebPortal/contentdelivery/servlet/ERICServlet?accno=ED310889 On Tue, Jun 28, 2011 at 11:05 AM, Jean-François Colson j...@colson.eu wrote: On 28/06/11 19:22, Bill Poser wrote: Unifon was used at one point to write several languages in northern California, so it has seen practical application. I'm not sure how much material was published in this form. I don't think that any of these tribes is still using Unifon. You’re right. Unifon has been used by the Yurok, Karuk, Tolowa and Hupa in the 70’s and the 80’s IIRC. Now, they have switched to writing systems based on the Latin alphabet. I’ve been told that several books have been printed in their languages using Unifon. However, a few letters have changed since then.