New Public Review Issue posted
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on November 11, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: 46 Proposal for Encoded Representations of Meteg In some Biblical Hebrew usage, it is considered necessary to distinguish how the meteg mark positions relative to a vowel point: to the left of the vowel, or to the right; or, in the case of a hataf vowel, between the two components of the hataf vowel. A solution for this has been proposed using control characters, including the zero width joiner and non-joiner characters. This public-review issue is soliciting feedback on this proposed solution. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: markup on combining characters
Philippe Verdy wrote, re Public Review Issue #41: > I don't know if a formal proposal has been sent to ISO/IEC WG too. Yes. In fact, the PRI document itself says WG2 N2822. It *has* gone to WG2 as well as to UTC. Rick
Re: RE: Public Review Issue: UAX #24 Proposed Update
Jony wrote, > FB1D, HEBREW LETTER YOD WITH HIRIQ, should be assigned to the > unknown group. It is not a Hebrew character, notwithstanding the > misleading name. Hmmm... Are you claiming that HEBREW LETTER YOD (the base character of the codepoint U+FB1D) is not a letter of the Hebrew script, and you can substantiate that claim? If so, please write a document to that effect with appropriate citations and send to me for posting to UTC. Rick
Two new Public Review Issues posted
The Unicode Technical Committee has posted two new issues for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new items close on November 8, 2004. Briefly, the new issues are: 44 Bidi Category of Fullwidth Solidus Unicode 4.0.1 changes the Bidi Category U+002F SOLIDUS from "ES" to "CS" but leaves U+FF0F FULLWIDTH SOLIDUS as category "ES". U+FF0F FULLWIDTH SOLIDUS should probably have the same bidi class as its regular sibling. The UTC proposes to make this change for Unicode 4.1. 45 Linebreaking Category of Narrow No-Break Space Should the linebreaking category of Narrow No-Break Space (NNBSP, U+202F) be changed from WS to CS, in analogy to No-Break Space U+00A0? The reason for the change is that in all scripts but Mongolian it acts like ordinary NBSP, except for its width. In Mongolian it may be recognized in shaping. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Public Review Issue: UAX #24 Proposed Update
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new items close on November 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Proposed Update Unicode Standard Annex #24: Script Names http://www.unicode.org/reports/tr24/tr24-6.html This is a proposed update to a previously approved Unicode Standard Annex. It provides an assignment of script names to all Unicode code points. This information is useful in mechanisms such as regular expressions and other text processing tasks. The proposed update makes several substantial changes to the previously approved annex. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
UTR #17 now available
The Unicode Technical Committee is pleased to announce the availability of a new fully-approved version of Unicode Technical Report #17: Character Encoding Model. It may be obtained at the following URLs: http://www.unicode.org/reports/tr17/ This report describes a model for the structure of character encodings. The Unicode Character Encoding Model places the Unicode Standard in the context of other character encodings of all types, as well as existing models such as the character architecture promoted by the Internet Architecture Board (IAB) for use on the internet If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Public Review Issue: UAX #34 Proposed Draft
I'm sorry to report that the subject line of my previous note today was incorrect. The correct subject line should say UAX #34, and it is UAX #34 which has been released for public review. http://www.unicode.org/review/ Regards, Rick McGowan Unicode, Inc. -- Subject: Public Review Issue: UTR #17 Proposed Draft The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new items close on November 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Proposed Draft Unicode Standard Annex #34: Unicode Named Character Sequences http://www.unicode.org/reports/tr34/tr34-1.html This annex specifies sequences of characters that may be treated as single units, either in particular types of processing, in reference by standards, in listing of repertoires (such as for fonts or keyboards), or in communicating with users. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Public Review Issue: UTR #17 Proposed Draft
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new items close on November 8, 2004. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Proposed Draft Unicode Standard Annex #34: Unicode Named Character Sequences http://www.unicode.org/reports/tr34/tr34-1.html This annex specifies sequences of characters that may be treated as single units, either in particular types of processing, in reference by standards, in listing of repertoires (such as for fonts or keyboards), or in communicating with users. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.
Re: Common Locale Data Repository Project
> From: "Peter Constable" <[EMAIL PROTECTED]> > > due to the strong perception of OpenI18N.org as > > opensource/Linux advocates, even though CLDR project is not > > specifically bound to Linux. > It is hard to look at OpenI18N.org's spec and not get the impression > that all of that group's projects are not bound to some flavour of Unix. We understand what you mean. Sometime perception is very important, and that's why we thought it was a good idea to transfer CLDR. As we started as Linux Internationalization Initiative(li18nux.org) and later changed name and charter as OpenI18N.org to accommodate wider platforms and platform neutral I18N technology developments, any projects at OpenI18N.org are not limited to Linux/Unix. > CLDR doesn't have to be tied to any particular platform -- after all, > it's just a collection of data. Yup! So hopefully this move would help more parties to join the projects. That would definitely help global interoperability for all platforms and help everybody. > But I don't think you can honestly say that OpenI18N isn't tied to a > particular family of platforms Most of our current projects are mainly for some flavour of Unix, since most of the participants' expertise and interests are for those platforms but we are not limited nor have to be bound to them. The only requirement for the projects in OpenI18N.org is to be open to everyone, to be developed in open process and to be opensourced. For example, one of the projects I run, the platform neutral multilingual distributed Unicode input method framework, IIIMF, runs on Windows as well, and I honestly hope Microsoft to adapt to IIIMF in the future release of Windows, so that we can unite unicode input method framework regardless of platform. Best Regards, -- [EMAIL PROTECTED],OpenI18N.org,li18nux.org,unicode.org,sun.com} Chair, OpenI18N.org/The Free Standards Group http://www.OpenI18N.org Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356
Re: Common Locale Data Repository Project
> From: "Philippe Verdy" <[EMAIL PROTECTED]> > Is that a contribution of the Unicode Consortium to the OpenI18n.org > project (former li18nux.org, maintained with most help from the > FSF), or a decision to make the OpenI18n.org project be more open by > pushing it to a more visible standard? More on the latter, but slightly different. We believe it would be good for both opensource community and commercial IT industry that we transfer (at least a part of) the project to Unicode Consortium, after hearing the concerns on difficulty of some commercial companies to join the project due to the strong perception of OpenI18N.org as opensource/Linux advocates, even though CLDR project is not specifically bound to Linux. We hope this transfer would gain further participations from wider audiences. Regarding confusions, I have to say it is anticipated, since the project is still in transition(for example, OpenI18N.org side has not been finished necessary procedure to finalize this, so OpenI18N.org does not have a press release statement ready yet - this announcement is a little too early), I guess it will all be sorted out as time goes by. -- [EMAIL PROTECTED],OpenI18N.org,li18nux.org,unicode.org,sun.com} Chair, OpenI18N.org/The Free Standards Group http://www.OpenI18N.org Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356
RE: OT? Languages with letters that always take diacriticals
A number of North American Native languages use a character+diacritic when no character-diacritic exists. -Romanised Cree has <ē> but no -Some west-coast Salishan languages have LATIN LAMBDA WITH STROKE+COMBINING COMMA ABOVE, but no plain LATIN LAMBDA WITH STROKE -a number of languages (e.g. Meskwaki) use <č> but not . However, most if not all North American Native languages have multiple orthographies historically if not synchronically. So some Cree speakers who are using Roman orthography may very well write instead of <ē> for reasons of graphical economy. Chris Harvey mail2web - Check your email from the web at http://mail2web.com/ .
Re: Canadian Unified Syllabics
Hi. >Make your recommendation either for encoding 3 separate things, or for >use of variation selectors; or put the issue out for the committee to >decide. I would recommend that three "invisible" (I don't know the technical term) characters be adden, base-line final, mid-line final, and top-line final. I can't see why the "top-line final" would be necessary except see *** below. The benifit of this concept is that for fonts which do not have these opentype substitutions, there would be no visual effect on the screen, just that all finals would be top-line, instead of their proper place. >> Carrier is missing 2 characters (only one of which appears in the >> text) and Blackfoot is missing 1 character (which doesn't appear in the >> text). >Those are the only 3 you found, apart from the different height finals >for Dene languages, is that right? Or are there still others? Those are the only missing characters that were used in examples on my website. Other missing characters are: -the Ojibway i-finals (vital), -Ojibway combining r and l finals (I believe necessary, I have a contact who could verify if the communities are still using these), -historical Chipewyan l and g finals (obsolete really), -woods cree dh-final (vital), -ojibway-cree small ring final (vital), -west cree w-vowel-y final (if this in not what U+141d is supposed to be, a colon-like character would be the usual form of this), -west cree y-dot final (forgot to put an example of this on the webpage, will fix) (vital), -a syllabics hyphen? (just an idea), -I have found evidence that Beaver used a superscript roman l in native words (I will put an example up). I figure the other superscript roman characters (used only in borrowings) could be coded something like "F" + "topline final". *** along with the blackfoot and carrier -blackfoot w-equals sign (vital) -carrier sans-serif s final (vital) -carrier f/v final (loan words only, but would be useful) (I also will try and get in contact with an expert in Carrier). thanks... chris mail2web - Check your email from the web at http://mail2web.com/ .
Re: Canadian Unified Syllabics
Hello Here are some comments about UCAS suggestions. * Encode the pages as compliantly as possible.(Also about the overuse of PUA on my website languagegeek.com) -I spent all night making Unicode-only versions of the syllabics pages where appropriate. In the end, for Carrier and Blackfoot, it doesn't look too bad. Carrier is missing 2 characters (only one of which appears in the text) and Blackfoot is missing 1 character (which doesn't appear in the text). Apart from that, the remaining issues are stylistic, and don't really concern us here. I have also included some .pdf files of what a more neatly type-set version of the text would look like, using glyph variants and missing characters. For the Dene languages, the only major problem is with the baseline-midline-topline final situation. For now, I have used superscript numbers to mark where the final ought to appear. Doesn't look too good, but it's better than nothing. Question: Are there any Unicode characters that one could use to mark final height? Something like variation selectors? For an Opentype font, I need some invisible character to tell the font where the final should go. If nothing is appropriate, then I would suggest that three height selectors be formally submitted for unicode approval. * Offer the one font to fit all the pages while awaiting either language-specific fonts or OpenType technology availability. * Note on the pages that the one font aims to cover all syllabics, but that language-specific variants exist which can't yet be covered in a single font due to technological limitations. Done and done (or almost done anyway). * Use any combining dots and so forth from the COMBINING DIACRITIC range. (A font like Code2000 won't display these combiners well due to technology limitations, but, so what? In *your* font, you can place the combining glyphs so that their default position is acceptable and won't overstrike the base glyphs.) I am going to do a few things. 1)I am going to leave Aboriginal serif as is, because people already have the font and may have documents typed already. I have many warnings all over the site about the drawbacks of using PUA. I also have a big notice that if at any time an old font from my site is obsolete, I will provide software to make documents compliant with a new font. Any new font will be Unicode and OpenType (hopefully any mutually agreed upon missing characters can be added to the standard by then) ii) I am making a Syllabics only Open-type font (staying away from a mega-Unicode font here), which will position diacritics, finals, etc. properly (hence the need for final-height-position characters). I will include the glyph variants as "historical" or "alternate" on a case by case basis. I have discovered that the new Adobe InDesign CS seems to process the syllabics opentype nicely (I think, I just downloaded it). This won't work yet on browsers, but if the web-site is done according to unicode (as best I can) then the opentype font will make it all look good in the future. iii) I am also going to make language specific Unicode fonts. One could look at syllabics as actually 4 scripts: Cree-Ojibway-Inuktitut, Dene, Carrier, and Blackfoot. The differences between these 4 could be likened to Roman, Cyrilic, and Greek: i.e. the alphabetical concept is pretty much the same, and several glyph shapes are shared between them. For this reason, it's tricky to get a Unified Range to look really nice. For a really nit-picky example, western Cree finals tend to be quite short and small, while eastern Cree finals have to be taller due to their more complex shape. Yet both eastern Cree and Western Cree share the 'h-final' U+1426. What happens is that either a tall h-final occurs alongside short western-finals, or a short h-final occurs alongside tall eastern finals. Also, in a few weeks, I would like to present to this forum a list of all the suggestions, comments, criticisms, etc. that have been posted. And we can see where to go from there. Thanks for everyone's comments so far. I hope we can get more opinions. Chris mail2web - Check your email from the web at http://mail2web.com/ .
Re: Canadian Unified Syllabics
Hello I would like to make a few comments about the Aboriginal Serif font. First, the reason for putting so many characters in the PUA is as follows. For Blackfoot, Dene, Cree (some dialects), and Ojibway (some dialects), some important characters are missing to write these languages properly. For example, as far as I can figure, within UCAS one cannot differentiate between top-line, mid-line, and baseline finals. So because these finals had to be lumped in the PUA (along with some other characters), I put glyph variants there also. Needless to say, one cannot write Dene or Ojibway (i-final) using Code2000. So I don't know what else to say. I want the examples on my site to be legible (dot accents non-spaced in the middle of syllabics instead of above them aren't really acceptable), and I want the characters to look like what speakers are familiar with, otherwise they may very well choose not to use the font, keyboards, etc. My aim is that people can type their own language on the computer they have now. Once OpenType is available on my machine and others, I will release fonts which have OpenType tables, calling the same glyphs that are now in the PUA. This way, I am trying to make some humble attempt at backward compatibility. But for now, if people cannot use the OpenType substitutions, what else should I do? I am building specific fonts for specific languages, but I wanted one font that would display the lot. That way, if someone wanted to use languagegeek.com, they would only have to download one font, instead of one per language. Please notice that months ago, I changed the name of the font from "Aboriginal Serif Unicode" to "Aboriginal Serif" in response to comments made earlier on this list; I also note on every page that one would have to download my font to view the pages properly. Thank-you.. Chris mail2web - Check your email from the web at http://mail2web.com/ .
Unified Canadian Syllabics
Hello I think I posted this to the list last week, but I haven't seen it come up. I would like to present to the Unicode community some suggestions for missing and mis-named characters related to the UCAS range. To properly describe the kinds of characters missing etc., many graphics are required. For this reason, I would invite people to see the document at: http://www.languagegeek.com/issues/ucas_unicode.html A "words only" description follows here: ** ** I would like to suggest to the Unicode community the following observations relating to the Unified Canadian Aboriginal Syllabics range. My goal (see www.languagegeek.com) is to enable all of the North American languages to be properly and accurately written on the Internet, and computers in general. Here I will focus specifically on the languages which are currently using or historically used (and may still be in some communities) syllabics. Some conventions used below. All Unicode character names are in majuscule, and Canadian Syllabics has been abbreviated to CS. Hexadecimal Unicode indices are in parentheses and prefixed with U+. All sources cited are linked to the languagegeek.com bibliography. A final is the Syllabics term for a character which represents a consonant only, not a consonant + vowel, so CS FINAL GRAVE (U+1420), CS CARRIER H (U+144B) and CS NASKAPI SKW (U+150A) would all be examples of finals. I use the term syllabic to refer to a consonant + vowel character. A series is a row of characters on a syllabic chart, so in Misnamed Characters Note 1, tta, tte, tti, tto would be the tt-series. Misnamed Characters The asterisk ᕯ character (U+156F) appears on the code-page chart as **, and is named CS TTH. This is a misreading of the syllabarium chart used by the French Missionaries for Chipewyanprobably from the 1904 publication Prières Catéchisme et Cantiques en langue Montagnaise ou Chipeweyan. The chart in this book has been reprinted in most if not all scripts of the world type books. Unlike most other syllabics charts, this one does not have a column of finals to the right of the consonant-vowel syllabics. Instead, it simply has a list of all the finals, which do not correspond with the syllabics series on the same row. Thus, the CS WEST-CREE P (U+144A) (looks like a prime ') final which appears to the right of the tta row is not the sound tt, but is instead h. The blue circled asterisk is not tth, but is in fact a symbol which indicates a proper name, in this case /*adą/ (Adam). A second glitch on the Unicode code-page chart is that this character is written with two asterisks **, when in fact on the chart above, the first asterisk is the character itself, and the second is part of the example. I believe this should definitively be fixed. In the syllbacs chart mentioned above, the final row in the chart is labelled tca, tce , (U+1570-73) which corresponds to the modern Roman orthography sound /t/ (an aspirated stop). Interpreting tca as tya is a misunderstanding of the French description of what the c represents. The Chipewyan Syllabarium page has more info on this. Whether this syllabics series is renamed is probably not a high priority. In Naskapi, each a-type syllabic character can either be preceded by a colon-like character, or have a umlaut-like diacritic. Unicode has labelled these as having a long vowel: e.g. (U+1482) CS NASKAPI KWAA. In fact, the colon or umlaut does not mark vowel length (Naskapi orthography ignores length). Instead, the colon or umlaut simply indicates wa. So (U+1482) would be better named CS NASKAPI KWA. This is also probably not a high priority. Missing Characters Naskapi According to the Naskapi Lexicon, there is no symbol NASKAPI WOO (U+1416), but there is a wi. This character look similar to U+140E CS WI, but is differentthe dot is higher up on the left side. wi may need to be added. woo may be on a different Naskapi chart I have not seen. Blackfoot In Blackfoot, a raised equals sign is used much as the CS FINAL MIDDLE DOT (U+1427) is in Cree: to indicate a /w/ between the consonant and vowel of the syllabic. A raised = with CS BLACKFOOT KA (U+15BD) before it, gives the sound /kwa/. This character is vital to writing Blackfoot, should be added. Carrier Dene A few finals are missing from Unicode which are used in Carrier. Information for Carrier is from Poser 2000. There is an important graphical distinction between the finals used for /s/ and /s(+macron-below)/ (in the Roman Orthography version). The former is a small serif s written mid-line, while the latter is a small sans-serif s written mid-line. Unicode lists only one version (U+1506) CS ATHAPASCAN S. A second character, an upside-down mid-line small h is used for load words with /f/ or /v/ sounds. These two finals should be added. In examples of Ca
FYI: CLAW 2003
FYI: Controlled Language Applications Workshop http://www.eamt.org/eamt-claw03/ Bev -- www.enso-company.com * [EMAIL PROTECTED] --- Bev Corwin, President Enso Company Ltd. The Westin Building 2001 Sixth Avenue Penthouse Suite 3403 Seattle WA 98121 USA Telephone: 206.728.2232 Facsimile: 206.728.2262
remove
Devanagari on MacOS 9.2 and IE 5.1
I spoke to fast. Upon taking a closer look at the file, the font was not set properly. MacOS 9.2, Indian Language Kit, Mac IE 5.1 and Devanagari MT as font face seem to display UTF-8 encoded Hindi just fine. Etienne >Date: Mon, 21 Jan 2002 10:24:16 -0800 > "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> [EMAIL PROTECTED], >[EMAIL PROTECTED]: [EMAIL PROTECTED] > RE: Devanagari > >On this subject, Win2K and IE5+ seem to do a nice job displaying UTF8-encoded Hindi. >On the Mac, the Indian Language Kit provides for OS support and fonts (with MacOS 9.2 >and above), but I have not been able to display Hindi (UTF8 encoded) with Mac's IE >5.1. Am I correct in assuming that the Mac version of IE does not support Hindi >without a hack? > >Etienne > >>Reply-To: <[EMAIL PROTECTED]> >> "Christopher J Fynn" <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>Cc: "Aman Chawla" ><[EMAIL PROTECTED]> >> RE: DevanagariDate: Mon, 21 Jan 2002 23:59:38 +0600 >> >>Aman >> >>Here in Bhutan the Internet connection is still much worse than in most >>places I've visited in India & Nepal (and the cost per minute is several >>times higher) - believe me even then UTF-8 (or UTF-16) encoded pages do not >>display noticeably slower than ASCII, ISCII or 8-bit font encoded pages - >>and I don't need to download any special plug-ins or fonts. >> >>- Chris >> >>-- >>Christopher J Fynn >>Thimphu, Bhutan >> >><[EMAIL PROTECTED]> >><[EMAIL PROTECTED]> >> >> >>> -Original Message- >>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On >>> Behalf Of Aman Chawla >>> Sent: 21 January 2002 10:57 >>> To: James Kass; Unicode >>> Subject: Re: Devanagari >>> >>> >>> - Original Message - >>> From: "James Kass" <[EMAIL PROTECTED]> >>> To: "Aman Chawla" <[EMAIL PROTECTED]>; "Unicode" >>> <[EMAIL PROTECTED]> >>> Sent: Monday, January 21, 2002 12:46 AM >>> Subject: Re: Devanagari >>> >>> >>> > 25% may not be 300%, but it isn't insignificant. As you note, if the >>> > mark-up were removed from both of those files, the percentage of >>> > increase would be slightly higher. But, as connection speeds continue >>> > to improve, these differences are becoming almost minuscule. >>> >>> With regards to South Asia, where the most widely used modems are >>> approx. 14 >>> kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly >>> unheard of, efficiency in data transmission is of paramount importance... >>> how can we convince the south asian user to create websites in an encoding >>> that would make his client's 14 kbps modem as effective (rather, >>> ineffective) as a 4.6 kbps modem? >>> > > > >Hot After Christmas DEALS on just about everything! >http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099 Hot After Christmas DEALS on just about everything! http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099
RE: Devanagari
On this subject, Win2K and IE5+ seem to do a nice job displaying UTF8-encoded Hindi. On the Mac, the Indian Language Kit provides for OS support and fonts (with MacOS 9.2 and above), but I have not been able to display Hindi (UTF8 encoded) with Mac's IE 5.1. Am I correct in assuming that the Mac version of IE does not support Hindi without a hack? Etienne >Reply-To: <[EMAIL PROTECTED]> > "Christopher J Fynn" <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>Cc: "Aman Chawla" ><[EMAIL PROTECTED]> > RE: DevanagariDate: Mon, 21 Jan 2002 23:59:38 +0600 > >Aman > >Here in Bhutan the Internet connection is still much worse than in most >places I've visited in India & Nepal (and the cost per minute is several >times higher) - believe me even then UTF-8 (or UTF-16) encoded pages do not >display noticeably slower than ASCII, ISCII or 8-bit font encoded pages - >and I don't need to download any special plug-ins or fonts. > >- Chris > >-- >Christopher J Fynn >Thimphu, Bhutan > ><[EMAIL PROTECTED]> ><[EMAIL PROTECTED]> > > >> -Original Message- >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On >> Behalf Of Aman Chawla >> Sent: 21 January 2002 10:57 >> To: James Kass; Unicode >> Subject: Re: Devanagari >> >> >> - Original Message - >> From: "James Kass" <[EMAIL PROTECTED]> >> To: "Aman Chawla" <[EMAIL PROTECTED]>; "Unicode" >> <[EMAIL PROTECTED]> >> Sent: Monday, January 21, 2002 12:46 AM >> Subject: Re: Devanagari >> >> >> > 25% may not be 300%, but it isn't insignificant. As you note, if the >> > mark-up were removed from both of those files, the percentage of >> > increase would be slightly higher. But, as connection speeds continue >> > to improve, these differences are becoming almost minuscule. >> >> With regards to South Asia, where the most widely used modems are >> approx. 14 >> kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly >> unheard of, efficiency in data transmission is of paramount importance... >> how can we convince the south asian user to create websites in an encoding >> that would make his client's 14 kbps modem as effective (rather, >> ineffective) as a 4.6 kbps modem? >> Hot After Christmas DEALS on just about everything! http://www.smartshop.com/cgi-bin/main.cgi?ssa=4099
ISCII-Unicode Conversion
Hi, Would anybody be able to point me to possible ISCII-Unicode conversion utilities/APIs? How reliable is the conversion? How well is Hindi supported by the UTF8-Internet Explorer combination? Your expertise is GREATLY appreciated. Best, Etienne Nettaxi would like to ask for your help in donations to the RED CROSS today! http://www.nyredcross.org/donate/
RE: Terms "constructed script", "invented script" (was: FW: Re: Shavian)
> Odd. I've always considered Japanese "double consonants" to be > glottal stops. Could anyone please explain the difference? They are glottal stops. But Japanese writing doesn't have a (standard) means of expressing a glottally stopped vowel pair. It only can express consonants. One supposes that a small "tsu" would suffice, e.g. $B%O%t%!%$%C%$(B => hawai'i... And probably has already been used somewhere to that effect. As Ed Cherlin pointed out, "tsu" has been adapted for word-final consonants... in that sense, "tsu" is effectively used as a virama already. I still don't know if there's any Japanese phonetic scholarship that distinguishes "L" and "R"... Rick
Re: Terms "constructed script", "invented script" (was: FW: Re: Shavian)
>Hiragana (and katakana) assume certain things about the syllabic structure, >specifically that syllables are of the form [C] V [C], where the trailing >consonant (if any) must be "n". Yes, but, kana _has_ been used even natively in comics and so forth, to end words with other consonants (i.e., eliding the last vowel) for example: $B%$%s%9%?%s%H%C!&%9!<%W%C(B The biggest problem with using kana for a wide variety of languages, aside from having a severely limited number of consonants & vowels even with extension, is that it doesn't express adjacent non-identical consonants at all. Kana should be quite adequate for some other languages... Hawaiian? Oh, hmmm, well, except for that darned L/R distinction which kana doesn't have... Uh... Never mind... Rick
Re: Shavian
David Starner - [EMAIL PROTECTED] wrote... > A lot of the arguments against Klingon weren't specificially against > Klingon; That was in WG2, I guess... The most recent discussion material that UTC saw is a document I wrote, which is solely about Klingon and reasons for rejecting it. Fictional or invented scripts aren't in and of themselves bad candidates for encoding, they should just be, in general, of low priority because, pretty much without exception, they are "toys". Shavian and Deseret are examples of scripts that needn't have been encoded now, and aren't very widely used, and aren't _NEEDED_ by anyone at all, but were encoded because a while back someone just happened to have done the work, and the proposals have just been sitting around gathering dust. Might as well get them in, because nothing more needs to be done to the proposals. What's "bad" is that work seems to get done on fictional scripts while there are still millions of real people (some of whom even have access to computers) who can't express texts of their natively-used languages with Unicode because we don't have their scripts encoded. There are various reasons for that, the most common being that we can't get enough information about them. The most common reason for not having enough information is that we can't shlep enough experts to us, nor shlep enough of us to the experts, to complete any encoding proposals... a matter of time and funds. Rick
Re: GBK, HZ and EUC-TW
> Lars Garshol wrote: > > * Tom Emerson > | > | As far as mapping tables go, the best one you'll find is the > | Microsoft or ICU mapping tables. I personally have not seen an > | official mapping table from GB 13000. As others have noted, > | Microsoft has extended the "pure" GBK with Euro, and perhaps other > | code points. > > Hmmm. Does this mean that it is best to support the Microsoft > extensions, or that it is best not to do so? I guess we will be > forced to support them sooner or later, and that we might as well do > it now to save everyone some bother. As others have already indirectly noted, the problem then is the Euro is thus "double-defined" within GBK at code points GB 0x80 and GB 0xA2E3. Consequently, round-trip conversions between GBK and the Unicode 0x20AC Euro are thereby not possible without some form of data code value transformation on the return for one of these two GBK values. The one alternative is to distinguish between the two forms of GBK, supporting two forms of conversions - one to cp936 and the other to "pure" GBK. --- Out of curiosity, what does GB-18030 define for the Euro? Does it define both a single-width and a double-width form? If so, does it include any reference to how interoperability should be handled in conversions with Unicode (or for that matter, any character set which defines a single code value for this character)? (Lastly, throwing a lighted match onto gasoline...) If two forms are specified in GB-18030, should Unicode consider adding another code point in the fullwidth variant region to accomodate this? - Sue