Re: SC UniPad 0.99 released.
Jungshik Shin wrote: > On several occasions, I heard about it on this mailing list and > finally my curiosity drove me to try it. Unfortunately, I was mightly > disappointed. At first, I was intrigued by their claim that it > supports Hangul Jamos. I've seen some false claims that Hangul > Jamos is supported and wanted to see if it really support them. Well, > it does not do any better than most other fonts/software that made > that claim. It just treats them as 'spacing characters' instead of > combining characters. Basically, it's useless except for making > Unicode code chart (so is Arial MS Unicode.) This is one of those cases where the verb "support" is so flexible that it loses meaning. UniPad does include glyphs for individual jamos as well as precomposed Hangul syllables, which is more than most non-Korean-specific TrueType fonts can offer. But it does not provide any mechanism for combining jamos into syllables, which of course is required for proper handling of Korean. Again, I don't know of any other mainstream Windows tools or fonts that can do this either (although I'm sure there are Korean-specific tools that can). > Then, I found its claim that it supports 300 languages(scripts). Wow ! > Does it properly support various South and Southeast Asian scripts? > Again, it does not. It treats combining characters as spacing > characters. I don't think users of those scripts would regard SC > Unipad as supporting their scripts/languages. UniPad never claims to support 300 scripts. I'm not even sure there are 300 scripts. Probably half of the 300 "supported languages" are written with the Latin script. But again, Jungshik has a good point that true "support" for Devanagari, Khmer, etc. really does imply shaping and combining behavior, similar to what UniPad already provides for Arabic. > You may want to check out Yudit (http://www.yudit.org). Although its > author is not so fond of MS Windows, That's putting it mildly -- he refers to Win32 as a "joke-api" [sic] and brags several times that he "will never touch Windows again." > it works in MS Windows as well as in Unix/X11. I haven't downloaded it yet, so I haven't seen whether this is true. I have my doubts, however, based on release notes like the following: "CreateProcess works in an unexpected way so the viewer won't find the file. As a workaround execute yudit from the desktop shortcut." No real Windows application gives a hoot whether you run it from a desktop shortcut, the Start menu, a taskbar button, the Start | Run dialog box, or a command-prompt window. > It supports South and Southeast Asian scripts, Arabic, > Hebrew with BIDI, Hangul Jamos(at the same level as Korean MS Office > XP in terms of the number of syllables made out of Jamos) and many > other (easier-to-deal-with) writing systems with various input > methods/keyboards (including Unicode codepoint in hex input). It can > also represent unrenderable characters with hex code in a box. If it > lacks support for your script/language and you can code, you may be > able to add it yourself either for yourself or with the author's help > as I did for Hangul Jamos. "If you can code" is a big stumbling block for anyone who is not a programmer. But certainly Yudit, like other similar open-source projects, appears to be highly extensible. -Doug Ewell Fullerton, California
Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.
Kenneth Whistler wrote: >> Is there an official press spokesperson for the meeting please? > > Well, I guess I just nominated myself. ;-) A fine choice. The ability to answer a reporter's questions BEFORE they are asked is a rare gift in the field of press relations, and the mark of a true professional. -Doug Ewell Fullerton, California > > --Ken Whistler > > 16 August 2002 > >> >> William Overington >> >> 21 August 2002
Re: Revised proposal for "Missing character" glyph
At 09:49 PM 8/26/2002 -0400, John Cowan wrote: >Nowadays, experts can detect mismatched character sets from the >nature of the byte barf that appears on their screen. And super-experts can read languages in "byte barf" as it is not random! Barry Caplan http://www.i18n.com
RE: Revised proposal for "Missing character" glyph
Ken, The little square boxes do not help much if you what to know exactly what the missing characters are. I do however feel that any solution to the problems should be Unicode based. If left to the vendors that may display the code page characters and you are guessing again. The tool idea is great but I do not see how it could be embedded in the OS without changing the application. It will also require user training. I think that as we move away from code page text we will find that the next big problem will be characters that are missing from the font or sets of fonts. The trick will be to change the set of fonts. This might require trial and error if we do not have good diagnostic tools. Implementing this change will probably be easier that using the special symbols for the script which will also require special handling and many not catch all errors. This approach will also allow critical test that can not be redisplayed to be deciphered. This has been a pet peeve of mine having used the Fujitsu Shift JIS solution and seen it work in a real live situation. Carl > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Kenneth Whistler > Sent: Monday, August 26, 2002 2:01 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: Revised proposal for "Missing character" glyph > > > [Resend of a response which got eaten by the Unicode email > during the system maintenance last week. Carl already responded > to me on this, but others may not have seen what he was > responding to. --Ken] > > > > Proposed unknown and missing character representation. This would be an > > alternate to method currently described in 5.3. > > > > The missing or unknown character would be represented as a series of > > vertical hex digit pairs for each byte of the character. > > The problem I have with this is that is seems to be an overengineered > approach that conflates two issues: > > a. What does a font do when requested to display a character > (or sequence) for which it has no glyph. > > b. What does a user do to diagnose text content that may be > causing a rendering failure. > > For the first problem, we already have a widespread approach that > seems adequate. And other correspondents on this topic have pointed > out that the particular approach of displaying up hex numbers for > characters may pose technical difficulties for at least some font > technologies. > > [snip] > > > > > This representation would be recognized by untrained people as > unrenderable > > data or garbage. So it would serve the same function as a missing glyph > > character except that it would be different from normal glyphs > so that they > > would know that something was wrong and the text did not just > happen to have > > funny characters. > > I don't see any particular problem in training people to recognize when > they are seeing their fonts' notdef glyphs. The whole concept of "seeing > little boxes where the characters should be" is not hard to explain to > people -- even to people who otherwise have difficulty with a lot of > computer abstractions. > > Things will be better-behaved when applications finally get past the > related but worse problem of screwing up the character encodings -- > which results in the more typical misdisplay: lots of recognizable > glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that > must be another piece of Korean spam mail in my mail tray.) > > > > > It would aid people in finding the problem and for people with > Unicode books > > the text would be decipherable. If the information was truly > critical they > > could have the text deciphered. > > Rather than trying to engineer a questionable solution into the fonts, > I'd like to step back and ask what would better serve the user > in such circumstances. > > And an approach which strikes me as a much more useful and extensible > way to deal with this would be the concept of a "What's This?" > text accessory. Essentially a small tool that a user could select > a piece of text with (think of it like a little magnifying glass, > if you will), which will then pop up the contents selected, deconstructed > into its character sequence explicitly. Limited versions of such things > exist already -- such as the tooltip-like popup windows for Asmus' > Unibook program, which give attribute information for characters > in the code chart. But I'm thinking of something a little more generic, > associated with textedit/richedit type text editing areas (or associated > with general word processing programs). > > The reason why such an approach is more extensible is that it is not > merely focussed on the nondisplayable character glyph issue, but rather > represents a general ability to "query" text, whether normally > displayable or not. I could query a black box notdef glyph to find > out what in the text caused its display; but I could just as well > query a properly displayed Tel
Re: SC UniPad 0.99 released.
On Mon, 26 Aug 2002, William Overington wrote: > This latest version is SC UniPad 0.99 and is available for free download > from the following address on the web. > > http://www.unipad.org On several occasions, I heard about it on this mailing list and finally my curiosity drove me to try it. Unfortunately, I was mightly disappointed. At first, I was intrigued by their claim that it supports Hangul Jamos. I've seen some false claims that Hangul Jamos is supported and wanted to see if it really support them. Well, it does not do any better than most other fonts/software that made that claim. It just treats them as 'spacing characters' instead of combining characters. Basically, it's useless except for making Unicode code chart (so is Arial MS Unicode.) Then, I found its claim that it supports 300 languages(scripts). Wow ! Does it properly support various South and Southeast Asian scripts? Again, it does not. It treats combining characters as spacing characters. I don't think users of those scripts would regard SC Unipad as supporting their scripts/languages. Its FAQ 4.2 has the following: SC> We have to differentiate between the simple inclusion of SC> the glyphs into the UniPad font and the implementation of special SC> text processing algorithms. It's definitely our goal to finally support SC> all CJK (Chinese, Japanese, Korean) characters and all Indic scripts SC> (Devanagari, Gurmukhi, etc.). Judging from the above, I think they are well aware that simply including the nominal glyphs for scripts taken from the Unicode code chart in the UniPad font is diffferent from supporting scripts. In addition, its list of general features makes it clear that it does not support 'combined rendering of non-spacing marks'. I can't help wondering, then, why they list Hindi, Thai, Tibetan, Lao, Bengali and many other South and Southeast Asian languages in the list of supported languages. > A particularly interesting new feature is that one may hold down the Control > key and press the Q key and a small dialogue box appears within which one > may enter the hexadecimal code for any Unicode character. Upon pressing the > I first learned of the existence of the UniPad program in a response to a > question which I asked in this forum, so I am posting this note so that any You may want to check out Yudit (http://www.yudit.org). Although its author is not so fond of MS Windows, it works in MS Windows as well as in Unix/X11. It supports South and Southeast Asian scripts, Arabic, Hebrew with BIDI, Hangul Jamos(at the same level as Korean MS Office XP in terms of the number of syllables made out of Jamos) and many other (easier-to-deal-with) writing systems with various input methods/keyboards (including Unicode codepoint in hex input). It can also represent unrenderable characters with hex code in a box. If it lacks support for your script/language and you can code, you may be able to add it yourself either for yourself or with the author's help as I did for Hangul Jamos. Jungshik
Re: Revised proposal for "Missing character" glyph
Kenneth Whistler scripsit: > Things will be better-behaved when applications finally get past the > related but worse problem of screwing up the character encodings -- > which results in the more typical misdisplay: lots of recognizable > glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that > must be another piece of Korean spam mail in my mail tray.) In the old days, experts could detect mismatched serial-line connections based on the nature of the baud barf that the remote system emitted. Nowadays, experts can detect mismatched character sets from the nature of the byte barf that appears on their screen. -- John Cowan [EMAIL PROTECTED] "You need a change: try Canada" "You need a change: try China" --fortune cookies opened by a couple that I know
Re: Romanized Cyrillic bibliographic data--viable fonts?
J. M. Craig wrote, > ... If anyone has access to > the Arial Unicode MS font and can check to see if U+FE20 and U+FE21 > combine properly, I'd be grateful--I don't want to spend the money to > get it if it won't solve the display problem! > Unless a font is fixed width, Latin combiners can't currently consistently combine well without "smart font technology" support enabled on the system. So, don't blame the Arial Unicode MS font if these glyphs don't always merge well. While awaiting Latin OpenType support, it might be a good idea to take a look at a well populated fixed width pan-Unicode font like Everson Mono. Best regards, James Kass.
Re: Romanized Cyrillic bibliographic data--viable fonts?
James Kass wrote, > ...would become: > > Unicode 0078 0360 0077 > > U+0360 is the double wide combining tilde. U+0361 is the double wide combining inverted breve. Oops. Best regards, James Kass.
Re: Romanized Cyrillic bibliographic data--viable fonts?
Thanks for the suggestion--of U+0361 (I don't think U+0360 is going to do what I want terribly well). I'm assuming that U+0361 IS in your font (I hadn't checked yet). One of the problems with that approach is that I don't have enough control over the conversion algorithm to make that work--or maybe I could make the right ligature half a non-translated character--hmm. I'll have to think about that. At any rate, what I'm working with is an algorithm that is much happier with round-trippable conversions (which the double breve wouldn't give me). So, no, I don't think that'll work. Shoot. I appreciate your pointing out about the copyright issues--I try to take copyrights appropriately seriously. I am in contact with the developer of the font in question (from Agfa/Monotype) and I'm REALLY hoping they'll agree to add the characters in question. If anyone has access to the Arial Unicode MS font and can check to see if U+FE20 and U+FE21 combine properly, I'd be grateful--I don't want to spend the money to get it if it won't solve the display problem! James Kass wrote: >J. M. Craig wrote, > >>... The ultimate problem is, I can't find an available font >>that properly supports the combining half marks FE20 and FE21. >> >Why not use U+0360 and U+0361 instead? > >>/ts/ >>Unicode 0078 FE20 0077 FE21 >> >> >>...would become: >> >>Unicode 0078 0360 0077 >> >> >>... or, three characters vs. four characters to write the same thing. >> >> > >James Kass, >who is now adding U+FE20 .. U+FE23 to the font here. > >Great! > John
Re: Romanized Cyrillic bibliographic data--viable fonts?
J. M. Craig wrote, > ... The ultimate problem is, I can't find an available font > that properly supports the combining half marks FE20 and FE21. > Why not use U+0360 and U+0361 instead? > /ts/ > Unicode 0078 FE20 0077 FE21 > ...would become: Unicode 0078 0360 0077 ... or, three characters vs. four characters to write the same thing. > Any suggestions welcomed! Is there a tool out there that will allow you > to edit a font to add a couple of missing characters? > William Overington has mentioned the Softy editor. Please keep in mind that fonts are copyrighted material, and, mostly users are forbidden to modify them, even for internal use purposes. The best way to get characters added to a font is to ask the font's developer. Best regards, James Kass, who is now adding U+FE20 .. U+FE23 to the font here.
Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.
William Overington inquired: > As many readers may know, the Unicode Technical Committee was due to start a > four day meeting yesterday at the Redmond, Washington State, USA campus of > Microsoft, that is, on 20 August 2002. > > Here in England I am interested to know of what is happening and to learn of > news from the meeting. As Sarasvati has indicated, minutes will be publicly posted in a few weeks. See: http://www.unicode.org/unicode/consortium/utc-minutes.html [BTW, the minutes from the February and April/May meetings have actually been approved, although their status has not been updated to "Approved" yet on the website page.] > It is the early hours of the morning in Washington State at present. It is > hoped that when delegates get up for breakfast that they might look in their > emails and make early morning responses, or perhaps arrange for an official > briefing to be posted later in the day. > > If I were conducting a live interview with the committee chairman or with an > official spokesperson I would ask the following questions. Unfortunately, the UTC has not yet arranged its television contract with ESPN, since character encoding has not generally been considered a mass-appeal spectator sport. However, since I did attend the UTC meeting last week, I may be able to provide up-to-date commentary regarding some of the questions which are not better answered by waiting for the official minutes. > * What was discussed yesterday (Tuesday) please, and what formal decisions, > if any, were taken please? Wait for the minutes. > > * How many people attended please? 16 on Tuesday. 18 on Wednesday. Back down to 15(?) on Thursday and Friday. > > * Is it only companies which are full members of the Unicode Consortium who > send delegates to the meeting, or are there also representatives of > organizations who do not vote in decisions present as well? The latter. > * Will there be a press statement at the close of the meeting please, and if > so, will it also be posted in the Unicode mailing list please? No, there will not be a press statement. Encoding of a VERTICAL LINE EXTENSION character was not considered of such earth-shattering consequence that it would lead to headlines in the technology press. > * Has there been, or is there on the agenda, any discussion of the wording > in the Unicode specification about the use of the Private Use Area and, if > so, are any changes to that wording being implemented? Not discussed by the UTC last week. This is in the purview of the editorial committee. > > * Has there been, or is there on the agenda, any discussion concerning the > status of the code points U+FFF9 through to U+FFFC please? There has been > some discussion recently in the Unicode mailing list about these code > points, as regards issues of U+FFF9 through to U+FFFB as an issue, the issue > of using U+FFFC as a single issue, and the issue of using U+FFF9 through to > U+FFFC all together. Is the committee discussing these issues at all and, > if so, are they discussing the matter of whether U+FFFC can be used in > sending documents from a sender to a receiver please? Is there any > discussion of a possible rewording, or changing of meaning, of the wording > about the U+FFF9 through to U+FFFC code points in the Unicode specification > please? Not discussed by the UTC last week. This is in the purview of the editorial committee. > > * Are any matters concerning how the Unicode specification interacts with > the way that fonts are implemented being discussed please? Yes. In a general way, this ends up being discussed at every meeting. > If so, is due > care being taken that as font format is not, at present, an international > standards matter that therefore the committee must take great care to ensure > that Unicode does not become dependent upon a usage, express or implied, of > the intellectual property rights or format of any particular font format > specification? The UTC always attempts to exercise "due care" in what it considers, but it is unclear just what clarification you are asking for here. The UTC does not standardize font formats. > * Is there any discussion of the possibility of adding further noncharacters > please, considering either or both adding some more noncharacters in plane 0 > and a large block of noncharacters in one of the planes 1 through to 14? No. > * Is the committee discussing the issue of interpretation, namely as to how, > if various people read the published specification so as to have different > meanings, how people may receive a ruling as to the formally correct meaning > of the wording of the specification. This recently arose in relation to the > U+FFFC character and has previously arisen in relation to what is correct > usage of the Private Use Area, so there are at least two areas where the > issue of interpretation has arisen. No. The UTC is a standardization committee, not a court of law. If a problem of interpreta
RE: Revised proposal for "Missing character" glyph
William, > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of William Overington > Sent: Friday, August 23, 2002 12:55 AM > To: James Kass; Carl W. Brown; Unicode List > Cc: [EMAIL PROTECTED] > Subject: Re: Revised proposal for "Missing character" glyph > > > James Kass wrote as follows. > > quote > > For non-BMP, how about a double tall glyph at the left as the > plane signifier? I double high number or letter will look like a standard letter that will just be narrower unless you are displaying text in a narrow font. In that case it will look like a separate character... This will be very confusing. Besides I don't like mixing bases and more than using octal for represents 8 bit bytes. It was confusing to use base 4, base 8, base 8, base 4, base 8, base 8 etc. How will you display the rest of the data. Will you use 65536 glyphs? That is a monster font. Better would be to use the top 4 bits of the low order 2 bytes then the bottom 4 bits of the same bytes. In any case you are going to a lot of trouble to avoid vertical hex which is the simple solution. Remember "keep it stupid, simple". Carl
Re: Revised proposal for "Missing character" glyph
[Resend of a response which got eaten by the Unicode email during the system maintenance last week. Carl already responded to me on this, but others may not have seen what he was responding to. --Ken] > Proposed unknown and missing character representation. This would be an > alternate to method currently described in 5.3. > > The missing or unknown character would be represented as a series of > vertical hex digit pairs for each byte of the character. The problem I have with this is that is seems to be an overengineered approach that conflates two issues: a. What does a font do when requested to display a character (or sequence) for which it has no glyph. b. What does a user do to diagnose text content that may be causing a rendering failure. For the first problem, we already have a widespread approach that seems adequate. And other correspondents on this topic have pointed out that the particular approach of displaying up hex numbers for characters may pose technical difficulties for at least some font technologies. [snip] > > This representation would be recognized by untrained people as unrenderable > data or garbage. So it would serve the same function as a missing glyph > character except that it would be different from normal glyphs so that they > would know that something was wrong and the text did not just happen to have > funny characters. I don't see any particular problem in training people to recognize when they are seeing their fonts' notdef glyphs. The whole concept of "seeing little boxes where the characters should be" is not hard to explain to people -- even to people who otherwise have difficulty with a lot of computer abstractions. Things will be better-behaved when applications finally get past the related but worse problem of screwing up the character encodings -- which results in the more typical misdisplay: lots of recognizable glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that must be another piece of Korean spam mail in my mail tray.) > > It would aid people in finding the problem and for people with Unicode books > the text would be decipherable. If the information was truly critical they > could have the text deciphered. Rather than trying to engineer a questionable solution into the fonts, I'd like to step back and ask what would better serve the user in such circumstances. And an approach which strikes me as a much more useful and extensible way to deal with this would be the concept of a "What's This?" text accessory. Essentially a small tool that a user could select a piece of text with (think of it like a little magnifying glass, if you will), which will then pop up the contents selected, deconstructed into its character sequence explicitly. Limited versions of such things exist already -- such as the tooltip-like popup windows for Asmus' Unibook program, which give attribute information for characters in the code chart. But I'm thinking of something a little more generic, associated with textedit/richedit type text editing areas (or associated with general word processing programs). The reason why such an approach is more extensible is that it is not merely focussed on the nondisplayable character glyph issue, but rather represents a general ability to "query" text, whether normally displayable or not. I could query a black box notdef glyph to find out what in the text caused its display; but I could just as well query a properly displayed Telugu glyph, for example, to find out what it was, as well. This is comparable (although more point-oriented) to the concept of giving people a source display for HTML, so they can figure out what in the markup is causing rendering problems for their rich text content. [snip] > This proposal would provide a standardized approach that vendors could adopt > to clarify missing character rendering and reduce support costs. By > including this in the standard we could provide a cross vendor approach. > This would provide a consistent solution. In my opinion, the standard already provides a description of a cross-vendor approach to the notdef glyph problem, with the advantage that it is the de facto, widely adopted approach as well. As long as font vendors stay away from making {p}'s and {q}'s their notdef glyphs, as I think we can safely presume they will, and instead use variants on the themes of hollowed or filled boxes, then the problem of *recognition* of the notdef glyphs for what they are is a pretty marginal problem. And as for how to provide users better diagnostics for figuring out the content of undisplayable text, I suppose the standard could suggest some implementation guidelines there, but this might be a better area to just leave up to competing implementation practice until certain user interface models catch on and get widespread acceptance. --Ken
Re: Romanized Cyrillic bibliographic data--viable fonts?
J M Craig wrote as follows. [snipped] >Any suggestions welcomed! Is there a tool out there that will allow you >to edit a font to add a couple of missing characters? You might like to have a look at Softy, which is a shareware font editor for TrueType fonts. Softy can be used to produce new TrueType fonts and to edit existing TrueType fonts. http://users.iclway.co.uk/l.emmett/ There is some more information about Softy, including the correct email address for registrations, at the following page. http://cgm.cs.mcgill.ca/~luc/editors.html Having a look for Softy and Softy font at http://www.yahoo.com might be helpful. I am trying to obtain a copy of the tutorial by "Grumpy", so far without success. I have found the other tutorial and it is very useful. I have had lots of fun with the Softy program and although I have not tried to implement the U+FE20 and U+FE21 which you mention, I have tried various experiments using Softy and have found it a very satisfactory package to use. Softy is shareware, so perhaps you might think it worth a try to find out if it will help you do what you want to achieve. Also, you might like to have a look at the SC UniPad program which I mentioned earlier today in another thread. When I was studying your posting I used SC UniPad to have a look at the various Cyrillic characters which you mentioned. As far as I can tell at present SC UniPad does not position the U+FE20 and U+FE21 characters as you might want them to appear, yet SC UniPad would seem like a good way to key in the text, ready to copy and paste it into another program which would be used to display the thus keyed text using a font of your choice. William Overington 26 August 2002
Re: Romanized Cyrillic bibliographic data--viable fonts?
At 07:27 -0600 2002-08-26, J M Craig wrote: >Any suggestions welcomed! Is there a tool out there that will allow >you to edit a font to add a couple of missing characters? The choices are, in general, buying font programs or hiring someone to modify your font for you. Having said that, it would be nice if the major OSes had better support for Latin than they do. :-) -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: SC UniPad 0.99 released.
William Overington wrote: > A particularly interesting new feature is that one may hold down the > Control key and press the Q key and a small dialogue box appears > within which one may enter the hexadecimal code for any Unicode > character. Upon pressing the Enter key, that character is entered > into the document. SC UniPad contains its own font. In a thread two weeks ago about Alt+NumPad sequences, I did mention that SC UniPad 0.99 would include this Ctrl+Q feature. It's a very handy device; my biggest obstacle so far, in fact, is simply *remembering that it's there* and using it, instead of opening Character Map and clicking on the character, which is what I had to do before (and which is still useful if I needed to browse CM to find the character in the first place). > Please note in particular the buttons in a column down the left hand > side of the display. These alter the way in which some code points > are indicated in the display. For example, if one clicks on the > button labelled FMT (which controls Character Rendering: Formatting > Characters)and selects Picture Glyph, then entry of U+200D into the > text document shows a box with the letters ZWJ in it. And best of all, you can set these rendering options independently for space characters, ASCII controls, other formatting characters (a broad category), characters unsupported in the UniPad font (a dying breed; only Plane 2 is not supported), unassigned code points, unpaired surrogates, and private-use characters. Note that unpaired surrogates are supported for testing purposes, but aren't really a good thing to have lying around. Also note that your choices for private-use characters are a generic picture glyph or a rectangle containing the USV in hex -- sorry, you can't install your own PUA font. ALSO, note that the hex-value display option for unassigned code points provides a neat solution to Martin Kochanski's earlier question about .notdef glyphs (and the ensuing discussion where Carl Brown and others suggested 2×2, 2×3, or 3×2 blocks of hex digits). BTW, the View toolbar doesn't have to run down the left side. It's there by default, but you can dock it elsewhere or let it float as a separate window. I have the Convert toolbar on the left side and View on the right because I use Convert more often. > I first learned of the existence of the UniPad program in a response > to a question which I asked in this forum, so I am posting this note > so that any end users of the Unicode system who are at present unaware > of the existence of the UniPad program might know of the opportunity > to have a look at it if they so choose. > > The web site has a facility to request email notification of > developments to SC UniPad. It was by a such requested email > notification that I became aware of the availability of SC UniPad > 0.99. I have asked the main developer of UniPad to post regular update notices on this list, and he says he will do so shortly, when he can put together a more thorough list of the new features in 0.99. Trust me, there are a LOT. ☺ -Doug Ewell Fullerton, California
Re: Romanized Cyrillic bibliographic data--viable fonts?
> Gory details: > ... > The specified Romanization for each of these Cyrillic characters > includes a ligature over the top of the two Latin code points in > question (to indicate that the Latin characters represent a single > Cyrillic character presumably). > If you can use horizontal bars over the characters rather than than the half-ligature marks, this seems to be supported by most fonts: http://www.columbia.edu/kermit/st-erkenwald.html - Frank
Re: GX Technology
On Sunday, August 25, 2002, at 10:12 PM, K S Rohilla wrote: Hi Everybody I am Working On Open type Font Technology. Pl. tell me any one GX Technology. Well, outside of the fact that what you want to ask about is called Apple Advanced Typography now (AAT), what is it you need to know? Have you checked Apple's typography site, ? == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Romanized Cyrillic bibliographic data--viable fonts?
Anyone at all familiar with bibliographical data (the MARC standards) knows that they can be a real pain to deal with. In this case, the difficulty isn't with the MARC data itself, but with the Library of Congress's Romanization standards and the lack of support for combining half marks in available fonts. I'm trying to help a client properly display Romanized Cyrillic from MARC data on a Unicode-enabled application. The ultimate problem is, I can't find an available font that properly supports the combining half marks FE20 and FE21. Alan Wood lists these two on his page of fonts by ranges (a truly impressive collection of info, BTW, Mr. Wood): Arial Unicode MS Apparently you can only get this with MS Office or Publisher these days--not a good solution for my client since their budget's very limited and they'd need it on a bunch of workstations. The most important issue from a technical point of view is that the marks may not properly combine and I don't have a copy of the font to test it myself. Does anyone know if these marks will properly combine with T, t, S, s, I, i, A, a, & U, u when using the MS font? Naqsh A cursive font (not practical) and the marks don't appear to combine properly in any case. Any suggestions welcomed! Is there a tool out there that will allow you to edit a font to add a couple of missing characters? (A more extensive explanation of the problem follows for those who want the gory details.) John Craig Alpha-G Consulting, LLC Gory details: The bibliographical data in question follows the Library of Congress Romanization rules (see this link): http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf An effective conversion to Unicode for the specified Romanizations of these Cyrillic characters is proving elusive: /ts/ Unicode 0426 (capital) & 0446 (lower case) /yu/ Unicode 042E & 044E /ya/ Unicode 042F & 044F The specified Romanization for each of these Cyrillic characters includes a ligature over the top of the two Latin code points in question (to indicate that the Latin characters represent a single Cyrillic character presumably). Now, the proper Unicode sequence for what the Library of Congress wants (based on their own documentation of the correspondances between the MARC ANSEL character set and Unicode) requires the use of the combining half marks left-half ligature U + FE20 and right-half ligature U + FE21: /ts/ Unicode 0078 FE20 0077 FE21 /yu/ Unicode 0069 FE20 0075 FE21 /ya/ Unicode 0069 FE20 0061 FE21 All very well, but the application can't paint it because of the lack of the combining half marks in the available fonts.
Re: Recent changes to i18n standards
On Fri, 23 Aug 2002 [EMAIL PROTECTED] wrote: > On 08/23/2002 04:54:58 AM "Doug Ewell" wrote: > > >For those who like to keep up on such things, there have been recent > >changes to the code lists of two important standards related to > >internationalization -- ISO 639 (language codes) and ISO 3166-2 (codes > >for country subdivisions). > > In addition to the two new code elements in ISO 639-2, there's another > development of interest in relation to language coding: ISO/TC 37 has > begun working toward development of a new part to this standard, to be > designated ISO 639-3, that will provide 3-letter identifiers for all known > languages. The relationship to part 2 will be that this the > individual-language code elements in part 2 will be a subset of part 3 > (part 2 will continue to have collective-language identifiers but part 3 will > not). The reason for the subsetting relationship of part 2 to part 3 > (rather than just adding a bunch of things to part 2) is that some user > communities (e.g. bibliographers) have indicated a need to restrict > individual-language identifiers to only developed languages with > significant bodies of literature. I'm anticipating a time frame of about > one year for this to be completed (assuming the process goes smoothly). > > > > - Peter > > > --- > Peter Constable > > Non-Roman Script Initiative, SIL International > 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA > Tel: +1 972 708 7485 > E-mail: <[EMAIL PROTECTED]> > Monday, August 26, 2002 Peter, I congratulate you and others who reached this reasonable solution. Regards, Jim Agenbroad ( [EMAIL PROTECTED] ) "It is not true that people stop pursuing their dreams because they grow old, they grow old because they stop pursuing their dreams." Adapted from a letter by Gabriel Garcia Marquez. The above are purely personal opinions, not necessarily the official views of any government or any agency of any. Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A. Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.
SC UniPad 0.99 released.
As an end user of Unicode I was interested to learn recently that the latest version of SC UniPad, a Unicode plain text editor for various PCs, has been released. This latest version is SC UniPad 0.99 and is available for free download from the following address on the web. http://www.unipad.org A particularly interesting new feature is that one may hold down the Control key and press the Q key and a small dialogue box appears within which one may enter the hexadecimal code for any Unicode character. Upon pressing the Enter key, that character is entered into the document. SC UniPad contains its own font. Please note in particular the buttons in a column down the left hand side of the display. These alter the way in which some code points are indicated in the display. For example, if one clicks on the button labelled FMT (which controls Character Rendering: Formatting Characters)and selects Picture Glyph, then entry of U+200D into the text document shows a box with the letters ZWJ in it. I first learned of the existence of the UniPad program in a response to a question which I asked in this forum, so I am posting this note so that any end users of the Unicode system who are at present unaware of the existence of the UniPad program might know of the opportunity to have a look at it if they so choose. The web site has a facility to request email notification of developments to SC UniPad. It was by a such requested email notification that I became aware of the availability of SC UniPad 0.99. William Overington 26 August 2002