Re: unicode on Linux
On Thu, 2003-10-23 at 04:54, Stephane Bortzmeyer wrote: On Tue, Oct 21, 2003 at 09:56:16AM -0700, Peter Kirk [EMAIL PROTECTED] wrote a message of 22 lines which said: In this page, Markus Kuhn is damaging his credibility by continuing to refer in several places to Unicode 3.0, although the page was updated some time after the release of Unicode 4.0. Is the rest of this material similarly out of date? Exactly my point. At the present time, trying to switch your working environment from Latin-1 to Unicode means digging through a lot of documentations, often out of date or inaccurate, compiling a lot of programs (see Benjamin Peterson's posting for just one program, grep) and debugging the whole stuff. Switching to Unicode requires dedication, for the ordinary Unix user (who is not an Unicode consortium member, just an ordinary computer engineer). Well, UTF-8 is the default encoding on many Linux distributions these days (Red Hat, of course, is what I'm familiar with), so that makes the amount of work involved in switching pretty minimal. Regards, Owen
Re: Last Resort Font
On Tue, 2003-08-19 at 15:45, Michael Everson wrote: At 15:04 -0400 2003-08-19, James H. Cloos Jr. wrote: John == John Jenkins [EMAIL PROTECTED] writes: John (Apple's LastResort font [contains every Unicode character], John of course, but by virtually of rampant reuse of glyphs.) Does this Generate glyphs like the following ascii- utf8-art? No. It generates much much better glyphs than that. See http://developer.apple.com/fonts/LastResortFont/ Of course, better here really depends on what you want. Prettier? Yes. More useful for Joe User who gets Sinhala spam? Yes. More useful if you are trying to debug why, in a span of Arabic text, some characters aren't being located in a font? Not really. I find it interesting, if so, that Apple uses a font to acheive that rather than a bit of code in the rendering libs. What Mac OS X does is when it encounters a Unicode character, it sees if it's in the current font. If it's not, it starts looking through all the other fonts until it finds one that is suitable. The Last Resort Font has glyphs for all the characters, so it's the last one looked at. If you have a Last Resort style font, Pango should pick it up as well (*). The hex boxes are only drawn when *no* font on the system contains the character. Regards, Owen (*) With some caveats about fontconfig configuration that I'm not going to get into here.
Re: Last Resort Font
On Tue, 2003-08-19 at 17:08, Michael Everson wrote: At 16:24 -0400 2003-08-19, Owen Taylor wrote: If you have a Last Resort style font, Pango should pick it up as well. I don't know what Pango is but I guess it isn't relevant to me... It was mentioned in the mail that you replied to (because of it's hex-box-drawing) so I didn't feel a need to gloss. Pango is an text layout library roughly along the lines of Uniscribe/ATSUI/etc, developed largely by myself, with lots of help from the open-source community, including various people on this list. See http://www.pango.org for really outdated content. (Not much time to update the web page these days.) If you don't use Linux or Unix, it's likely not relevant to you. It's used pretty widely these days in that arena. Regards, Owen
Re: Questions on ZWNBS
On Sat, 2003-08-02 at 06:32, Theodore H. Smith wrote: Hi list, I have some questions on the ZWNBS. While I don't actually need this myself, someone I know needs this. Where? Specifically, where does it say FEFF shouldn't be in a string? Certainly, FEFF shouldn't be considered a BOM anywhere but at the start of a string, but does it say you just can't use that value? And if so, how are you supposed to use a ZWNBSP?! I'm thinking that 0xFEFF shouldn't be in a UTF16BE string, except at the start right? For other kinds of UTF, I'm not sure if it is allowed or not. I know it is allowed in UTF16LE, although discouraged. Instead of can't use ZWNBS, I think that char is discouraged. Where is the rule that discourages it? As far as I know, the only rules here are: The character U+FEFF *should* occur at the start of a UTF16 (either endianness) text to act as the BOM. The non-character U+FFFE should not occur in any encoding of Unicode; this means that the *byte sequence* 0xFE 0xFF should not occur in a UTF-16LE string. ZWNBS can be a useful character (to suppress a line break), and there is no reason not to use it. Regards, Owen
Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)
On Wed, 2003-07-02 at 01:03, [EMAIL PROTECTED] wrote: Philippe Verdy wrote on 06/28/2003 02:48:01 AM: If the user strikes the two keys patah and hiriq, the input method for Traditional Hebrew will generate patah,CGJ,hiriq That requires* an input method that is aware of the input context (or of what has already been input -- but awareness of context is far more reliable). How many systems do you know that are capable of that? It requires the input drivers, such as keyboard DLLs, that support context-sensitive operations; it requires application interfaces that allow the input driver to find out from the app what the input context is; and it requires applications that support that interface. Can you name for me any system on which the existing keyboard driver format supports context-sensitive rules? Can you name an application interface that allows input methods (other than full-blown input method editors -- i.e. something with a composition window) to communicate the input context to the input method, and can you name one or more apps that support this interface? This is all stuff I'd like to see become commonplace for a variety of reasons, but I doubt we'll see that happen for the sake of Biblical Hebrew. On the other hand, context information is quite useful for almost any syllable-based writing system. For instance, smart input methods for Thai that can reorder characters entered in the wrong order are desirable. input methods don't have to have an obvious user interface ... a sensible setup is to have *all* text input go through the input method interfaces. This is how the GTK+ toolkit works and in fact, how the traditional XIM system works in X. Both systems support context retrieval and modifications. And it's supported by the basic GTK+ editing widgets, so by hundreds of applications. Regards, Owen