Re: unicode on Linux

2003-10-23 Thread Owen Taylor
On Thu, 2003-10-23 at 04:54, Stephane Bortzmeyer wrote:
 On Tue, Oct 21, 2003 at 09:56:16AM -0700,
  Peter Kirk [EMAIL PROTECTED] wrote 
  a message of 22 lines which said:
 
  In this page, Markus Kuhn is damaging his credibility by continuing to 
  refer in several places to Unicode 3.0, although the page was updated 
  some time after the release of Unicode 4.0. Is the rest of this material 
  similarly out of date?
 
 Exactly my point. At the present time, trying to switch your working
 environment from Latin-1 to Unicode means digging through a lot of
 documentations, often out of date or inaccurate, compiling a lot of
 programs (see Benjamin Peterson's posting for just one program, grep)
 and debugging the whole stuff.
 
 Switching to Unicode requires dedication, for the ordinary Unix user
 (who is not an Unicode consortium member, just an ordinary computer
 engineer).

Well, UTF-8 is the default encoding on many Linux distributions
these days (Red Hat, of course, is what I'm familiar with), so
that makes the amount of work involved in switching pretty
minimal.

Regards,
Owen





Re: Last Resort Font

2003-08-19 Thread Owen Taylor
On Tue, 2003-08-19 at 15:45, Michael Everson wrote:
 At 15:04 -0400 2003-08-19, James H. Cloos Jr. wrote:
John == John Jenkins [EMAIL PROTECTED] writes:
 
 John (Apple's LastResort font [contains every Unicode character],
 John of course, but by virtually of rampant reuse of glyphs.)
 
 Does this Generate glyphs like the following ascii-  utf8-art?
 
 No. It generates much much better glyphs than that. See 
 http://developer.apple.com/fonts/LastResortFont/

Of course, better here really depends on what you want. 
Prettier? Yes. More useful for Joe User who gets Sinhala
spam? Yes. More useful if you are trying to debug why, in
a span of Arabic text, some characters aren't being located
in a font? Not really. 

 I find it interesting, if so, that Apple uses a font to acheive that 
 rather than a bit of code in the rendering libs.
 
 What Mac OS X does is when it encounters a Unicode character, it sees 
 if it's in the current font. If it's not, it starts looking through 
 all the other fonts until it finds one that is suitable. The Last 
 Resort Font has glyphs for all the characters, so it's the last one 
 looked at.

If you have a Last Resort style font, Pango should pick it up
as well (*). The hex boxes are only drawn when *no* font
on the system contains the character.

Regards,
Owen

(*) With some caveats about fontconfig configuration that I'm
not going to get into here.





Re: Last Resort Font

2003-08-19 Thread Owen Taylor
On Tue, 2003-08-19 at 17:08, Michael Everson wrote:

 At 16:24 -0400 2003-08-19, Owen Taylor wrote:

 If you have a Last Resort style font, Pango should pick it up as well.
 
 I don't know what Pango is but I guess it isn't relevant to me...

It was mentioned in the mail that you replied to (because of
it's hex-box-drawing) so I didn't feel a need to gloss.

Pango is an text layout library roughly along the lines of
Uniscribe/ATSUI/etc, developed largely by myself, with
lots of help from the open-source community, including
various people on this list.

See http://www.pango.org for really outdated content. (Not
much time to update the web page these days.)

If you don't use Linux or Unix, it's likely not relevant to 
you. It's used pretty widely these days in that arena.

Regards,
Owen





Re: Questions on ZWNBS

2003-08-02 Thread Owen Taylor
On Sat, 2003-08-02 at 06:32, Theodore H. Smith wrote:
 Hi list,
 
 I have some questions on the ZWNBS. While I don't actually need this 
 myself, someone I know needs this.
 
  Where? Specifically, where does it say FEFF shouldn't be in a string?
  Certainly, FEFF shouldn't be considered a BOM anywhere but at the start
  of a string, but does it say you just can't use that value? And if so,
  how are you supposed to use a ZWNBSP?!
 
 I'm thinking that 0xFEFF shouldn't be in a UTF16BE string, except at 
 the start right?
 
 For other kinds of UTF, I'm not sure if it is allowed or not. I know it 
 is allowed in UTF16LE, although discouraged.
 
 Instead of can't use ZWNBS, I think that char is discouraged. Where 
 is the rule that discourages it?

As far as I know, the only rules here are:

 The character U+FEFF *should* occur at the start of a UTF16 (either
 endianness) text to act as the BOM.

 The non-character U+FFFE should not occur in any encoding of Unicode;
 this means that the *byte sequence* 0xFE 0xFF should not occur in a
 UTF-16LE string.

ZWNBS can be a useful character (to suppress a line break), and there 
is no reason not to use it.

Regards,
Owen







Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

2003-07-02 Thread Owen Taylor
On Wed, 2003-07-02 at 01:03, [EMAIL PROTECTED] wrote:
 Philippe Verdy wrote on 06/28/2003 02:48:01 AM:
 
  If the user strikes the two keys patah and hiriq, the input method
  for Traditional Hebrew will generate patah,CGJ,hiriq
 
 That requires* an input method that is aware of the input context (or of 
 what has already been input -- but awareness of context is far more 
 reliable). How many systems do you know that are capable of that? It 
 requires the input drivers, such as keyboard DLLs, that support 
 context-sensitive operations; it requires application interfaces that 
 allow the input driver to find out from the app what the input context is; 
 and it requires applications that support that interface. Can you name for 
 me any system on which the existing keyboard driver format supports 
 context-sensitive rules? Can you name an application interface that allows 
 input methods (other than full-blown input method editors -- i.e. 
 something with a composition window) to communicate the input context to 
 the input method, and can you name one or more apps that support this 
 interface?
 
 This is all stuff I'd like to see become commonplace for a variety of 
 reasons, but I doubt we'll see that happen for the sake of Biblical 
 Hebrew.

On the other hand, context information is quite useful for almost
any syllable-based writing system. For instance, smart input methods
for Thai that can reorder characters entered in the wrong order
are desirable.

input methods don't have to have an obvious user interface ...
a sensible setup is to have *all* text input go through the input method
interfaces. This is how the GTK+ toolkit works and in fact, how 
the traditional XIM system works in X. Both systems support context
retrieval and modifications. And it's supported by the basic GTK+
editing widgets, so by hundreds of applications.

Regards,
Owen