Re: What's in a wchar_t string on unix?

2004-03-02 Thread Noah Levitt
As specified in C99 (and maybe earlier), if the macro
__STDC_ISO_10646__ is defined, then wchar_t values are ucs4.
Otherwise, wchar_t is an opaque type and you can't be sure
what it is.

Noah

On Mon, Mar 01, 2004 at 11:13:58 -0800, Rick Cameron wrote:
> Hi, all
> 
> This may be an FAQ, but I couldn't find the answer on unicode.org.
> 
> It seems that most flavours of unix define wchar_t to be 4 bytes. If the
> locale is set to be Unicode, what's in a wchar_t string? Is it UTF-32, or
> UTF-16 with the code units zero-extended to 4 bytes?
> 
> Cheers
> 
> - rick cameron



Re: Questions on ZWNBS - for line initial holam plus alef

2003-09-18 Thread Noah Levitt
On Mon, Aug 11, 2003 at 12:57:11 -0700, Kenneth Whistler wrote:
> Kent asked:
> 
> > How should a freestanding double diacritic be encoded (for purposes of
> > meta-discussions, and the like):  or  > diacritic, SPACE>? 
> 
> It *could* be represented as , of course,
> or for that matter , or other possibilities.
> The combining character sequence, in either case, is the
>  sequence.

How should a text rendering library deal with ? Should the character after the diacritic be
drawn under the right half of the diacritic, or beyond its
rightmost ink?

Noah



Re: Vi problem

2003-08-18 Thread Noah Levitt
Try running vim in a UTF-8 locale. 

  $ LANG=en_US.UTF-8 vim

Also, see ":help termencoding".

Noah

On Mon, Aug 18, 2003 at 17:50:42 +0200, Stefan Persson wrote:
> Hi!
> 
> I am using Vi (version Vi IMproved 6.1) on Linux using UTF-8 (xterm 
> -u8).  If a UTF-8 characters does, when misinterpreted as Latin-1, 
> contain a control character, that character is displayed as something 
> different.  For example, the Swedish capital "Ä" is displayed as a 
> square box followed by '~D'.  Is there a way to get rid of this problem?
> 
> Stefan
> 



Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

2003-08-14 Thread Noah Levitt
According to the docs at
http://www.microsoft.com/typography/otfntdev/indicot/other.htm,
uniscribe renders combining marks in isolation when they are
applied to SPACE + ZWJ. (Without the ZWJ, it uses a dotted
circle.) Perhaps this is an acceptable solution to the
people calling for a new character.

  Combining marks and signs that appear in text not in
  conjunction with a valid consonant base are considered
  invalid. Uniscribe displays these marks using the fallback
  rendering mechanism defined in the Unicode Standard
  (section 5.12, 'Rendering Non-Spacing Marks' of the
  Unicode Standard 3.1), i.e. positioned on a dotted circle. 

  Please note that to render a sign standalone (in apparent
  isolation from any base) one should apply it on a space
  (see section 2.5 'Combining Marks' of the Unicode
  Standard). Uniscribe requires a ZWJ to be placed between
  the space and a mark for them to combine into a standalone
  sign.

Noah



"savvy" images

2003-07-15 Thread Noah Levitt
I'm getting 404 Not Found for the "Unicode Savvy" images.
http://www.unicode.org/consortium/unisavvy.html

Noah



Re: Letterforms based on p

2003-06-10 Thread Noah Levitt
On Tue, Jun 10, 2003 at 10:30:30 -0400, Jim Allan wrote:
> 
> One can quickly search for these symbols in the OED web edition.

How can one? 

Noah



Re: International Font to be Used

2003-06-09 Thread Noah Levitt
If you are on a system that uses fontconfig, such as most
recent linux distros, you can find out which fonts support a
particular language using fc-list; for example:

  $ fc-list :lang=ko

Noah

On Mon, Jun 09, 2003 at 18:04:58 +0100, Raymond Mercier wrote:
> >   http://pfaedit.sourceforge.net/
> 
> http://ourworld.compuserve.com/homepages/RaymondM/unisearch.htm



Re: UNESCO standard keyboards? (Re: Tamazight/berber language : ....)

2003-06-06 Thread Noah Levitt
fontconfig also has lists of characters needed for each
language, which it uses to help decide which font to
provide:
http://keithp.com/cgi-bin/cvsweb/fontconfig/fc-lang/

Noah

On Fri, Jun 06, 2003 at  8:14:48 -0700, Doug Ewell wrote:
> > If would be interesting to add some informative appendixes to Unicode
> > and later make them normative, to clearly state the subset of
> > characters that MUST be supported for each written language, and a
> > list of legacy equivalents that should be interpreted the same as
> > their recommanded encoding in the context of that language.
> 



Re: Fw: Unicode filename problems

2003-06-03 Thread Noah Levitt
There's also a highly portable open source implementation:
http://www.info-zip.org/pub/infozip/Zip.html

Noah

On Mon, Jun 02, 2003 at 15:05:39 +0100, Raymond Mercier wrote:
> >
> >http://www.pkware.com/products/enterprise/white_papers/appnote.html



Re: New Unicode Savvy Logo

2003-05-29 Thread Noah Levitt
I had it up at http://gucharmap.sourceforge.net/ by 
Tue May 27 17:24:00 EDT 2003.

I agree with everybody that it's pretty ugly. But frankly,
it's not nearly as ugly as most of www.unicode.org,
especially the technical reports. Incidentally, the site is
hard to navigate, too. But I'm not one to complain. ;-) 

Noah

On Wed, May 28, 2003 at  0:30:04 -0700, Doug Ewell wrote:
> Announcement: New Unicode Savvy LogoSnazzy or not, is there some kind of
> award for the first Webmaster to actually use it?
> 
> -Doug Ewell
>  Fullerton, California
>  http://users.adelphia.net/~dewell/
>  2003-05-28 07:29 UTC
> 



Re: Detecting UTF-8 Locale Question

2003-03-25 Thread Noah Levitt
On Tue, Mar 25, 2003 at 12:01:11 -0500, Edward H Trager wrote:
> 
> (3) Aside from xterm or mlterm running under Cygwin, are there other UTF-8
> competent terminals available on Win32? Which one are "the best"?

PuTTY supports UTF-8, and is open source.

Noah



Re: Surrogate supported in Mozilla 1.3

2003-03-15 Thread Noah Levitt
On Sun, Mar 16, 2003 at  9:48:46 +0330, Roozbeh Pournader wrote:
> 
> > I will update the Unicode example intro page to reflect that Mozilla 1.3
> > supports this.
> 
> Just for Windows, I guess. I have a Red Hat Linux 8.0 box, just installed 
> code2001, but only see question marks on the page. The number of question 
> marks looks to be right, but that's all.

It works for me with mozilla-snapshot (1.4a) on Debian Sid.

Noah



Re: FAQ entry

2003-03-07 Thread Noah Levitt
On Fri, Mar 07, 2003 at 17:27:08 +0100, David Oftedal wrote:
> We're not necessarily talking about Latin here. In Norwegian and Danish, 
> æ is not a ligature, but a separate sound almost unpronounceable by 
> English speakers. 

I believe æ is also a character in the IPA.

Noah