Re: ConScript

2004-01-16 Thread James H. Cloos Jr.
 Mark == Mark E Shoulson [EMAIL PROTECTED] writes:

Mark presumably Linux put it there following the Corporate
Mark Space rule, placing Klingon as their corporate characters.

I remember when the klingon character support was added to the console
drivers, and that is exactly why they put it where they did.

-JimC




Re: Clones (was RE: Hexadecimal)

2003-08-19 Thread James H. Cloos Jr.
 John == John Jenkins [EMAIL PROTECTED] writes:

John (Apple's LastResort font [contains every Unicode character],
John of course, but by virtually of rampant reuse of glyphs.)

Does this Generate glyphs like the following ascii-  utf8-art?

+--+
|AB|AB
|CD|CD
+--+

(Both included for the benefit of the utf8-impaired.)

I find it interesting, if so, that Apple uses a font to acheive that
rather than a bit of code in the rendering libs.  I beleive that
pango () does it in the lib.

-JimC




Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread James H. Cloos Jr.
 Curtis == Curtis Clark [EMAIL PROTECTED] writes:

Marco TOILETS --- 50 yds (45.72 m)

Curtis To be precise, it should have said 50.00 yards (or perhaps 46 m).

Actually, 50 only has one significant digit, so that would
in fact round to 50 m afterall.  

-JimC




Re: Handwritten EURO sign (off topic?)

2003-08-14 Thread James H. Cloos Jr.
 Kenneth == Kenneth Whistler [EMAIL PROTECTED] writes:

 terra is not far behind (especially if disk sizes continue to grow).

Kenneth Does that refer to physical disk sizes growing to global
Kenneth scale, or disk contents sufficiently capacious to encompass
Kenneth the entire store of terran information?

Touch.(That is U+02AD for the utf8-impaired.)

Does anyone have a good limerick lambasting typos?

Or a haiku?  ( , yes? )

-JimC






Re: Handwritten EURO sign (off topic?)

2003-08-14 Thread James H. Cloos Jr.
 Anto'nio == Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED] writes:

Anto'nio (Let alone the validity of things
Anto'nio like k, c etc.)

I'm sure things like m, k, M and even G will come into use,
though I expect more will use them in front of the digits.

I certainly use m$, k$ et al, and regulary see others use them.

-JimC




Re: Handwritten EURO sign (off topic?)

2003-08-10 Thread James H. Cloos Jr.
 Stefan == Stefan Persson [EMAIL PROTECTED] writes:

Stefan m and m$ would be millieuros and millidollars.  How could
Stefan anyone need anything like that?

On this side of the pond, fuel prices per gallon are quoted in m$;
I presume they quote m$ per Litre in CA, though it has been long
enough that I cannot be sure what I remember about ON stations

Presorted bulk mail in the states is priced such that the per-item
rates are not integral cents; you can even buy stamps at rates like
14.xxx .  I could see people discussing those using m$ or even $.

However, the specific places *I* used m$ were in micropayment
discussions.

Stefan And why use c$ and c, wouldn't  be just as good?

You'll not I didnt use centi.

-JimC




Re: [OT?] LCD/LED Keyboard

2003-07-25 Thread James H. Cloos Jr.
As acomment on this, I dont think LCDs are the way to go.

eInk may be more interesting, though it would require sufficiently
robust and transparent coating.  (Perhaps AlO?)

-JimC

 Aluminum oxide, if your MUA is not utf-8 friendly :)





Re: [OT?] LCD/LED Keyboard

2003-07-25 Thread James H. Cloos Jr.
 Pim == Pim Blokland [EMAIL PROTECTED] writes:

Pim I'm not sure if the cost would be too high, because this would
Pim mean manufacturers only need to build one model of keyboard
Pim instead of all those different ones for different countries.

More precisely, it means only one model of keyCAP for each shape.

No screening, sorting, et al.

-JimC




Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread James H. Cloos Jr.
 Werner == Werner LEMBERG [EMAIL PROTECTED] writes:

 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Werner No.  There must be a kind of `dot' for the i and a kind of
Werner `breve' sign above the u.

Might it have been taught differently in different regions?  My Prof
was from Berlin.  (She and her parents escaped when she was about 5 or
so, back in the 30s.)  The script she learned from her parents left
out the dots and umlauts.  IIRC each word was a single unbroken curve.

-JimC




Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread James H. Cloos Jr.
 Patrick == Patrick Andries [EMAIL PROTECTED] writes:

 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Patrick Stterling ?

Patrick http://terraaqua.de/schrift.htm

Yes, Id guess that is it, but my Profs vertical strokes werent.
(But then, as I hinted in my reply to Werner, she probably learned it
from here parents after they fled to the states, rather than in school.)

-JimC




Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread James H. Cloos Jr.
 John == John Cowan [EMAIL PROTECTED] writes:

John Not just mediaeval text; c. for etc. (= et cetera) was
John common right through the 19th century if not later.

And picked up steam again online in the 1980s; groups.google.com
should have lots of examples of c.

-JimC






Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread James H. Cloos Jr.
 Peter == Peter Kirk [EMAIL PROTECTED] writes:

Peter Maybe, but it is hardly realistic to expect all existing
Peter Turkish and Azeri text to be recoded to insert a character in
Peter the middle of each f - i sequence.

But a lot of it already does do that.  In TeX Turkish uses f{}i to
block the (fonts) ligation.  roff does something similar.  Im
sure all of the other text-source publishing systems do as well.

Even the WYSI(NR)WYG must be doming something to accomplish that.

-JimC

 NR  Not Really




Re: Accented ij ligatures (was: Unicode Public Review Issuesupdate)

2003-06-30 Thread James H. Cloos Jr.
 Philippe == Philippe Verdy [EMAIL PROTECTED] writes:

Philippe But if one wants to restore the preious visual behavior,
Philippe even if it's incorrect for languages using this digraph as a
Philippe letter, what would be the behavior of using the following
Philippe sequence: ij, combining dot above, combining accute
Philippe (i.e. should this display 1 or 2 dots?)

Seems clear to me that if ij has soft dots (and I agree it should)
then to get a pair of dots via a combining accent one should use a
two dot combining accent:  U+0308 COMBINING DIAERESIS.

So if you want two dots and an acute use ij, U+0308, U+0301: 

Of course a given fonts diaeresis will often not line up with the
stems of its ij, and a custom one should be used instead.  Or
features and/or ligs as appropriate to the font technology could
just use the ij glyph w/ an extra acute.  Either way it is a glyph
issue rather than a character issue.

But it really seems to be just an academic issue, yes?

-JimC




Re: Classification of Alphabetic characters (was: Hiragana/Katakanasound marks)

2003-06-06 Thread James H. Cloos Jr.
 Philippe == Philippe Verdy [EMAIL PROTECTED] writes:

Philippe Another interesting case is the usage of the apostrophe in
Philippe (modern) Breton, where the official alphabet considers the
Philippe sequence c'h as a single letter, despite it's written with
Philippe 3 Unicode characters, one of which is not a letter...

If one wanted to ensure that all of the characters in c'h were letters,
it would probably make sense to not use U+0027 APOSTROPHE ('), but rather
U+02BC MODIFIER LETTER APOSTROPHE ().  It is Lm like U+30FC () (which 
started this thread, for those starting in the middle).

Perhaps a unicode savvy  input method would automate that.  Although
automatically differentiating between  and  is a bit of a challenge

-JimC




Re: Detecting UTF-8 Locale Question

2003-03-25 Thread James H. Cloos Jr.
 Edward == Edward H Trager [EMAIL PROTECTED] writes:

Edward (1) Is examination of the LC_CTYPE environment variable on
Edward UNIX-like environments a sufficient way of detecting locale?

If you are going to look at the env yourself, rather than using the
relevant libraries, you must look at more than just LC_CTYPE.

The progression is:

LC_ALL overrides LC_CTYPE overrides LANG

So, first check LC_ALL; if that is not set then check LC_CTYPE; if
that is also not set check LANG; if that is not set the locale is C.

That should match the behavior of POSIX compliant libraries.

-JimC




Re: Public Review Issues update

2003-02-04 Thread James H. Cloos Jr.
 Asmus == Asmus Freytag [EMAIL PROTECTED] writes:

Asmus The new character code charts are available for review at
Asmus http://www.unicode.org/charts/u40-beta.html

The pdfs referenced there are all 404.

(They all match the regex http://www.unicode.org/charts/U40-[0-9A-F]+\.pdf )

-JimC





Re: browsers and unicode surrogates

2002-04-22 Thread James H. Cloos Jr.

 Tex == Tex Texin [EMAIL PROTECTED] writes:

Tex I am surprised by the must only be used. It seems I am not
Tex conforming by including a meta statement in the utf-16 HTML
Tex page. I should either remove the statement or encode the HTML up
Tex to and including that statement as ascii. I'll check on this.

Since you are using apache, it is quite easy to get the extra headers
sent at the protocol level rather than having to use meta tags.

You can use a Header directive in an .htaccess file a la:

Files foobar.html
  Header set Content-Language en-US
  Header set Content-Type text/html; charset=UTF-8
/Files

Or, you can use mod_cern_meta to put the extra headers in a
foo.html.meta file.  (The actual filename suffix can be set in the
.htaccess file or the main server conf files.)

There are other ways as well.  Apache will already (if you use the
default configs) add the Content-Language header if you use a filename
like foo.en.html.  You could have it also add the charset via a
similar mechanism.  Something like:

AddCharset UTF-8 utf8

will make foobar.en.utf-8.html send the headers:

Content-Language: en
Content-Type: text/html; charset=UTF-8

given the default configs for language and type extensions.

Hmmm.  Looking at a recent install of SuSE, using their apache rpm,
.utf8 is already configured as an extension to set charset=UTF8, so
you could try just renaming the file to eg:

http://www.i18nguy.com/unicode-plane1.utf8.html

to set the charset.  You'd have to add your own AddCharset directives
for UTF-16 and UTF-32.

-JimC





PS names for glyphs corresponding to non-BMP chars

2002-04-19 Thread James H. Cloos Jr.

The latest docs I've seen indicate four hex chars in the uni names
for glyphs corresponding to BMP chars.  What should be done for glyphs
corresponding to characters in the supplementary planes?

Will using five or six hex chars break any software¹ out in the wild?

Also, would setting up lig pairs such as:

C xxx ; WX 0 ; N uniD875 ; B 0 0 0 0 ; L uniDC00 uni1D400 ; L uniDC01 uni1D401 ...

(to use afm syntax) be reasonable to try to kludge surrogate support
into software that doesn't grok them natively?²

-JimC

¹ Any /useful/ software, anyway. :)

² could the whole set of U+D875 U+Dxxx pairs even work in an afm?
  I don't see anything in technote 5004 about line-length limitations
  or support for a continuation escape such as (in C) \\\n.

P.S.  For the purpose of this note please presume I got the
  utf-16 surrogate pairs correct for those plane 1 characters





Re: Concerning proposals

2002-04-11 Thread James H. Cloos Jr.

 Stefan == Stefan Persson [EMAIL PROTECTED] writes:

Stefan Is there some free font program out there that can be used for
Stefan this purpose?

There is pfaedit at:

http://pfaedit.sf.net/

and for bdf bitmap fonts xmbdfed at:

http://crl.nmsu.edu/~mleisher/xmbdfed.html

Pfaedit's ttf/otf instructing is not yet up to par, but it works well
for type1 fonts.  

Of course, DEK's metafont is excellent as well.  It should come with
any TeX distribution.

Finally, there are any number of such programs for doze or mac.

-JimC