Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 21:33 12/1/2001, Asmus Freytag wrote:

>If the character can be shown to have as much justification for existence
>as coded character as similar characters in the standard, i.e. if it's
>ever used in printed handwriting, etc., etc., than we will have a tough
>time coming up with a unification that's not (far) worse than just adding
>it by itself.

Indeed. If it is not suitable to treat the och sign as a variant form of 
the ampersand, it would be better to give it its own codepoint rather than 
try to unify it with some other character(s) that would require more 
convoluted rendering.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Are these characters encoded?

2001-12-02 Thread Asmus Freytag

At 05:29 PM 12/1/01 -0600, David Starner wrote:
> > It is certainly not a glyph variant of an ampersand. An ampersand is
> > a ligature of e and t. This is certainly an abbreviation of och. That
> > both mean "and" is NOT a reason for unifying different signs.
>
>But the fact that they never appear in the same text in the same font,
>and that one appears in handwritten text in the same places as the
>ampersand appears in machine written text means that it is a glyph
>variant. In any case, if it never appears in machine-written text, (if
>there's no font, as you point out for proposed ConScript additions),
>then there's no need to encode it.

Signs for faithful renderings of manuscript are - at least at the moment -
somewhat outside the scope of Unicode. Having said that, an exception for
current practice can be certainly be considered, as instances of type-set
"handwriting" are not generally uncommon, even if we can't lay our hands on
them on demand. So, on this aspect of the character alone I would not like
to make a ruling one way or another, but getting a printed 'och' would
certainly make the counterargument moot.

I wish that Unicode encoding principles were as easy as "If entity A only
occurs in one context and entity B occurs only in another, they can be
unified". Well, taking this argument to extreme, we could unify a lot of
unrelated things. Unicode might have fit in 64K after all. ;-)

Michael's argument that "and" (Sw. 'och') and "et" are different words and
need to be distinguished on that score alone is interesting, because
semantics and usage are so close. For letters we have long held that if it
is the same letter, we don't disunify it across languages. Why this
necessarily breaks down for abbreviation of a near universal word as 'and',
is not necessarily clear.

However, the Swedish case is really that the handwriting uses o-underbar
*NOT* in place of the ampersand, but in places where the typeset text
presumeably would have the word 'och' spelled out. In fact, I would guess
that a handwritten text referring to a company name, for example
Rabén&Sjögren might use the & and not the o-underbar in Swedish. I don't
know this for sure, but I strongly suspect that such differentiation of
usage exists that would make it awkward to convert printed handwriting
into printed text by a pure font change.

Overloading the existing 00BA º is tempting, but would likely result in
incorrect output unless special purpose (read private use) fonts are used,
or unless it became common to have a Swedish glyph overrides in fonts and
rendering engines that applied them. Since the usage and typographic
convention for 'och' and the raised o for numbering are not related, this
unification smells more of shoehorning than encoding.

(BTW it's not B0 as someone noted, that's a raised digit 0).

The strongest surviving candidate is the composed sequence U+006F U+0332,
but 0332 is an underscore, and not something that sits on-line. Again,
it would take special-purpose or specifically Swedish aware fonts and/
or rendering engines that support them to get the right result. That would
argue against this particular unification - even though it would be quite
acceptable for rough plain text usage.

If the character can be shown to have as much justification for existence
as coded character as similar characters in the standard, i.e. if it's
ever used in printed handwriting, etc., etc., than we will have a tough
time coming up with a unification that's not (far) worse than just adding
it by itself.

A./






Re: Are these characters encoded?

2001-12-02 Thread juuichiketajin

Perhaps they should be. I wonder: When transcribing a foreign name (like a business 
name) that includes the ampersand, would a Swede use the "och" sign?
I can't answer that.

In other words, does there exist a case where the ampersand and the "och" sign are not 
interchangeable?


-Original Message-
From: John Hudson <[EMAIL PROTECTED]>
Date: Sun, 02 Dec 2001 16:33:04 -0800
To: [EMAIL PROTECTED]
Subject: Re: Are these characters encoded?


> At 15:16 12/2/2001, [EMAIL PROTECTED] wrote:
> 
> >Then why not unify DIGIT THREE with HAN DIGIT THREE?
> 
> I don't know enough about the Han encoding to answer that. Because they are 
> distinguished in existing character sets? Because someone has a need to 
> distinguish them in plain text?
> 
> I'm not saying that the Swedish och sign should automatically be unified 
> with the ampersand. I'm simply pointing out that, as described to date on 
> this list, it is not clear that this sign needs to be separately encoded. 
> We know that is can be treated as a language-specific glyph variant because 
> Swedish readers apparently accept both forms to means exactly the same 
> thing. Whether such treatment is sufficient depends on whether there is 
> also need to distinguish the two forms, and to do so in plain text. I think 
> Michael Everson made a strong case for separate encoding of the Tironian et 
> sign, and I think a similarly strong case would need to be made for 
> separately encoding the Swedish och sign.
> 
> I'm perfectly happy to include the och sign in my fonts, whether it is 
> encoded or not, and to provide mechanisms to access the glyph. At the 
> moment, though, I don't think it is clear whether it is best for this sign 
> to be encoded or not. What might be the impact on Swedish keyboard drivers? 
> Is the intention that a new och sign character should replace the ampersand 
> character in Swedish text processing, or should both be used? What is the 
> impact on existing documents?
> 
> John Hudson
> 
> Tiro Typeworkswww.tiro.com
> Vancouver, BC [EMAIL PROTECTED]
> 
> ... es ist ein unwiederbringliches Bild der Vergangenheit,
> das mit jeder Gegenwart zu verschwinden droht, die sich
> nicht in ihm gemeint erkannte.
> 
> ... every image of the past that is not recognized by the
> present as one of its own concerns threatens to disappear
> irretrievably.
>Walter Benjamin
> 
> 
> 

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 15:16 12/2/2001, [EMAIL PROTECTED] wrote:

>Then why not unify DIGIT THREE with HAN DIGIT THREE?

I don't know enough about the Han encoding to answer that. Because they are 
distinguished in existing character sets? Because someone has a need to 
distinguish them in plain text?

I'm not saying that the Swedish och sign should automatically be unified 
with the ampersand. I'm simply pointing out that, as described to date on 
this list, it is not clear that this sign needs to be separately encoded. 
We know that is can be treated as a language-specific glyph variant because 
Swedish readers apparently accept both forms to means exactly the same 
thing. Whether such treatment is sufficient depends on whether there is 
also need to distinguish the two forms, and to do so in plain text. I think 
Michael Everson made a strong case for separate encoding of the Tironian et 
sign, and I think a similarly strong case would need to be made for 
separately encoding the Swedish och sign.

I'm perfectly happy to include the och sign in my fonts, whether it is 
encoded or not, and to provide mechanisms to access the glyph. At the 
moment, though, I don't think it is clear whether it is best for this sign 
to be encoded or not. What might be the impact on Swedish keyboard drivers? 
Is the intention that a new och sign character should replace the ampersand 
character in Swedish text processing, or should both be used? What is the 
impact on existing documents?

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Are these characters encoded?

2001-12-02 Thread DougEwell2

In a message dated 2001-12-02 11:00:32 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

> "o." and "o-with-underscore" are NOT glyph variants of a ligature of 
> e and t (at a character level), no matter what they mean.

I suggested that Stefan's o-underscore "and" might OR might not be a 
variation of the ampersand, in all its many existing glyph variants.

The "glyph variant" side is bolstered by the argument that it's a symbol, 
just like &, used to mean "and" without any translation necessarily taking 
place; that it's only used in Swedish; and that users consider it equivalent 
to & and use different forms depending on whether the text is handwritten or 
typed.

The "separate character" side can point to the fact that its derivation is 
completely different from that of &; that it looks nothing like any of the 
existing forms of & (like TIRONIAN SIGN ET); and that it's only used in 
Swedish (cf. GREEK QUESTION MARK).

I don't think there is one obvious answer to this.  I will say this, however: 
The majority of posts stating that some character or other is "not in 
Unicode" turn out to be bogus; the proposed character is really a glyph 
variant or presentation form.  Stefan's original post had the following three 
points:

1.  Swedish "o-underscore" -- maybe, maybe not
2.  Fraction slash -- already encoded
3.  Roman numerals -- overextension of compatibility forms; rendering issue

When two of three proposals can be quickly blown off, it is human nature that 
sometimes it is difficult to see the potential virtue in the third.

I also want to say that, although Michael is of course correct that & was 
originally a ligature of e and t, many, many of the & glyphs seen today do 
not even remotely resemble such a ligature.  Consider the top three glyphs in 
the attached GIF (only 290 bytes).  The first is obviously still an e-t 
ligature, the second is one with centuries of typographical evolution applied 
to it (and today more closely resembles a treble clef), the third is not at 
all.  If traceability to the original Latin "et" were what made these 
characters the same or different, then that might have spoken against the 
separate encoding of TIRONIAN SIGN ET.

I never think of & as meaning "et," even the glyph variants that do look like 
an e-t ligature.  I assume that practically all users of this symbol treat it 
as a logograph meaning "and" in the language of the surrounding text.  (I 
have, rarely, seen & used in Spanish text, which strikes me as funny since 
the Spanish words for "and" ("y" and "e") would not seem to need 
abbreviating.)

So the question might be posed, do Swedish users think of o-underscore as a 
logograph meaning "och" or as an abbreviation for the spelled-out word "och"?

In a message dated 2001-12-02 9:23:51 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

>>> Having said that, it seems to me that U+00B0 would represent Stefan's
>>> character easily enough.
>>
>> No. It's not a degree sign.  Nor is 00BA appropriate: the underlined o is
>> not superscripted/raised (much, if at all).
>
> Sorry, I did mean U+00BA, and subscription or superscription of the 
> glyph in that character is a matter of glyph choice.

I think, though, that use of U+00BA MASCULINE ORDINAL INDICATOR would be a 
classic example of hijacking a character for an unintended and inappropriate 
purpose simply because its glyph looks "close enough."  This would be like 
using U+003B at the end of a Greek question.  I stick to my original 
suggestion of U+006F U+0332, crossing my fingers that rendering engines will 
handle this correctly.

-Doug Ewell
 Fullerton, California




Re: Are these characters encoded?

2001-12-02 Thread juuichiketajin

Then why not unify DIGIT THREE with HAN DIGIT THREE?


-Original Message-
From: John Hudson <[EMAIL PROTECTED]>
Date: Sun, 02 Dec 2001 10:05:36 -0800
To: Michael Everson <[EMAIL PROTECTED]>
Subject: Re: Are these characters encoded?


> At 14:14 12/1/2001, Michael Everson wrote:
> 
> >It is certainly not a glyph variant of an ampersand. An ampersand is a 
> >ligature of e and t. This is certainly an abbreviation of och. That both 
> >mean "and" is NOT a reason for unifying different signs.
> 
> The fact that & is accepted by Swedish readers as a substitute for the 
> 'och' sign, and that the latter seems to be limited to manuscript, suggests 
> a glyph variant. I do not consider the fact that both mean 'and' to be a 
> reason for unifying different signs. I ponder whether two different signs 
> that are apparently used *interchangeably* might be unified?
> 
> John Hudson
> 
> Tiro Typeworkswww.tiro.com
> Vancouver, BC [EMAIL PROTECTED]
> 
> ... es ist ein unwiederbringliches Bild der Vergangenheit,
> das mit jeder Gegenwart zu verschwinden droht, die sich
> nicht in ihm gemeint erkannte.
> 
> ... every image of the past that is not recognized by the
> present as one of its own concerns threatens to disappear
> irretrievably.
>Walter Benjamin
> 
> 
> 

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: Are these characters encoded?

2001-12-02 Thread Michael Everson

At 10:05 -0800 2001-12-02, John Hudson wrote:
>At 14:14 12/1/2001, Michael Everson wrote:
>
>>It is certainly not a glyph variant of an ampersand. An ampersand 
>>is a ligature of e and t. This is certainly an abbreviation of och. 
>>That both mean "and" is NOT a reason for unifying different signs.
>
>The fact that & is accepted by Swedish readers as a substitute for 
>the 'och' sign, and that the latter seems to be limited to 
>manuscript, suggests a glyph variant. I do not consider the fact 
>that both mean 'and' to be a reason for unifying different signs. I 
>ponder whether two different signs that are apparently used 
>*interchangeably* might be unified?

Um, I accept "etc." and "&c." and "7c." (the last with a Tironian et, 
admittedly peculiar to most readers of English) as "meaning" the same 
thing but that doesn't mean that & and 7 are the same character. They 
have different origins which are well known. You don't unify that 
kind of thing.

In Irish many people accept "srl" and "&rl" and "7rl" as meaning the 
same thing as well. The form with the actual & is considered peculiar.

"o." and "o-with-underscore" are NOT glyph variants of a ligature of 
e and t (at a character level), no matter what they mean.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Writing/finding a UTF8, UTF16, UTF32 converter

2001-12-02 Thread Rick McGowan

There is code for doing UTF8/16/32 conversions:

ftp://www.unicode.org/Public/PROGRAMS/CVTUTF

Rick





Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 14:14 12/1/2001, Michael Everson wrote:

>It is certainly not a glyph variant of an ampersand. An ampersand is a 
>ligature of e and t. This is certainly an abbreviation of och. That both 
>mean "and" is NOT a reason for unifying different signs.

The fact that & is accepted by Swedish readers as a substitute for the 
'och' sign, and that the latter seems to be limited to manuscript, suggests 
a glyph variant. I do not consider the fact that both mean 'and' to be a 
reason for unifying different signs. I ponder whether two different signs 
that are apparently used *interchangeably* might be unified?

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: C with bar for "with"

2001-12-02 Thread Wm Seán Glen



The lower case 'c' with either and overscore or an underscore 
is used in medical terminology. It means "with" and comes from the Latin "cum". 
The English version is lower case 'w' with a solidus "w/"
Seán


Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 06:17 12/2/2001, Stefan Persson wrote:

>Well, this character is *only* used in Swedish, while & is used in most
>(all?) languages using Roman letters, so it has a partially different usage!
>Using this character in, for example, an English text would be *wrong*!

Which is why I went on to suggest that the Swedish manuscript ampersand 
form (the 'och' abbreviation) might be substituted 'in Swedish text'. The 
OpenType glyph substitution model, for example, associates lookups with 
particular script and language system combination, so it is possible to to 
have something like this:

 Latin 
 Swedish 
 Stylistic Alternates 
 ampersand -> ampersand.swe

This substitution would only be applied in Swedish text. Now, this 
particular aspect of OpenType is not well supported yet, but it is a viable 
mechanism for the kind of substitution that the 'och' glyph requires.

Please note that I am not saying that the 'och' should not be encoded, only 
that there may well be good reasons to consider this form as a glyph 
variant and existing technologies for dealing with it as such. In order to 
make a case for encoding the 'och' ampersand, I think you will need to 
demonstate a need to distinguish it from the regular ampersand in plain 
text documents.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





RE: Are these characters encoded?

2001-12-02 Thread Michael Everson

At 17:12 +0100 2001-12-02, Kent Karlsson wrote:

>Similarly, COMBINING OVERLINE and COMBINING LOW LINE
>should be used, together with ordinary I, V etc. (when possible)
>to get "lined" roman numerals.

What? Surely this is a font matter, and using combining characters a 
hack here. In Quark one might just draw a line and align it with the 
font.

>  > It is certainly not a glyph variant of an ampersand. An ampersand is
>  > a ligature of e and t.
>
>True (both). ("ampersand" is somewhat of a misnomer.)

It derives from "and per se and", apparently.

>  > This is certainly an abbreviation of och. That
>  > both mean "and" is NOT a reason for unifying different signs.
>  >
>  > Having said that, it seems to me that U+00B0 would represent Stefan's
>  > character easily enough.
>
>No. It's not a degree sign.  Nor is 00BA appropriate: the underlined o is
>not superscripted/raised (much, if at all).

Sorry, I did mean U+00BA, and subscription or superscription of the 
glyph in that character is a matter of glyph choice.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Are these characters encoded?

2001-12-02 Thread Kent Karlsson


> >>  1.) Swedish ampersand (see "&.bmp"). It's an "o" (for 
> "och", i.e. "and")
> >>  with a line below. In handwritten text it is almost 
> always used instead of
> >>  &, in machine-written text I don't think I've ever seen it.
> >
> >This might be a character in its own right, as different 
> from the ampersand
> >as U+204A TIRONIAN SIGN ET.  Or it might be simply a glyph 
> variant of  the
> >ampersand.

No.

> If you have never seen o-underbar in machine-written text, I
> >doubt that this will help your cause much.  You might try 
> U+006F U+0332,

Yes. (But some write "o.", esp. in the rare event this is typed.)

Similarly, COMBINING OVERLINE and COMBINING LOW LINE
should be used, together with ordinary I, V etc. (when possible)
to get "lined" roman numerals.

> >though this will probably not give you the vertical spacing you expect.
> 
> It is certainly not a glyph variant of an ampersand. An ampersand is 
> a ligature of e and t. 

True (both). ("ampersand" is somewhat of a misnomer.)

> This is certainly an abbreviation of och. That 
> both mean "and" is NOT a reason for unifying different signs.
> 
> Having said that, it seems to me that U+00B0 would represent Stefan's 
> character easily enough.

No. It's not a degree sign.  Nor is 00BA appropriate: the underlined o is
not superscripted/raised (much, if at all).

Kind regards
/kent k





Re: Are these characters encoded?

2001-12-02 Thread Michael Everson

Stafan, can you do up a web page or PDF file with samples of the 
"och" abbreviation in different manuscripts and in print? Or is it 
never found in print?
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: "John Hudson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


> >1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> >with a line below. In handwritten text it is almost always used instead
of
> >&, in machine-written text I don't think I've ever seen it.
>
> This is, as your analysis suggests, a glyph variant, not a distinct
> character.

Well, this character is *only* used in Swedish, while & is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is "α" a glyph variant of "a" and "あ?" Or even better, what about "A" and
"Α?"

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Sorry...

2001-12-02 Thread Stefan Persson

It seems that I did something wrong when sending my previous mail, so that
it was sent in multiple copies. Sorry for the inconvenience.

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: "John Hudson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


> >1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> >with a line below. In handwritten text it is almost always used instead
of
> >&, in machine-written text I don't think I've ever seen it.
>
> This is, as your analysis suggests, a glyph variant, not a distinct
> character.

Well, this character is *only* used in Swedish, while & is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is "α" a glyph variant of "a" and "あ?" Or even better, what about "A" and
"Α?"

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: "John Hudson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


> >1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> >with a line below. In handwritten text it is almost always used instead
of
> >&, in machine-written text I don't think I've ever seen it.
>
> This is, as your analysis suggests, a glyph variant, not a distinct
> character.

Well, this character is *only* used in Swedish, while & is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is "α" a glyph variant of "a" and "あ?" Or even better, what about "A" and
"Α?"

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: "John Hudson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


> >1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> >with a line below. In handwritten text it is almost always used instead
of
> >&, in machine-written text I don't think I've ever seen it.
>
> This is, as your analysis suggests, a glyph variant, not a distinct
> character.

Well, this character is *only* used in Swedish, while & is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is "α" a glyph variant of "a" and "あ?"

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





RE: C with bar for "with"

2001-12-02 Thread Yves Arrouye

It may even be a glyph variant of the w with forward slash...
YA

> -Original Message-
> From: Stefan Persson [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, December 02, 2001 3:19 AM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: C with bar for "with"
> 
> - Original Message -
> From: <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: den 2 december 2001 02:16
> Subject: C with bar for "with"
> 
> 
> > Someone said that in English, c-with-underbar means "with". My mom
> writes
> this as c-with-overline.
> 
> Well, then I suppose this is a glyph variant of the c with underbar...
> 
> Stefan
> 
> 
> _
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com





Writing/finding a UTF8, UTF16, UTF32 converter

2001-12-02 Thread Theo

Hi UniCode list,

I am dealing with unicode for XML. I'm sorry if this bothers a few
people, but reading the technical information is not very easy. The
crossings out and underlinings don't help, the information seems a bit
scattered, and the usually interesting information is not linked to in
easy to find places.

I think I have finally found what I wanted, the table:

"Table 3.1. UTF-8 Bit Distribution"

on 

Basically, I want to write some code that can convert UTF8, UTF16, and
UTF32 to any of the other two formats. I suppose I could use UTF32 as a
go-between to reduce the conversion possibilities.

Anyhow, does anyone know of any existing source code that does this
transformation?

I don't feel like using Apple's UniCode converter because it seems so
complex it will probably take MORE work for me to access it, than just
write the conversion code myself. And even then I hear it doesn't do
UTF32, so there is no use. And even then I have to compile my code for
Win32 also, so its even more no use.

If anyone knows of some existing code that does the transformation,
that would help. I might end up re-writing it myself and just use the
code as a working example.

All that bitshifting and bitmasking such should slow down my UTF8/UTF16
processing, is there any accepted good way to speed this up? Some form
of table perhaps?

--
This email was probably cleaned with Email Cleaner, by:
Theodore H. Smith - Macintosh Consultant / Contractor.
My website: 





Re: C with bar for "with"

2001-12-02 Thread Stefan Persson

- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: den 2 december 2001 02:16
Subject: C with bar for "with"


> Someone said that in English, c-with-underbar means "with". My mom writes
this as c-with-overline.

Well, then I suppose this is a glyph variant of the c with underbar…

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Suggestions for next print edition

2001-12-02 Thread juuichiketajin

1. Unicode points are NUMBERS. Numbers can be written in ANY base. Knowing decimal 
values of codepoints is sometimes useful, so please print them in the next edition of 
the Unicode book.

2. There was a Shift-JIS index for kanji. I don't know much about kanji, but it seems 
to me that they are arranged in a-i-u-e-o order of on'yomi. Why not print little 
hiragana letters at the top to aid people searching for a kanji?

Remember how I could not find the "ran" of "randamu" before? Let's see this time... 
Aha! There is is!
I know it was somewhere between "mo(kuyoubi)" and "(fu)ro". Better than stroke / 
radical, I wonder?
* Disclaimer: From what I hear, the Japanese do NOT write "randamu" as U+4E71 U+3060 
U+3080. They use U+30E9 U+30F3 U+30C0 U+30E0. But the first is cuter. ^_^
-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze