Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-10 Thread Michael \(michka\) Kaplan
From: "Kenneth Whistler" <[EMAIL PROTECTED]>

> On the other hand, for many English speakers, "RSVP" is simply
> learned as an unanalyzed verb, pronounced "aressveepee", meaning
> "send a response to this message". And to castigate such speakers
> for politely prepending a "please" to that verb is a little
> too much, don't you think?

We actually know of a person here who used to refer to a bigwig guest as a
"Very VIP person". I don't think she was corrected for some time -- its
entertaining to let them find out on their own, sometimes

MichKa [MS]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Microsoft Windows International Division



Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread Michael \(michka\) Kaplan
This is JScrript tags in HTML -- client side script.

I do not if other browsers have solutions for this problem?

Michael

- Original Message - 
From: "Christopher Fynn" <[EMAIL PROTECTED]>
Cc: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>; "Unicode List"
<[EMAIL PROTECTED]>
Sent: Sunday, November 21, 2004 7:49 AM
Subject: Re: [even more increasingly OT-- into Sunday morning] Re: Unicode
HTML, download


> Thanks Michael
>
> This is useful information. Unfortunately I usually need to use static
> HTML - so I can't use the ASP parts.  It would be nice see something
> like this working on UTF-8 encoded web pages where lang  is defined. In
> most cases knowing the text is a specific language and knowing the page
> is Unicode would let you know which script is being used.
>
> I'd also like to figure out a way to trigger this kind of behavior  in
> other browsers as well as in IE (using Java Script or Java rather than
> VB)  as not quite everyone uses IE - (but I guess you are not going to
> give me any more clues on how to do that :-) )
>
> regards
>
> - Chris
>
>
>
> Michael (michka) Kaplan wrote:
>
> > From: "Stefan Persson" <[EMAIL PROTECTED]>
>
> >>I haven't used M$ IE for many years, though, and my
> >>memory might be wrong.
>
> >
> > Blinded by the misspelling of the product name, maybe? :-)
>
> > See http://msdn.microsoft.com/msdnmag/issues/0700/localize/ and the
section
> > entitled "Choosing Character Sets" for info on what is going on here,
> > particularly firgures 3 and 4 for info on how to script the behavior for
the
> > UTF-8 case
>
> > MichKa [MS]
> > NLS Collation/Locale/Keyboard Technical Lead
> > Globalization Infrastructure, Fonts, and Tools
> > Windows International Division
>
>
>
>




Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread Michael \(michka\) Kaplan
From: "Stefan Persson" <[EMAIL PROTECTED]>

> I haven't used M$ IE for many years, though, and my
> memory might be wrong.

Blinded by the misspelling of the product name, maybe? :-)

See http://msdn.microsoft.com/msdnmag/issues/0700/localize/ and the section
entitled "Choosing Character Sets" for info on what is going on here,
particularly firgures 3 and 4 for info on how to script the behavior for the
UTF-8 case

MichKa [MS]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Windows International Division




Re: basic-hebrew RtL-space ?

2004-11-04 Thread Michael \(michka\) Kaplan
From: "kefas" <[EMAIL PROTECTED]>

> That is easily done by assigning the U-codes to some
> keys on the keyboard, but I dont know how to combine
> this with the pressing and releasing of CAPS.
> MSKBLC.exe , keyboard-layout-creator, does not allow
> for that.

Hmmm actually, if one puts the alternate form (i.e. with the format
character) in the SHIFT state and then marks the "CAPS == SHIFT" checkbox
for each of those keys, then it will work just fine.

Now mind you I am not recommending that anyone do this because I think the
wholesale addition of formatting characters is not a great idea. But I
wanted to correct the claim that MSKLC does not allow it

One could even use SGCAPS support to do this, although on would have to put
the formatting character versions in the base state and the ones without
formatting characters in the SGCAPS states. But I definitely do not
recommend that approach! :-)

> Is what I describe as desirable the visual input
> method? After reading about it I was not clearer.

Most people seem okay with the little bit of flipping about of the space.
Really good touch typists can just type and everything works out, and
hunt-'n-peck/two-fingered typists are mostly just looking at the keyboard
not the screen anyway. Though admittedly people may just be resigned to
things working this way, I think its fair to say that they are accepting it
at this point, even if it is reluctant

> Why don't we leave it to the text2html software to
> enclose a group of R-letters into RLE,PDF as you
> suggest.
> Changing languages during typing should be my own
> concern.
> > ~fantasai
>
>




Re: bit notation in ISO-8859-x is wrong

2004-10-10 Thread Michael \(michka\) Kaplan
Not really a Unicode issue

And not really a bug. Whether one calls the first item #0 or #1 is a
regional or technical matter that is honestly somethind that does not
matter. International standards (like 8859), football fans ("we're #!!"),
and elevators (floor 1 in e.g. the US, floor 0 in e.g. Sweden) are all
arbitrary ways of numbering something.

Now, since the 8859 standard is an 8-bit standard that is documented a
calling them b1 through b8, it is obvious what their decision was. the only
"bug" here is one of user expectations if anyone does not accept their
decision :)

MichKa

- Original Message - 
From: "Cristian Secară" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, October 10, 2004 7:01 PM
Subject: bit notation in ISO-8859-x is wrong


> The bit notation in ISO-8859-x series of standards are noted b8 ... b1.
> What notation is that ? Normally it should be b7 ... b0.
>
> For example, in ISO-8859-15 there is a table that says that b1 equals
> 1. This is wrong: 2^1=2, not 1. It should be b0 instead, where 2^0=1.
> The same mistake is in ISO-8859-14, the same in ISO-8859-14 and most
> likely in all 8859 series.
>
> ???
>
> Cristi
>
>
>
>




Re: Much better Latin-1 keyboard for Windows

2004-07-18 Thread Michael \(michka\) Kaplan
From: "John Cowan" <[EMAIL PROTECTED]>

> http://www.livejournal.com/users/gwalla/39856.html is a page about
> (and a link to) a truly excellent Windows keyboard driver that
> provides full access to the Latin-1 range but is completely compatible
> with the US-ASCII keyboard except for AltGr (the right Alt key).
> All non-ASCII characters and dead keys are available there: for
> example, to get à, one types AltGr-` followed by a.
>
> I can't recommend this too much; I immediately dropped both the US-ASCII
> and US-International keyboards, which I have been using in alternation.
> The only (very minor) problem with it is that for some reason it messes
> up Ctrl-Shift and Ctrl-nonletter key combinations.

Ah, that is the first time I ever saw a use for control chars in the CTRL
shift state!

But. if you load the keyboard in MSKLC, add U+001d to the VK_OEM_6 key
(which is where that bracket is), rebuilt it, and install it, then the
Telnet ESC squence will work properly.

Telnet seems to be depending on some of those control characters being on
the keyboard, and in particular positions (which means lots of the built-in
keyboards would fail since not all of them have these definitions). I'll put
this in as a bug to fix in the next version of MSKLC

MichKa




Re: Looking for transcription or transliteration standards latin- >arabic

2004-07-09 Thread Michael \(michka\) Kaplan
From: "Peter Kirk" <[EMAIL PROTECTED]>

> But Kaplan is referring to something quite different, optionally
> ignoring diacritics in search operations. This is indeed desirable, so
> that a single search can match both Dvorak and DvoÅÃk for example, and
> so that the one doing the search does not need to remember exactly which
> diacritics are used in the name. And it is already covered by the
> Unicode collation algorithm and default table, in which diacritics are
> distinguished only at the second level and so folded by a top level only
> collation.

(a) If this were true and it were the only need, then case folding would
also just be "a UCA issue", yet case folding is in the document.

(b) Not everyone uses the UCA who uses Unicode (most of the corporate
members companies in Unicode -- including IBM -- had alternate collation
methods that existed prior to the UCA and which to this day support more
languages, in their databases and operating systems)

(c) Since the operation (diacritic folding) is a valid one that
implementations may want to do and the UCA is a UTS and thus not required
for Unicode conformance, it is a sensible folding operation to define.

Does diacritic folding destroy information provided by the distinctions that
diacritcs provide? Of course it does. But then again, the same can be said
of all foldings. This does not diminish their potential usefulness in
specific tasks/operations.


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies
Windows International Division




Re: Looking for transcription or transliteration standards latin- >arabic

2004-07-08 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>

> I think it's stupid (in general) to argue for stripping a letter of
> diacritics. If a reader is ignorant of their meaning, that can be
> cured. But if they are meaningful, stripping them is just misspelling
> the words they belong to. Why would anyone want to do that?

I think its inadvisable (in general) to call things stupid merely because
one does not see the need. on the whole, that is a better time to ask the
question than to make the judgment.

There is actually a great deal of both European and American data in
programs like Microsoft Exchange and Outlook, as well as in web search) that
folding away diacritics as a part of giving full lists of possible matches
is indeed preferred by users. Now they would (also) prefer the exact matches
to have priority, but having additional matches without the diacritics is a
common request, and one that has been built into many scenarios.

Formalizing that operation in Unicode is only a bad thing (or a stupid
thing, to use your words) if creating a standard that meets real world needs
(as opposed to ideal typographic or linguistic preferences) is considered a
bad (or stupid) thing.

As far as I know, most of the members of the Unicode Consortium have those
real world use cases as their first priority.

MichKa [MS]




Re: Proposal to encode dominoes and other game symbols

2004-05-26 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 26, 2004 5:35 AM
Subject: Re: Proposal to encode dominoes and other game symbols


> At 13:09 +0100 2004-05-26, Michael Everson wrote:
>
> >Just because someone hasn't put them on a web page (in a clumsy
> >graphic) yet doesn't  mean that it isn't reasonable to wait for them
> >to do so.
>
> RECTE Just because someone hasn't put them on a web page (in a clumsy
> graphic) yet doesn't mean that it isn't *un*reasonable to wait for
> them to do so.

The first version was better, in my opinion. Perhaps we would leave room for
the future, but Unicode and WG2 both have enough work to do without assuming
needs not yet proven and not yet requested.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies





Re: ISO 15924

2004-05-20 Thread Michael \(michka\) Kaplan
<<>>

MichKa

P.S. Hint: the subject is smarter than the body.

- Original Message - 
From: "Mahesh T. Pai" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, May 20, 2004 7:49 PM
Subject: Re: ISO 15924


> Michael Everson said on Fri, May 21, 2004 at 12:56:44AM +0100,:
> 
>  > http://www.unicode.org/iso15294. The Registrar thanks everyone who 
> 
> I am getting a 404 here? 
>  
> 
> -- 
> 
>"Those willing to give up a little liberty for a little security
>   deserve neither security nor liberty"
> 
> 



Re: Response to Everson Phoenician and why June 7?

2004-05-19 Thread Michael \(michka\) Kaplan
I would respecfully suggest that Dr. Stephen A. Kaufman will need to come up
with a more convincing or (and probably and) professional argument than this
one if he wants it to be taken seriously by people who have a very good
understanding of both Unicode and glyphs, and who further have a serious set
of requirements that suggest that Dr. Kaufman's needs may be the same as the
needs of others who would like the script to be encoded.

I doubt neither Dr. Kaufman's expertise nor reputation, but it is clear that
the actual stated requirements have not been discussed, nor has any specific
problem inherent in the encoding been stated by him. He should consider that
if on one side sits convincing arguments and on the other side sits his
brief posting that it is unlikely that his words will sway the committee.


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies


- Original Message - 
From: "E. Keown" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "Deborah W. Anderson" <[EMAIL PROTECTED]>
Cc: "John Cowan" <[EMAIL PROTECTED]>
Sent: Wednesday, May 19, 2004 1:54 PM
Subject: Response to Everson Phoenician and why June 7?


>Elaine Keown
>Tucson
>
> Hi,
>
> I include below the response of
> Prof. Stephen A. Kaufman, one of the world's most
> famous Aramaists, to the Everson Phoenician proposal:
>
> Dr. Stephen A. Kaufman wrote (on the ANE list
> recently):
>
> > Anyone who thinks there has to be a separate
> > encoding for Phoenician either does not understand
> > Unicode or (and probably "and") does not understand
> > what a glyph is.  There are already encodings
> > suitable for all varieties of Northwest Semitic
> > scripts.  One can legitimately argue, as some have,
> > that there are still some problems with the Hebrew
> > and Syriac encodings, but not that we need anything
> > more for the other NW Semitic languages other than
> >some nice FONTS!
> >
> >Steve Kaufman
>
> Why did Debbie suggest June 7 as a the latest date for
> responses?
>
> Elaine
>
>
>
>
> __
> Do you Yahoo!?
> SBC Yahoo! - Internet access at a great low price.
> http://promo.yahoo.com/sbc/
>
>




Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)

2004-05-07 Thread Michael \(michka\) Kaplan
We should rename this thread to the "how offtopic can we be?" :-)

No need to look at the repro, though. The warnings MSKLC gives you are
completely correct and appropriate, and the help file explains both why they
are there and how you can disable them if you are not interested in seeing
the codepage warnings in the View|Options dialog.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

- Original Message - 
From: "African Oracle" <[EMAIL PROTECTED]>
To: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>; "Unicode List"
<[EMAIL PROTECTED]>
Sent: Friday, May 07, 2004 8:24 AM
Subject: Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)


> I recently developed a keyboard named WAZOBIA using the Keyboard Layout
> Creator tool, though the validator therein reported some error regarding
> 1252 incompatability. Is there a way to address this? Will Michael be
> interested in looking at this or a member of his team and advice me on how
> to further enhance it?
>
> Dele Olawole
> www.dnetcom.com
>
> - Original Message - 
> From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
> To: "Unicode List" <[EMAIL PROTECTED]>
> Sent: Friday, May 07, 2004 4:14 PM
> Subject: Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)
>
>
> > From: "Philippe Verdy" <[EMAIL PROTECTED]>
> >
> > > And my comment here was not about Microsoft should manage its business
> >
> > 
> >
> > Its still offtopic. Please take it to "alt.microsoft.sucks" or whatever
> > other forum you feel might be appropriate. :-)
> >
> > Once a thread goes "bad" it is hard to convert it back into something
> good,
> > which is why people from Microsoft might be hesitant to respond to
> Raymond's
> > attempt to get it back on topic. Now if Phillippe ignores the above and
> > spews a long MS-bashing response back at me about evil business
practies,
> I
> > will wish that I had been as wise as my colleagues :(
> >
> > I am not on the IE team and thus cannot speak for them, but as far as I
> can
> > see if you use the conformant W3C method of specifying the font to use,
> you
> > will see both Extension A and Extension B characters. This whole sloppy
> > complaint relates to what a browser does by default if you ask to
display
> > the text using "Arial" (or whatever) and the font does not have the
> glyphs.
> > While I could wish such a feature would work for all scripts, I do not
> lose
> > too much sleep over the lack since most fonts do not support Extension
A,
> > and even if you have fonts that do you almost certainly want one
> best-suited
> > to the appropriate language. If you have bunch of conformant browsers
then
> > all you have to do is list out your font preferences across various
> > platforms and you will be certain to get what you are looking for. The
> > "hack" registry keys are an interim solution to people who want that
> > "automatic font fallback" support and is hardly the best way to do this
> even
> > if it worked well.
> >
> > I would strongly recommend that anyone who suspects there is a problem
try
> > the above prior to posting again -- there is way too much agreement
about
> > the terrible nature of a problem that for all practical purposes is not
> > really a problem. And for myself I would rather concentrate of actual
> issues
> > since every false MS-bashing complaint weakens the ability to find and
> > respond to real complaints.
> >
> > As for GB18030 -- if the government of China feels the support is
> adequate,
> > seems good enough to me. Can we let it lie now? Anyone who has problems
> > should likely take it up with China and not Microsoft.
> >
> > Also, note about statements of misfact:
> >
> > -- Extension A is on the BMP.
> >
> >
> >
> > MichKa [MS]
> > NLS Collation/Locale/Keyboard Development
> > Globalization Infrastructure and Font Technologies
> >
> >
> >
>
>
>
>




Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)

2004-05-07 Thread Michael \(michka\) Kaplan
From: "Philippe Verdy" <[EMAIL PROTECTED]>

> And my comment here was not about Microsoft should manage its business



Its still offtopic. Please take it to "alt.microsoft.sucks" or whatever
other forum you feel might be appropriate. :-)

Once a thread goes "bad" it is hard to convert it back into something good,
which is why people from Microsoft might be hesitant to respond to Raymond's
attempt to get it back on topic. Now if Phillippe ignores the above and
spews a long MS-bashing response back at me about evil business practies, I
will wish that I had been as wise as my colleagues :(

I am not on the IE team and thus cannot speak for them, but as far as I can
see if you use the conformant W3C method of specifying the font to use, you
will see both Extension A and Extension B characters. This whole sloppy
complaint relates to what a browser does by default if you ask to display
the text using "Arial" (or whatever) and the font does not have the glyphs.
While I could wish such a feature would work for all scripts, I do not lose
too much sleep over the lack since most fonts do not support Extension A,
and even if you have fonts that do you almost certainly want one best-suited
to the appropriate language. If you have bunch of conformant browsers then
all you have to do is list out your font preferences across various
platforms and you will be certain to get what you are looking for. The
"hack" registry keys are an interim solution to people who want that
"automatic font fallback" support and is hardly the best way to do this even
if it worked well.

I would strongly recommend that anyone who suspects there is a problem try
the above prior to posting again -- there is way too much agreement about
the terrible nature of a problem that for all practical purposes is not
really a problem. And for myself I would rather concentrate of actual issues
since every false MS-bashing complaint weakens the ability to find and
respond to real complaints.

As for GB18030 -- if the government of China feels the support is adequate,
seems good enough to me. Can we let it lie now? Anyone who has problems
should likely take it up with China and not Microsoft.

Also, note about statements of misfact:

-- Extension A is on the BMP.



MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: Standardize TimeZone ID

2004-04-25 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>
> At 08:49 -0700 2004-04-25, Mark Davis wrote:
> >There is a different committee mailing list than the one for the UTC.
However,
> >for the public [EMAIL PROTECTED] list it didn't seem worth having
separate
> >public list yet. (After all, much of the material on [EMAIL PROTECTED]
is
> >general globalization discussion -- and often even pretty far off
> >that topic :-)
>
> I do not want to have to argue the merits of Breton or Irish month
> names on the Unicode list. And that's what locale-building is about.
> The Unicode list should be for discussion of encoding characters and
> processing them. There should be a locale list for discussion of all
> the language tags, country tags, and other locale baggage.
>
> Please, Mark.

I find myself in the [rare? ] position of agreeing with Michael Everson
wholeheartedly. Seems like those who want to combine them in a huge mishmosh
can simply belong to both lists, right?

MichKa




Re: Standardize TimeZone ID

2004-04-24 Thread Michael \(michka\) Kaplan
If the officers believw that rather than the UTC that a new committee is
needed to "govern" the repository, then it stands to reason that the Unicode
List is the wrong place for locales  a separate list for discussions
related to that standard is the most sensible approach.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

- Original Message - 
From: "Asmus Freytag" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>; "Frank Yung-Fong Tang"
<[EMAIL PROTECTED]>
Cc: "Unicode List" <[EMAIL PROTECTED]>
Sent: Saturday, April 24, 2004 6:07 PM
Subject: Re: Standardize TimeZone ID


> At 05:04 PM 4/24/2004, Mark Davis quoted a message by Frank:
> >I know this is a little bit off-topic for Unicode, just like the one
> >about locale. Maybe I should move this to w3c i18n mailling list
>
> Now that the common locale data repository is hosted by The Unicode
> Consortium, it may no longer be as off-topic as you think
>
> A./
>
>
>
>




Re: OT? Languages with letters that always take diacriticals

2004-03-16 Thread Michael \(michka\) Kaplan
From: "Marion Gunn" <[EMAIL PROTECTED]>

> Irish in Roman script is written i with dot above, Irish in traditional
> script is written i without dot above. The current flooding of our local
> advertising and publishing markets by various non-native uncial fonts to
> write our language goes against tradition in imposing on us that unwanted
> dot. Is there any way at all that using Unicode can help support our
> tradition?

It seems that Unicode has supported the dotless i for over a decade -- and
in software products that use Unicode for the bulk of that time. Your
tradition is supported by anyone who believes it important enough to take
the trouble to support it (where trouble in this context is defined as
installing just about any remotely recent OS that supports Unicode!).

You can tell them that the ball is now in their court. We can lead the horse
to water, but...

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Michael \(michka\) Kaplan
From: "Antoine Leca" <[EMAIL PROTECTED]>

> Well, I do not believe this is the most adequate place to discuss this,

For what it is worth, I agree with you on that point. :-)

> Now, the file in XP is still exactly 262144 bytes in size. To me, this is
> evidence that only the BMP characters did receive weights in this file.

Actually, it is evidence only that the original *default* table architecture
cannot contain "tailorings" for anything off the BMP (note that they can
conceivably have a default weighting via the surrogate code units that are
on the BMP).

> Since SORTTBLS.NLS is still a ridiculous 20 k in size, it does not hold
the
> weights.

Well, this assumes (1) that there is no other mechanism added and (2) there
are only large scripts in the world beyond the BMP. The former may well be
false and the latter is definitely false. :-)

In any case, a few of your assumptions are already mistaken, and most
importantly you seem to be focusing on collation which, though cool as a
topic, is not related to Unicode conformance at all so we are very far
afield of the original question in the "black box" investigation you have
done.

> Now, what I do not know is:
>
>  - if the Win32 NLS API has been fully upgraded to Unicode 3.0 for XP. I
was
> thinking that when I did research it earlier today, since the sizes of the
> .NLS files did accordingly increase, but since I did not find the relevant
> KB article I was not sure. Michael's approximate answer (I beg your pardon
> if this was not the intent) that may lead to think it is an almost-full,
> almost-empty pot, is not a very good news

Again, this is very far afield -- and it is interesting that just about
every person who has read this thread takes the question to mean something
different. You have specifically chosen to take an NLS API that does not
relate to Unicode conformance -- so the question "does it support Unicode
version _?" is hard to cast appropriately.

See Peter Constable's excellent email for more on the actual way the
question should be asked to be able to get a good answer.

>  - what is the status with NT 5.2 a.k.a. Server 2003, since I do not have
> access right now to this version. A quick look to the size of SORTKEY.NLS
> would give some hints: 256 Ki would say it is still at 3.0 level, 768 Ki
> (Plane 0, 1 and 2, perhaps with some adjust to cover the delightful plane
> 14) would be an indication it supports meaningful surrogates without heavy
> changes to the scheme, 4352 Ki (4.25 Mi, 17 * 256Ki) would say the
> programmer did extend the table without even thinking about how to
optimize
> it (I do not think it happens, but who knows), and some much smaller size
> would mean the algorithm was revised!

You would likely be disappointed, as the size of sortkey.nls has not
changed. But let me mention that I am the dev. owner of those file and the
APIs associated with them and I can assure you that this is not an accurate
measure of Unicode support on Windows

>  - by the way, the same question can be asked with the beta releases of
> Longhorn. However, there is not much point trying to nail down the level
of
> Unicode support of a beta.

Microsoft does not generally comment on unreleased products.


-- 
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies





Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Michael \(michka\) Kaplan
This is one of those really hard to define questions, given how many pieces
actually define what gets bundled under the term "Windows." The KB article
is not being entirely fair by taking the lowest level that exists and
assuming that the entire platform must be designed that way. Given that core
Windows fonts and collation tables include charcacters allocated after 2.0,
there are clearly pieces of Windows that support later versions.

What is fair to say is that Windows 2000 supports between Unicode 2.0 and
3.1, depending on what are of Windows you refer to.

For sortkey.nls -- that file does not ever change in size, as it is not a
file that one adds characters to. I suppose if you wanted to look to file
sizes to indicate additions you could look to sorttbls.nls, locale.nls,
unicode.nls, ctype.nls, and the various font files. But file size without
understanding what the file does still seems like a crude way to determine
complexity.

-- 
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies



- Original Message - 
From: "Antoine Leca" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, March 05, 2004 5:35 AM
Subject: Version(s) of Unicode supported by various versions of Microsoft
Windows


> Hi folks,
>
> I discovered, to much of my surprise (but after reflexion it does hold
much
> sense, taken in account the dates when it were developped), that Windows
> 2000 only support The Unicode Standard, version 2.0
> http://support.microsoft.com/default.aspx?scid=kb;EN-US;227483>
>
> The question, I was unable to find a similar information refering to
Windows
> NT version 5.1 and 5.2.
>
> Certainly people here may direct me to the correct place to find it.
Thanks
> in advance.
>
>
> (Please, do not tell me "it supports 4.0 since you can view 4.0 provided
you
> use the correct browser and the correct fonts"; that is NOT what I want to
> know. I am interested for example in sorting strings with surrogates;
seeing
> that in a typical WinXP distribution, %SYSTEM32%/SORTKEYS.NLS is still
256k
> like it was with NT3.x, shows me that this one would not support Unicode
> 3.1, for instance).
>
> A similar query has been directed to Dr. International
> http://www.microsoft.com/globaldev/drintl/askdrintl.aspx>
>
>
> Antoine
>
>
>




Re: [OT?] Modifying (Unicode) sorting of languages using diacritics in MS Word and MS SQL Server

2004-02-27 Thread Michael \(michka\) Kaplan
From: "Patrick Andries" <[EMAIL PROTECTED]>

> [PA] Yes, the GUI tool is very nice. So easy to use in theory that I
> don't understand why it is only available in English (i.e. one does not
> need to be a techie and "thus" know English to be able or want to use
> this tool).

Well, the tool is localizable, but none of the subsidiaries have yet
expressed the desire. One never knows what the future may hold though

> [PA] Let me be reasonable as you kindly suggest, how about proper French
> Canadian  (CAN/CSA Z243.4.1 standard (which you most probably know) and
> ISO/IEC 14651 with the delta corresponding to the latter) or Khmer sorting
?

I am unaware of any specific non-conformant pieces in Windows in regard to
the former standard.

Proper linguistic sorting for Khmer is a *lot* more complicated. It is not
there now, definitely.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies





Re: [OT?] Modifying (Unicode) sorting of languages using diacritics in MS Word and MS SQL Server

2004-02-23 Thread Michael \(michka\) Kaplan
From: "Patrick Andries" <[EMAIL PROTECTED]>

> I have the same question for MS SQL Server 2000...

Similar answer to the one Chris gave for Word, though with a slightly older
version of the Windows sort tables

> Finally, I would like to know if it is possible for a user  to add
> an additional language to the ones appearing in the Windows regional and
> language options, so as to assign to it, for instance, some keyboard
> layouts.

This is not currently possible. But the user can certainly create a new
keyboard (now with an easy GUI tool) and the system will handle all that is
typed with it.

> P.-S. : Do Word, SQL Server 2000 and the Regional and Language options
> window support all Unicode 4.0 associated languages as far as proper
> sorting and addition of keyboards are concerned ?

It is hard to know what you mean here -- are you asking for when every
single character in Unicode 4.0 will be in some keyboard and some
linguistically appropriate sort, all built into Windows? Or did you have a
more practical (and reasonable) target in mind?

> If not, when will these products do so ?

Well, see above.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: Install regional language setting options of system through program

2004-02-13 Thread Michael \(michka\) Kaplan
Not really a Unicode question, but

For just adding keyboards, you will likely be able to more easily use the
LoadKeyboardLayout API. To change individual settings within a chosen user
locale, you can use the SetLocaleInfo API.

For other settings, see
http://support.microsoft.com/default.aspx?scid=kb;en-us;289125 which
describes the format and syntax to use in order to call intl.cpl (Regional
Options) in an unattended mode for adding language support and keyboards.
Note that there are many things you cannot do, such as remove keyboards and
affect individual user locale settings.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies


- Original Message - 
From: "sonia" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, February 12, 2004 8:54 PM
Subject: Install regional language setting options of system through program


Is there any way to install language settings and add various input locale
languages by programming in visual basic.net.
I want to design a porgram so that Users  have no need to go into regional
settings option to install various script and add languages in input locale.
is it possible?




Re: UTF8 locale & shell encoding

2004-01-16 Thread Michael \(michka\) Kaplan
From: "Philippe Verdy" <[EMAIL PROTECTED]>
> From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>

> > This is incorrect. UTF-8 support did not exist until Win98 for the Win9x
> > family, and did not exist until NT4 for the WinNT family.
>
> Even after system updates?

Even after system updates. The support does not exist in either Win95 Gold
or Win95 OSR2.

This has nothing to with ACPs, OEMCPs, FAT, FAT32, networking, security,
instability, or any of the myriad of issues you raised here -- I just looked
at the source code and it does not exist in the Win9x tree until Window 98.

I can keep correcting mis-statements you make about this fact if its
important to you, but we should probably take it off the Unicode List at
that point.


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies





Re: UTF8 locale & shell encoding

2004-01-16 Thread Michael \(michka\) Kaplan
From: "Philippe Verdy" <[EMAIL PROTECTED]>

> Exactly. However the conversion to UTF-8 from UTF-16 (the Windows
"WideChar"
> encoding used in the Win32 Unicode API) is supported natively in
> MultibyteToWideChar() as if it was a SBCD/DBCS character set, even on
> Windows 95.

This is incorrect. UTF-8 support did not exist until Win98 for the Win9x
family, and did not exist until NT4 for the WinNT family.

There are several other errors related to Win9x in this message but this is
the most glaring.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies





Re: Pre-1923 characters?

2004-01-03 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>

> At 09:03 -0800 2004-01-03, Peter Kirk wrote:

> >But in the light of naming errors like this one implementers should
> >be advised not to use character names, because they are not reliably
> >helpful.
>
> I wouldn't say that. It would better to advise them, as we do, that
> they cannot rely on the names being perfect. That's different from
> not using them at all.

It makes me wish we had a CouldaWouldaShoulda_CharacterName property that
contains what the name ought to be, and we document this as one that *will*
change any time there is a mistake made in the original character name. We
just make a nice informative property and go through all of our known
mistakes and the maintenance after the initial pass should be minimal

MichKa




Re: Windows and MacOS keyboard layouts in human-readable format?

2003-12-30 Thread Michael \(michka\) Kaplan
From: "Adam Twardoch" <[EMAIL PROTECTED]>

> I agree with Michael that the simplistic approach I have envisioned would
be
> rather incomplete -- I'm willing to accept that limitation. I am aware of
> many issues involving IMEs, "chaining" dead keys etc. I would be willing
to
> leave them out of the scope, i.e. I'd be willing to accept as much as it
> gets within a somewhat "flat" table.

Fair enough. Note that even normal dead keys will not be very useful in a
flat table -- you will have to have a way to map two keystrokes to one
UTF-16 code point,

> I have since then read some descriptions of Windows API functions. For
> example, I found out that the Windows user32.dll function VkKeyScanEx
> translates a character to the corresponding virtual-key code and shift
> state. This makes me conclude that Windows developers already had this
kind
> of "back-lookups" in mind, and were willing to accept the limitations.

Well, I never said it was impossible to get the information (in fact MSKLC
uses the keyboard APIs in user32 extensively to do its work to "load from an
existing layout"). Just that it is nowhere near a flat table, by any means.

> (Michael,)
>
> from my understanding, when I have a Unicode character, I should be able
to
> translate it to a virtual key code via VkKeyScanEx, then the virtual key
> code to a scan code using MapVirtualKey, and finally translate the result
to
> a human-readable name using GetKeyNameText. Is that a correct approach?
Does
> anybody happen to have a ready-to-use function (C++, VB, Python)? :D

This will not work well for several reasons:

** GetKeyNameText only works witth the current layout and there is no
GetKeyNameTextEx. If you ask me, GetKeyNameText is a little silly API, which
relies on the strings added in each individual DLL (some localized, some
not). The "localization" not very consistent between keyboards, either. You
would essentially be reading information from specific keyboards and then
trying to describe them as one particular one (whichever one you happened to
be using as your keyboard in the current thread).

** Another problem is in using VkKeyScanEx here. Do you plan to throw all of
Unicode at every single keyboard?

** Yet another problem with this approach is that none of the dead keys will
show up with it; it will only find single characters that came from single
keystrokes.

** And yet another problem is that no ligatures will work (same problem as
above).

Could we take a step back for a moment? If you could clearly explain what
you planned to do with the files it might be easier to suggest a course of
action. All I can do with the current plans is poke holes in the problems
based on my own experience with "keyboard interrogation".

You may also want to mark the thread "OT" since its not really much to do
with Unicode


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: Windows and MacOS keyboard layouts in human-readable format?

2003-12-30 Thread Michael \(michka\) Kaplan
From: "Doug Ewell" <[EMAIL PROTECTED]>

> Michael is right, of course, that the differences between keyboards
> amount to more than just the basic layout and Shift and AltGr keys.
> There are custom shifting keys, dead keys, and unusual uses of Caps
> Lock, not to mention Far Eastern IMEs.  But I don't think the whole idea
> needs to be scuttled because of these problems.  Let's see if we can
> expand Adam's chart to cover some of these extra needs.

Adam was looking for this work, already done. And he gave a sample of what
he was hoping would exit, out in the wild. So I was merely pointing out a
bunch of reasons why this was not a practical thing to want, so he should
not be too disappointed when he did not find it (since by its very nature it
would be incomplete).

If there is interest in trying to develop a format and subsequently fill it,
then that is great. But is it really a good idea to discuss such a project
in *this* forum? And does it really meet the criteria Adam laid out?

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: Windows and MacOS keyboard layouts in human-readable format?

2003-12-29 Thread Michael \(michka\) Kaplan
From: "Philippe Verdy" <[EMAIL PROTECTED]>

> If the intent is to display in a user interface which keystroke the user
> must press to create a character sequence it can be useful to know the
> character generated in the default state without modifiers (or the
character
> generated in CAPSLOCK mode).

Since the majority of keystrokes in ALL keyboards are created in different
shift states, I am not sure why this relates to the question that Adam was
asking at all (he clearly referenced AltGr out of the starting gate).

> This is an issue for example when creating "accelerators" or keyboard
> shortcuts in an application and one wants to display this shortcut in menu
> items or in button tooltips. (For example, how do you display the
"mnemonic"
> in Java menu items, or the "&"-selected character in Windows resources, fo
> menu items that display non Latin strings such as Chinese or Arabic? You
> need to display it at end of the menu item, and the underlined convention
> for mnemonics is not usable. But which string will you display? You need
> this information support from the keyboard driver itself or an OS API).

This is an interesting but mostly unrelated topic to either Unicode in
general or Adam's question in particular, but if you understand IMEs and all
of the existing CJK input methos then you know that it is impossible to have
CJK accelerators or keyboard shortcuts.

> Designing keyboard shortcuts that will work in internationalized
> applications is a difficult problem for the localization of applications,
> and there's no way to do it without knowing the exact keyboard layout
> actually used, but that may not be knowed precisely at run-time, or may be
> customized by the user. Not supporting shortcuts is also a problem for
> accessibility.

Very true -- but also off of the topic -- its not polite to take over a
thread and push it off in unrelated directions -- lets stick to Adam's
query, here. K?

> Unfortunately, Windows does not help the application to select appropriate
> shortcuts to use to match some prefered "accessible characters present in
> menu items. There's no automatic generation of these shortcuts from menu
> item strings, and no automatic display.

Um, I guess you mean application tools, right? Again offtopic, but this is a
problem that is not very easy to solve and which will often fail even when
it is solved since it assumes that the language to which an application is
localized is also the input language (which is often false).

Lets drop this one, since it is REALLY not what was being asked. Hijacking
of threads is poor netiquette.

> > 2) The fact that many keystrokes produced more than one keystrokes
(called
> > ligatures there athough the technology can apply to code point
> combinations
> > that do not, in fact make up ligatures)
>
> I suppose you speak about complex keyboards that generate variable-size
> sequences of characters from multiple keystrokes, using complex input
> methods, such as Pinyin input method editors for Chinese.

No, I do not. I am speaking of the fact that any keystroke can be mapped to
between one and four UTF-16 code points. No CJK implied or expressed, as
IMEs have their own unrelated rules.

> > 7) No way to explain SGCAPS use of the CAPS lock, used by Hebrew, Czech
> and
> > others.
>
> Also the Swiss German keyboard...

Actually, ironically, no. The "feature" was added for the sake of that
keyboard (thus the "SG" in the name) but it was not, eventually, used there.
The name simply stuck.

> > 8) No way to describe "custom" shift keys like seen in the [unfortunate]
> > Canadian Multilingual Standard keyboard
>
> Do you mean here a sort of a second AltGr modifier mapped onto an OEM key?

Yes

> Or about the many dead keys it supports to enter accents used in various
> languages?

No

> > I could go on. but you get the idea. There is no simple list because
there
> > is no simple format that can describe them.
>
> Notably the complex keyboards that support multiple scripts or large
> scripts: would you show Kangxi radicals on a Chinese keyboard used with a
> ideographic radical composition mode, or the ASCII keycap in the Romaji
mode
> of a Japanese keyboard?

Again unrelated to keyboards -- this is an IME issue, not a keyboard issue.
It is also nothing to do with Adam's question.

Lets get back on topic, please


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies





Re: Windows and MacOS keyboard layouts in human-readable format?

2003-12-29 Thread Michael \(michka\) Kaplan
Such a format for Windows would be quite inadequate since it is missing many
things, such as:

1) The version of Windows in which it first shipped (there were minor
differences in what was in 9x vs. NT, and on NT some characters were added
to keyboards in later versions).

2) The fact that many keystrokes produced more than one keystrokes (called
ligatures there athough the technology can apply to code point combinations
that do not, in fact make up ligatures)

3) The fact that in many cases combinations of keystrokes produce a single
code point (called dead keys) -- best described by the many combining
characters that go with individual base characters

4) No completely consistent description of OEM keys that is based on letters
that would work for all keyboard hardware

5) Naming inconsistencies between different versions (rare but present on
occasion)

6) No good way to explain when CAPS lock is being used for casing, etc.

7) No way to explain SGCAPS use of the CAPS lock, used by Hebrew, Czech and
others.

8) No way to describe "custom" shift keys like seen in the [unfortunate]
Canadian Multilingual Standard keyboard

I could go on. but you get the idea. There is no simple list because there
is no simple format that can describe them.


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies


- Original Message - 
From: "Adam Twardoch" <[EMAIL PROTECTED]>
To: "Mailing List Unicode" <[EMAIL PROTECTED]>
Sent: Monday, December 29, 2003 2:03 PM
Subject: Windows and MacOS keyboard layouts in human-readable format?


> Do you know if there are human-readable versions of Windows and/or MacOS
> keyboard layouts available somewhere?
>
> I'm looking for a way to compile a table that could look a bit like the
> following:
>
> Platform LanguageLayoutUnicodeKeystroke
> WindowsPolish Polish (Programmers) 0105AltGr+A
> WindowsPolish Polish (Programmers) 0041A
> ...
>
> where I could, for example, look up which exactly keyboard layouts let the
> user input, say, a with acute, and how he can do that.
>
> Thank you in advance,
> Adam Twardoch
>
>
>
>




Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

2003-12-11 Thread Michael \(michka\) Kaplan
From: "Peter Kirk" <[EMAIL PROTECTED]>

> Here I disagree. As an application programmer writing for example some
> kind of linguistic application, it is totally irrelevant to me how much
> actual storage a string takes. Such things should be hidden away from me
> by several levels of system software and compilers. An application
> programmer doesn't even need to know what this concept means! Seriously!
> Beginners, even young children, can be taught simple programming and
> string handling without knowing anything about bits and bytes, certainly
> without having to know whether the e acute they just typed is stored as
> one byte or two.

I think you are mostly mistaken here. All of the programmers I know (i.e.
script kiddies need not apply? ) call APIs. The bulk of those APIs
deal with APIs that have no notion of any of this. They take LPWSTR or WCHAR
* and a developer who does not know what those are or who incorrectly
assumes that they are grapheme clusters will not be able to function very
effectively.

> Just as people can and do learn to drive cars without
> knowing anything about the nuts and bolts or how the engine works.

I think this is more like knowing how to fill the car with gas than knowing
innards. Most programmers (even ones who DO deal with graphene clusters)
need to be working below the level to which you are referring here.

MichKa [MS]




Re: Unicode Conferences - Which one?

2003-12-10 Thread Michael \(michka\) Kaplan
I would have to say (as someone who is presenting at both of them!) that it
really depends on the platform upon which you are developing and to some
extent how much you really feel you need to know about particular topics.

The GDDC will cover more info about Microsoft technologies than the IUC
could possibly do, but the IUC casts a wider net and covers platforms beyond
Windows, etc. Because of that, tts not even really good to consider it an
either/or proposition (unless there are finanical reasons to do so) as each
has a unique focus that allows them to complement each other.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies


- Original Message - 
From: "Tom Tran" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, December 10, 2003 9:03 AM
Subject: Unicode Conferences - Which one?


I need to get up to speed quickly on globalization and localization.  I've
heard about the Unicode Conference and the Microsoft Global Development and
Deployment Conference.  Does anyone have any advice on which one would be
more useful to attend?

Tom





Re: Fonts on Web Pages

2003-12-02 Thread Michael \(michka\) Kaplan
From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I use Microsoft WEFT to embed fonts.  I have had complaints that it does
not
> run on non-Windows platforms but then Bitstream does not either.  The
> problem with Bitstream is that it requires an active-x control to be
> installed and many people will not do that.  WEFT is also free.
>
> Try http://www.microsoft.com/typography/default.mspx

The version of WEFT that is up there was having problems with Extension B. I
was working with the Borware folks (reporting a bunch of bugs) and they gave
me a later build that I used to generate the "Surrogate IME" code charts at:

http://i18nwithvb.com/surrogate_ime/code_charts/

Note there are also non-EOT options. EOT files (usually quite small) get
much larger for pages of this size. There were some bugs in Mozilla
originally which were reportedly fixed but I have not installed Mozilla
recently to verify


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Michael \(michka\) Kaplan
I know I'll end up regretting this

From: "Philippe Verdy" <[EMAIL PROTECTED]>

> That far? So why isn't there correct support of UTF-16 on Windows
> 95OSR2, 98, 98SE and ME (notably for their FAT32 filesystem)? I can
> understand it for Windows 95 and 95OSR1 as they were designed before,
> and may be also for the 95OSR2 version despite it was published in
> late 1997, one year after UTF-16 was published.

If you have programmed on Win9x for any length of time then you know the
answer to this and it is simply a straw man to ask why such support was not
done. Its the fundamental nature of what that system does and it would
require a rewrite to support Unicode -- in fact, it did require a rewrite
(called NT).

There is MSLU, but it is designed for compatibility with Unicode programs on
Win9x, and has the same limitations of non-Unicode support on NT.

> Still I cannot conceive that effectively Windows 2000 disables its
> support for UTF-16 and keeps on using UCS-2 only. This is still
> the case in Windows XP, and Media Center, and I wonder if this
> is still missing in the new Windows 2003 Server or in the next
> coming Windows 2003 workstations.

No, it is not still the case.

And the only this that is disabled is the rendering. The ability to convert
between GB18030 and UTF-16 is built into XP and Server 2003, and it is
available via an installable for windows 2000.

> How can Microsoft claim they support GB18030 in China in Windows
> XP, Media Center, or Windows 2003 then? Is this support restricted
> to only some APIs directly related to text presentation and handling
> for the GUI (like in GDI+ and UniScribe) but not enforced in more
> system APIs like the NTFS filesystem?

See above. MS not only "claims" support but the support was certified by the
apropriate PRC agency. Is it really necessary to go on about it here, in
relation to unrelated questions about "support of Unicode 4.0" ?


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Michael \(michka\) Kaplan
You are correct, Mark. I could probably intrigue people with tales of
attempts at file systems that change their rules based on locale settings,
but mostly it would just cause nightmares for anyone who understood what a
bad idea that would be. Suffice to day that Windows will not boot if "I" !=
"i" in a case-insensitive comparison. :-)

To answer the original question, support of Unicode in *any* version of
Windows (or indeed any operating system) is between 1.1 and 4.0, depending
on what feature you are looking at. To answer such a question, the specific
feature about which the questioning party is thinking must be given as a
part of said question.

I would not expect Windows (whose most recent shipping version shipped
before Unicode 4.0 was released) to support 4.0 properties and such. But at
the same time, if you have fonts and build a keyboard you can support any
number of 4.0-only scripts.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

- Original Message - 
From: "Mark E. Shoulson" <[EMAIL PROTECTED]>
To: "Arcane Jill" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, December 01, 2003 5:57 AM
Subject: Re: MS Windows and Unicode 4.0 ?


> Shouldn't it permit "assa" and "aßa" to co-exist?  It isn't like ß is
> canonically equivalent to ss (if I read the file aright, it isn't even
> compatibility equivalent).  It's a language-dependent choice to regard
> them as equivalent.  I'd guess that should be the responsibility of the
> de_DE localization package or something.
>
> ~mark
>
> On 12/01/03 05:26, Arcane Jill wrote:
>
> > The current Windows OS still stores filenames as strings of
> > sixteen-bit wide words (not codpoints; not characters). It allows
> > filenames "assa" and "aßa" to coexist in the same folder, despite its
> > claim to being case-insensitive, and I have even managed to create
> > filenames containing unmatched surrogate codepoints and noncharacter
> > codepoints.
> >
> > Jill
>
>
>
>




Re: Unicode for Windows CE

2003-11-29 Thread Michael \(michka\) Kaplan
I also sincrely doubt that MSKLC will create keyboards that will work on a
CE device, to tell you the truth. Maybe they do, but they have never been
tested there and I would be surprised if they had no problems (never forget
the First Tester's Axiom!).

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

- Original Message - 
From: "Christopher John Fynn" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, November 29, 2003 10:35 AM
Subject: Re: Unicode for Windows CE


>
> > Thanks for the link. It is good to know that MSKLC can be used for
creating
> > Keyboard Driver for WinCE. But is it true only truetype fonts can be
used. No
> > OTF?
> > Thanks and refgares
> > Mustafa Jabbar
>
> I doubt that PostScript flavour OpenType fonts can be used since that
would
> require some form of Adobe Type Manager in Windows CE. Simple TrueType
flavour
> OpenType fonts that don't require Uniscribe probably work but for complex
> script layout for scripts such as Bangla / Bengali there would have to be
the
> equivalent of USP10.DLL  running in Windows CE - and I've never heard of
> anything like that.
>
> You'd have to try asking on the MS Volt list or someone in Microsoft
> Typography.
>
> Most of what I see listed on the MS web-site is about support for east
asian
> (CJK) scripts in Win CE - nothing so far about any complex Indic or Arabic
> scripts.
>
> Personally I wouldn't expect support for complex scripts like Bengali to
appear
> in Windows CE until some time after all the main complex scripts are fully
> supported in Windows XP.  Uniscribe (USP10.DLL) is constantly being
updated
> with support for new scripts and it would seem to make sense to make a
version
> for Win CE only once Uniscribe already has support for more or less all
the
> scripts they plan to support. That is unless there is a huge commercial
demand
> for complex script support in Win CE and it is both practical and
commercially
> worth while for them to implement it.
>
> OpenType fonts for complex scripts on Windows CE would  need very good
hinting
> and ClearType to be useable since text is rendered at a small size. There
is
> probably also the issue of getting handwriting recognition for scripts
like
> Bengali to work well since that is the main input method for many CE
devices.
>
>  - Chris
>
>
>




Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Michael \(michka\) Kaplan
From: "Frank Yung-Fong Tang" <[EMAIL PROTECTED]>

> so.. in summary, how is your concusion about the quality of "GB18030"
> support on IE6/Win2K ? If you run the same test on Mozilla / Netscape
> 7.0, what is your conclusion about that quality of support?

In Summary?

Well, in summry, I fail to see how testing for NCRs has anything to do with
suport of *any* encoding in a browser. It seems like an inadequate test of
functionality of "gb18030" support.

If you want to test gb18030 support, then please encode a web page in
gb18030 and test *that* in the browser of your choice.

Now if you want to discuss NCR support then that may also be interesting.
But it would be nice to have tests that actually cover what they claim to
cover

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Michael \(michka\) Kaplan
From: "Philippe Verdy" <[EMAIL PROTECTED]>

> As far as I know, an application has little control on the subset of
> character it can accept from an IME, or keyboard driver, and if some
> characters in the generated combining sequence are ignored, and some other
> are accepted, it creates a new sequence which may not be appropriate in
the
> target subset.

Once again, you are not really speaking acurately to what the system is
doing. Though at least you ar tempering your words a bit more than you used
to (e.g. "as far as I know...") so that people do not think you are speaking
definitive facts.

In this case, a Unicode application can of course accept anything. A
non-Unicode application is pretty much limited to characters within the
default system code page.

HOWEVER: Note that before one can switch to a particular keyboard layout on
Windows, a WM_INPUTLANGCHANGEREQUEST message is recieved, asking for
permission to change to a given language. The application can simply not
accept the change, in which case it will not happen. I would humbly suggest
that this gives the application a LOT of control over the subset of
characters it will receive.

HOWEVER REDUX: There is really no case where the behavior you suggest ("if
some characters in the generated combining sequence are ignored, and some
other are accepted, it creates a new sequence which may not be appropriate
in the target subset") happens in CJK, and almost no case where it happens
outside of CJK. What actually happens is the the Unicode character is
converted to the default system code page, which will mostly produce
question marks for all of the characters off of that code page. The one
exception is cases of "best fit" mappings, but this is really not a CJK
phenomenon.

> So clearly this is not a Unicode issue, but an issue with the usability of
> keyboards and IMEs with all applications that are assuming the complete
> support of the keyboard subset in the text they accept (if you don't know
> what I mean, just look at the remaining number of games that are just
> interpreting keycodes or that are assuming a US keyboard layout, and that
> are hard or impossible to use with their default keyboard control
> assignments, as they are mapping for example Alt+digit keystrokes, when
some
> keyboards will not be able to generate these keystrokes without an
> additional shift key modifier, and you'll get a good example about the
> usability of a enhanced keyboard in a context where it was assumed that
all
> keyboard sequences were possible).

Such behavior is an application limitation that is even beyond the
limitations of non-Unicode apps. But thisr really has nothing to do with the
original question -- which as actually for a UNICODE application. This
strange CJK tangent is a thread best abandoned.


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Michael \(michka\) Kaplan
From: "Patrick Andries" <[EMAIL PROTECTED]>

> I would like to input arbitrary hexadecimal Unicode values in an
application (XMetal) which does
> not seem to use the RichEdit control.

See my other post for the answer to this question.

> Unfortunately, I don't seem to be able to key in a large decimal value
(outside of win 1252) using
> the ALT+0xxx convention in XMetal (I'm on a US Windows XP). Is this normal
?

No idea about this one, though.

> Is it possible - I suspect not - to use the Keyboard Layout Creator to
specify a similar behaviour
> to the RichEdit control or the standard ALT+? Something
like ALT+X+ number> would correspond the Unicode character associated to that hex
value. Would be useful,
> I think.

Its not possible with MSKLC, sorry. Though one could create a keyboard
layout through the Windows DDK that does all kinds of intelligent processing
that you provide the code for. I've never known anyone to do it in a
production system (and I have done it myself only once just to prove it
would work), but it would in theory be possible

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Michael \(michka\) Kaplan
From: "Peter Kirk" <[EMAIL PROTECTED]>

> On 15/11/2003 09:08, Philippe Verdy wrote:
>
> >... (also in Hebrew and Arabic
> >for vowel points and marks which should better be entered with a logical
> >order after the base letter, even if they produce combining sequences to
the
> >application through the WM_CHAR event).
> >
> >
> >
> I'm not sure of your point here. In both Hebrew and Arabic vowel points
> come logically after their base characters and so in the same order as
> in Unicode - with a few insignificant exceptions. The canonical order of
> points after any one base character is not the logical typing order, but
> as canonically equivalent sequences are supposed to be processed the
> same there is no need for a keyboard to ensure that it generates the
> canonical order.

He was thinking about dead keys under Windows, and assuming that a Hebrew
keyboard would use the vowels and points as dead keys. Now there is no
keyboard I know of that does this, but it is what he was thinking

MichKa




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Michael \(michka\) Kaplan
From: "Philippe Verdy" <[EMAIL PROTECTED]>

> May be that's something to suggest to Microsoft for inclusion in a future
> version of its wonderful MSKLB tool,

You mean MSKLC? I hope the "B" in your exposition stands for "Bodacious" or
something

> which works great to create keyboard
> drivers for alphabet/abjad scripts that use dead keys and AltrGr (or
> Alt+Ctrl on US keyboards),

Have you tried the Right "Alt" key on such a keyboard? It acts as an AltGr
based on the layout, whether it says AltGr or not. Check out the help
file

> but cannot be used to create complex input
> methods, where it is impractical to use dead keys (also in Hebrew and
Arabic
> for vowel points and marks which should better be entered with a logical
> order after the base letter, even if they produce combining sequences to
the
> application through the WM_CHAR event).

What are you talking about? Both the built in Hebrew keyboard and any
keyboard you create can handle all of this just fine, and DO. Dead keys
would be inappropriate, so its great that they do not use dead keys here -- 
you type the base letter than if you need a vowel or a point you type that.

> The  keystroke sequence could be added to
> the existing support of  and  code> keystrokes... and featured in a update for all supported keyboard
> drivers for Windows.

Nice when people who have not looked at the source can talk about what could
be added. :-)

One has to be careful about how much system support is added for things like
this, as there are those who use keystrokes for their own purposes that can
interfere. Others have noticed that such features seem to fade in and out
from time to time as they are attempted.

> Of course, other complex input methods like the ones for Chinese, Japanese
> or Korean

You mean input method editors (IMEs)?

> (and possibly also for the input method for Vietnamese, unless the
> MSKLB

MSKLC!

> tool can handle less restricted stateful tables with more than one
> dead key) will still need a specific IME development with the additional
> support of a GUI input window allowing searches and selection of
ideographs.

Yes, IMEs can be developed. also note that for Vietnamese there is a
built-in keyboard that works -- have you tried it to type Vietnamese text?

MichKa




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Michael \(michka\) Kaplan
From: <[EMAIL PROTECTED]>


> > input of this interesting IME that is expecting UTF-16 code units ...
>
> Oh, you did mention that, didn't you?  I wonder if surrogate
> pairs will work here...

They will. Its fairly cool, if you ask me.

Ignore Philippe's ramblings here, they are offtopic and hav nothing to do
with Patrick's original question or my answer.

UNICODE apps will work fine with it, and legacy apps will work okay if the
code points you enter are in the default system code page of the system
(which can never be gb18030).

MichKa




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Michael \(michka\) Kaplan
From: "Patrick Andries" <[EMAIL PROTECTED]>

> I was enquiring about ways of using Unicode on
> a popular platform (Windows).
> I'm terribly sorry to enquire about a practical issue.

No need to apologize. How about a practical answer? :-)

If you install the "Chinese (Traditional) - Unicode" IME as an input method,
than any program that is prepared to accept Unicode input will handle the
input of this interesting IME that is expecting UTF-16 code units. Although
obviously intended for CJK, it can be used fo any UTF-16 code point.

MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Michael \(michka\) Kaplan
From: "Toyin Ryan" <[EMAIL PROTECTED]>

> Thank you Philippe. The characters you detailed below only seem to work in
Word.
> They don't work in DBArtisan or netscape messenger or outlook. Do you have
> anymore ideas?

As I mentioned before, it looks like DBArtisan is not a Unicode application
and your default system locale is not set in such a way that a non-Unicode
application can support these characters.

MichKa





Re: Hacek - Typing from a keyboard... Help!!!!

2003-10-29 Thread Michael \(michka\) Kaplan
From: "Toyin Ryan" <[EMAIL PROTECTED]>

> I am trying to type the 'hacek' diacritic mark above 'c' and 'e' and
> also a straight line (not a tilda) above characters too.
>
> The hacek is a diacritic mark used in the Czech and Lithuanian
> languages. It looks like an upside-down circumflex or a pointed
> breve-essentially a small "v" over the letter.
>
> Please can you tell me what keys on the key pad I should use.

You should look at the keyboard layouts that were designed for the languages
in question (see http://www.microsoft.com/globaldev/reference/keyboards.aspx
for the various layouts).

> However, I want to type these charcarters in a tool called 'Embarcadero
> DBArtisan, version 7' in order to put the characters in a Sybase
> Database.
> When I currently type the characters in Word and then try to paste them
> in DBArtisan the hacek disappears.

The most likely cause of THAT is the clipboard operation being a CF_TEXT
rather than a CF_UNICODETEXT transfer. This means it is using the default
system code page and if the characters with the haceks are not on that code
page than they will be munged into the hacek-less versions.

Assuming this is true:

1) Trying to use one of the keyboards may or may not work -- it is possible
for an application to support Unicode but still handle the clipboard
incorrectly (I do not know what DBArtisan supports here specifically).

2) You can change the "Language for non-Unicode Programs" (Regional and
Language Options, on the advanced tab) so that the default system code page
will support the characters in question.


MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies




Re: [indic] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-17 Thread Michael \(michka\) Kaplan
These collation tables have one and only one of the following two problems:

A) If these are intended to be language-specific tailorings then a strong
warning about the linguistic inapplicabilty needs to be added, since the
data is actually incorrect for the use of most of these scripts.

B) If these tables are just intended to be another view of the default
table, outlining the data by script, then this information should be more
prominently explained.

I suspect that the issue is the one outlined in (B) meaning it is just
better explaining what the data is meant to be, but I have had many
customers claim to me that "Unicode does not understand their
language/script" because they assumed that the issue was as outlined in (A).

Assuming that it is (B), this problem is unfortunately exacerbated by the
many times that the language name == the script name (e.g. in Tamil,
Bengali, and many others). Having the items on the left called out at the
top as SCRIPTS rather than LANGUAGES would probably help with that issue
(though some people do not distinguish even when the difference is explained
clearly).

Of course there is the issue that the UTS does not really seem reference
this page, but maybe there is a reference somewhere else that is a bit more
of a challenge to find. :-)

MichKa

- Original Message - 
From: "Mark Davis" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Sunday, August 17, 2003 5:37 PM
Subject: [indic] Re: Unicode Collation Algorithm: 4.0 Update (beta)


> There are also beta collation charts in:
>
> http://www.unicode.org/charts/collation/beta/
>
> Mark
> __
> http://www.macchiato.com
> ►  “Eppur si muove” ◄
>
> - Original Message - 
> From: "Rick McGowan" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Friday, August 15, 2003 19:27
> Subject: Unicode Collation Algorithm: 4.0 Update (beta)
>
>
> > The Unicode Technical Committee would like to announce availability of
the
> > beta Default Unicode Collation Element Table for UCA 4.0. Feedback is
> > invited.
> >
> > The primary goal of this release is to synchronize the repertoire of
> > strings for collation (sorting) with the repertoire of Unicode 4.0. For
> > future versions of the Unicode Standard that add characters, there will
> > also be versions of the UCA tables with synchronized repertoire.
> >
> > A small number of additional changes have been made for consistency in
> > treatment of new and old characters; however, other changes await
working
> > with SC22/WG2 so that future versions of ISO 14651 and UCA can be
> > synchronized.
> >
> > The relevant data file is found here:
> >
> > http://www.unicode.org/reports/tr10/allkeys-4.0.0d1.txt
> >
> > Please also look at the corresponding proposed update version of Unicode
> > Technical Standard #10, The Unicode Collation Algorithm:
> >
> > http://www.unicode.org/reports/tr10/tr10-10.html
> >
> > Due to production difficulties, the beta period for this is quite short;
> > comments for this version must be submitted by end of day, August 26,
2003.
> > However, comments directed to the next version can be submitted after
this
> > date. Please submit feedback with the reporting form at:
> >
> > http://www.unicode.org/reporting.html
> >
> > Regards,
> > Rick McGowan
> >
> >
> >
>
>
>




Re: Hexadecimal

2003-08-16 Thread Michael \(michka\) Kaplan
From: "John Cowan" <[EMAIL PROTECTED]>

> If you have ever wondered if you are in hell, John Cowan
> it has been said, then you are on a well-traveled
http://www.ccil.org/~cowan
> road of spiritual inquiry.  If you are absolutely
http://www.reutershealth.com
> sure you are in hell, however, then you must be
[EMAIL PROTECTED]
> on the Cross Bronx Expressway.  --Alan Feur, NYTimes, 2002-09-20

I think the writer was actually Alan Feuer, native of Shaker Heights, Ohio
(I'll ask him when I see him at my sister's wedding in October).

:-)

MichKa




Re: Vurtual Keyboard!

2003-07-24 Thread Michael \(michka\) Kaplan
From: "Peter Kirk" <[EMAIL PROTECTED]>

> On 24/07/2003 04:33, [EMAIL PROTECTED] wrote:

> > How to map the code values of the key boards with the code values for 
> > the alphabet given with different Code pages of any scripts such as 
> > devanagari, gujarati etc.

> If  you are using Windows, you might like to look at Keyman 6, from 
> www.tavultesoft.com. This is a free download for personal use. There are 
> also freely downloadable keyboard drivers for some Indian languages, or 
> you can write your own using Keyman Developer (not free).

You can also look at the MS Keyboard Layout Creator (MSKLC) from Microsoft:

http://microsoft.com/globaldev/tools/msklc.mspx


MichKa [MS]



Re: [OT] French Government Bans the Term 'E-Mail'

2003-07-21 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>

> I don't know what the "i" in
> the iLifestyle suite (iChat, iPhoto, iBook,
> iThis, iThat) means.

For developers, a capital "I" usually means interface -- in code certainly
but then often applied in life as only geeks can do. I have fond memories of
not too many years ago, wandering around the Red Light district in Amsterdam
with fellow developers Stephen Forte and Richard Campbell, discussing the
important new "IToilet" interface that would revolutionize our interaction
with commodes. Probably just the space cakes talking. Ah, to be young[er]
again :-)

I always assumed the lowercase "i" was either meant to be something similar
to devs but mean something like "information" to normal (i.e.,
non-developer) types. Then, like any concept is has to be [over]used
everywhere. Maybe someone from Apple who has talked to their marketing folks
lately could comment

MichKa




Re: About the European MES-2 subset

2003-07-20 Thread Michael \(michka\) Kaplan
Well, I thought Arial Unicode MS is a little pricey for just putting it
anywhere? I may be wrong here (and I have no idea how much is costs,
really), but the huge size compared to megafonts like Code2000, which is
based in part on the "rich" Arial typeface heritage, also makes it a font of
some value and a legitimate "value add" where it is...

Of course, all of this is IMHO, as I have no real knowledge of what Office
or even nearby Typography think about any of these things

MichKa [MS]

- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, July 20, 2003 6:20 AM
Subject: Re: About the European MES-2 subset


> > On Windows, the "cannot find a font for it" situation is the NULL glyph.
> The
> > Last Resort font is cool but a Code2000 stab at the actual glyph is
> (IMHO)
> > cooler than both.:-)
>
> Then wouldn't it make sense for Arial Unicode MS to be included with
> Windows rather than just with Office?
>
>
>
> - Peter
>
>
> --
-
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
>
>
>




Re: About the European MES-2 subset

2003-07-18 Thread Michael \(michka\) Kaplan
I am pretty sure you have to be wrong here, Michael. Attend me:

1) API converts from Unicode to the wrong code page
2) API does some sort of work with the string
3) API tries to display the string

How on earth could it from the Last Resort font, unless it is a generic
glyph that contains no script info (which would be no better than a question
mark or a NULL glyph) ?

In any case, Code2000 giving some glyph for more cases is still a better
solution.

MichKa

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, July 18, 2003 4:16 PM
Subject: Re: About the European MES-2 subset


> At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
> >A question mark is a sign of a bad conversion from Unicode (to a code
page
> >that did not contain the character). This would likely happen on the Mac
too
> >rather than the Last Resort font, wouldn't it?
>
> No, it wouldn't. A "not a character" glyph is displayed in the Last
> Resort font.
>
> >On Windows, the "cannot find a font for it" situation is the NULL glyph.
>
> Not much netter than "?"
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>




Re: About the European MES-2 subset

2003-07-18 Thread Michael \(michka\) Kaplan
A question mark is a sign of a bad conversion from Unicode (to a code page
that did not contain the character). This would likely happen on the Mac too
rather than the Last Resort font, wouldn't it?

On Windows, the "cannot find a font for it" situation is the NULL glyph. The
Last Resort font is cool but a Code2000 stab at the actual glyph is (IMHO)
cooler than both.:-)

MichKa

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, July 18, 2003 1:42 PM
Subject: Re: About the European MES-2 subset


> At 11:28 -0400 2003-07-18, John Cowan wrote:
>
> >However, a font like Last Resort (the world's smallest giant font, as it
were)
> >does that just about as well.
>
> While I hate seeing the Last Resort font show up, I love seeing it
> when it does. :-) S much better than "?".
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>




Re: [OT] Re: ISO 639 "duplicate" codes

2003-07-12 Thread Michael \(michka\) Kaplan
From: "Doug Ewell" <[EMAIL PROTECTED]>

> Of course, if this is your belief, you are not alone.  The ISO 3166
> Maintenance Agency has now spent five months debating and voting on the
> question of what new codes for "Serbia and Montenegro" should replace
> "YU" and "YUG" used for "Yugoslavia," while some people wonder why the
> codes have to be changed at all, if the country itself has not changed
> but merely its name.

And then, amazingly enough, we come fulle circle the "evil" use within
Microsoft of LCIDs (Locale IDs) -- numbers that, in the case of a name
change, can simply have a constant changed representing the number. The old
one would be kept (in winnt.h) for compatibility so the old apps would
compile, and the new one (which has an identical number) would be the one
documented. Something similar to what was done with HANGUEL_CHARSET and
HANGUL_CHARSET in wingdi.h?

Not as evil as everyone originally thought? :-)

Of course, LCIDs have limitations too (no need to enumerate them here at
length). Just pointing out that solving one problem will almost invariably
run you into others

MichKa




Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Michael \(michka\) Kaplan
Thank you for [indirectly] making my point for me. I am saying that if
someone has an issue that *does* make a difference then they should bring it
up.

Otherwise, I say that a difference that makes no difference, make no
difference. And we can move on to actual problems. :-)

MichKa

- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, June 25, 2003 1:08 PM
Subject: Re: Major Defect in Combining Classes of Tibetan Vowels


> Michael Kaplan wrote on 06/25/2003 10:55:47 AM:
>
> > Let me add that this was the case recently for Hebrew (to mention on
> > example). So it is certainly not impossible.
>
> The Hebrew issue is different: that involves things that *are* visually
> distinct, and that distinction cannot be represented in a reliable manner.
>
>
> - Peter
>
>
> --
-
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
>
>
>




Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Michael \(michka\) Kaplan
Let me add that this was the case recently for Hebrew (to mention on
example). So it is certainly not impossible.

But we have enough real work to do that we should do our best to veer from
the theoretical. :-)

MichKa

- Original Message - 
From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "Andrew C. West"
<[EMAIL PROTECTED]>
Sent: Wednesday, June 25, 2003 8:11 AM
Subject: Re: Major Defect in Combining Classes of Tibetan Vowels


> From: "Andrew C. West" <[EMAIL PROTECTED]>
>
> > What I'm suggesting is that although "cui" <0F45, 0F74, 0F72> and "ciu"
> <0F45,
> > 0F72, 0F74> should be rendered identically, the logical ordering of the
> > codepoints representing the vowels may represent lexical differences
that
> would
> > be lost during the process of normalisation.
>
> Do you (or does anyone) have an actual example where this is the case? It
> may well be true but until someone has a proof there is not really an
> indication of a specific problem for the UTC to address.
>
> The current discussion is like arguing about a color that none of the
> participants have ever seen.
>
> MichKa
>
>
>




Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Michael \(michka\) Kaplan
From: "Andrew C. West" <[EMAIL PROTECTED]>

> What I'm suggesting is that although "cui" <0F45, 0F74, 0F72> and "ciu"
<0F45,
> 0F72, 0F74> should be rendered identically, the logical ordering of the
> codepoints representing the vowels may represent lexical differences that
would
> be lost during the process of normalisation.

Do you (or does anyone) have an actual example where this is the case? It
may well be true but until someone has a proof there is not really an
indication of a specific problem for the UTC to address.

The current discussion is like arguing about a color that none of the
participants have ever seen.

MichKa




Re: Revised N2586R

2003-06-23 Thread Michael \(michka\) Kaplan
(reminded of a South Park Episode... the spelling bee in "Hooked on Monkey
Phonics")

excerpt:
-
MAYOR: Here we go - "kroxldyphivc".
KYLE: What?!?
MAYOR: "kroxldyphivc".
KYLE: Definition?
MAYOR: Something which has a kroxldyph-like quality.
KYLE: Uh, could you use it in a sentence?
MAYOR: Certainly -- " 'Kroxldyphivc' is a hard word to spell."
-

Of course, with a definition and usage example like that, its no wonder Kyle
messed up the word and lost the spelling bee. :-)

MichKa

- Original Message - 
From: <[EMAIL PROTECTED]>
To: "Michael Everson" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Sunday, June 22, 2003 11:07 PM
Subject: Re: Revised N2586R


> It seems to me the proposal would present a stronger case if samples were
> available that were something *other* than an explanation of the symbol in
> a dictionary, encyclopaedia, or other reference. It would be similar to
> these kinds of samples if I were to create a proposal using as a sample
> the Phonetic Symbol Guide, but that might not clearly show if a character
> was something that was merely proposed by someone at one time but never
> actually used -- in such a case, taking a sample from Phonetic Symbol
> Guide does not really demonstrate the need to encode as a character for
> text representation. Likewise, the sample for (e.g.) the fleur-de-lis
> doesn't really provide a case that this should be a character to
> facilitate representation in text. It wouldn't be hard to provide a
> comparable descriptive paragraph that began with an image of the Stars and
> Stripes, but I don't think we'd want to encode the US flag as a character.
>
> I'm not saying that I oppose the proposed characters; just that samples of
> a different nature would make for a stronger case.
>
>
> - Peter
>
>
> --
-
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
>
>
>




Re: [OT] No more IE for Mac

2003-06-14 Thread Michael \(michka\) Kaplan

- Original Message - 
From: "Philippe Verdy" <[EMAIL PROTECTED]>



This is an equal opportunity forum intended for discussion of issues
relative to Unicode, an industrial consortium that includes (among many
others) the companies you are talking about. Excessive anti-ANYONE talk is
really not productive.

I am very close to just filtering out your mails completely as spam
(something many others now do), but I'd like to avoid that type of extreme
action if possible.

Please?

MichKa
(stranded 589 miles from Seattle)




Re: Exciting new software release!

2003-04-03 Thread Michael \(michka\) Kaplan
From: "William Overington" <[EMAIL PROTECTED]>

> It certainly is exciting!

Whoosh!

MichKa



Re: New contribution

2003-04-01 Thread Michael \(michka\) Kaplan
Well, at least I can remember the date with these types of reminders.
:-)

MichKa

- Original Message - 
From: "Thomas Milo" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Michael Everson"
<[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, April 01, 2003 8:39 AM
Subject: Re: New contribution


> Very up-to-date! Warmly recommended.
>
> t
>
> - Original Message - 
> From: "Michael Everson" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Tuesday, April 01, 2003 2:15 PM
> Subject: New contribution
>
>
> > A new document:
> >
> > N258A Proposal to encode two COMBINING HEART characters in the UCS
> > by Michael Everson, Roozbeh Pournader, and John Cowan
> > http://www.evertype.com/standards/iso10646/pdf/n258a-heartdot.pdf
> > -- 
> > Michael Everson * * Everson Typography *  *
http://www.evertype.com
> >
> >
>
>




Re: Characters for Cakchiquel

2003-03-28 Thread Michael \(michka\) Kaplan
From: "Kenneth Whistler" <[EMAIL PROTECTED]>

> and any such sequences as "quatrillo con coma" or "tz" which
> need to be handled as units simply get contractions defined
> for them in the collation element weighting tables.

Or other, analogous methods to obtain the same results (for products
that do not use the UCA).

:-)

MichKa [MS]




Re: Several BOMs in the same file

2003-03-23 Thread Michael \(michka\) Kaplan
You can remove a per-file prefix, certainly. This would make sense.

But if you do not, what is the harm of a character that you cannot see
and which does not even have width or cause line breaking behavior?
Realistically, what would the problem be?

MichKa

- Original Message - 
From: "Stefan Persson" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sunday, March 23, 2003 4:14 AM
Subject: Several BOMs in the same file


> Hi!
>
> Let's say that I have two files, namely file1 & file2, in any
Unicode
> encoding, both starting with a BOM, and I compile them into one by
using
>
> cat file1 file2 > file3
>
> in Unix or
>
> copy file1 + file2 file3
>
> in MS-DOS, file3 will have the following contents:
>
> BOM
> contents from file1
> BOM
> contents from file2
>
> Is this in accordance with the Unicode standard, or do I have to
remove
> the second BOM?
>
> Stefan
>
>
>




Re: sorting order between win98/xp

2003-03-14 Thread Michael \(michka\) Kaplan
I have tried it on Win98, WinME, Win2000, WinXP, and Windows Server
2003. LCMapStringA/CompareStringA on all five and
LCMapStringW/CompareStringW on the NT-ish platforms.

I do not and cannot repro the reported problem of your colleague.

Please have that colleague send me some email with their repro
scenario, if they have one (I don't think they do, as the code on the
NT platforms does not have this functionality, even as an option).

MichKa

- Original Message - 
From: "Yung-Fong Tang" <[EMAIL PROTECTED]>
To: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, March 13, 2003 5:20 PM
Subject: Re: sorting order between win98/xp


>
> do you use
> LCMapStringW on WinXP and LCMapStringA on Win98 WITH  LCMAP_SORTKEY
to
> genearate the SORT KEY ?
>
> Have you try on both platforms ? (Win98 and WinXP)?
>
>
> Michael (michka) Kaplan wrote:
>
> >LCMapString does not do the reported behavior either.
ComparesString
> >and LCMapString are based on the same data and return the same
> >results.
> >
> >Your colleague is mistaken.
> >
> >MichKa
> >
> >- Original Message - 
> >From: "Yung-Fong Tang" <[EMAIL PROTECTED]>
> >To: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
> >Cc: <[EMAIL PROTECTED]>
> >Sent: Thursday, March 13, 2003 4:31 PM
> >Subject: Re: sorting order between win98/xp
> >
> >
> >
> >
> >>We cannot use that. The function you mention is to compare two
> >>
> >>
> >Unicode
> >
> >
> >>strings.
> >>We need the function to "generate sort key" from unicode strings
> >>
> >>
> >instead
> >
> >
> >>of compare two string.
> >>
> >>Michael (michka) Kaplan wrote:
> >>
> >>
> >>
> >>>From: "Yung-Fong Tang" <[EMAIL PROTECTED]>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>One of my colleague ask me this question.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>In the interests of completeness
> >>>
> >>>The function that does the type of sorting your colleague noted
is
> >>>StrCmpLogicalW in shlwapi.dll, version 5.5 and later. See the
> >>>following link for more information (all on one line in the
> >>>
> >>>
> >browser):
> >
> >
>
>>http://msdn.microsoft.com/library/en-us/shellcc/platform/shell/refer
e
> >>
> >>
> >nce/shlwapi/string/strcmplogicalw.asp
> >
> >
> >>>MichKa
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
>
>




Re: sorting order between win98/xp

2003-03-13 Thread Michael \(michka\) Kaplan
LCMapString does not do the reported behavior either. ComparesString
and LCMapString are based on the same data and return the same
results.

Your colleague is mistaken.

MichKa

- Original Message - 
From: "Yung-Fong Tang" <[EMAIL PROTECTED]>
To: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, March 13, 2003 4:31 PM
Subject: Re: sorting order between win98/xp


> We cannot use that. The function you mention is to compare two
Unicode
> strings.
> We need the function to "generate sort key" from unicode strings
instead
> of compare two string.
>
> Michael (michka) Kaplan wrote:
>
> >From: "Yung-Fong Tang" <[EMAIL PROTECTED]>
> >
> >
> >
> >>One of my colleague ask me this question.
> >>
> >>
> >
> >In the interests of completeness
> >
> >The function that does the type of sorting your colleague noted is
> >StrCmpLogicalW in shlwapi.dll, version 5.5 and later. See the
> >following link for more information (all on one line in the
browser):
> >
>
>http://msdn.microsoft.com/library/en-us/shellcc/platform/shell/refere
nce/shlwapi/string/strcmplogicalw.asp
> >
> >MichKa
> >
> >
> >
>
>




Re: sorting order between win98/xp

2003-03-13 Thread Michael \(michka\) Kaplan
From: "Yung-Fong Tang" <[EMAIL PROTECTED]>

> One of my colleague ask me this question.

In the interests of completeness

The function that does the type of sorting your colleague noted is
StrCmpLogicalW in shlwapi.dll, version 5.5 and later. See the
following link for more information (all on one line in the browser):

http://msdn.microsoft.com/library/en-us/shellcc/platform/shell/reference/shlwapi/string/strcmplogicalw.asp

MichKa




Re: sorting order between win98/xp

2003-03-12 Thread Michael \(michka\) Kaplan
From: "Doug Ewell" <[EMAIL PROTECTED]>

> Note that I'm speaking in terms of programmable sorting.  I really
don't
> care how filenames in Windows Explorer are sorted.

Sigh I wonder if my mail made it to the list?

Doug,

Frank was wrong (or arther his colleague was wrong). CompareString
does not do this thing that was reported, neither in its "A" or "W"
flavors.

MichKa




Re: sorting order between win98/xp

2003-03-12 Thread Michael \(michka\) Kaplan
From: "Dominikus Scherkl" <[EMAIL PROTECTED]>

> Yeah! One of the best features of XP - finaly I don't need to
> insert leading zeroes to filenames to get them in the proper order
> (even 9a is sorted before 10).
>
> > Anyone know is there a way to make them sort in the same
> > order?
> Why should anybody want that?
>
> > Anyone know why the sort order is different under that two
systems?
> As I mentioned: a new feature, keeping numbers ordered numerical.

ADDENDUM:

I cannot repro the reported behavior in CompareString anyway. The
Shell is doing some cool things as Dominikus reports. But those are
changes in the Windows explorerer, not in CompareString.

MichKa




Re: sorting order between win98/xp

2003-03-11 Thread Michael \(michka\) Kaplan
From: "Yung-Fong Tang" <[EMAIL PROTECTED]>

> One of my colleague ask me this question.

Not much to do with Unicode, though. Is it?

> We use LCMapStringW on WinXP and LCMapStringA
> on Win98 (by using LCMAP_SORTKEY ). And we got
> different sorting order for the following
>
> Example of message list ordering  in Win98:
> TESTING #1
> TESTING #10
> TESTING #100
> TESTING #11
>
> While, the message list ordering in WinXP:
> TESTING #1
> TESTING #10
> TESTING #11
> TESTING #100
>
> Anyone know is there a way to make them sort in the same order?

Use the same OS? Or at least stay on NT-based systems, rather than
mixing 9x and NT

> Anyone know why the sort order is different under that two systems?
> The are running under the same locale.

The compat. issues are taken much more seriously now then they were
then (esp. between 9x and NT). For apps running on old platforms, they
are kind of stuck with time before such a thing existed. When apps
like Office and Jet and SQL Server have wanted to keep compatibile
orderings, they actually shipped their own tables for those
platforms

MichKa




Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Michael \(michka\) Kaplan
From: "Mark Davis" <[EMAIL PROTECTED]>

> I agree with Kent  that it is somewhat less robust to simply remove
> ill-formed sequences, since it removes any indication that the data
was
> corrupted.

Nice that the API gives one the option to choose, huh? ;-)

The notion of continuing (even if one is limping along, removing
invalid sequences) is to help some of the backcompat story, where
there were no errors previously -- without adding security errors due
to non-shortest form strings.

> But the final decision should be made by the user of the API, since
the
> desired behavior may vary depending on the environment.

Also agreed.

MichKa




Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-02-28 Thread Michael \(michka\) Kaplan
From: "Yung-Fong Tang" <[EMAIL PROTECTED]>

> When you deal with encoding which need states (ISO-2022,
ISO-2022-JP,
> etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the
> situration is different.

Unicode cannot of course speak for those other encodings, but it can
speak for UTF-8. There is a clear definition and it is up to the
application what it wants to do with sequences deemed irregular or
illegal. The decision is application dependent.

EXAMPLE: In the latest versions of Windows, one can convert from UTF-8
using MultiByteToWideChar. If one passes MB_ERR_INVALID_CHARS then
such an errant string will cause the conversion to fail with an
ERROR_NO_UNICODE_TRANSLATION error. If one does not pass the flag,
then the conversion will simply strip the errant characters. Note that
either solution meets the needs of refusal to interpret the errant
sequences.

What Netscape wants to do here in Mozilla or elsewhere can also be
based on a decision within Netscape for the most appropriate behavior,
given the definition.

MichKa [MS]




Re: DBCS and Unicode 3.1

2003-02-17 Thread Michael \(michka\) Kaplan
Well, DBCS means "double byte character set" and thus it is always two
bytes. But its a theoretical definition since there are no actual DBCS
code pages -- all of the ones that exist are MBCS (multibyte character
set) since they support both one-byte and two-byte characters.

There are standards like the Chinese GB18030 which supports characters
of 1, 2, or 4 bytes -- definitely MBCS again.

But these code pages are generally owned by outside
governments/agencies, so there is no rule that they need to update
when Unicode does. With the exception of gb18030, they are really
*all* subsets of Unicode.

MichKa

- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, February 17, 2003 2:51 PM
Subject: DBCS and Unicode 3.1


> Hello all,
>
> In the past, DBCS could support characters no larger than 2 bytes.
Correct?
>
> Now that Unicode 3.1 has broken the two-byte barrier, is there a
corresponding update for DBCS?
>
> I've been getting most of my DBCS info from these url's:
> http://oss.software.ibm.com/icu/userguide/conversion-data.html
> http://www-919.ibm.com/developer/dbcs/guide3.html#DBCS
>
> Thanks,
>
> Erik Ostermueller
>
>





Re: BOM's at Beginning of Web Pages?

2003-02-16 Thread Michael \(michka\) Kaplan
Given all of the below statements, which are true, I see no reason to
suggest that this be made actively illegal unless one is hoping to
break a lot of clients.

Luckily even if the HTML standard ever agreed with Roozbeh and leaned
this way, actual browsers would not want to break their customers so
on the whole they would ignore the directive. So I suppose we can just
drop the whole thing as a really bad idea, resolved "by [real world]
design".

MichKa

- Original Message - 
From: <[EMAIL PROTECTED]>
To: "Roozbeh Pournader" <[EMAIL PROTECTED]>
Cc: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sunday, February 16, 2003 12:35 AM
Subject: Re: BOM's at Beginning of Web Pages?


> .
> Roozbeh Pournader wrote,
>
> > According to the specs, it's illegal, and it doesn't hurt to fix
it. So
> > why shouldn't one?
>
> The lack of the BOM in the 'white space' section of the specs may
> just be an oversight.
>
> Since plain text files can have any kind of file extension, and the
> *.TXT extension historically covers many different code pages, some
> people do find the BOM helpful.  It enables some of the editors to
> correctly load a file the first time without having to manually
> reset the encoding format and reload.
>
> You're right about the BOM being irrelevant to the browser, since
> the HTML encoding is supposed to be declared as mark-up in the
> HTML header.  But, at least on Win platforms, when the user (or
> author) views the source, the default editor (usually Notepad)
> seems to require that the BOM be present.  NotePad also (AFAICT)
> automatically inserts the BOM when "file-saving as" UTF-8.
> The non-technical user may not even be aware of this.
>
> I've found the BOM handy, but could probably live without it on
> any of my web pages.  Especially if it's going to display as a
> Euro symbol on some systems...
>
> Best regards,
>
> James Kass
> .
>
>





Re: BOM's at Beginning of Web Pages?

2003-02-16 Thread Michael \(michka\) Kaplan
Well, since the whole web could be full of such pages, fixing the
browser would be a better long term strategy in the short term,
the best tool for quick fixes to HTML pages *is* notepad, which is
what is being blamed for causing the problem. :-)

Has anyone worked to be positive that this is the cause of the errant
euro? With two simple UTF-8 encoded page (one with and one without the
BOM) ? I still have a hard time seeing how a BOM can cause a euro in
any way other than consulting fees.

MichKa

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, February 16, 2003 11:20 AM
Subject: Re: BOM's at Beginning of Web Pages?


> At 19:10 -0800 2003-02-15, Michael \(michka\) Kaplan wrote:±±
>
> >Of course if I had a penny for every byte that has been used
> >discussing these three bytes sometimes found at the beginning of a
> >UTF-8 document, I would not be working this weekend; I'd be
> >somewhere really warm and sunny.
>
> My point was that its being used on the Unicode home page mucks up
> the home page display and so it needs to be deleted from that page.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>





Re: BOM's at Beginning of Web Pages?

2003-02-15 Thread Michael \(michka\) Kaplan
From: "Roozbeh Pournader" <[EMAIL PROTECTED]>

> I agree, but the Unicode web age is the buggy thing here, not the
specific
> browser that was reported earlier to have a problem with it. That's
all my
> point. One should fix the Unicode web page instead of that browser.

If the problem was indeed due to a BOM then the answer *is* to fix the
browser. Windows 2000 and XP have shipped onto a gazillion machines
and a lot of people make quick spot changes to HTML pages in notepad.
The BOM is here and any browser that cannot handle not showing either
a BOM or a ZBNBSP can be classed as a dumb one.

But I am not convinced at the report, to tell you the truth. What kind
of a bug could possibly make a BOM become a Euro? Except for the
bandwidth that gets sucked up in these conversations that some ISPs
make money off of, that is. 

> I also personally belive that any browser should fix the small
mistakes
> made by the author (or the authoring software) in some way or other,
but
> isn't it better for the author not to make the mistake, or fix it
when one
> finds about it?

Not really, not in this case. Because if you do not want it to be a
BOM then its a ZWNBSP. And its out there on the internet. A browser
cannot bury its head in the sand, and making it "illegal" will not
change anything at all except make more browsers non-conformant.

MichKa





Re: BOM's at Beginning of Web Pages?

2003-02-15 Thread Michael \(michka\) Kaplan
From: "Roozbeh Pournader" <[EMAIL PROTECTED]>

> PS: UTF-16 is an exception to that, since the BOM is not part of the
> document and should be removed for processing.

And to whatever extent UTF-8 has a BOM, it would fall under the same
category. Certainly that is how processors that understand the UTF-8
BOM deal with it.

Rather then treating HTML like the SQL standard (lofty goals that no
one company completely supports because it would be insane to do it!)
they can bend to the actual usage out there and just move on, right?

Even if you ignore the BOM as a BOM, the notion that a zero width
space is  legal but a zero width no break space is not just smacks of
silliness. But at the beginning of an HTML page you are either going
to not show it because you stripped it as a BOM or not show it because
there is no visible representation for it.

How many browsers plan to refuse to show pages that do not follow HTML
4.0 rules? :-)

Of course if I had a penny for every byte that has been used
discussing these three bytes sometimes found at the beginning of a
UTF-8 document, I would not be working this weekend; I'd be somewhere
really warm and sunny.

MichKa





Re: CJK test data

2003-02-06 Thread Michael \(michka\) Kaplan
From: <[EMAIL PROTECTED]>

>   1) Sorting Test
> a) include a list of un-ordered strings.
> b) follow that with the same list, ordered properly.

GB18030 does not define a specific standard for sorting (as far as I know, neither 
does GB13000). It
is an encoding standard.

Since GB18030 covers all of Unicode, this is a good thing.

MichKa





Re: unicode in Mac

2003-01-26 Thread Michael \(michka\) Kaplan
Well, that opening byte is either a BOM or its ZWNBSP. Either way its something you 
would not really
see if the either the app or the rendering engine understand Unicode

MichKa

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, January 26, 2003 9:36 AM
Subject: Re: unicode in Mac


> At 17:13 + 2003-01-26, Raymond Mercier wrote:
> >Given a plain text unicode file, with the opening byte FEFF, and
> >which displays correctly in Notepad on a PC.
>
> Open it in TextExit? I wouldn't know about the opening byte stuff.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>





Re: A case for Tamil-X (k sh)

2003-01-08 Thread Michael \(michka\) Kaplan
Hello all,

This was sent out while the requirement was still be discussed elsewhere. In the not 
too distant
future, one of two things will happen:

A) a character proposal will be submitted, with the right form filled out and with 
samples of usage
and justification, or

B) nothing will be done, if it is determined that this character is not required.

MichKa

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, January 08, 2003 10:50 AM
Subject: Re: A case for Tamil-X (k sh)


> At 08:40 -0800 2003-01-08, Doug Ewell wrote:
> >Sinnathurai Srivas  wrote:
> >There may be merit in adding this new "x" character (or perhaps the
> >problem could be solved with ZWNJ or ZWJ), but Michael is correct:
> >although it's a good idea to discuss it on the list first, nothing will
> >be considered for addition unless a proper proposal is written and
> >submitted.
>
> It can't be discussed without seeing the glyphs. So far it is just English!
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>





Re: Coptic II?

2002-12-26 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>

> The Georgian has to do with a false unification (whatever the reasons
> for it, it was a mistake).

In the opinion of some people.

Others, such as the head of the automation dept. of the National
Parlimentary Library of Georgia, do not agree with this assessment. As the
opinion from the library is based on a careful look at their nearly 4
million Georgian books they need to catalog and handle, I cannot help
feeling that there may be some merit to their point of view?

MichKa [MS]





Re: Precomposed Tibetan

2002-12-17 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>

> At 13:53 -0800 2002-12-17, Kenneth Whistler wrote:
>
> >The question for Unicoders is whether introduction of significant
> >normalization problems into Tibetan (for everyone) is a worthwhile
tradeoff
> >for this claimed legacy ease of transition for one system, when it is
> >clear that all existing legacy data using these precomposed stacks is
> >going to have to either be reencoded anyway (or surrounded by migration
> >filters for new systems).
>
> Is it a question? To do so would be a disaster for the encoding of
Tibetan.

Michael,

Everyone here KNOWS this. What Ken was pointing out is that not only will it
create such problems, but it will not solve the problem that they claim it
will. It was an additional reason to say no, and one they might be forced to
acknowledge since it refutes their claims.

MichKa [MS]





Re: Order in which unicode charactoers displayed

2002-12-07 Thread Michael \(michka\) Kaplan
This is an expected behavior. Ii has to do with the "Endian-ness" of the
processor on which you are running. From the glossary
(http://www.unicode.org/glossary/):

Little-endian. A computer architecture that stores multiple-byte numerical
values with the least significant byte (LSB) values first.

Big-endian. A computer architecture that stores multiple-byte numerical
values with the most significant byte (MSB) values first.

You can also find several thousand web pages describing the issue by
searching in google for these two terms.

MichKa


- Original Message -
From: "Smith, Mike" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, December 07, 2002 1:16 PM
Subject: Order in which unicode charactoers displayed


> Hello
>
> I am attempting to utilise the unicode values from the CJK Unified
> Ideographs to undertake searches for the occurance of the corresponding
> characters on a hard disk drive.
>
> When I look at a chinese character with a hex editor I get a certain
> order for the hex or unicode value for the character.  For example, the
> english word 'abalone' in chinese has a code of '8D9C 7C9C' when viewed
> in a hex editor, but when I referred to the CJK unicode table, the value
> came out as '9C8D 9C7C'.
>
> Can you explain the different ordering of the code?
>
> When I conducted a test search of the contents of a hard drive known to
> contain the chinese characters for 'abalone' I only found hits on the
> hex values not onth e CJK unicode values.
>
> Any assistance woudl be appreciated.
>
> Regards
>
> Mike Smith
>
>
>





Re: CJK fonts

2002-12-02 Thread Michael \(michka\) Kaplan
Hello Raymond,

If you have Chinese Office XP and the font in question (Simsun Founder
Extended), then you already have all of the contents of
http://i18nwithvb.com/surrogate_ime/ , in Chinese -- the translation on the
web site is just a handy adjunct for people who do not understand Chinese.
:-)

As for contacting Borware, I do not know how you tried to do this...
although the web site appears to be down -- Michael Jansson of Borware has
posted to this list in the past. You can find more information about WEFT
from http://microsoft.com/typography/web/embedding/weft3/default.htm and a
simple search in google for "WEFT BORWARE" found many hits.

MichKa

- Original Message -
From: "Raymond Mercier" <[EMAIL PROTECTED]>
To: "Tom Gewecke" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, December 02, 2002 4:26 AM
Subject: Re: CJK fonts


> Tom,
> I have downloaded what I can from these sites, but can't get a reply from
> Borware.com. Have you found this site?
> I am troubled also because the font Ming(for ISO10646) won't work in my
> IE6. I have written a windows utility to exploit unihan.txt, by
> listing  all the characters for a given Pinyin, and hoped that some at
> least of the Ext A characters would appear, but none do. I will study the
> msdn page that was referred to, where they suggest a modification of the
> registry, and try also specifying the font in my display. The program
works
> by embedding IE to display characters.
>
> Raymond
>
>
>
> >Some info is at:
> >
> >http://www.i18nwithvb.com/surrogate_ime/code_charts/
>
>
>





Re: Emergency help required!

2002-11-14 Thread Michael \(michka\) Kaplan
This person has installed an application that is attempting to use the
Unicode version of ATL.DLL on Win9x, which will fail (this is expected).

This error would not be related to the "freeze" (it is code that causes the
load to fail right after clicking "OK" to the messagebox); the freeze might
be related to problems with when the application (whatever it may be) is
unable to load properly.

MichKa


- Original Message -
From: "Magda Danish (Unicode)" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, November 14, 2002 12:40 PM
Subject: Emergency help required!


Hi Unicoders,

Does anyone have any idea what is causing this person's computer to
freeze over a "Unicode" issue. Any help will be greatly appreciated.

Magda.

-Original Message-
From: barbara kolender [mailto:barbarakolender@;hotmail.com]
Sent: Thursday, November 14, 2002 12:04 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: error message


As per our conversation,

The error message:
"CAN NOT RUN UNICODE VERSION OF ATL.DLL ON WINDOWS 95.
PLEASE INSTALL CURRENT VERSION"
appears everytime I open anything on my computer. We are running
Windows 98. This message is causing the computer to freeze constantly
and I cannot run any new programs.

PLEASE HELP.

Thanks,
Barbara & Merrie-Paralegals
Law Office of Marc Simon



Protect your PC - Click here   for
McAfee.com VirusScan Online






Re: IBM AIX 5 and GB18030

2002-11-14 Thread Michael \(michka\) Kaplan
From: "Carl W. Brown" <[EMAIL PROTECTED]>

> Other companies
> like Microsoft took a very big gamble and implemented the code for
surrogate
> support into Windows 2000 based on early drafts of the Unicode standard.
If
> they had not done it this way or had guessed wrong they might not even
have
> support in Windows XP.

Not to quibble but this is really not exactly right. The mechanism for
surrogate pairs to support what would later be called supplementary
characters has existed since Unicode 2.1. There was no "draft version of a
standard" that was behind Windows 2000's support of supplementary
characters.

Big companies do not stop on a dime on decisions like this, but they also do
not make turns down roads that have not yet been opened. :-)

MichKa





Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael \(michka\) Kaplan
From: "John Hudson" <[EMAIL PROTECTED]>

> At 13:50 11/11/2002, Michael Everson wrote:
>
> >By the way MichKa if you make the boxes a bit wider the whole string of
> >numbers would display.
>
> I noticed the same problem in Opera. It's okay in IE.

Ah, if I called *that* by design, someone might accuse me of global
conspiracy. :-)

Never mind, it wasn't that funny. I went ahead and updated the page, it
should work well in "Opera Compatibility" mode. 

Michael, in answer to your request for a UTF-8 converter, that will have to
be another day (its a bit more complicated, and I spend most of my time in
UTF-16 and UTF-32 so I can't really pretend its work related). If you wanted
to provide the code in VBScript or JScript I will add it to the page (and
give you credit, of course).

MichKa





Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael \(michka\) Kaplan
From: "Michael Everson" <[EMAIL PROTECTED]>
> At 12:10 -0700 2002-11-11, John Hudson wrote:

> >Many thanks to the various people who recommended Michael Kaplan's
> >calculator at http://trigeminal.com/16to32AndBack.asp
> >
> >This is excellent and solves my problem.

Glad you like it, John -- I am sure James Kass remembers when I put it up,
it was actually because of a complaint that there wasn't such a thing and
there ought to be. 

> Perhaps it is just me, but terms like scalar value just don't mean
> anything to me. It rather reminds me of reptilian skin shedding.

Since I do not use that term on my site, I assume you are referring to
someone else's resource? :-)

> I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER
> KU) and it did convert to a surrogate pair. I wonder what would
> happen if I pasted it into an HTML document. Hmm but I couldn't do
> that until I converted them to UTF-8

Well, since the page advertises itself as a UTF-16/UTF-32 sort of converter,
I would hope that the lack of UTF-8 byte conversion would be expected.

> By the way MichKa if you make the boxes a bit wider the whole string
> of numbers would display.

What numbers did not display for you? They all fit for me

MichKa





Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]>

> No, the notation to say "BOM required (report any files without BOM)",
"BOM
> not allowed (report any files with BOM)", or "BOM optional (only report
> files if they are not valid UTF-8 at all)", for a given file type.

Well, yes. If you wanted to avoid making it SIMPLE. A low level conversion
application does not need to care about such things. And most would refuse
to bother with a feature like that?

And once again, adding a name would not solve the problem.

MichKa





Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]>

> Yes, it's trivial to check. What's missing is the notation to tell the
> checker what to check for.

Sorry, but that is incorrect. If they know its UTF-8, then its either a BOM
or its not. It is three specific bytes.

> Yes, this is a good description of the sad state of existing software.
> Noting that failure to standardize is irritating and unnecessary doesn't
> make existing software go away.

None of which is "fixed" by naming it.

Your suggestion does not solve the problem, to the extent that it is a
problem?

MichKa





Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]>

Joesph,

> Software currently under development could use the identifiers for
choosing
> whether to require or emit BOM, like the file requirements checker I have
to
> write, and ICU/uconv.

Lets separate that into the two issuse it represents:

EMITTING: They could simply choose globally whether to emit the BOM or not.
If they wanted to get "fancy" they could have a command line option which
said whether to emit the bytes or not. But that is optional.

INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES.
If hey are there then there is a BOM. Simple.

> The inability to update to one standard all possible consuming software
one
> might encounter (or for that matter human customers' opinions) is
precisely
> why producing and checking software has to handle both possibilities.

But the "both possibilities" are trivial adn its by no means dificult to do.
Having a good program that refuses to do a little work to handle three bytes
is like someone who runs a 100 mile marathon and then refuses to cross the
finish line because the line is yellor instead of white.

> What would you mean by "the right thing" as far as emitting BOM? Should
file
> conversion programs only allow output of non-BOM? (or with-BOM?) Or should
> they take the specification in an argument separate from the charset name?
> As said before this unnecessarily requires extra logic.

Already answered --- they can make a global decision, like notepad or other
programs do. Especially if the progammer finds the idea of setting it as a
huge hardship, they can skip that work and simply choose whether they want
it or not

I plead with you -- keep it SIMPLE. :-)

MichKa





Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

2002-11-04 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]>

> Thanks for the dozens of responses discussing consumers' behavior on UTF-8
> BOM. This is actually not what I'm concerned with, as I have to take it as
a
> given that there is both software that wants UTF-8 BOM and software that
> doesn't want it.
>
> Could we evaluate the need for separate identifiers for producing or
> describing UTF-8 with and without BOM, or viable alternatives to use in
> control input to a file encoding converter program or encoding checker
> program.

Joseph,

How on earth could a separate identier be USED unless software were updated
to use it? And if they are updating to do this, why couldn't they just fix
it anyway to do the right thing?

There is no need here for separate identifiers, as they would not solve the
problem, to the extent that a problem actually exists (I have yet to see
proof that there is such a problem?).

MichKa





Re: Names for UTF-8 with and without BOM

2002-11-03 Thread Michael \(michka\) Kaplan
From: "Mark Davis" <[EMAIL PROTECTED]>

Ironic that for the purpose of dealing with THREE bytes that so many bytes
are being wasted. :-)

> Little probability that right double quote would appear at the start of a
> document either. Doesn't mean that you are free to delete it (*and* say
that
> you are not modifying the contents).

Interesting strawman there, Mark -- but there is a huge difference there.
But even if we leave in the notion of it as a character and just deprecate
its usage and people ignore that, then we are talking about a ZERO WIDTH NO
BREAK SPACE. This character has the job of:

1) being invisible
2) not breaking text with it

So even if it were in there, who cares? I mean, can anyone explain why it
would make a difference?

The one thing that no one has ever come up with is a reasonable case where
it would be at the beginning of the document *yet* it was not a BOM.

So we have a clear semantic for it at the beginning of a file -- its a BOM.
Period.

If there is a higher level protocol as well and the protocol and the BOM
both match, then that is great! Considering how much redundancy there is in
the Unicode standard about some definitions, a redundant marker for a file
seems a very trivial issue.

If there is a higher level protocol as well and they do not match, then we
are in fantasy land bizarro world, inventing edge cases because we have
nothing better to do. :-)  But for the sake of argument, lets pretend its a
real scenario -- in which case we treat it the same way as if your higher
level protocol claims its ISO-8859-1 and the BOM says its UTF-32. Its an
error.

Problem solved!

> I agree that when the UTC decides that a BOM is *only* to be used as a
> signature, and that it would be ok to delete it anywhere in a document
(like
> a non-character), then we are in much better shape. This was, as a matter
of
> fact proposed for 3.2, but not approved. If we did that for 4.0, then
there
> would be much less reason to distinguish UTF-8 'withBOM' from UTF-8
> 'withoutBOM'.

There is no reason to worry about this case and no need to delete anything.
This is a ZERO WIDTH NO BREAK SPACE we are talking about. The burden is on
the people who think this is a scenario to bring proof that anyone is doing
anything as unrealistic as this.

There is an easy, clear, and unambigous plan that can be used here which
will always work. For ones lets not opt to complicate it without reason.

MichKa





Re: Names for UTF-8 with and without BOM

2002-11-03 Thread Michael \(michka\) Kaplan
From: <[EMAIL PROTECTED]>

> In particular, I'm thinking of a situation about a year and a half ago
> (IIRC) in which Michael (and I and others) were strongly opposed to a
> suggestion that the Unicode Consortium should document a certain variation
> (perversion, some would say) of one of the Unicode encoding forms that a
> certain vendor had implemented in their software. On that occasion,
> Michael (and I and others) were arguing that, just because they had done
> something in their software, that shouldn't mean that the rest of the
> world should be forced to support their encoding form.
>
> I find it interesting, then, to see Michael saying that, since Notepad
> sticks a BOM-cum-signature at the start of its UTF-8, the rest of the
> world should support it.

I do not see the conflict, or the irony? Remember that what Notepad and
others do is present mainly because it *is* in the XML standard, What was
being done by those others with UTF-8 was not a part of the UTF-8 "standard"
and was in fact specifically disallowed. In the end, note that UTF-8 was not
compromised; they got their own [non-preferred] encoding scheme for their
backcompat requirement, and they now have the "job" of making their products
use it in name.

If someone has a bug or problem in their software, then it is of course
their responsibility to fix it. On the other hand, if one pays attention to
a possible (optional) recommendation in a standard, it is the standard's
responsibility to not make people regret that they paid attention?

(Which is not to say that they got the "idea" from XML; I am not sure where
the idea came from. I figure that there was a strong interest in making sure
that when someone saved a file as UTF-8 that when reloaded it would still be
considered UTF-8, rather than ASCII or ANSI [sic]. This is a good reason for
such a decision in plain text --and the fact that XML is after all "just
text" is lost on no one...)

Given the strong lack of interest that XML has had in the notion of breaking
old parsers or valid XML 1.0 streams, it seems unlikely (to me) that they
would make such a breaking change in a future version of XML.

MichKa





Re: Names for UTF-8 with and without BOM

2002-11-02 Thread Michael \(michka\) Kaplan
You are mistaken about this -- XML claimed originally that it was valid but
was not required.

The notion that XML parsers would update to handle a new encoding form to
strip off three bytes but would not conditionally strip those three bytes if
they were the first three bytes of the file is an unrealistic one.

MichKa

- Original Message -
From: "Tex Texin" <[EMAIL PROTECTED]>
To: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
Cc: "Mark Davis" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Saturday, November 02, 2002 11:08 AM
Subject: Re: Names for UTF-8 with and without BOM


> "Michael (michka) Kaplan" wrote:
> > > .xml UTF-8N Some XML processors may not cope with BOM
> >
> > Maybe they need to upgrade? Since people often edit the files in
notepad,
> > many files are going to have it. A parser that cannot accept this
reality is
> > not going to make it very long.
>
> I didn't think the XML standard allowed for utf-8 files to have a BOM.
> The standard is quite clear about requiring 0xFEFF for utf-16.
> I would have thought a proper parser would reject a non-utf-16 file
> beginning with something other than "<".
>
> (The fact that notepad puts it there should be irrelevant.)
>
> Am I wrong about XML and the utf-8 signature?
>
> tex
>
>
> --
> -
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@;XenCraft.com
> Xen Master  http://www.i18nGuy.com
>
> XenCraft http://www.XenCraft.com
> Making e-Business Work Around the World
> -
>
>





Re: Names for UTF-8 with and without BOM

2002-11-02 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]>

> These are listed as examples to demonstrate the idea of a configuration
file
> listing encoding constraints. The fact that each constraint is arguable is
a
> good reason to make the constraints configurable, and therefore to have
> names to distinguish BOM and non-BOM UTF-8.

Yes, but the fact that every one of them can have it or not and only
inadequate parsers will ever really have a problem with them is a good
indication that it is not really required for the users who care about
separate charset names

MichKa





Re: Names for UTF-8 with and without BOM

2002-11-02 Thread Michael \(michka\) Kaplan
From: "Joseph Boyle" <[EMAIL PROTECTED]>

> Type Encoding Comment
> .txt UTF-8BOM We want plain text files to have BOM to distinguish
> from legacy codepage files

Not really required, but optional -- the perfomance hit of making sure its
valid UTF-8 is pretty minor. But people do open some *huge* text files in
things like notepad

> .xml UTF-8N Some XML processors may not cope with BOM

Maybe they need to upgrade? Since people often edit the files in notepad,
many files are going to have it. A parser that cannot accept this reality is
not going to make it very long.

> .htm UTF-8 We want HTML to be UTF-8 but will not insist on BOM

Same as text, with the bonus of the possiblity of a higher lever protocol.
It can still go either way.

> .rc Codepage Unfortunately compiler insists on these being
> codepage.

They can be UTF-16, too (at least on Win32!).

> .swt ASCII Nonlocalizable internal format, must be ASCII.

Haven't run across these -- but note that  if its not UTF-8 then it does not
apply





Re: Names for UTF-8 with and without BOM

2002-11-02 Thread Michael \(michka\) Kaplan
From: "Mark Davis" <[EMAIL PROTECTED]>

> That is not sufficient. The first three bytes could represent a real
content
> character, ZWNBSP or they could be a BOM. The label doesn't tell you.

There are several problems with this supposition -- most notably the fact
that there are cases that specifically claim this is not recommended and
that U+2060 is prefered?

> This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE
0xFF
> represents a BOM, and is not part of the content. In the second case, it
> does *not* represent a BOM -- it represents a ZWNBSP, and must not be
> stripped. The difference here is that the encoding name tells you exactly
> what the situation is.

I do not see this as a realistic scenario.  I would argue that if the BOM
matches the encoding scheme, perhaps this was an intentional effort to make
sure that applications which may not understand the higher level protocol
can also see what the encoding scheme is.

But even if we assume that someone has gone to the trouble of calling
something UTF16BE and has 0xFE 0xFF at the beginning of the file. What kind
of content *is* such a code point that this is even worth calling out as a
special case?

If the goal is to clear and unambiguous text then the best way would to
simplify ALL of this. It was previously decided to always call it a BOM, why
not stick with that?

MichKa





Re: how to display Unicode

2002-10-29 Thread Michael \(michka\) Kaplan



No links to give, just a note to warn you 
that VB itself converts text that it puts into the RichEdit control from Unicode 
when it assigns the text. 
 

Technically the control supports Unicode 
since its interfaces are Unicode. but this conversion does limit the text that 
can be supported.
 
This makes it impossible through supported 
means to support any characters not on the default system code page when you are 
using VB6 (a limitation that has been addressed in VB.Net).
 
MichKa
 

  - Original Message - 
  From: 
  nandu 
  patil 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, October 29, 2002 1:21 
  AM
  Subject: how to display Unicode
  
  Hi Friends ,
      I have to display Unicode on RIchEdit control of VB6. 
  Can you help me how to display the unicode in VB application . If possible 
  give me the references and links.
  Thanks
  Nandlal
   
  
  
  Do you Yahoo!?HotJobs 
  - Search new jobs daily now


Re: Character identities

2002-10-28 Thread Michael \(michka\) Kaplan
All this talk about the letter "A" reminded me of something from Hofstadter:

"The problem of intelligence, as I see it is to understand the fluid nature
of mental categories, to understand the invariant cores of percepts such as
your mother’s face, to understand the strangely flexible yet strong
boundaries of concepts such as “chair” or the letter “a“ … The central
problem of (artificial intelligence) is the question: What is the letter ‘a’
and ‘i’? ...By making these claims, I am suggesting that, for any program to
handle letterforms with the flexibility that human beings do, it would have
to possess full-scale general intelligence."

-- Douglas R. Hofstadter, from one of his Metamagical Themas articles

The notion that we could ever capture the essence of "A-ness" has already
been discussed at length and dismissed as impossible without an AI
breakthrough. :-)

MichKa





Re: Nedd Help

2002-10-19 Thread Michael \(michka\) Kaplan
From: nandu patil

> I wanted to do all encodings in C++ so i can write a library (DLL) which
> will handle all this stuff

Do you need to communicate with other applications that do not support
Unicode? If not, then you have no need to support "encodings" at all (since
you are using Unicode and you have no need to convert data to and from it).

If you do need to communicate with legacy (non-Unicode) applications and you
are on Windows, then how easy/hard it is depends on what you are converting
from. If you are using VB.Net or C# (see below) then using a DLL here
becomes a lot less required since the .Net framework has all of the support
you would need without requiring an external DLL anyway!

> but I want to disply this hindi in VB application.

If you mean VB.Net, this can work quite well... it will support the input
methods, fonts, rendering, collation, and locale information provided in
Windows 2000/XP/.Net Server. It also will handle a lot of things you were
thinking about doing in a separate DLL.

But if you mean VB5 or VB6, you will find this to be quite a challenge since
the forms and intrinsic controls sdupport only Unicode interfaces (and
convert everything to the default system code page, which does not ever
support Hindi).

> Please help me to do this Multilangual application,I dont know how to
start
> development using Unicode. my guide suggested me to use unicode,

Well to start, deciding on your programming tools would be important -- it
is crucial to have a foundation before you start building.

MichKa





Re: Hindi keyboard with the Microsoft Hindi font Mangal

2002-10-15 Thread Michael \(michka\) Kaplan

From: "Doug Ewell" <[EMAIL PROTECTED]>

>  It's too bad you can't see that combination on the
> Javascript keyboards at Globaldev.

The use of either  or  shift states in Microsoft-supplied
keyboards is very rare. The reason it is rare is that it interferes with
programs that use those shift states to perform control actions (such as
Microsoft Word).

It is also difficult (though not impossible) to query the actual information
on these shift states due to the fact that USER will automatically map such
keystrokes to control characters (if there is no assigned keystroke in the
keyboard layout itself).

MichKa





Re: Ligature and VB

2002-10-07 Thread Michael \(michka\) Kaplan

From: Gopal Krishan

> In one of my VB project, I'm sending PostMessage API function
> to send character of my desired value to control. e.g. If I
> need to send character J to the keyboard, I'm using

> dim s as Long
> s = PostMessage(Text1.hwnd, WM_CHAR, 99, 0)

> where 99 is the ASCII value of character J in my font.

> Now, in certain languages e.g. Japanese, Hindi and other
> Asian languages, characters are made by combining 2 or
> more different characters. e.g. character N of devnagiri in
> some font will be made by combining ascii values 120
> followed by 201.

You are actually mistaking two different things here.

CJK languages such as Japanese can be represented by either Unicode or a
double byte value using a particular code page. The first value is
specifically not an "ASCII" value since a DBCS lead byte will not be in the
first 127 characters.

But Devanagari can only be represented by Unicode, thus you must send a
single Unicode code point in a PostMessageW function, and there is no
non-Unicode method you can use.

In any case, if you are using VB <= 6.0, you cannot have it represent a
"Unicode only" script such as Devanagari; you cannot even represent Japanese
if you are not on a system that has a Japanese default code page -- although
you can temporarily "fool" VB by sending messages, the data can be very
quickly corrupted since VB will convert data yo the default system code
page, which will convert both of these things to '?' very quickly.

> 1. Is there any way to send 2 characters in PostMessage function in 1
> statement e.g. using Bitwise And
>  120 to binary is 
>  201 to binary is 110010011

No, there is not -- but this is not really something you need to do here?
Just use Unicode code points and PostMessageW and then you do not ned to
send bytes.

> 2. Any freeware activex control / dll to make ligatures of TTF file?

Not sure what you are looking for here from a control or dll to do.


MichKa





Re: unsuscribe

2002-10-04 Thread Michael \(michka\) Kaplan

Allow us to help you once more:

http://unicode.org/unicode/consortium/distlist.html#3

It contains the info on how to unsubscribe, and if you scroll down a bit it
gives information on what to do if you have problems unsubscribing.


MichKa

- Original Message -
From: "Amruthavakkula, Malini (CORP,Consultant)"
<[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, October 04, 2002 5:58 AM
Subject: unsuscribe


>
> Hi, Thanks for all the help
>
>
> Unsubscribe me from the list.
>
> Thanks
>
>
> Tata consultancy services
> supportcentral
> __
>
> Malini.A
> 7th floor
> 900 chapel street
> NewHaven,CT- 06511
> 203-787-7030  Outside Line
>  *233-7030Dialcomm
>
>
>
>
>
>
>





  1   2   3   4   5   6   7   8   >