UCD 3.2.0

2002-04-04 Thread Theo Veenker

Hi all,

I'd like to make a few remarks about the UCD files.

The following things I ran into when checking out the 3.2.0 release:

 o  In PropertyValueAliases-3.2.0.txt line 79:
ccc; 202; ATBL ; Attached_Below_Left
whereas in UnicodeData-3.2.0.html I read:
200: Below left attached
202: Below attached
What is is correct value for "attached below left", 200 or 202?

 o  In SpecialCasing-3.2.0.txt lines 234 and 235 are missing the closing
semicolon. This problem also appeared in 3.1.1.

 o  Typo in UnicodeCharacterDatabase-3.2.0.html:
"DerivedNormalizationProperties", should be "DerivedNormalizationProps".

Minor points that I find a bit annoying:

 o  Many of the UCD files have a comment header with lines longer than 80
characters. Viewing these files using the page utility on a 80 column
terminal window to gives ugly output due to the forced line wrapping.

 o  All UCD files except CaseFolding-3.2.0.txt and SpecialCasing-3.2.0.txt
*separate* columns by semicolons. For the two exceptions the semicolon 
*terminates* a column, why not keep it the same for all UCD files?

 o  UnicodeData-3.2.0.txt still uses this notation:
1234;;Lo;0;L;N;
5678;;Lo;0;L;N;
instead of 
1234..5678;..;Lo;0;L;N;
Since all other UCD files use the latter notation why not change this
one too? IMHO backward compatibility with existing UCD file parsers 
shouldn't be an issue in this particular case.

Regards,
Theo




Re: how can I write an arabic square root- I think I've understood a little.

2002-04-04 Thread Markus Scherer

[EMAIL PROTECTED] wrote:

> This raises a question in my mind: how is an app to know whether the layout
> engine+font are smart enough? ...
> In other words, it seems to me that it must be agreed that an
> app should assume it is handled by Uniscribe/OT, or should assume that it
> is not.


Yes, I think this is it.
If you are on Windows or MacOS (or X/KDE/gnome), or you are using a web UI (browser), 
then you assume that the layout engine is "smart". If it's not smart enough, complain 
:-)

If you have an application on a mainframe that writes to a 3270 or 5250, then you 
assume that the layout it "dumb".
"Dumb" is probably also a good assumption for command-line tools on Unixes, although 
in theory their output could be piped into a file and displayed with a browser.

markus





conversion performance: UTF-8 BOCU-1 SCSU

2002-04-04 Thread Markus Scherer

I have numbers for text size and conversion performance of BOCU-1 and SCSU relative to 
UTF-8.

Quick summary:

For Latin text, UTF-8 is best.
For CJK, BOCU-1 and SCSU provide smaller size, with some speed trade-off.
For other scripts, BOCU-1 and SCSU are much better than UTF-8 in both speed and size.

Note that BOCU-1 encoded text (since it preserves control characters and spaces) could 
be directly used in emails, for CVS, etc.

Please see http://oss.software.ibm.com/icu/dropbox/bocuperf.html

Best regards,
markus





Re: how can I write an arabic square root- I think I've understood alittle.

2002-04-04 Thread Peter_Constable


On 04/01/2002 07:24:40 PM Markus Scherer wrote:

>> I believe that the current mirrored and mirrored glyph properties are
>> useful only when no help can be obtained from the font; otherwise, the
>> resolved directionality should be provided to the font, which should
>> then select the appropriate shape for each and every character,
>> regardless of the mirrored and mirrored glyph properties.
>
>
>Correct, and this is how the BidiMirroring.txt file is intended: Only if
the
>layout engine+font are not smart enough, as a fallback mechanism.

This raises a question in my mind: how is an app to know whether the layout
engine+font are smart enough? What if the app makes the wrong assumption
about the rendering components, does a character transformation, and then
the rendering components do a glyph transformation on the transformed
characters? Consider an OpenType + Uniscribe implementation: Either
Uniscribe can do a character transformation before doing cmap lookups and
calling the OpenType engine, or Unscribe can set features (as Eric
suggested), assuming the font supports those features (a reasonable
assumption: OT fonts have to be created to a spec with regard to features
that need to be used to support a given script). But (I gather) Uniscribe
currently isn't doing either. So, an app might currently be doing
something, but it doesn't have any way (that I know of) to ask a future
version of Uniscribe whether it does handle mirroring through one mechanism
or the other. In other words, it seems to me that it must be agreed that an
app should assume it is handled by Uniscribe/OT, or should assume that it
is not.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>






Windows98 with mongolian cyrillic fonts

2002-04-04 Thread Magda Danish (Unicode)


-Original Message-
Date/Time:Thu Apr  4 06:56:57 EST 2002
Contact:  [EMAIL PROTECTED]
Report Type:  General question
Text of the report is appended below:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Dear Mr/Mrs

I'm using Windows98 with mongolian cyrillic fonts. 
Please, how to find "Mongolian cyrillic keyboard support for UNICODE"
and converter from ANSII to Unicode.

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
(End of Report)