Re: urban legends just won't go away!

2003-01-31 Thread Eric Muller


Barry Caplan wrote:


Who knew in this day and age flipping bits to change case is still publishable (this is from today!)
 

What I find a lot more objectionable is that what this code pretends to 
do is not defined (in particular, the domain over which it applies). 
Without such qualification, we cannot say if the code is correct or not, 
no matter how fishy it looks. In fact, this example is a perfectly valid 
implementation if the system pretends to handle only an appropriate 
subset of the Unicode character set.

For more information, see .

Eric.





compatibility between unicode 2.0 and 3.0

2003-01-31 Thread Erik.Ostermueller
We have a large amount of C++ that currently has Unicode 2.0 support.

Could you all help me figure out what types of operations will fail
if we attempt to pass Unicode 3.0 thru this code?

I can start the list off with 

-sorting 
-searching for text 
-text comparison
-other character classification (isSpace, isDigit, etc...).

I'm understand that these operations probably won't work in ALL cases.
But how about basic plumbing code -- creating and copying string?

As I mentioned in my last post, I've enjoyed
listening in on this forum -- I've learned a whole lot.

Thanks,

--Erik Ostermueller




Re: How is glyph shaping done?

2003-01-31 Thread John Hudson
At 09:28 AM 1/31/2003, Mete Kural wrote:


So does this mean that every character rendered on the
screen in a Unicode-enabled program such as Internet
Explorer or some editor, have a corresponding
presentation form Unicode associated to it?


No. Most complex script shaping is now handled by a combination of shaping 
engine and font lookups. The shaping engine analyses the text strings, 
performs any character level pre-processing (e.g. re-ordering for Indic 
scripts), and then implements specific lookups in the font for glyph 
substitution and positioning. This means that there is no need for the 
various contextual forms of Arabic letters to be encoded in a font's 
character-to-glyph mapping data at all.

On Windows, the shaping engines for complex scripts are part of Uniscribe 
(usp10.dll) and make use of OpenType font technology. An Arabic OpenType 
font will contain layout features for Initial , Medial  and 
Final  substitutions (and possibly Isolated , e.g. to handle 
contextual variation of the letter heh). Uniscribe analyses strings of 
Arabic text, keeps track of the position of letters and their neighbours, 
and implements the appropriate layout feature for each letter.

For more information, see 
http://www.microsoft.com/typography/developers/opentype/default.htm, and 
the MS Arabic font specification at 
http://www.microsoft.com/typography/specs/default.htm

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

A book is a visitor whose visits may be rare,
or frequent, or so continual that it haunts you
like your shadow and becomes a part of you.
   - al-Jahiz, The Book of Animals




Re: How is glyph shaping done?

2003-01-31 Thread Rick McGowan
Mete Kural asked:

> when a Unicode rendering program is doing
> glyph shaping for Arabic (or any other language with
> similar properties), would the program first convert
> all Unicode Arabic characters in the 06XX domain into
> Arabic presentation forms in the FXXX domain, and then
> render each one of these presentation forms one by one
> and join them together?

Probably not. These days, Arabic shaping is typically done by the  
low-level rendering system built into your OS, in conjunction with data  
tables available in smart fonts. If you haven't already looked into some of  
the literature about how Arabic is encoded and how it is supported on  
various platforms, you should probably do so. Please see the Unicode  
standard online,
http://www.unicode.org/uni2book/u2.html chapter 8, and the technical  
report on Bidi http://www.unicode.org/reports/tr9/ . Probably also there  
are some questions in the FAQ http://www.unicode.org/faq and also please  
see one or more of the presentations by Thomas Milo on Arabic here:  
http://www.tradigital.de/specials/casestudies.htm

That would save you a lot of work. Most platforms these days already  
support Arabic rendering, so you don't need to worry about this level of  
detail, unless you are planning to implement a new system from scratch. I  
would expect the Microsoft developer web site to also have some info on  
their Arabic implementation...

> So does this mean that every character rendered on the
> screen in a Unicode-enabled program such as Internet
> Explorer or some editor, have a corresponding
> presentation form Unicode associated to it?

No. It means that the fonts have appropriate tables and the rendering  
engine, Uniscribe or whatever, knows how to handle the font to do correct  
shaping when the text is rendered. You should not be using any presentation  
form characters in your text, just nominal forms from the 0600 block.

Rick




How is glyph shaping done?

2003-01-31 Thread Mete Kural
Hello,

After one of the replies that I received for my
previous question, I thought of a more general
question about how glyph shaping is done. I'm just
wondering, when a Unicode rendering program is doing
glyph shaping for Arabic (or any other language with
similar properties), would the program first convert
all Unicode Arabic characters in the 06XX domain into
Arabic presentation forms in the FXXX domain, and then
render each one of these presentation forms one by one
and join them together? Or are there other possible
ways to do glyph shaping in Unicode?

So does this mean that every character rendered on the
screen in a Unicode-enabled program such as Internet
Explorer or some editor, have a corresponding
presentation form Unicode associated to it?

Thanks,
Mete




RE: Suggestions in Unicode Indic FAQ

2003-01-31 Thread Kent Karlsson

Keyur Shroff wrote:
...
> 
> No fallback rendering is coming into picture with your explanation. 

Yes, there is.  A character sequence  (say)
is very unlikely to have a ligature, specially adapted (and fitting)
adjustment points, or similar.  The rendering would in that sense
need to use a fallback mechanism that renders an "approximation"
for this rare combination.

...
> Here is the para you are talking about.
> 
> [Quote]
[...]
> should be rendered as if they had a space as a base character."
> [/Quote]
> 
> In the text there is no mention of explicitly inputting space character
> before any combining mark that is defective combining character.

The text says "as if". Which I also emphasised before.

> Also, the word "should be rendered" implies that it is recommendation. 

Yes.  A rather good one.  

> > By removing that particular fallback mechanism from implementations
[inserting dotted circle glyphs for allegedly "invalid" combinations]
> > as well as the TUS text!  (I'm serious!) This particular fallback
> > mechanism is NOT recommended as it stands.  
> 
> Note that the text has been written in the section "Implementation
> Guidelines". Can't it be considered as recommendation?

That particular one, no.  Just an example [that isn't very good,
outside of a general "show invisibles" mode].

> > But since its mention is erroneously taken as a recommendation, I'd 
> > suggest removing also its mention.
> 
> This is disastrous! What will happen to the systems which already
> implemented this recommendations!?

It's not a recommendation.

> Will they be considered invalid
> implementation afterwards? What is about stability?

They are ugly implementations as they are.  And will stay ugly
implementations.  Stability is good ;-).

/Kent K





Re: Arabic Presentation Forms

2003-01-31 Thread Shlomi Tal
Do you any suggestions on how I could convert a piece
of Unicode text in this manner? Are there any programs
that could do this?


Roman Czyborra's arabjoin (a Perl script):

http://czyborra.com/arabjoin/

It does the conversion to Arabic Presentation Forms. But also, which may not 
be what you need, it converts logically-ordered Arabic to visual order; this 
for display on systems that support neither BiDi nor Arabic shaping.

ST

_
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus




Re: Arabic Presentation Forms

2003-01-31 Thread Bob_Hallissy

On 31/01/2003 05:56:55 Mete Kural wrote:

>I need to figure out a method to convert Arabic
>Unicode text encoded in its normal form to Arabic
>Unicode text encoded in Arabic presentation forms.

Are you aware that the presentation forms are incomplete? That is, there
are Arabic letters in the U+06xx block for which there isn't a complete set
of presentation forms defined in the Arabic presentation forms areas.
You'll need to decide what to do with such...

Bob






RE: Suggestions in Unicode Indic FAQ

2003-01-31 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:
> 
> > Clearly, since in this case the sign is not
> > preceded by any consonant base, it has to be rendered using one of the
> > mechanisms specified in fallback rendering of non-spacing marks.
> 
> If it is preceded by a SPACE (or is first in a string/paragraph/similar)
> it should be rendered as a "freestanding" glyph (no dotted circle).  If
> it
> is preceded, in the source string, by, say, FULL STOP, a typographically
> acceptable rendering would be to have the vowel sign E glyph float on
> top of the glyph for the FULL STOP (no dotted circle).

No fallback rendering is coming into picture with your explanation. 

> > I add that this is a good way of displaying a combining mark that has
no
> > base character, i.e. one occurring at the begin of a line or paragraph.
> No, those should be displayed *as if* preceded by a SPACE (TUS 3.0 page 
> 121).

Now here you are really talking about fallback rendering :-). 

Here is the para you are talking about.

[Quote]
"In a degenerate case, a nonspacing marks occurs as the first character in
the text or is separated from its base character by a line separator,
paragraph separator, ot other formatting character that causes a positional
separation. This result is called a defective combining character sequence
(see chapter 3.5, Combinations). Defective combining character sequences
should be rendered as if they had a space as a base character."
[/Quote]

In the text there is no mention of explicitly inputting space character
before any combining mark that is defective combining character. Also, the
word "should be rendered" implies that it is recommendation. 

> > 
> > Then how can we rake care of fallback mechanism?
> 
> By removing that particular fallback mechanism from implementations
> as well as the TUS text!  (I'm serious!) This particular fallback
> mechanism is NOT recommended as it stands.  

Note that the text has been written in the section "Implementation
Guidelines". Can't it be considered as recommendation? (although not
necessary for implementation)

> But since its mention is erroneously taken as a recommendation, I'd 
> suggest removing also its mention.

This is disastrous! What will happen to the systems which already
implemented this recommendations!? Will they be considered invalid
implementation afterwards? What is about stability?

- Keyur


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com




Re: Arabic Presentation Forms

2003-01-31 Thread John Hudson
At 09:56 PM 1/30/2003, Mete Kural wrote:


I need to figure out a method to convert Arabic
Unicode text encoded in its normal form to Arabic
Unicode text encoded in Arabic presentation forms.


May I ask why you want to do this?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

A book is a visitor whose visits may be rare,
or frequent, or so continual that it haunts you
like your shadow and becomes a part of you.
   - al-Jahiz, The Book of Animals