Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Pavel Adamek
 # In charts and illustrations in this standard,
 # the combining nature of these marks
 # is illustrated by applying them to a dotted circle,

How should be such chart coded?
The character 25CC DOTTED CIRCLE was mentioned
as a possible base character,
but the on-line reference says:
note that the reference glyph for this
character is intentionally larger than the
dotted circle glyph used to indicate
combining characters in this standard.
IMO the correct base glyph
(at least for Latin diacritic)
should look like a dotted letter o.

  P.A.






Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Antoine Leca
On Monday, March 29, 2004 8:11 PM
John Cowan va escriure:

 Well, it depends on what the equivoque combining marks in the title
 of Section 7.7 means.

Ah! This is the place where I did not seek into! (It was not obvious to me
that text about the dependent vowel marks has to be searched into the
European alphabetical scripts section! But as Ken pointed out elsewhere, I
should have known better: Obviously, one must know the whole standard text,
and the history of it, before making any assumption about signification of
any given section: after all, this is not an ISO standard.)

Many thanks John for pointing this out.


 This is where (p. 187) the remarks about SP and NBSP appear:

 # Marks as Spacing Characters.  By convention, combining marks may be

OK, this one says it should applies to all combining, and does not make any
distinction between spacing and nonspacing. So the issue appears now clear
(and we implementers of rendering tools have now work to do, haven't we?)

Now I will fill erratum reports for all the discording things I have found.


Antoine




Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Peter Kirk
On 29/03/2004 16:28, Kenneth Whistler wrote:

...

Using NBSP rather than SPACE has several advantages, and has long been 
specified in Unicode, although not widely implemented. It is less likely 
to occur accidentally. But it has disadvantages, especially that it will 
always be a spacing character, whereas for display of isolated Indic 
vowels no extra spacing is required.
   

NBSP is not a fixed-width space.

 

Yes it is, in Unicode 4.0.0. Ernest quoted from UAX #14 All other space 
characters have fixed width. This may be in the standard by mistake, 
but it is in the standard. Asmus says that this will be changed in 
4.0.1, but that has not yet been released. If a statement is written in 
a standard, even in the introduction to a different section, that is 
normative.

I would like to repeat my earlier proposal for a new character ISOLATED 
COMBINING MARK BASE. This character would have no glyph, and the general 
properties of a letter. Its spacing would be just as much as required 
for proper display of the combining mark - which would be zero for 
combining marks which have their own width.
   

And after 15 years presence in the standard (or its earlier drafts)
of the SP + CM recommendation, what makes you think that introduction
of a *new* convention using a *new*, special purpose format control
character sorta like a space only different, would lead to any
better situation in actual practice? Use of such a character would
*NOT* resolve the differences regarding how to display such a
combination, by the way.
 

I would be happy for NBSP to be used in this way, now that it has been 
clarified that this should not be considered fixed width when followed 
by a combining mark. I would like to see a clear recommendation (not a 
conformance requirement, I agree) that the sequence NBSP, non-spacing 
combining mark should be rendered as a spacing version of the mark with 
just enough space for the mark and no added glyph. My reason for 
preferring NBSP to SPACE is that it is unambiguously non-breaking and (I 
think) not a word boundary.

But this doesn't solve the Tamil etc problem as what is needed there is 
a non-spacing non-breaking base character which can allow the vowel to 
display without the dotted circle. Perhaps ZWJ would be suitable.

...

Well, as I understand it NBSP is often expected to be a fixed-width 
space, and it is in many implementations. In fact I think it ought to 
be, whether or not this is actually specified. But there ought to be a 
character which is explicitly NOT fixed width to carry NSMs.
   

There are *two* such characters: SPACE and NBSP.
 

You mean, there will be in 4.0.1. The problem with SPACE is a different one.

...

The intent of the UTC and the editors has always seemed clear to
me on this particular point -- and the fact that the text in
question has survived 3 major reeditings of the entire standard
without significant change indicates to me that this has not been
a problematical part of the standard for the UTC.
 

Well, a text needs to be clear to its readers, not just to its authors. 
Obviously this text is not clear to readers, even ones as experienced as 
John Cowan, and so needs clarification.

So assuming that combining mark means combinining character rather than
non-spacing mark (the term does not appear in the Glossary), it seems that
combining vowels should work fine with SP or NBSP. 
   

This, however, is a textual problem which should be addressed.
As it stands, Section 7.7, Combining Marks deals with various
types of combining characters, including non-spacing combining
marks and enclosing combining characters. It does not say
anything explicit about Indic dependent vowels, in part because
of its textual history.
 

In that case something clear and sensible needs to be added about Indic 
dependent vowels.

Peter Kirk continued:

 

But it is a source of great confusion to 
everyone when a widely used application does something clearly different 
from what the standard intends, and yet claims conformance even if 
technically this is correct.
   

What the standard intends is that the textual representation (encoding)
of an isolated combining mark be done via the sequence SP, CM.
It does not *require* or *not require* that the visual rendering
of such a sequence be done with or without a dotted circle indicating
the absence of an expected normal base letter. In fact, the standard
itself makes widespread and explicit use of the convention to display
such combinations *with* a dotted circle.
 

Well, the standard clearly intends that the character for a is 
rendered with the glyph a and not the glyph b. It may not formally 
require this, but a system which breaks this rule, while possibly 
formally conformant, can hardly claim to support Unicode properly.

One convention for display of isolated combining marks is to use a 
dotted circle. But this convention is far from universal across all 
writing systems. It is wrong to impose it on all systems - 

Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Peter Kirk
On 30/03/2004 04:31, John Cowan wrote:

Peter Kirk scripsit:

 

Yes it is, in Unicode 4.0.0. Ernest quoted from UAX #14 All other space 
characters have fixed width. This may be in the standard by mistake, 
but it is in the standard. Asmus says that this will be changed in 
4.0.1, but that has not yet been released. If a statement is written in 
a standard, even in the introduction to a different section, that is 
normative.
   

This is just false.  All standards known to me have both normative and
informative parts; there can be no presumption that a certain text is
normative merely because it is in the standard.
It's true that the Unicode Standard in particular does not always
clearly distinguish between normative and informative text; but
in general it would surprise me if anything said in an introduction
was to be taken as normative.
 

I accept that some standards do have sections which are described as 
informative, and as such they are an exception to what I wrote. But as 
the purpose of a standard is to be normative, it is reasonable to 
assume, as I have, that its text is normative unless otherwise indicated.

From the introductory material (not the oddly named section 3 
Introduction) to UAX 14, http://www.unicode.org/reports/tr14/:

/This document has been reviewed by Unicode members and other 
interested parties, and has been approved by the Unicode Technical 
Committee as a *Unicode Standard Annex*. This is a stable document and 
may be used as reference material or cited as a normative reference by 
other specifications./


The implication is that this whole document, not just parts of it, is 
normative.

But this doesn't solve the Tamil etc problem as what is needed there is 
a non-spacing non-breaking base character which can allow the vowel to 
display without the dotted circle. Perhaps ZWJ would be suitable.
   

The use of SP or NBSP works fine for vowels as well as other combining
characters.
 

No. At least it does not work for spacing combining marks unless the 
space of NBSP is compressed to zero width, which you said earlier was 
not permitted. Alphabets etc are commonly listed in columns, and those 
columns need to be straight. If one item in the column is preceded by a 
space of non-zero width, the column will not line up. I accept that 
formatting details like this are outside the scope of Unicode, but I do 
think that Unicode should not make it impossible to display spacing 
combining marks as part of an aligned column.

 

In that case something clear and sensible needs to be added about Indic 
dependent vowels.
   

+1

 

I would say that if specific products do not support dictionaries, 
indexes or literacy primers in Tamil, they cannot claim to support Tamil.
   

This is extremist.  Not only products, but whole standards, have rightly
claimed to support English without being able to support the specialized
requirements of dictionaries -- for IPA or another phonetic spelling
system, for syllabication dots, for condensed typography, for the
ability to set text in multiple tight columns.  Indeed, it may be
fairly said that even now Unicode does not provide full support for
all the characters used in English lexicography.
 

IPA and other phonetic spelling systems are not part of the English 
writing system, and so do not need to be supported as part of it. Tamil 
vowels are part of the Tamil writing system, even in isolation, and so 
do need to be supported by it.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Michael Everson
At 07:31 -0500 2004-03-30, John Cowan wrote:
Peter Kirk scripsit:

  Yes it is, in Unicode 4.0.0. Ernest quoted from UAX #14 All other space
  characters have fixed width. This may be in the standard by mistake,
 but it is in the standard. Asmus says that this will be changed in
 4.0.1, but that has not yet been released. If a statement is written in
 a standard, even in the introduction to a different section, that is
 normative.
This is just false.  All standards known to me have both normative and
informative parts; there can be no presumption that a certain text is
normative merely because it is in the standard.
John is correct here, but it is also true that All other space 
characters have fixed width is a fairly strong declaration.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread John Cowan
Peter Kirk scripsit:

 I accept that some standards do have sections which are described as 
 informative, and as such they are an exception to what I wrote. But as 
 the purpose of a standard is to be normative, it is reasonable to 
 assume, as I have, that its text is normative unless otherwise indicated.

History, the facts of other standards, and the explicit statements of
participants in the Unicode Consortium argue otherwise.  It is all very
well for A.P. Herbert's justice to say that if Parliament does not mean
what it says, it must say so, but the Unicode Standard is not a code
of laws.

 /This document has been reviewed by Unicode members and other 
 interested parties, and has been approved by the Unicode Technical 
 Committee as a *Unicode Standard Annex*. This is a stable document and 
 may be used as reference material or cited as a normative reference by 
 other specifications./
 
 The implication is that this whole document, not just parts of it, is 
 normative.

By no means.  One may make a normative reference to a standard that
contains informative material.  The meaning of Standard A makes a
normative reference to standard B is merely that it is as if the text
of standard B were incorporated within standard A.  For example, the
XML Recommendation makes normative reference to the Unicode Standard;
it is as if the former included the latter in its entirety, normative
and informative parts both.  An informative reference, OTOH, is one which
the compiler of the referencing standard thinks will be useful in aiding
interpretation; it is not implicitly incorporated in any way.

 No. At least it does not work for spacing combining marks unless the 
 space of NBSP is compressed to zero width, which you said earlier was 
 not permitted. 

Fair enough.  Normally, SP and NBSP cannot disappear, but this is a
context in which they plausibly could and should.

 IPA and other phonetic spelling systems are not part of the English 
 writing system, and so do not need to be supported as part of it. Tamil 
 vowels are part of the Tamil writing system, even in isolation, and so 
 do need to be supported by it.

But they form no part of texts written in Tamil, save those texts
that make reference to Tamil orthography.  If I am writing a book that
teaches how to hand-write English, I will need to be able to represent
components of graphemes, but that does not require a general mechanism
for representing such components in isolation.

-- 
John Cowan  [EMAIL PROTECTED]  www.reutershealth.com  www.ccil.org/~cowan
Consider the matter of Analytic Philosophy.  Dennett and Bennett are well-known.
Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett.
There is also one Dummett.  By their works shall ye know them.  However, just as
no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly
known by his works.  Indeed, Bummett does not exist.  It is part of the function
of this and other e-mail messages, therefore, to do what they can to create him.



Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Asmus Freytag
At 04:28 PM 3/29/2004, Kenneth Whistler wrote:
 I will say again as I have said before - but the above (and what I
 snipped) is extra evidence for it - that what is broke ... is
 the rule that the isolated (generally spacing) form of a combining mark
 should be formed by SPACE or NBSP followed by the combining mark.
This has been the *intent* of the standard since its inception in
1989.
 There
 are many good reasons for not using SPACE for this, including default
 behaviour like inserting line breaks immediately after SPACE.
Nope. UAX #14 specifies the following regarding SPACE followed by
combining marks:
If U+0020 SPACE is used as a base character, it is treated as AL
instead of SP.
This is an unfortunate typo in UAX#14. The correct statement is:

If U+0020 SPACE is used as a base character, it is treated as ID
instead of SP.
see the description of these issues in the rules section of the UAX
which are quite explicit:
LB 7a  In all of the following rules, if a space is the base character for 
a combining mark, the space is changed to type 
http://www.unicode.org/reports/tr14/#IDID. In other words, break before 
http://www.unicode.org/reports/tr14/#SPSP 
http://www.unicode.org/reports/tr14/#CMCM* in the same cases as one would 
break before an http://www.unicode.org/reports/tr14/#IDID.

Treat SP CM* as if it were ID

As stated in [http://www.unicode.org/reports/tr14/#UnicodeUnicode], 
Section 7.7 Combining Marks, combining characters are shown in isolation by 
applying them to either U+0020 SPACE (SP) or U+00A0 NO- BREAK SPACE (NBSP). 
The visual appearance is the same, but the line breaking result is 
different. Correspondingly, if there is no base, or if the base character 
is http://www.unicode.org/reports/tr14/#SPSP, 
http://www.unicode.org/reports/tr14/#CMCM* or 
http://www.unicode.org/reports/tr14/#SPSP 
http://www.unicode.org/reports/tr14/#CMCM* behave like 
http://www.unicode.org/reports/tr14/#IDID.

This means that a combining character sequence of this type is treated
as a unit for the purposes of line breaking, and this overrides the
behavior otherwise of SPACE to be treated as a line break
opportunity.
There's never a line break opportunity between a SPACE and a combining 
mark, but
since SP is treated like an ID (ideopgrahic line breaking class), there are
break opportunities *before* the SP that will not be there if an NBSP is used.

Of course UAX #14 only spells out default behavior,
but then default behaviour is what was claimed just above.
 Using NBSP rather than SPACE has several advantages, and has long been
 specified in Unicode, although not widely implemented. It is less likely
 to occur accidentally. But it has disadvantages, especially that it will
 always be a spacing character, whereas for display of isolated Indic
 vowels no extra spacing is required.
NBSP is not a fixed-width space.
Correct. Somewhere in the standard, we should point out that using a 
space/NBSP as base character does not require these spaces to be at the 
same widths as elsewhere in the text, but that they can (and should) be 
adjusted to best serve this 'base character' function.

A./






Fixed Width Spaces (was: Printing and Displaying Dependent Vowels)

2004-03-30 Thread Ernest Cline



 [Original Message]
 From: Asmus Freytag [EMAIL PROTECTED]

 At 12:19 PM 3/29/2004, Ernest Cline wrote:

 
 UAX #14 makes a rather definitive statement on this issue, albeit
 in an obscure place, in Section 3: Introduction.

 4.0.1 will amend that section to correct the wrong impression that NBSP is
 fixed width and to clarify that this statement is not intended to cover
any
 specialized cases, but just ordinary typographical conventions:

 I'm sorry if the fact that the placement and context of text was not
enough
 to guide the reader. Note that the 'obscure place' was in the
 introduction (!) of the UAX, where it was a mere note on a subject not
 actually covered by the UAX (i.e. line layout) that nevertheless forms
 the context in which linebreaking happens.

True, but it was the only guidance on the subject that is present in
Unicode 4.0.0, and there do exist widely used applications that do
treat NBSP as a fixed width space.

Still, there is a need for a fixed width space with a width equal to the
unjustified width of a normal space .  With NBSP being ruled out
for that job, that leaves FIGURE SPACE, MMSP, and FOUR-PER-EM
SPACE as the closest alternatives, but none of them are guaranteed
to be exactly that width, even if they are available.  I suppose
suspending justification for just one space via a higher-level
protocol could work, except I'm not aware of any such protocol that
works at a fine-enough grain to do that.  Also, one could by that argument
also argue that many of the current fixed width spaces could be
handled by higher level protocol as well.

Perhaps a possible U+2064 NONJUSTIFYING SPACE would make
sense with line breaking class BA like most of the other fixed width
spaces. (I would have preferred proposing U+205E to place it
adjoining MMSP, but that code point is already in the pipeline.)





Re: Fixed Width Spaces (was: Printing and Displaying Dependent Vowels)

2004-03-30 Thread John Cowan
Peter Kirk scripsit:

 In each of these cases FIGURE SPACE may be appropriate. Are any of these 
 alternative spaces non-breaking? That is also a requirement in my last 
 two applications.

You can make anything non-breaking by putting ZWNBSP on both sides of it.

-- 
John Cowan  www.ccil.org/~cowan  www.reutershealth.com  [EMAIL PROTECTED]
All isms should be wasms.   --Abbie



Re: Fixed Width Spaces (was: Printing and Displaying Dependent Vowels)

2004-03-30 Thread fantasai
Asmus Freytag wrote:
and I don't know whether FOUR-PER-EM is the width of a typical space.
FOUR-PER-EM is 1/4 of an em, always. A typical space, however, varies
in width depending on the font.
~fantasai




Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Jacobi
Hi James, All,

 If this is treated as a Unicode issue rather than a display issue, then
 one solution
 would be for someone to propose a new character, (back on topic a little
 bit)
 COMBINING DOTTED CIRCLE FOR COMBINING MARKS.
 Then, rather than inserting DOTTED CIRCLE into the display, a rendering
 engine
 could be changed to insert this new character.  Then, these updated
 rendering
 engines could be distributed and font developers could add the new
 characters
 to fonts and distribute updated fonts.  This might just take a while, but
 it
 wouldn't be too hard to find examples of the character in actual text use
 to
 accompany the proposal...
 
 If it ain't broke, don't fix it.  So, is it 'broke'?

Your argument about not spotting errors, when SPACE+COMBINING SOMETHING gets
rendered without the dotted circle looks convincing, but lacks consistency:
The SPACE character
can be used to transform the combining marks from the U+0300..U+03BF range
into spacing
characters.

But that aside, it would be better to not to use SPACE for this purpose, for
reasons you mentioned.
So just any Unicode codepoint sequence which turns combining marks into
spacing glyphs would be a solution (only the first answers to Srivas' question
inidicated, that SPACE is to be used according to Unicode). One may be able to
conjure a new Unicode codepoint ISOLATED COMBINING MARK for this purpose, but
amongst all spaces and dubious characters at U+20?? something existing
should be found adequate.

Regards,
Peter Jacobi

-- 
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz




Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Antoine Leca
On Sunday, March 28, 2004 12:03 AM, James Kass wrote:
 So, if the question is how to make an OpenType font *not* display the
 dotted circle on Windows with Uniscribe, one idea would be to add a
 spacing glyph to U+25CC (DOTTED CIRCLE) in the font.

If you do so, you will end with defeating the normal behaviour that is to
draw a circle when someone makes an error while typing. Depending on the
intent of the font, it may or may not be a good idea.

Since Avarangal seems to be now under non disclosure agreement with
Microsoft, we do not know for sure what is his intent.
We also do not know if there are variations between releases (I hear there
are, but do not feel it is my job to investigate it), or generally what are
the real specifications in this area (the official being that the sequence
SP+ZWJ+some_mark renders without displaying the circle, but we know it is
not always enforced).

In the general case of a font intended for general use, and if the rendering
without the circle is intended in special cases like drawing a keyboard
layout for reference, I still believe it is better to have the circle and
resort to special manipulations, like SP+ZWJ+vowel or drawing directly with
ExtTextOut(ETO_GLYPH_INDEX), in order to draw the keyboard layout. At least,
because complexing a font to cure a defect into a version of one (the)
rendering engine does not seem to me an engineering solution. (I since read
your other post that rather seems to agree with me)


 Another approach is to simply use a non-OpenType Unicode TrueType
 font for Tamil.  The dotted circles don't seem to ever appear unless the
 font-in-use has OpenType tables covering the script-in-use.

Right. (The only remaining problem will then be the overhang and centering).


Antoine




Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread John Cowan
Antoine Leca scripsit:

 In the general case of a font intended for general use, and if the rendering
 without the circle is intended in special cases like drawing a keyboard
 layout for reference, I still believe it is better to have the circle and
 resort to special manipulations, like SP+ZWJ+vowel or drawing directly with
 ExtTextOut(ETO_GLYPH_INDEX), in order to draw the keyboard layout. 

The bottom line is that SP+vowel and NBSP+vowel are prescribed by the
Unicode Standard, and if they don't work (at least the former; for the
latter, one can weasel out by claiming conformity with earlier versions
of the Standard) the system is broken.

-- 
A rabbi whose congregation doesn't want John Cowan
to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan
and a rabbi who lets them do it [EMAIL PROTECTED]
isn't a man.--Jewish saying http://www.reutershealth.com



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 27/03/2004 17:17, John Hudson wrote:

[EMAIL PROTECTED] wrote:

So, if the question is how to make an OpenType font *not* display the 
dotted
circle on Windows with Uniscribe, one idea would be to add a spacing 
glyph to
U+25CC (DOTTED CIRCLE) in the font.  This spacing glyph should be a 
no-contour
glyph, perhaps with the same advance width as U+0020.  I've not tried 
this,
but it might just work.


It should work: Uniscribe inserts the U+25CC glyph that is in the 
font, so this could be something other than an actual dotted circle. 
Another option would be to map the dotted circle to a non-contour 
spacing glyph in one of the discretionary OpenType Layout features 
such as salt, which would allow users of apps supporting that 
feature (currently only InDesign ME, so far as I know) to choose 
whether or not to display the circle.

John Hudson

I don't like the look of this one. It might work as a kludge, but surely 
we should not encourage kludges in which the glyph (or non-glyph in this 
case) for one character, SPACE, is used with the code point of another 
character, U+25CC. This would cause considerable confusion for those 
deliberately trying to insert U+25CC, perhaps because they want to 
display the combining mark with a dotted circle.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 28/03/2004 18:35, [EMAIL PROTECTED] wrote:

...

People generating texts for educational purposes will always have special needs.
So, they'll always need to make special effort to get special effects.  Workarounds
concerning the original question have already been suggested.
If this is treated as a Unicode issue rather than a display issue, then one solution
would be for someone to propose a new character, (back on topic a little bit)
COMBINING DOTTED CIRCLE FOR COMBINING MARKS.
Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine
could be changed to insert this new character.  Then, these updated rendering
engines could be distributed and font developers could add the new characters
to fonts and distribute updated fonts.  This might just take a while, but it
wouldn't be too hard to find examples of the character in actual text use to
accompany the proposal...
If it ain't broke, don't fix it.  So, is it 'broke'?

Best regards,

James Kass

 

I will say again as I have said before - but the above (and what I 
snipped) is extra evidence for it - that what is broke (in the old or 
dialect sense broken rather than the modern sense without money) is 
the rule that the isolated (generally spacing) form of a combining mark 
should be formed by SPACE or NBSP followed by the combining mark. There 
are many good reasons for not using SPACE for this, including default 
behaviour like inserting line breaks immediately after SPACE. The good 
additional reason James has given is that SPACE followed by the 
combining mark is often a mistake (and so it is sensible to add the 
dotted circle), but there is a need in certain kinds of texts to display 
isolated combining marks.

Using NBSP rather than SPACE has several advantages, and has long been 
specified in Unicode, although not widely implemented. It is less likely 
to occur accidentally. But it has disadvantages, especially that it will 
always be a spacing character, whereas for display of isolated Indic 
vowels no extra spacing is required.

I would like to repeat my earlier proposal for a new character ISOLATED 
COMBINING MARK BASE. This character would have no glyph, and the general 
properties of a letter. Its spacing would be just as much as required 
for proper display of the combining mark - which would be zero for 
combining marks which have their own width.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 29/03/2004 04:14, John Cowan wrote:

Antoine Leca scripsit:

 

In the general case of a font intended for general use, and if the rendering
without the circle is intended in special cases like drawing a keyboard
layout for reference, I still believe it is better to have the circle and
resort to special manipulations, like SP+ZWJ+vowel or drawing directly with
ExtTextOut(ETO_GLYPH_INDEX), in order to draw the keyboard layout. 
   

The bottom line is that SP+vowel and NBSP+vowel are prescribed by the
Unicode Standard, and if they don't work (at least the former; for the
latter, one can weasel out by claiming conformity with earlier versions
of the Standard) the system is broken.
 

I agree that this implies that the system is not conformant with the 
standard. But that could be because the standard is broken. So perhaps 
it is the standard that should be fixed, by specifiying a new preferred 
sequence for isolated combining marks.

I realise that for backward compatibility reasons the old encoding 
cannot be made illegal. But it can be deprecated, and a note can be 
added that this sequence may not always be displayed as preferred.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Antoine Leca
On Monday, March 29, 2004 2:14 PM, John Cowan va escriure:

 The bottom line is that SP+vowel and NBSP+vowel are prescribed by the
 Unicode Standard,

I am sorry John, I should have miss a post of yours. I asked you where it is
written, and did not find any answer to this; unless someone consider that
all marks, including spacing combining vowels, are (European) diacritics.

I did find some things in UAX29 about grapheme clusters (as indicated by
Philippe), but also found that Mc characters do not seem to be concerned
(Mn, on the other hand, seems to are). I now understand that any base
followed by a Grapheme_Extend are to be seen as a cluster. I found
Grapheme_Extend as being defined as Other_Grapheme_Extend + Me + Mn in the
UCD. (But was not able to encounter this in the standard itself. Never mind,
I should have miss something obvious.)

I am sorry to insist on these issues. I have really big problems to
understand where are the specifications, when chapter 2.10 inside the
Unicode book says one thing while dealing directly with the issue, while
another document that is supposed to be as standard as well, says otherwise,
or better is to be interpreted otherwise, and still none of them match
exactly with what people are expecting in this forum.

(And furthermore when asked about issues of conformance, the former answer
was, it does not matter, or it should not matter, or depending on what
you are doing, etc., in a word ways to avoid answering the original
question.)


 if they don't work [...] the system is broken.

As James eloquently showed earlier today, I am not that sure we want things
this way.

The text in The Unicode Standard explicitely refers to the case of the
European diacritics. There (well, here!), because of typing habits (use of
so-called dead keys), users expects that combination of a diacritics and a
space is rendered as a spacing clone of the diacritic. I read the 2.10
snippet as guarding this convention.
(Of course, this is my interpretation, I can very easily be wrong.)

On the other hand, typing habits in other parts of the world are not that
entrenched. After all, dead keys are with us for more than a century, while
keyboard for combining characters that may reorder before the preceding
characters are only twenty years old. Furthermore, custom is to provide
disambiguating ways, such a bell (Thai) or a dotted circle, when a vowel is
mistyped. Evidently, Microsoft did follow this when they designed
Uniscribe/Indic OpenType. What you are saying is that when a mistyped vowel
follow a space character, it should appear hanging from nothing, while
situation will be different is typed after virama, or another vowel, or some
other mark.

As I said, I am not sure this is what we really want.


Antoine




RE: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Constable

 The bottom line is that SP+vowel and NBSP+vowel are prescribed by the
 Unicode Standard, and if they don't work (at least the former; for the
 latter, one can weasel out by claiming conformity with earlier
versions
 of the Standard) the system is broken.

Or the system is conformant but doesn't support everything in the
standard.



Peter Constable



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread John Cowan
Peter Kirk scripsit:

 Using NBSP rather than SPACE has several advantages, and has long been 
 specified in Unicode, although not widely implemented. It is less likely 
 to occur accidentally. But it has disadvantages, especially that it will 
 always be a spacing character, whereas for display of isolated Indic 
 vowels no extra spacing is required.

You don't actually say so, but you give me the impression that you think
NBSP is a fixed-width space.  It isn't; it can assume any width greater
than zero, just as SPACE can; in particular, when used before a NSM, I
would expect it to have the same width as the NSM.

 I would like to repeat my earlier proposal for a new character ISOLATED 
 COMBINING MARK BASE. This character would have no glyph, and the general 
 properties of a letter. Its spacing would be just as much as required 
 for proper display of the combining mark - which would be zero for 
 combining marks which have their own width.

Except for not being letters, SP and NBSP have, or ought to have,
exactly this behavior.

-- 
Well, I'm back.  --SamJohn Cowan [EMAIL PROTECTED]



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 29/03/2004 06:35, Peter Constable wrote:

The bottom line is that SP+vowel and NBSP+vowel are prescribed by the
Unicode Standard, and if they don't work (at least the former; for the
latter, one can weasel out by claiming conformity with earlier
   

versions
 

of the Standard) the system is broken.
   

Or the system is conformant but doesn't support everything in the
standard.


Peter Constable



 

You can't get away with it that easily. If the standard specifies that 
space, combining mark should be displayed as an isolated combining 
mark, then it would be conformant for a partial implementation to 
display this sequence as nothing or as an illegal sequence. But if the 
system attempts to display the sequence in a meaningful manner, it must 
do so according to the standard, i.e. not as dotted circle plus 
combining mark.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 29/03/2004 06:56, John Cowan wrote:

Peter Kirk scripsit:

 

Using NBSP rather than SPACE has several advantages, and has long been 
specified in Unicode, although not widely implemented. It is less likely 
to occur accidentally. But it has disadvantages, especially that it will 
always be a spacing character, whereas for display of isolated Indic 
vowels no extra spacing is required.
   

You don't actually say so, but you give me the impression that you think
NBSP is a fixed-width space.  It isn't; it can assume any width greater
than zero, just as SPACE can; in particular, when used before a NSM, I
would expect it to have the same width as the NSM.
 

Well, as I understand it NBSP is often expected to be a fixed-width 
space, and it is in many implementations. In fact I think it ought to 
be, whether or not this is actually specified. But there ought to be a 
character which is explicitly NOT fixed width to carry NSMs. Also you do 
say that NBSP must have a width greater than zero, but for some 
combining marks (those which are not non-spacing, and arguably even some 
which are) this base character should have zero width.

I would like to repeat my earlier proposal for a new character ISOLATED 
COMBINING MARK BASE. This character would have no glyph, and the general 
properties of a letter. Its spacing would be just as much as required 
for proper display of the combining mark - which would be zero for 
combining marks which have their own width.
   

Except for not being letters, SP and NBSP have, or ought to have,
exactly this behavior.
 

Well, there are several differences. An obvious one is that a line break 
is permitted after SP (but before the combining mark?) And they are 
different for a number of algorithms including those for text boundaries 
and bidi.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



RE: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Constable
 You can't get away with it that easily. If the standard specifies that
 space, combining mark should be displayed as an isolated combining
 mark, then it would be conformant for a partial implementation to
 display this sequence as nothing or as an illegal sequence. But if the
 system attempts to display the sequence in a meaningful manner, it
must
 do so according to the standard, i.e. not as dotted circle plus
 combining mark.

Are you saying that you'd like to see apps display text according to the
correct behaviour for a given script, or not at all?

I don't think that would be particularly helpful. And I think it's a
good thing the conformance requirements don't attempt to define what
not supporting such-and-such characters means at this level of detail.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread jcowan
Antoine Leca scripsit:

 I am sorry John, I should have miss a post of yours. I asked you where it is
 written, and did not find any answer to this; unless someone consider that
 all marks, including spacing combining vowels, are (European) diacritics.

Well, it depends on what the equivoque combining marks in the title of
Section 7.7 means.  This is where (p. 187) the remarks about SP and NBSP
appear:

# Marks as Spacing Characters.  By convention, combining marks may be exhibited
# in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK
# SPACE.  This approach might be taken, for example, when referring to the
# diacritical mark itself as a mark, rather than using it in its normal way
# in text.  The use of U+0020 SPACE versus U+00A0 NO-BREAK SPACE affects line
# breaking behavior.
#
# In charts and illustrations in this standard, the combining nature of these
# marks is illustrated by applying them to a dotted circle, as shown in the
# examples throughout this standard.
#
# The Unicode Standard separately encodes clones of many common European
# diacritical marks as spacing characters.  These related characters are
# cross-referenced in the character names list.

So assuming that combining mark means combinining character rather than
non-spacing mark (the term does not appear in the Glossary), it seems that
combining vowels should work fine with SP or NBSP.  The reference to European
diacriticals plainly applies only to the various spacing diacriticals, some
of which are grandfathered in by ASCII or Latin-1.

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
In computer science, we stand on each other's feet.
--Brian K. Reid



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 29/03/2004 08:42, Peter Constable wrote:

You can't get away with it that easily. If the standard specifies that
space, combining mark should be displayed as an isolated combining
mark, then it would be conformant for a partial implementation to
display this sequence as nothing or as an illegal sequence. But if the
system attempts to display the sequence in a meaningful manner, it
   

must
 

do so according to the standard, i.e. not as dotted circle plus
combining mark.
   

Are you saying that you'd like to see apps display text according to the
correct behaviour for a given script, or not at all?
 

I would prefer to see the text displayed according to the standard. In 
this particular case, I would prefer to see the standard fixed rather 
than the rendering system. But it is a source of great confusion to 
everyone when a widely used application does something clearly different 
from what the standard intends, and yet claims conformance even if 
technically this is correct.

There is clearly a widespread need to display a variety of combining 
marks in isolation, and with no dotted circle. Unicode defines an 
encoding for this. Uniscribe apparently does not support this encoding. 
There is something wrong here.

It seems, from what Srivas (Avarangal) wrote, to be part of the 
requirement for correct display of Tamil, and perhaps other Indic 
languages, to be able to display isolated forms of such characters as 
U+0BC6. If Uniscribe does not support this, even if it is technically 
Unicode conformant, Microsoft cannot claim to support Tamil and other 
languages.

I don't think that would be particularly helpful. And I think it's a
good thing the conformance requirements don't attempt to define what
not supporting such-and-such characters means at this level of detail.
 

I agree, I think. But a claim to support particular scripts or languages 
surely implies that all characters in that script (or at least in its 
modern form) are supported. That is not perhaps a Unicode requirement, 
but at least in the UK a failure here might be a breach of laws on 
truthful advertising and description of products.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread jameskass

John Cowan quoted,

 Well, it depends on what the equivoque combining marks in the title of
 Section 7.7 means.  This is where (p. 187) the remarks about SP and NBSP
 appear:
 
 # Marks as Spacing Characters.  By convention, combining marks may be exhibited
 # in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK
 # SPACE.  This approach might be taken, for example, when referring to the
 # diacritical mark itself as a mark, rather than using it in its normal way
 # in text. 

Note the use of may and might in the quoted text rather than must.  

The above could be interpreted in part as '... combining marks may be exhibited 
in (apparent) isolation by applying them to U+0020 SPACE, or they may not.'
Such an interpretation might lead people to decide that the approach is
up to the renderer.

Semantics aside, if the default display appearance of a combining mark in isolation
on a certain system is the mark on a dotted circle, then that system should be 
considered conformant when it displays space+mark as dotted_circle+mark.

An observation, FWIW:  on the system here, combiners in Indic scripts get
the dotted circle, but combining diacritics from the (mostly) Western
combining diacritics range don't.  Space + U+0327 displays a stand-alone
cedilla here; no dotted circle.

Best regards,

James Kass




Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Peter Kirk
On 29/03/2004 10:11, [EMAIL PROTECTED] wrote:

Antoine Leca scripsit:

 

I am sorry John, I should have miss a post of yours. I asked you where it is
written, and did not find any answer to this; unless someone consider that
all marks, including spacing combining vowels, are (European) diacritics.
   

Well, it depends on what the equivoque combining marks in the title of
Section 7.7 means.  This is where (p. 187) the remarks about SP and NBSP
appear:
# Marks as Spacing Characters.  By convention, combining marks may be exhibited
# in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK
# SPACE.  This approach might be taken, for example, when referring to the
# diacritical mark itself as a mark, rather than using it in its normal way
# in text.  The use of U+0020 SPACE versus U+00A0 NO-BREAK SPACE affects line
# breaking behavior.
 

These words are equivocal in more ways than one. What does By 
convention... may be exhibited mean? Does this mean that the sequence 
SPACE, mark should be rendered as an isolated mark, or does it mean 
that optionally it may be? Is the convention one which is optional for 
those encoding texts, or optional for implementers? Are these words 
intended to be in any way prescriptive, or are they intended merely to 
be descriptive of what some people have chosen to do? If This approach 
might be taken, for example, when referring to the diacritical mark 
itself as a mark, what other approach might be taken as an alternative? 
The language is altogether far too loose for a standard. The result is 
the current confusion, according to which people are trying to encode 
texts according to what they think Unicode expects them to do, and 
finding that the rendering engines they use do not provide either this 
or any other way to display what they want to display, and yet claim to 
conform to Unicode.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Ernest Cline

 [Original Message]
 From: Peter Kirk [EMAIL PROTECTED]

 On 29/03/2004 06:56, John Cowan wrote:

 Peter Kirk scripsit:
 
   
 
 Using NBSP rather than SPACE has several advantages, and has long
 been specified in Unicode, although not widely implemented. It is less
 likely to occur accidentally. But it has disadvantages, especially that
 it will always be a spacing character, whereas for display of isolated
 Indic vowels no extra spacing is required.
 
 You don't actually say so, but you give me the impression that you think
 NBSP is a fixed-width space.  It isn't; it can assume any width greater
 than zero, just as SPACE can; in particular, when used before a NSM, I
 would expect it to have the same width as the NSM.
 
 Well, as I understand it NBSP is often expected to be a fixed-width 
 space, and it is in many implementations. In fact I think it ought to 
 be, whether or not this is actually specified. But there ought to be a 
 character which is explicitly NOT fixed width to carry NSMs. Also
 you do say that NBSP must have a width greater than zero, but for
 some combining marks (those which are not non-spacing, and
 arguably even some which are) this base character should have
 zero width.

UAX #14 makes a rather definitive statement on this issue, albeit
in an obscure place, in Section 3: Introduction.

When expanding or compressing inter-word space, only the space
marked by U+0020  SPACE and U+3000  IDEOGRAPHIC SPACE
are normally subject to compression, and only spaces marked by
U+0020 SPACE, and occasionally spaces marked by U+2009
THIN SPACE are subject to expansion. All other space characters
have fixed width.

While one can argue as to whether this has anything to do with the
effect on the width of NBSP with a combining character following
it or not, it is clear that clear that one should not assume that NBSP
is treated exactly the same as SPACE except for not breaking a line.
Indeed, I would prefer to see NBSP treated as a fixed-width character
that would only be affected by letter spacing in all contexts, including
when it has an attached combining character.

The idea of an explicit character to be used as a combining
character base has merit in my opinion, but only if an acceptable
standardization of the behavior of combining characters with some
other character such as SPACE cannot be achieved so that it would
always be expected to produce an isolated combining character.
(except when in an intentional show the codes mode)





Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Kenneth Whistler
Peter Kirk said:

 I will say again as I have said before - but the above (and what I 
 snipped) is extra evidence for it - that what is broke ... is 
 the rule that the isolated (generally spacing) form of a combining mark 
 should be formed by SPACE or NBSP followed by the combining mark. 

This has been the *intent* of the standard since its inception in
1989. 

 There 
 are many good reasons for not using SPACE for this, including default 
 behaviour like inserting line breaks immediately after SPACE. 

Nope. UAX #14 specifies the following regarding SPACE followed by
combining marks:

If U+0020 SPACE is used as a base character, it is treated as AL
instead of SP.

This means that a combining character sequence of this type is treated
as a unit for the purposes of line breaking, and this overrides the
behavior otherwise of SPACE to be treated as a line break
opportunity. Of course UAX #14 only spells out default behavior,
but then default behaviour is what was claimed just above.

 Using NBSP rather than SPACE has several advantages, and has long been 
 specified in Unicode, although not widely implemented. It is less likely 
 to occur accidentally. But it has disadvantages, especially that it will 
 always be a spacing character, whereas for display of isolated Indic 
 vowels no extra spacing is required.

NBSP is not a fixed-width space.

 I would like to repeat my earlier proposal for a new character ISOLATED 
 COMBINING MARK BASE. This character would have no glyph, and the general 
 properties of a letter. Its spacing would be just as much as required 
 for proper display of the combining mark - which would be zero for 
 combining marks which have their own width.

And after 15 years presence in the standard (or its earlier drafts)
of the SP + CM recommendation, what makes you think that introduction
of a *new* convention using a *new*, special purpose format control
character sorta like a space only different, would lead to any
better situation in actual practice? Use of such a character would
*NOT* resolve the differences regarding how to display such a
combination, by the way.

 I realise that for backward compatibility reasons the old encoding 
 cannot be made illegal. But it can be deprecated, and a note can be 
 added that this sequence may not always be displayed as preferred.

This is a recipe for prolonging the confusion and inconsistency in
implementations of this feature.

 You can't get away with it that easily. If the standard specifies that 
 space, combining mark should be displayed as an isolated combining 
 mark, then it would be conformant for a partial implementation to 
 display this sequence as nothing or as an illegal sequence. But if the 
 system attempts to display the sequence in a meaningful manner, it must 
 do so according to the standard, i.e. not as dotted circle plus 
 combining mark.

The standard does not *require* this rendering or anything else. For
the most part, the Unicode Standard is *NOT* a text rendering
standard -- it is a character encoding standard. All kinds of
recommendations are put in regarding how to handle one kind or another
of rendering problem, precisely so that every implementer doesn't
start from scratch to reinvent the wheel here, and so as to provide
some basis for people to represent the same text content with the
same spellings for complex scripts.

There are reasons why such recommendations are found in Chapters 7
(and 5 and 2) of the standard, and are not nailed down with
conformance clauses in Chapter 3. The UTC has, over the years, not
found it appropriate to try to make normative requirements on the
details of text display, except insofar (as in the Bidirectional
Algorithm) as they have a direct bearing on the interpretation of
the logical content of the text itself.

 Well, as I understand it NBSP is often expected to be a fixed-width 
 space, and it is in many implementations. In fact I think it ought to 
 be, whether or not this is actually specified. But there ought to be a 
 character which is explicitly NOT fixed width to carry NSMs.

There are *two* such characters: SPACE and NBSP.

John Cowan noted:

 Well, it depends on what the equivoque combining marks in the title of
 Section 7.7 means.

and then quoted the relevant text from p. 187. By the way, the first
part of that text has survived almost verbatim from Unicode 1.0, where
it was printed on p. 40 in what was then Chapter 3, Character Blocks.
It was written there as part of the section Generic Diacritical
Marks U+0300 -- U+036F, as that was the most obviously a propos
point in the text at the time. The text of the standard has since
been morphed, restructured, and extensively added to, but some of
its quirks result from the fact that the text has a *history*, and
it isn't completely rewritten every time a new book is published.

The intent of the UTC and the editors has always seemed clear to
me on this particular point -- and the fact that the text in
question has 

Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Asmus Freytag
At 12:19 PM 3/29/2004, Ernest Cline wrote:

 [Original Message]
 From: Peter Kirk [EMAIL PROTECTED]

 On 29/03/2004 06:56, John Cowan wrote:

 Peter Kirk scripsit:
 
 Using NBSP rather than SPACE has several advantages, and has long
 been specified in Unicode, although not widely implemented. It is less
 likely to occur accidentally. But it has disadvantages, especially that
 it will always be a spacing character, whereas for display of isolated
 Indic vowels no extra spacing is required.
 
 You don't actually say so, but you give me the impression that you think
 NBSP is a fixed-width space.  It isn't; it can assume any width greater
 than zero, just as SPACE can; in particular, when used before a NSM, I
 would expect it to have the same width as the NSM.

 Well, as I understand it NBSP is often expected to be a fixed-width
 space, and it is in many implementations. In fact I think it ought to
 be, whether or not this is actually specified. But there ought to be a
 character which is explicitly NOT fixed width to carry NSMs. Also
 you do say that NBSP must have a width greater than zero, but for
 some combining marks (those which are not non-spacing, and
 arguably even some which are) this base character should have
 zero width.
UAX #14 makes a rather definitive statement on this issue, albeit
in an obscure place, in Section 3: Introduction.
4.0.1 will amend that section to correct the wrong impression that NBSP is
fixed width and to clarify that this statement is not intended to cover any
specialized cases, but just ordinary typographical conventions:

When expanding or compressing inter-word space according to common
typographical practice, only the spaces marked by U+0020 SPACE,
U+00A0 NO-BREAK SPACE, and U+3000 IDEOGRAPHIC SPACE are subject
to compression, and only spaces marked by U+0020 SPACE,
U+00A0 NO-BREAK SPACE, and occasionally spaces marked by
U+2009 THIN SPACE are subject to expansion. All other space
characters normally have fixed width. When expanding or
compressing inter-character space the presence of
U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER are always ignored.

I'm sorry if the fact that the placement and context of text was not enough
to guide the reader. Note that the 'obscure place' was in the
introduction (!) of the UAX, where it was a mere note on a subject not
actually covered by the UAX (i.e. line layout) that nevertheless forms
the context in which linebreaking happens.
Next, people will extract normative statements from the book cover. ;-0

Now that this is settled, all can go on discussing the main point:

While one can argue as to whether this has anything to do with the
effect on the width of NBSP with a combining character following
it or not, it is clear that clear that one should not assume that NBSP
is treated exactly the same as SPACE except for not breaking a line.
Indeed, I would prefer to see NBSP treated as a fixed-width character
that would only be affected by letter spacing in all contexts, including
when it has an attached combining character.
The idea of an explicit character to be used as a combining
character base has merit in my opinion, but only if an acceptable
standardization of the behavior of combining characters with some
other character such as SPACE cannot be achieved so that it would
always be expected to produce an isolated combining character.
(except when in an intentional show the codes mode)





Re: Printing and Displaying Dependent Vowels

2004-03-28 Thread Sinnathurai Srivas
Unicode rightly or wrongly decided to implement partial Grammar at encoding
level. Hence, possible solutions to this problem be defined by UC and not
leaving to others is get tangled may be the right way to go.

1/ Linear Depandent with dotted circle- as stand alsone
2/ Linear dependent without dotted circle - as stand alone
3/ Repositioned dependent with dotted circle- as stand alone
4/ Repositioned dependent without dotted circle - as stand alone

I think the above four need to be defined by UC.

Probably the no:1 above  (or is it no: 3 above) is already defined and wee
can build on this.

Srivas

- Original Message - 
From: Peter Jacobi [EMAIL PROTECTED]
To: Avarangal [EMAIL PROTECTED]; Peter Constable
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Saturday, March 27, 2004 9:24 PM
Subject: RE: Printing and Displaying Dependent Vowels


Hi Srivas, Peter Kirk, Peter Constable, List Members

Peter Constable wrote:
 Peter Kirk wrote:
  Are these dependent on the font, as some have
  suggested, or are they prescribed by Uniscribe? Do different versions
 of
  Uniscribe differ in this respect, as I rather think?

 At present, I don't know the answer. I know this is something we have
 intended to support, but I don't get that behaviour on the particular
 system I'm using at the moment. I will keep it in mind as an issue to
 review in the next version of our Indic shaping engine.

With the help of members of the [EMAIL PROTECTED]  mailing list,
I can offer some empirical evidence on this whodunnit:

Using the Linux version of Abiword, which uses the Pango renderer,
both the Code 2000 and the MS Latha font display the vowel signs without the
unwanted dotted circle. NBSP and normal SPACE give identical results.
For Code 2000 only, the dotted circle or a similiar ersatz glypg (the
screenshot is
not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and
U+0BCC
between the two parts.

Best Regards,
Peter Jacobi


-- 
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz







RE: Printing and Displaying Dependent Vowels

2004-03-28 Thread Peter Jacobi
Hi James, List members,

James Kass wrote:
 U+0B82 TAMIL SIGN ANUSVARA is substituted and re-positioned in the
 compound 
 glyphs of Code2000 for the normal dotted circle in the default glyphs for 
 U+0BCA, U+0BCB, and U+0BCC.  
 
 This is only expected to appear with a rendering system which does not
 support 
 OpenType.  This is because the default glyphs for these surroundrant
 vowel signs 
 would never be drawn on the screen.  [...]

I see. Thinking once more about it, also in the special contexts, where
there is a desire
to get a rendering of vowel signs without the dotted circle, U+0BCA, U+0BCB,
and U+0BCC
wouldn't be called for, but their components U+0BC6, U+0BC7, U+0BBE and
U+0BD7.

 So, if the question is how to make an OpenType font *not* display the
 dotted
 circle on Windows with Uniscribe, one idea would be to add a spacing glyph
 to
 U+25CC (DOTTED CIRCLE) in the font.  This spacing glyph should be a
 no-contour
 glyph, perhaps with the same advance width as U+0020.  I've not tried
 this,
 but it might just work.

The hard part (I assume), is not only to avoid the dotted circle, but make
the
glyps behave like normal spacing characters, so that e.g. when one of them
is surrounded
by parentheses, no extra or missing spacing is be seen.

So U+0BC0,  U+0BC1 and U+0BC2 should acquire the width of SPACE, 
wheras the other vowel signs should use their glyph's width.

Regards,
Peter Jacobi

-- 
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz




Re: Printing and Displaying Dependent Vowels

2004-03-28 Thread C J Fynn

John Hudson [EMAIL PROTECTED] wrote:


 [EMAIL PROTECTED] wrote:

  So, if the question is how to make an OpenType font *not* display the
dotted
  circle on Windows with Uniscribe, one idea would be to add a spacing glyph
to
  U+25CC (DOTTED CIRCLE) in the font.  This spacing glyph should be a
no-contour
  glyph, perhaps with the same advance width as U+0020.  I've not tried this,
  but it might just work.

 It should work: Uniscribe inserts the U+25CC glyph that is in the font, so
this could be
 something other than an actual dotted circle. Another option would be to map
the dotted
 circle to a non-contour spacing glyph in one of the discretionary OpenType
Layout features
 such as salt, which would allow users of apps supporting that feature
(currently only
 InDesign ME, so far as I know) to choose whether or not to display the
circle.

 John Hudson

If someone wants this,  isn't it possible to put a specific lookup in the font
so that any dependant vowel following a space character renders as a spacing
(stand-alone) dependant vowel? Surely a specific lookup should overide it being
displayed on a dotted circle by default.

- Chris




Re: Printing and Displaying Dependent Vowels

2004-03-28 Thread John Hudson
C J Fynn wrote:

If someone wants this,  isn't it possible to put a specific lookup in the font
so that any dependant vowel following a space character renders as a spacing
(stand-alone) dependant vowel? Surely a specific lookup should overide it being
displayed on a dotted circle by default.
Not necessarily. Applications or layout engines may insert the dotted circle character on 
the fly during rendering in what they consider invalid sequences. Clearly space+mark is 
not an invalid sequence according to Unicode, but there may still be some apps that handle 
this incorrectly. Also, space characters have layout behaviours that do not always make 
them an ideal base for combining marks, e.g. being swallowed at the end of lines.

John Hudson

--

Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: Printing and Displaying Dependent Vowels

2004-03-28 Thread jameskass

C J Fynn responded to John Hudson,

 If someone wants this,  isn't it possible to put a specific lookup in the font
 so that any dependant vowel following a space character renders as a spacing
 (stand-alone) dependant vowel? Surely a specific lookup should overide it being
 displayed on a dotted circle by default.

Has anyone tried this?  Would the space glyph U+0020 be expected to trigger
a look-up in the Tamil GSUB table as if it were a Tamil base character?

The reason that I haven't tried this is because, in the OpenType look-ups here
for the re-ordrant vowel signs of Tamil, the vowel sign is INPUT1 and the
base letter is INPUT2.  This is because the rendering engine has already
re-ordered the character string before this look-up is performed.  It doesn't
seem likely that a rendering engine would re-order a vowel sign before a space.
It could be tested both ways, I suppose...

This seems to be OT for this list, but, here it is, and it will probably keep
popping up from time to time unless clarified.

I can only make inferences and suppositions based on observation of the
behavior and reasoning behind the behavior of the rendering engine used
here, Microsoft's Uniscribe.  People who know all about this do follow
this list, so they're free to offer corrections.

inference and supposition

Uniscribe inserts the dotted circle into the display for complex scripts in
order to give a visual indication of an encoding or spelling error.  This seems
quite useful whether text is being entered or merely displayed.

Allowing dependent vowels to follow the space character breaks this utility.
In other words, somebody could write a Tamil word in a web page starting
with the E-vowel-sign (U+0BC6), and there'd be no indication that this is 
improper, either to the author or the visitor.

Someone searching for that word on that page wouldn't find it, and so on.

Maybe some kind of spell-checker should be used by the original author, but,
there seems to be no way to assure that spell-checking was performed by the
author of any web page one visits.

It is the very appearance of that dotted circle unexpectedly in our texts which
alerts us to the fact that we have made a mistake.  That dotted circle jumps out
of the page into our vision exclaiming, Hey, I'm wrong!  I'm so wrong, don't
even bother running your spell-checker on me!  This is the basis upon which
Uniscribe renders text which includes dependent vowel signs, not just for Tamil,
but for the other so-called complex scripts, too.  The dotted circle plus the
matra is the default rendering for combining marks *in isolation*.  Uniscribe
seems to rightly treat a vowel sign following a space as being in isolation, and,
how could it do otherwise?  What goes for the space character also seems to
go for any other character which is not a valid character *within the Unicode 
range*.  Again, how could it be otherwise.  If the first character in a string
isn't a Tamil character, there's no reason for the renderer to consult the Tamil
OpenType tables in a font.  If it did, my gosh, imagine all the pointless look-ups
just to display a page which was, for example, mostly Chinese with a few Tamil
phrases.

end of supposition and inference

The good folks engineering the Uniscribe have been most responsive to all kinds
of special requests and pointers related to complex script shaping.

I think asking them to break the existing mechanism in order to support
vowel signs on spaces asks too much, though.

People generating texts for educational purposes will always have special needs.
So, they'll always need to make special effort to get special effects.  Workarounds
concerning the original question have already been suggested.

If this is treated as a Unicode issue rather than a display issue, then one solution
would be for someone to propose a new character, (back on topic a little bit)
COMBINING DOTTED CIRCLE FOR COMBINING MARKS.
Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine
could be changed to insert this new character.  Then, these updated rendering
engines could be distributed and font developers could add the new characters
to fonts and distribute updated fonts.  This might just take a while, but it
wouldn't be too hard to find examples of the character in actual text use to
accompany the proposal...

If it ain't broke, don't fix it.  So, is it 'broke'?

Best regards,

James Kass





RE: Printing and Displaying Dependent Vowels

2004-03-27 Thread Peter Jacobi
Hi Srivas, Peter Kirk, Peter Constable, List Members

Peter Constable wrote:
 Peter Kirk wrote:
  Are these dependent on the font, as some have
  suggested, or are they prescribed by Uniscribe? Do different versions
 of
  Uniscribe differ in this respect, as I rather think?
 
 At present, I don't know the answer. I know this is something we have
 intended to support, but I don't get that behaviour on the particular
 system I'm using at the moment. I will keep it in mind as an issue to
 review in the next version of our Indic shaping engine.

With the help of members of the [EMAIL PROTECTED]  mailing list,
I can offer some empirical evidence on this whodunnit:

Using the Linux version of Abiword, which uses the Pango renderer,
both the Code 2000 and the MS Latha font display the vowel signs without the
unwanted dotted circle. NBSP and normal SPACE give identical results.
For Code 2000 only, the dotted circle or a similiar ersatz glypg (the
screenshot is
not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and
U+0BCC
between the two parts.

Best Regards,
Peter Jacobi


-- 
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz




RE: Printing and Displaying Dependent Vowels

2004-03-27 Thread jameskass

Peter Jacobi wrote,


 Using the Linux version of Abiword, which uses the Pango renderer,
 both the Code 2000 and the MS Latha font display the vowel signs without the
 unwanted dotted circle. NBSP and normal SPACE give identical results.
 For Code 2000 only, the dotted circle or a similiar ersatz glypg (the
 screenshot is
 not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and
 U+0BCC
 between the two parts.

U+0B82 TAMIL SIGN ANUSVARA is substituted and re-positioned in the compound 
glyphs of Code2000 for the normal dotted circle in the default glyphs for 
U+0BCA, U+0BCB, and U+0BCC.  

This is only expected to appear with a rendering system which does not support 
OpenType.  This is because the default glyphs for these surroundrant vowel signs 
would never be drawn on the screen.  Rather, the expected approach from the 
rendering engine is to use the component glyphs for these three vowel signs, such 
as U+0BC7 for the left part of U+0BCA, and U+0BBE for the right-side portion.

If the presence of these default glyphs in Code2000 is making problems, they can
be adjusted.  (Just because I expect a rendering engine to take a certain approach,
doesn't mean that a rendering engine will take that approach!)

On Windows, as others have noted, the rendering engine (Uniscribe) inserts the
dotted circle glyph (if the font has a dotted circle glyph) into the display.  The
dotted circle character is not inserted into the text, of course.

So, if the question is how to make an OpenType font *not* display the dotted
circle on Windows with Uniscribe, one idea would be to add a spacing glyph to
U+25CC (DOTTED CIRCLE) in the font.  This spacing glyph should be a no-contour
glyph, perhaps with the same advance width as U+0020.  I've not tried this,
but it might just work.

Another approach is to simply use a non-OpenType Unicode TrueType font for
Tamil.  The dotted circles don't seem to ever appear unless the font-in-use has
OpenType tables covering the script-in-use.

Best regards,

James Kass




Re: Printing and Displaying Dependent Vowels

2004-03-27 Thread John Hudson
[EMAIL PROTECTED] wrote:

So, if the question is how to make an OpenType font *not* display the dotted
circle on Windows with Uniscribe, one idea would be to add a spacing glyph to
U+25CC (DOTTED CIRCLE) in the font.  This spacing glyph should be a no-contour
glyph, perhaps with the same advance width as U+0020.  I've not tried this,
but it might just work.
It should work: Uniscribe inserts the U+25CC glyph that is in the font, so this could be 
something other than an actual dotted circle. Another option would be to map the dotted 
circle to a non-contour spacing glyph in one of the discretionary OpenType Layout features 
such as salt, which would allow users of apps supporting that feature (currently only 
InDesign ME, so far as I know) to choose whether or not to display the circle.

John Hudson

--

Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Michael Everson
At 01:55 +0100 2004-03-26, Chris Jacobs wrote:

  Avarangal scripsit:
   Can any one provide information on the sequences used for diplaying
   and printing dependent vowels as standalones.
  The standards-conforming way to do so is to precede the dependent vowel
  with a space character (U+0020).
Yes.
 If this sequence is not displayed correctly, complain to your software or
  font vendor, but it should be.

Here I disagree. A font does not have to support each and every combining
sequence. If he needs fonts which support combining sequences starting with
a space char he surely should look for those, but that is no reason to
complain about those fonts that dont.
Someone makingg an Indic font should consider this particular concern.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Peter Kirk
On 25/03/2004 13:33, [EMAIL PROTECTED] wrote:

Avarangal scripsit:

 

Can any one provide information on the sequences used for diplaying
and printing dependent vowels as standalones.
   

The standards-conforming way to do so is to precede the dependent vowel with a
space character (U+0020).  If this sequence is not displayed correctly, complain
to your software or font vendor, but it should be.
 

There are two standards-conforming way of doing these. One is to precede 
the dependent vowel with a space character; the other is to precede it 
with a non-breaking space. The latter method is preferable, especially 
if the standalone dependent vowel is likely to occur as part of a word 
rather than in isolation, to avoid unwanted line breaks.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Michael Everson
At 02:39 -0800 2004-03-26, Peter Kirk wrote:

There are two standards-conforming way of doing these. One is to 
precede the dependent vowel with a space character; the other is to 
precede it with a non-breaking space. The latter method is 
preferable, especially if the standalone dependent vowel is likely 
to occur as part of a word rather than in isolation, to avoid 
unwanted line breaks.
Of course, one could always display it with a dotted circle as well.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Peter Kirk
On 26/03/2004 03:04, Michael Everson wrote:

At 02:39 -0800 2004-03-26, Peter Kirk wrote:

There are two standards-conforming way of doing these. One is to 
precede the dependent vowel with a space character; the other is to 
precede it with a non-breaking space. The latter method is 
preferable, especially if the standalone dependent vowel is likely to 
occur as part of a word rather than in isolation, to avoid unwanted 
line breaks.


Of course, one could always display it with a dotted circle as well.


Except that is precisely what Srivas (Avarangal) asked NOT do do.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Avarangal asked about 
 the requirements by educational establishments is the ability
 to print and display dependent vowels without dotted circles.

John Cowan answered:
 Avarangal scripsit:
 
 Can any one provide information on the sequences used for diplaying
 and printing dependent vowels as standalones.
 
 The standards-conforming way to do so is to precede the dependent
 vowel with a space character (U+0020).

Does it fullfil the need (i.e., displaying _without_ dotted circles).
If so, where is it written?


Antoine




Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Philippe Verdy
From: Antoine Leca [EMAIL PROTECTED]
 Avarangal asked about
  the requirements by educational establishments is the ability
  to print and display dependent vowels without dotted circles.

 John Cowan answered:
  Avarangal scripsit:
 
  Can any one provide information on the sequences used for diplaying
  and printing dependent vowels as standalones.
 
  The standards-conforming way to do so is to precede the dependent
  vowel with a space character (U+0020).

 Does it fullfil the need (i.e., displaying _without_ dotted circles).
 If so, where is it written?

Space is a base character, then it combines with the next diacritic with which
it creates a default grapheme cluster which should be interpreted as if it was
a single character identity. It is NOT defective. Note that NBSP can be used as
well instead of SPACE, if you need that SPACE keeps its role of a keyword
separator.

To display the dotted circle, you can use a defective combining sequence
starting by the diacritic or vowel sign: for example use a control followed by
the isolated diacritic or vowel sign, or code the diacritic or vowel sign at the
beginning of a parsed plain-text element (in XML, HTML, XHTML or SGML, this is
normally delimited after character entities have been parsed and resolved). You
may also code explicitly the dotted circle symbol followed by the diacritic or
vowel sign to create a non defective combining sequence starting by that base
symbol.

Now how would you interpret differently SPACE+diacritic or SPACE+vowel sign? If
you display a dotted circle there, then you'll display two separate glyphs for a
single grapheme cluster, and this is not intended by the normal Unicode
character model. It may be useful for debugging purpose or as a help tool to
compose text, but not to render an actual text out of an input context, and this
should require special code in the renderer to disable that feature in fonts or
renderers.
Note that some fonts may incorrectly display SPACE+diacritic or SPACE+vowel sign
with a dotted circle after a space. This is not a issue with Unicode but with
the font or with the renderer.




Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Sorry to answer my own post.

 Avarangal asked about
 the requirements by educational establishments is the ability
 to print and display dependent vowels without dotted circles.

 John Cowan answered:
 Avarangal scripsit:

 Can any one provide information on the sequences used for diplaying
 and printing dependent vowels as standalones.

 The standards-conforming way to do so is to precede the dependent
 vowel with a space character (U+0020).

 Does it fullfil the need (i.e., displaying _without_ dotted circles).
 If so, where is it written?

It seems many are thinking about the section in 2.10, titled Spacing Clones
of European Diacritical Marks. I read it as applying to diacritical marks
(and perhaps only European ones, but the distinction looks like blurry to
me). Beginning of 2.10 makes quite clear that diacritics is only one class
(the most important, though) of combining characters. Indic dependent vowels
are another.

Also, something which is probably very relevant to Avarangal, fact is the
implementation from a major vendor in the field, Microsoft Uniscribe, does
retain the dotted circle (if present in the font; if not, you would probably
get the .missing glyph instead).


Antoine




Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Philippe Verdy
From: Antoine Leca [EMAIL PROTECTED]
 It seems many are thinking about the section in 2.10, titled Spacing Clones
 of European Diacritical Marks. I read it as applying to diacritical marks
 (and perhaps only European ones, but the distinction looks like blurry to
 me). Beginning of 2.10 makes quite clear that diacritics is only one class
 (the most important, though) of combining characters. Indic dependent vowels
 are another.

I answered to you by saying diacritics or vowel signs, but yes it also
includes dependant vowels when they are used to create what is more generally
called default grapheme clusters which is a larger set than the set of
combining sequences (made of a base character followed by combining
characters).

Indic scripts are a bit unique by the fact that they have a syllabic structure
decomposed into separate letters with a base consonnant and a combining (this
is not the proper term for Unicode) vowel modifier after it. This differs from
European alphabets (Latin, Greek, Cyrillic) or even from some Asian or African
syllabaries (notably Hiragana/Katakana) where these grapheme clusters are
(almost always) combining sequences are coded with a base character and
diacritics.

But if one wants to show the isolated form of of a Indic vowel, there's a
orthographic convention to use a sort of vowel order, i.e. a default
consonnant, in a way which also happens in the Arabic and Hebrew scripts for the
default base vowel coded with a base letter.

Indic scripts offer several variations here because there are also half-forms
for these vowels, which are not meant to be used isolately but to complement a
preceding syllable in the same grapheme cluster. It's hard to say which one of
these forms an author would like to present for these isolated dependant vowels
because, as their name suggest, they are normally dependant of another preceding
consonnant.

So the best way to represent these isolated dependant vowels would be to encode
an empty/null base consonnant to force the presentation of the dependant vowel.
An indic text would more probably use one base consonnant and present all
dependant vowels with that consonnant. Trying to represent the isolated vowel
creates a theorical grapheme cluster, which is normally not part of the normal
orthograph of Indic-written words where these vowels are used.

Another solution would be to code these Indic dependant vowels after the Indic
letter A (for example after U+0905 DEVANAGARI LETTER A), because this letter
represents also the default vowel implied by all other consonnants.

A sample with Devanagari could be:   (U+0905 LETTER A, U+093E VOWEL SIGN AA)
which should normally be presented like the precomposed:  (U+0906 LETTER AA),
but which incorrectly displays the dotted circle with the Mangal font.

So an author has to make some notational compromizes here. But still, I do think
that using NBSP as this empty/null base consonnant before the dependant vowel
will create the intended Unicode default grapheme cluster. Then it's up to the
font or renderer to show the NBSP+vowel cluster properly, without the dotted
circle, but it's not a problem of Unicode itself.

With NBSP, you get this result:   (U+00A0 NBSP, U+093E VOWEL SIGN AA)
which often shows a square, probably because many fonts don't have a glyph for
the isolated form of the vowel sign.

It is true that this looks like a problem because the dotted circle should not
appear here after showing the NBSP character (because it creates a single
grapheme cluster that should be recognized as such, even if this cluster
contains two combining sequences as it contains two base characters), but the
problem is in the Mangal font itself (or in the UniScribe engine in Windows),
not in Unicode.

In fact you could as well wonder how to represent an isolated form of other
Indic combining characters like an anusvara or candrabindu, but here also
Unicode specifies that they should be coded after a space or preferably a NBSP:
  (NBSP),   (NBSP, ANUSVARA),   (NBSP, CANDRABINDU),   (NBSP,
VISARGA)

If dotted circles appear before the symbol, or if the symbol is shown with a
square box for a missing glyph, it's not the fault of Unicode. So the best way
would be to use a normal Indic base character, such as in:
 (LETTER A),  (LETTER A, ANUSVARA),  (LETTER A, CANDRABINDU), 

(LETTER A, VISARGA)
where the sequences look more familiar with the normal Devanagari orthographic
and calligraphic rendering rules implemented in usual fonts.

 Also, something which is probably very relevant to Avarangal, fact is the
 implementation from a major vendor in the field, Microsoft Uniscribe, does
 retain the dotted circle (if present in the font; if not, you would probably
 get the .missing glyph instead).

I'm not sure that UniScribe is the cause of this problem. There just appears to
exist no GSUB rule in some fonts like Mangal to handle the case of NBSP followed
by a Indic vowel sign or combining character, to map them to a single glyph
without 

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Philippe Verdy
At end of my response to Antoine Leca, I suggested something which may merit
some comments:

 What is clear is that there's no way to enable these features explicitly in
plain-text files, if there's no standard format control in Unicode to enable
these OpenType font features. May be these could become new characters to
allocate in plane 14?

What I mean here is that there's currently no defined way to convey in plain
text files the intended rendering features that are now common in OpenType
fonts and engines.

What we currently have is the script identification and the language
identification with language tags in plane 14, but languages tags reaveal much
useless, unlike the feature tags that we currently cannot encode.

Is there some pending proposal to encode a new set of FEATURE TAGs, in the same
spirit as LANGUAGE TAGs in plane 14? Or to use a new leading character in the
LANGUAGE TAGs block to mark the begining of a feature tag instead of a language
tag (this would require only 1 codepoint allocation, for example E0002)? It
would find an immediate application within OpenType renderers, which could be
instructed to set or unset some rendering features found today in common fonts,
and that could be transported in plain text files, rather than only in rich-text
file formats like XML-based documents or Word documents or CSS stylesheets (if
such possibility gets added and standardized into CSS).




Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Avarangal wrote:
 display dependent vowels without dotted circles.

 Can any one provide information on the sequences used for
 diplaying and printing dependent vowels as standalones.

Microsoft's Uniscribe allows you to display a dependent vowel with the
following sequence (to be followed precisely): U+0020 U+200D U+0Bxx. U+00A0
does not work. Neither does U+200D.
Also, this should be the first characters in the string passed to the
Windows API: if there are some characters before, they will not trigger the
special behaviour, and you will end with the circle.

Please note that trying to display something a bit more complex, like U+0020
U+200D U+0BC6 U+0BD7 or U+0020 U+200D U+0BBF U+0B82, will fail.

[ I am sorry for the misleading words I had in earlier answers to others. It
costs me some time to figure out exactly what does this tool. ]

Hope this helps,

Antoine




RE: Printing and Displaying Dependent Vowels

2004-03-26 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Philippe Verdy


 At end of my response to Antoine Leca, I suggested something which may
merit
 some comments:

Does that imply that it might also *not* merit comments?

 
  What is clear is that there's no way to enable these features
explicitly in
 plain-text files, if there's no standard format control in Unicode to
enable
 these OpenType font features. May be these could become new
characters to
 allocate in plane 14?

This sounds suspiciously like courtyard codes. (Wonders to self: Are
Philippe Verdy and William Overington aliases for the same person?
:-)


 What I mean here is that there's currently no defined way to convey in
plain
 text files the intended rendering features that are now common in
OpenType
 fonts and engines.

Nor should there be, any more than there should be ways in plain text to
indicate typeface, point size, style, etc. There is a class of
representations for such information called rich text, and such
representation has been and will very likely continue to be beyond the
scope of plain text.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
On Friday, March 26, 2004 7:12 PM, Philippe Verdy va escriure:

 Indic scripts are a bit unique by the fact that they have a syllabic
 structure decomposed into separate letters with a base consonnant and
 a combining (this is not the proper term for Unicode) vowel
 modifier after it. This differs from European alphabets (Latin,
 Greek, Cyrillic) or even from some Asian or African syllabaries
 (notably Hiragana/Katakana) where these grapheme clusters are (almost
 always) combining sequences are coded with a base character and
 diacritics.

Where exactly is the difference with say IPA?
And with Vocalized Perso-Arabic?

(And it is not all Indic scripts: Thai and Lao behave differently)


 Indic scripts offer several variations here because there are also
 half-forms for these vowels,

Please, define half form for vowel. This is new to me.


 A sample with Devanagari could be:   (U+0905 LETTER A, U+093E
 VOWEL SIGN AA) which should normally be presented like the
 precomposed:  (U+0906 LETTER AA), but which incorrectly displays
 the dotted circle with the Mangal font.

Mangal has nothing to do with this. What you are seeing and critizing is
Uniscribe's implementation, fruit of a compromise between performances and
dealing with special/inusual cases. This case is not clearly specified by
the Devanagari Open Type specifications, but it appears that the default
behaviour (considering U+093E as dependent vowel shown in isolation, and
rendering it with the added circle) has been elected here by the
implemention. In my own implementation of the same specifications, I
consider this is a perfectly correct and useful sequence (used in India to
teach the sillabary), so I do not insert the circle and as a result (with
Mangal) it is shown as you expect.

 So an author has to make some notational compromizes here. But still,
 I do think that using NBSP as this empty/null base consonnant before
 the dependant vowel will create the intended Unicode default grapheme
 cluster.

About NBSP: I hope Paul will read my other post (direct to Avarangal) and
will enhance Uniscribe on this respect, allowing NBSP to behave the same as
SPO on this respect. I am not sure here (one should look at Unicode 2.0),
but I seem to record the behaviour with NBSP has been added around 3.0, and
since Uniscribe has been designed against 2.0...


 Then it's up to the font or renderer to show the NBSP+vowel
 cluster properly, without the dotted circle, but it's not a problem
 of Unicode itself.

OFF-TOPIC
I am reading the Unicode list for quite some time (and sorry Philippe, but I
speak about time previous to when you came in). I do not know why, but every
now and then, there are comments from regulars that says This is not a
defect of Unicode itself, even when nobody is even thinking such a thing.
On a psychological point of view, this is quite interesting. ;-)
/OFF-TOPIC

 If dotted circles appear before the symbol, or if the symbol is shown
 with a square box for a missing glyph, it's not the fault of Unicode.

Again! ;-)


 Also, something which is probably very relevant to Avarangal, fact
 is the implementation from a major vendor in the field, Microsoft
 Uniscribe, does retain the dotted circle (if present in the font; if
 not, you would probably get the .missing glyph instead).

 I'm not sure that UniScribe is the cause of this problem.

I am pretty sure it is! Because if he were using Freetype, he would not have
any problem to display the standalone glyph. :-D

Something more complex would be to have some way to display *various*
representation of the dependent vowels; in Tamil U+0BC1 and U+0BC2, which
come to mind, show too much variation, there is not likely to have that one
glyph in the font. But for the well-known Burmese AA U+102C or in
Traditional Malayalam U+0D41 and U+0D42 this might be an open question.
Here again, using Freetype this is perhaps doable, but with some
higher-level engine it would be much more complex. If the need for it
arises, probably the option would be to define a user-accessible OpenType
feature (of alternative kind).

 There just
 appears to exist no GSUB rule in some fonts like Mangal to handle the
 case of NBSP followed by a Indic vowel sign or combining character,

Well, we are quite away from the original subject, but anyway...
You are missing something important about the Indic OpenType specifications.
Besides, in fact before, the substitutions and after that the positioning,
which are encoded as TTO tables GSUB and GPOS, there are two stages called
analysing and then reordering. Analysing deals mainly with splicing the
stream into clusters. Reordering then does a number of operations, and this
is this step that will insert the dotted circle. Or will not, depending how
it is programmed.

 I'm not an expert of UniScribe programming, but there may exist some
 Indic features in Indic fonts, which can be enabled in UniScribe to
 change the rendering behavior by including some additional (optional)
 GSUB/GPOS 

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Philippe Verdy va escriure:

 Space is a base character, then it combines with the next diacritic
 with which it creates a default grapheme cluster which should be
 interpreted as if it was a single character identity.

Agreed so far for diacritics. Agreed also for non-spacing dependent vowels
like U+0BC0. Agreed for the special exceptions like u+0BBE. I disagree for
U+093F or U+0BBF (Mc not included in Other_Grapheme_Extend, there is an
allowed break before it), until there is something I missed here.

 It is NOT defective.

I do not understand. I did say anything implying that, did I? I just
remarked that I was not able to fetch in the text of the standard some words
to require from vendors and implementers (like I am) solid base to make them
modify their engines to provide special exceptions to deal with the
combination U+0020/U+00A0 then U+093F.

And no, this is not the same as displaying a diacritic, because it should be
re-ordered, rather than being a spacing representation of diacritics.


 Now how would you interpret differently SPACE+diacritic or
 SPACE+vowel sign?

See above.

 If you display a dotted circle there, then you'll
 display two separate glyphs for a single grapheme cluster, and this
 is not intended by the normal Unicode character model.

?

How do you believe anybody will show say u+0063 u+0300? Which font have this
as a single glyph?

Furthermore, a single character like U+0916 (Devanagari KHA) is very often
rendered with two glyphs (namely, Half-Kha then the glyph also used for the
AA-matra, U+093E). Unicode does not enter into knowing how does this stuff
is handled.


Antoine




Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Philippe Verdy
From: Peter Constable [EMAIL PROTECTED]
   What is clear is that there's no way to enable these features
 explicitly in
  plain-text files, if there's no standard format control in Unicode to
 enable
  these OpenType font features. May be these could become new
 characters to
  allocate in plane 14?

 This sounds suspiciously like courtyard codes. (Wonders to self: Are
 Philippe Verdy and William Overington aliases for the same person?
 :-)

I can ensure you that this is not the same person (look at the country of origin
detected in the IP address if you are still not convinced).

  What I mean here is that there's currently no defined way to convey in
  plain text files the intended rendering features that are now common in
  OpenType fonts and engines.

 Nor should there be, any more than there should be ways in plain text to
 indicate typeface, point size, style, etc. There is a class of
 representations for such information called rich text, and such
 representation has been and will very likely continue to be beyond the
 scope of plain text.

Note that I was not speeking strictly about style, but about the way to mark the
text to allow or disallow some script features. This remains something optional
for the renderer, and this can be ignored as well without breaking the encoded
text. What I mean here is a set of format controls which help to the
interpretation of the text by renderers.

Yes of course we could define all these at the rich text format level (for
example in CSS if it has such functions to select alternate rendering options).
But when I look at what OpenType features perform (I don't mean the content of
the associated extra GSUB/GPOS tables which is not what I mean here) it looks
like they are designed to be used for particular languages or scripts, in a way
that can be used across multiple font designs.

So a font may implement a feature and another may not. This looks very similar
to a sort of meta-tagging within the middle of the text to add semantics to it,
which can then be used by various renderers and fonts to adapt its style on the
fly.

This was already the case when language tags were added to Unicode. And OpenType
can now include language-specific features which can be triggered by the
presence of these tags. But in reality, most font features implemented today are
not performed at the language level but with a finer grained level after the
language level). And there's no similar way to tag the text with these features.

Please don't consider this was a proposal, just a question about the feasibility
of applications that need to use such script-specific features, as part of their
regular text processing, without even needing it at the graphic level (when I
look at some OpentType features, their 4-character labels may become part of the
text-level processing, without even needing any glyph processing in the
application using these tags.

So reread my question (this was not a RFE) like this: are there semantics in
these feature tags (yes, just the 4-letters IDs of these tags, not the content
of the GSUB/GPOS tables to which they may be mapped in a specific font) which
would need a way to represent them as format controls within the plain-text
stream?

I think that such semantic exists for these, which may be used or left unused in
some presentation by a renderer, but may have its own application for plain-text
handling (without any glyph processing). I suggested that they may be encoded in
plane 14, possibly among language tags, but this was just a suggestion if they
ever need to be encoded somewhere.




RE: Printing and Displaying Dependent Vowels

2004-03-26 Thread Peter Constable

  This sounds suspiciously like courtyard codes. (Wonders to self:
Are
  Philippe Verdy and William Overington aliases for the same
person?
  :-)
 
 I can ensure you that this is not the same person (look at the country
of origin
 detected in the IP address if you are still not convinced).

Well, the originating address that is reported when the message arrives
at the list server doesn't guarantee that that's really where it came
from, as we all know. You haven't convinced me yet. :-)



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Peter Kirk
On 26/03/2004 12:02, Peter Constable wrote:

...

This sounds suspiciously like courtyard codes. (Wonders to self: Are
Philippe Verdy and William Overington aliases for the same person?
:-)
 

...
Peter, I notice that you have found time while looking at this thread to 
criticise Philippe's ramblings and speculate about his identity. Perhaps 
you can use some of your time more profitably in answering the questions 
about Uniscribe and its treatment of sequences like space, diacritic 
and NBSP, diacritic. Are these dependent on the font, as some have 
suggested, or are they prescribed by Uniscribe? Do different versions of 
Uniscribe differ in this respect, as I rather think?

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



RE: Printing and Displaying Dependent Vowels

2004-03-26 Thread Peter Constable
 From: Peter Kirk [mailto:[EMAIL PROTECTED]


 Peter, I notice that you have found time while looking at this thread
to criticise 
 Philippe's ramblings and speculate about his identity.

Yes, he and I have having fun offline debating his identity :-)


 Perhaps
 you can use some of your time more profitably 

Hmmm... I perceive you don't approve of how I've been using my time.
You're right: I should spend less time replying on this list and more
time on the projects in my yearly objectives ;-)


 in answering the questions
 about Uniscribe and its treatment of sequences like space, diacritic
 and NBSP, diacritic...

I have been corresponding with the original inquirer offline to find out
more precisely what the issues and requirements of the users he's
representing are. It's more of a priority for me to discover that than
to discuss details regarding Uniscribe behaviour I don't actually know
about for certain (and that I can't change in the immediate future).
 

 Are these dependent on the font, as some have
 suggested, or are they prescribed by Uniscribe? Do different versions
of
 Uniscribe differ in this respect, as I rather think?

At present, I don't know the answer. I know this is something we have
intended to support, but I don't get that behaviour on the particular
system I'm using at the moment. I will keep it in mind as an issue to
review in the next version of our Indic shaping engine.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Printing and Displaying Dependent Vowels

2004-03-25 Thread Avarangal



We are in the process of updating Tamil keyboard 
drivers and one of the requirements by educational establishments is the ability 
to print and display dependent vowels without dotted circles.

Can any one provide information on the sequences 
used for diplaying and printing dependent vowels as standalones.

Srivas


Re: Printing and Displaying Dependent Vowels

2004-03-25 Thread jcowan
Avarangal scripsit:

 Can any one provide information on the sequences used for diplaying
 and printing dependent vowels as standalones.

The standards-conforming way to do so is to precede the dependent vowel with a
space character (U+0020).  If this sequence is not displayed correctly, complain
to your software or font vendor, but it should be.

-- 
John Cowan   http://www.ccil.org/~cowan[EMAIL PROTECTED]
You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
Clear all so!  `Tis a Jute (Finnegans Wake 16.5)



Re: Printing and Displaying Dependent Vowels

2004-03-25 Thread Chris Jacobs

- Original Message - 
From: [EMAIL PROTECTED]
To: Avarangal [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, March 25, 2004 10:33 PM
Subject: Re: Printing and Displaying Dependent Vowels


 Avarangal scripsit:

  Can any one provide information on the sequences used for diplaying
  and printing dependent vowels as standalones.

 The standards-conforming way to do so is to precede the dependent vowel
 with a space character (U+0020).

Yes.

 If this sequence is not displayed correctly, complain to your software or
 font vendor, but it should be.

Here I disagree. A font does not have to support each and every combining
sequence. If he needs fonts which support combining sequences starting with
a space char he surely should look for those, but that is no reason to
complain about those fonts that dont.




Re: Printing and Displaying Dependent Vowels

2004-03-25 Thread Doug Ewell
Chris Jacobs chris dot jacobs at freeler dot nl wrote:

 If this sequence is not displayed correctly, complain to your
 software or font vendor, but it should be.

 Here I disagree. A font does not have to support each and every
 combining sequence. If he needs fonts which support combining
 sequences starting with a space char he surely should look for those,
 but that is no reason to complain about those fonts that dont.

What John meant was, don't complain to Unicode if this doesn't work,
because this is the standard way of doing it.

A font does not have to support everything, but it's not Unicode's fault
if one doesn't.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/