Re: Printing and Displaying Dependent Vowels
# In charts and illustrations in this standard, # the combining nature of these marks # is illustrated by applying them to a dotted circle, How should be such chart coded? The character 25CC DOTTED CIRCLE was mentioned as a possible base character, but the on-line reference says: note that the reference glyph for this character is intentionally larger than the dotted circle glyph used to indicate combining characters in this standard. IMO the correct base glyph (at least for Latin diacritic) should look like a dotted letter o. P.A.
Re: Printing and Displaying Dependent Vowels
On Monday, March 29, 2004 8:11 PM John Cowan va escriure: Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. Ah! This is the place where I did not seek into! (It was not obvious to me that text about the dependent vowel marks has to be searched into the European alphabetical scripts section! But as Ken pointed out elsewhere, I should have known better: Obviously, one must know the whole standard text, and the history of it, before making any assumption about signification of any given section: after all, this is not an ISO standard.) Many thanks John for pointing this out. This is where (p. 187) the remarks about SP and NBSP appear: # Marks as Spacing Characters. By convention, combining marks may be OK, this one says it should applies to all combining, and does not make any distinction between spacing and nonspacing. So the issue appears now clear (and we implementers of rendering tools have now work to do, haven't we?) Now I will fill erratum reports for all the discording things I have found. Antoine
Re: Printing and Displaying Dependent Vowels
On 29/03/2004 16:28, Kenneth Whistler wrote: ... Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. NBSP is not a fixed-width space. Yes it is, in Unicode 4.0.0. Ernest quoted from UAX #14 All other space characters have fixed width. This may be in the standard by mistake, but it is in the standard. Asmus says that this will be changed in 4.0.1, but that has not yet been released. If a statement is written in a standard, even in the introduction to a different section, that is normative. I would like to repeat my earlier proposal for a new character ISOLATED COMBINING MARK BASE. This character would have no glyph, and the general properties of a letter. Its spacing would be just as much as required for proper display of the combining mark - which would be zero for combining marks which have their own width. And after 15 years presence in the standard (or its earlier drafts) of the SP + CM recommendation, what makes you think that introduction of a *new* convention using a *new*, special purpose format control character sorta like a space only different, would lead to any better situation in actual practice? Use of such a character would *NOT* resolve the differences regarding how to display such a combination, by the way. I would be happy for NBSP to be used in this way, now that it has been clarified that this should not be considered fixed width when followed by a combining mark. I would like to see a clear recommendation (not a conformance requirement, I agree) that the sequence NBSP, non-spacing combining mark should be rendered as a spacing version of the mark with just enough space for the mark and no added glyph. My reason for preferring NBSP to SPACE is that it is unambiguously non-breaking and (I think) not a word boundary. But this doesn't solve the Tamil etc problem as what is needed there is a non-spacing non-breaking base character which can allow the vowel to display without the dotted circle. Perhaps ZWJ would be suitable. ... Well, as I understand it NBSP is often expected to be a fixed-width space, and it is in many implementations. In fact I think it ought to be, whether or not this is actually specified. But there ought to be a character which is explicitly NOT fixed width to carry NSMs. There are *two* such characters: SPACE and NBSP. You mean, there will be in 4.0.1. The problem with SPACE is a different one. ... The intent of the UTC and the editors has always seemed clear to me on this particular point -- and the fact that the text in question has survived 3 major reeditings of the entire standard without significant change indicates to me that this has not been a problematical part of the standard for the UTC. Well, a text needs to be clear to its readers, not just to its authors. Obviously this text is not clear to readers, even ones as experienced as John Cowan, and so needs clarification. So assuming that combining mark means combinining character rather than non-spacing mark (the term does not appear in the Glossary), it seems that combining vowels should work fine with SP or NBSP. This, however, is a textual problem which should be addressed. As it stands, Section 7.7, Combining Marks deals with various types of combining characters, including non-spacing combining marks and enclosing combining characters. It does not say anything explicit about Indic dependent vowels, in part because of its textual history. In that case something clear and sensible needs to be added about Indic dependent vowels. Peter Kirk continued: But it is a source of great confusion to everyone when a widely used application does something clearly different from what the standard intends, and yet claims conformance even if technically this is correct. What the standard intends is that the textual representation (encoding) of an isolated combining mark be done via the sequence SP, CM. It does not *require* or *not require* that the visual rendering of such a sequence be done with or without a dotted circle indicating the absence of an expected normal base letter. In fact, the standard itself makes widespread and explicit use of the convention to display such combinations *with* a dotted circle. Well, the standard clearly intends that the character for a is rendered with the glyph a and not the glyph b. It may not formally require this, but a system which breaks this rule, while possibly formally conformant, can hardly claim to support Unicode properly. One convention for display of isolated combining marks is to use a dotted circle. But this convention is far from universal across all writing systems. It is wrong to impose it on all systems -
Re: Printing and Displaying Dependent Vowels
On 30/03/2004 04:31, John Cowan wrote: Peter Kirk scripsit: Yes it is, in Unicode 4.0.0. Ernest quoted from UAX #14 All other space characters have fixed width. This may be in the standard by mistake, but it is in the standard. Asmus says that this will be changed in 4.0.1, but that has not yet been released. If a statement is written in a standard, even in the introduction to a different section, that is normative. This is just false. All standards known to me have both normative and informative parts; there can be no presumption that a certain text is normative merely because it is in the standard. It's true that the Unicode Standard in particular does not always clearly distinguish between normative and informative text; but in general it would surprise me if anything said in an introduction was to be taken as normative. I accept that some standards do have sections which are described as informative, and as such they are an exception to what I wrote. But as the purpose of a standard is to be normative, it is reasonable to assume, as I have, that its text is normative unless otherwise indicated. From the introductory material (not the oddly named section 3 Introduction) to UAX 14, http://www.unicode.org/reports/tr14/: /This document has been reviewed by Unicode members and other interested parties, and has been approved by the Unicode Technical Committee as a *Unicode Standard Annex*. This is a stable document and may be used as reference material or cited as a normative reference by other specifications./ The implication is that this whole document, not just parts of it, is normative. But this doesn't solve the Tamil etc problem as what is needed there is a non-spacing non-breaking base character which can allow the vowel to display without the dotted circle. Perhaps ZWJ would be suitable. The use of SP or NBSP works fine for vowels as well as other combining characters. No. At least it does not work for spacing combining marks unless the space of NBSP is compressed to zero width, which you said earlier was not permitted. Alphabets etc are commonly listed in columns, and those columns need to be straight. If one item in the column is preceded by a space of non-zero width, the column will not line up. I accept that formatting details like this are outside the scope of Unicode, but I do think that Unicode should not make it impossible to display spacing combining marks as part of an aligned column. In that case something clear and sensible needs to be added about Indic dependent vowels. +1 I would say that if specific products do not support dictionaries, indexes or literacy primers in Tamil, they cannot claim to support Tamil. This is extremist. Not only products, but whole standards, have rightly claimed to support English without being able to support the specialized requirements of dictionaries -- for IPA or another phonetic spelling system, for syllabication dots, for condensed typography, for the ability to set text in multiple tight columns. Indeed, it may be fairly said that even now Unicode does not provide full support for all the characters used in English lexicography. IPA and other phonetic spelling systems are not part of the English writing system, and so do not need to be supported as part of it. Tamil vowels are part of the Tamil writing system, even in isolation, and so do need to be supported by it. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
At 07:31 -0500 2004-03-30, John Cowan wrote: Peter Kirk scripsit: Yes it is, in Unicode 4.0.0. Ernest quoted from UAX #14 All other space characters have fixed width. This may be in the standard by mistake, but it is in the standard. Asmus says that this will be changed in 4.0.1, but that has not yet been released. If a statement is written in a standard, even in the introduction to a different section, that is normative. This is just false. All standards known to me have both normative and informative parts; there can be no presumption that a certain text is normative merely because it is in the standard. John is correct here, but it is also true that All other space characters have fixed width is a fairly strong declaration. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Printing and Displaying Dependent Vowels
Peter Kirk scripsit: I accept that some standards do have sections which are described as informative, and as such they are an exception to what I wrote. But as the purpose of a standard is to be normative, it is reasonable to assume, as I have, that its text is normative unless otherwise indicated. History, the facts of other standards, and the explicit statements of participants in the Unicode Consortium argue otherwise. It is all very well for A.P. Herbert's justice to say that if Parliament does not mean what it says, it must say so, but the Unicode Standard is not a code of laws. /This document has been reviewed by Unicode members and other interested parties, and has been approved by the Unicode Technical Committee as a *Unicode Standard Annex*. This is a stable document and may be used as reference material or cited as a normative reference by other specifications./ The implication is that this whole document, not just parts of it, is normative. By no means. One may make a normative reference to a standard that contains informative material. The meaning of Standard A makes a normative reference to standard B is merely that it is as if the text of standard B were incorporated within standard A. For example, the XML Recommendation makes normative reference to the Unicode Standard; it is as if the former included the latter in its entirety, normative and informative parts both. An informative reference, OTOH, is one which the compiler of the referencing standard thinks will be useful in aiding interpretation; it is not implicitly incorporated in any way. No. At least it does not work for spacing combining marks unless the space of NBSP is compressed to zero width, which you said earlier was not permitted. Fair enough. Normally, SP and NBSP cannot disappear, but this is a context in which they plausibly could and should. IPA and other phonetic spelling systems are not part of the English writing system, and so do not need to be supported as part of it. Tamil vowels are part of the Tamil writing system, even in isolation, and so do need to be supported by it. But they form no part of texts written in Tamil, save those texts that make reference to Tamil orthography. If I am writing a book that teaches how to hand-write English, I will need to be able to represent components of graphemes, but that does not require a general mechanism for representing such components in isolation. -- John Cowan [EMAIL PROTECTED] www.reutershealth.com www.ccil.org/~cowan Consider the matter of Analytic Philosophy. Dennett and Bennett are well-known. Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett. There is also one Dummett. By their works shall ye know them. However, just as no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly known by his works. Indeed, Bummett does not exist. It is part of the function of this and other e-mail messages, therefore, to do what they can to create him.
Re: Printing and Displaying Dependent Vowels
At 04:28 PM 3/29/2004, Kenneth Whistler wrote: I will say again as I have said before - but the above (and what I snipped) is extra evidence for it - that what is broke ... is the rule that the isolated (generally spacing) form of a combining mark should be formed by SPACE or NBSP followed by the combining mark. This has been the *intent* of the standard since its inception in 1989. There are many good reasons for not using SPACE for this, including default behaviour like inserting line breaks immediately after SPACE. Nope. UAX #14 specifies the following regarding SPACE followed by combining marks: If U+0020 SPACE is used as a base character, it is treated as AL instead of SP. This is an unfortunate typo in UAX#14. The correct statement is: If U+0020 SPACE is used as a base character, it is treated as ID instead of SP. see the description of these issues in the rules section of the UAX which are quite explicit: LB 7a In all of the following rules, if a space is the base character for a combining mark, the space is changed to type http://www.unicode.org/reports/tr14/#IDID. In other words, break before http://www.unicode.org/reports/tr14/#SPSP http://www.unicode.org/reports/tr14/#CMCM* in the same cases as one would break before an http://www.unicode.org/reports/tr14/#IDID. Treat SP CM* as if it were ID As stated in [http://www.unicode.org/reports/tr14/#UnicodeUnicode], Section 7.7 Combining Marks, combining characters are shown in isolation by applying them to either U+0020 SPACE (SP) or U+00A0 NO- BREAK SPACE (NBSP). The visual appearance is the same, but the line breaking result is different. Correspondingly, if there is no base, or if the base character is http://www.unicode.org/reports/tr14/#SPSP, http://www.unicode.org/reports/tr14/#CMCM* or http://www.unicode.org/reports/tr14/#SPSP http://www.unicode.org/reports/tr14/#CMCM* behave like http://www.unicode.org/reports/tr14/#IDID. This means that a combining character sequence of this type is treated as a unit for the purposes of line breaking, and this overrides the behavior otherwise of SPACE to be treated as a line break opportunity. There's never a line break opportunity between a SPACE and a combining mark, but since SP is treated like an ID (ideopgrahic line breaking class), there are break opportunities *before* the SP that will not be there if an NBSP is used. Of course UAX #14 only spells out default behavior, but then default behaviour is what was claimed just above. Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. NBSP is not a fixed-width space. Correct. Somewhere in the standard, we should point out that using a space/NBSP as base character does not require these spaces to be at the same widths as elsewhere in the text, but that they can (and should) be adjusted to best serve this 'base character' function. A./
Fixed Width Spaces (was: Printing and Displaying Dependent Vowels)
[Original Message] From: Asmus Freytag [EMAIL PROTECTED] At 12:19 PM 3/29/2004, Ernest Cline wrote: UAX #14 makes a rather definitive statement on this issue, albeit in an obscure place, in Section 3: Introduction. 4.0.1 will amend that section to correct the wrong impression that NBSP is fixed width and to clarify that this statement is not intended to cover any specialized cases, but just ordinary typographical conventions: I'm sorry if the fact that the placement and context of text was not enough to guide the reader. Note that the 'obscure place' was in the introduction (!) of the UAX, where it was a mere note on a subject not actually covered by the UAX (i.e. line layout) that nevertheless forms the context in which linebreaking happens. True, but it was the only guidance on the subject that is present in Unicode 4.0.0, and there do exist widely used applications that do treat NBSP as a fixed width space. Still, there is a need for a fixed width space with a width equal to the unjustified width of a normal space . With NBSP being ruled out for that job, that leaves FIGURE SPACE, MMSP, and FOUR-PER-EM SPACE as the closest alternatives, but none of them are guaranteed to be exactly that width, even if they are available. I suppose suspending justification for just one space via a higher-level protocol could work, except I'm not aware of any such protocol that works at a fine-enough grain to do that. Also, one could by that argument also argue that many of the current fixed width spaces could be handled by higher level protocol as well. Perhaps a possible U+2064 NONJUSTIFYING SPACE would make sense with line breaking class BA like most of the other fixed width spaces. (I would have preferred proposing U+205E to place it adjoining MMSP, but that code point is already in the pipeline.)
Re: Fixed Width Spaces (was: Printing and Displaying Dependent Vowels)
Peter Kirk scripsit: In each of these cases FIGURE SPACE may be appropriate. Are any of these alternative spaces non-breaking? That is also a requirement in my last two applications. You can make anything non-breaking by putting ZWNBSP on both sides of it. -- John Cowan www.ccil.org/~cowan www.reutershealth.com [EMAIL PROTECTED] All isms should be wasms. --Abbie
Re: Fixed Width Spaces (was: Printing and Displaying Dependent Vowels)
Asmus Freytag wrote: and I don't know whether FOUR-PER-EM is the width of a typical space. FOUR-PER-EM is 1/4 of an em, always. A typical space, however, varies in width depending on the font. ~fantasai
Re: Printing and Displaying Dependent Vowels
Hi James, All, If this is treated as a Unicode issue rather than a display issue, then one solution would be for someone to propose a new character, (back on topic a little bit) COMBINING DOTTED CIRCLE FOR COMBINING MARKS. Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine could be changed to insert this new character. Then, these updated rendering engines could be distributed and font developers could add the new characters to fonts and distribute updated fonts. This might just take a while, but it wouldn't be too hard to find examples of the character in actual text use to accompany the proposal... If it ain't broke, don't fix it. So, is it 'broke'? Your argument about not spotting errors, when SPACE+COMBINING SOMETHING gets rendered without the dotted circle looks convincing, but lacks consistency: The SPACE character can be used to transform the combining marks from the U+0300..U+03BF range into spacing characters. But that aside, it would be better to not to use SPACE for this purpose, for reasons you mentioned. So just any Unicode codepoint sequence which turns combining marks into spacing glyphs would be a solution (only the first answers to Srivas' question inidicated, that SPACE is to be used according to Unicode). One may be able to conjure a new Unicode codepoint ISOLATED COMBINING MARK for this purpose, but amongst all spaces and dubious characters at U+20?? something existing should be found adequate. Regards, Peter Jacobi -- +++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++ 100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
Re: Printing and Displaying Dependent Vowels
On Sunday, March 28, 2004 12:03 AM, James Kass wrote: So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. If you do so, you will end with defeating the normal behaviour that is to draw a circle when someone makes an error while typing. Depending on the intent of the font, it may or may not be a good idea. Since Avarangal seems to be now under non disclosure agreement with Microsoft, we do not know for sure what is his intent. We also do not know if there are variations between releases (I hear there are, but do not feel it is my job to investigate it), or generally what are the real specifications in this area (the official being that the sequence SP+ZWJ+some_mark renders without displaying the circle, but we know it is not always enforced). In the general case of a font intended for general use, and if the rendering without the circle is intended in special cases like drawing a keyboard layout for reference, I still believe it is better to have the circle and resort to special manipulations, like SP+ZWJ+vowel or drawing directly with ExtTextOut(ETO_GLYPH_INDEX), in order to draw the keyboard layout. At least, because complexing a font to cure a defect into a version of one (the) rendering engine does not seem to me an engineering solution. (I since read your other post that rather seems to agree with me) Another approach is to simply use a non-OpenType Unicode TrueType font for Tamil. The dotted circles don't seem to ever appear unless the font-in-use has OpenType tables covering the script-in-use. Right. (The only remaining problem will then be the overhang and centering). Antoine
Re: Printing and Displaying Dependent Vowels
Antoine Leca scripsit: In the general case of a font intended for general use, and if the rendering without the circle is intended in special cases like drawing a keyboard layout for reference, I still believe it is better to have the circle and resort to special manipulations, like SP+ZWJ+vowel or drawing directly with ExtTextOut(ETO_GLYPH_INDEX), in order to draw the keyboard layout. The bottom line is that SP+vowel and NBSP+vowel are prescribed by the Unicode Standard, and if they don't work (at least the former; for the latter, one can weasel out by claiming conformity with earlier versions of the Standard) the system is broken. -- A rabbi whose congregation doesn't want John Cowan to drive him out of town isn't a rabbi, http://www.ccil.org/~cowan and a rabbi who lets them do it [EMAIL PROTECTED] isn't a man.--Jewish saying http://www.reutershealth.com
Re: Printing and Displaying Dependent Vowels
On 27/03/2004 17:17, John Hudson wrote: [EMAIL PROTECTED] wrote: So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. This spacing glyph should be a no-contour glyph, perhaps with the same advance width as U+0020. I've not tried this, but it might just work. It should work: Uniscribe inserts the U+25CC glyph that is in the font, so this could be something other than an actual dotted circle. Another option would be to map the dotted circle to a non-contour spacing glyph in one of the discretionary OpenType Layout features such as salt, which would allow users of apps supporting that feature (currently only InDesign ME, so far as I know) to choose whether or not to display the circle. John Hudson I don't like the look of this one. It might work as a kludge, but surely we should not encourage kludges in which the glyph (or non-glyph in this case) for one character, SPACE, is used with the code point of another character, U+25CC. This would cause considerable confusion for those deliberately trying to insert U+25CC, perhaps because they want to display the combining mark with a dotted circle. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
On 28/03/2004 18:35, [EMAIL PROTECTED] wrote: ... People generating texts for educational purposes will always have special needs. So, they'll always need to make special effort to get special effects. Workarounds concerning the original question have already been suggested. If this is treated as a Unicode issue rather than a display issue, then one solution would be for someone to propose a new character, (back on topic a little bit) COMBINING DOTTED CIRCLE FOR COMBINING MARKS. Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine could be changed to insert this new character. Then, these updated rendering engines could be distributed and font developers could add the new characters to fonts and distribute updated fonts. This might just take a while, but it wouldn't be too hard to find examples of the character in actual text use to accompany the proposal... If it ain't broke, don't fix it. So, is it 'broke'? Best regards, James Kass I will say again as I have said before - but the above (and what I snipped) is extra evidence for it - that what is broke (in the old or dialect sense broken rather than the modern sense without money) is the rule that the isolated (generally spacing) form of a combining mark should be formed by SPACE or NBSP followed by the combining mark. There are many good reasons for not using SPACE for this, including default behaviour like inserting line breaks immediately after SPACE. The good additional reason James has given is that SPACE followed by the combining mark is often a mistake (and so it is sensible to add the dotted circle), but there is a need in certain kinds of texts to display isolated combining marks. Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. I would like to repeat my earlier proposal for a new character ISOLATED COMBINING MARK BASE. This character would have no glyph, and the general properties of a letter. Its spacing would be just as much as required for proper display of the combining mark - which would be zero for combining marks which have their own width. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
On 29/03/2004 04:14, John Cowan wrote: Antoine Leca scripsit: In the general case of a font intended for general use, and if the rendering without the circle is intended in special cases like drawing a keyboard layout for reference, I still believe it is better to have the circle and resort to special manipulations, like SP+ZWJ+vowel or drawing directly with ExtTextOut(ETO_GLYPH_INDEX), in order to draw the keyboard layout. The bottom line is that SP+vowel and NBSP+vowel are prescribed by the Unicode Standard, and if they don't work (at least the former; for the latter, one can weasel out by claiming conformity with earlier versions of the Standard) the system is broken. I agree that this implies that the system is not conformant with the standard. But that could be because the standard is broken. So perhaps it is the standard that should be fixed, by specifiying a new preferred sequence for isolated combining marks. I realise that for backward compatibility reasons the old encoding cannot be made illegal. But it can be deprecated, and a note can be added that this sequence may not always be displayed as preferred. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
On Monday, March 29, 2004 2:14 PM, John Cowan va escriure: The bottom line is that SP+vowel and NBSP+vowel are prescribed by the Unicode Standard, I am sorry John, I should have miss a post of yours. I asked you where it is written, and did not find any answer to this; unless someone consider that all marks, including spacing combining vowels, are (European) diacritics. I did find some things in UAX29 about grapheme clusters (as indicated by Philippe), but also found that Mc characters do not seem to be concerned (Mn, on the other hand, seems to are). I now understand that any base followed by a Grapheme_Extend are to be seen as a cluster. I found Grapheme_Extend as being defined as Other_Grapheme_Extend + Me + Mn in the UCD. (But was not able to encounter this in the standard itself. Never mind, I should have miss something obvious.) I am sorry to insist on these issues. I have really big problems to understand where are the specifications, when chapter 2.10 inside the Unicode book says one thing while dealing directly with the issue, while another document that is supposed to be as standard as well, says otherwise, or better is to be interpreted otherwise, and still none of them match exactly with what people are expecting in this forum. (And furthermore when asked about issues of conformance, the former answer was, it does not matter, or it should not matter, or depending on what you are doing, etc., in a word ways to avoid answering the original question.) if they don't work [...] the system is broken. As James eloquently showed earlier today, I am not that sure we want things this way. The text in The Unicode Standard explicitely refers to the case of the European diacritics. There (well, here!), because of typing habits (use of so-called dead keys), users expects that combination of a diacritics and a space is rendered as a spacing clone of the diacritic. I read the 2.10 snippet as guarding this convention. (Of course, this is my interpretation, I can very easily be wrong.) On the other hand, typing habits in other parts of the world are not that entrenched. After all, dead keys are with us for more than a century, while keyboard for combining characters that may reorder before the preceding characters are only twenty years old. Furthermore, custom is to provide disambiguating ways, such a bell (Thai) or a dotted circle, when a vowel is mistyped. Evidently, Microsoft did follow this when they designed Uniscribe/Indic OpenType. What you are saying is that when a mistyped vowel follow a space character, it should appear hanging from nothing, while situation will be different is typed after virama, or another vowel, or some other mark. As I said, I am not sure this is what we really want. Antoine
RE: Printing and Displaying Dependent Vowels
The bottom line is that SP+vowel and NBSP+vowel are prescribed by the Unicode Standard, and if they don't work (at least the former; for the latter, one can weasel out by claiming conformity with earlier versions of the Standard) the system is broken. Or the system is conformant but doesn't support everything in the standard. Peter Constable
Re: Printing and Displaying Dependent Vowels
Peter Kirk scripsit: Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. You don't actually say so, but you give me the impression that you think NBSP is a fixed-width space. It isn't; it can assume any width greater than zero, just as SPACE can; in particular, when used before a NSM, I would expect it to have the same width as the NSM. I would like to repeat my earlier proposal for a new character ISOLATED COMBINING MARK BASE. This character would have no glyph, and the general properties of a letter. Its spacing would be just as much as required for proper display of the combining mark - which would be zero for combining marks which have their own width. Except for not being letters, SP and NBSP have, or ought to have, exactly this behavior. -- Well, I'm back. --SamJohn Cowan [EMAIL PROTECTED]
Re: Printing and Displaying Dependent Vowels
On 29/03/2004 06:35, Peter Constable wrote: The bottom line is that SP+vowel and NBSP+vowel are prescribed by the Unicode Standard, and if they don't work (at least the former; for the latter, one can weasel out by claiming conformity with earlier versions of the Standard) the system is broken. Or the system is conformant but doesn't support everything in the standard. Peter Constable You can't get away with it that easily. If the standard specifies that space, combining mark should be displayed as an isolated combining mark, then it would be conformant for a partial implementation to display this sequence as nothing or as an illegal sequence. But if the system attempts to display the sequence in a meaningful manner, it must do so according to the standard, i.e. not as dotted circle plus combining mark. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
On 29/03/2004 06:56, John Cowan wrote: Peter Kirk scripsit: Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. You don't actually say so, but you give me the impression that you think NBSP is a fixed-width space. It isn't; it can assume any width greater than zero, just as SPACE can; in particular, when used before a NSM, I would expect it to have the same width as the NSM. Well, as I understand it NBSP is often expected to be a fixed-width space, and it is in many implementations. In fact I think it ought to be, whether or not this is actually specified. But there ought to be a character which is explicitly NOT fixed width to carry NSMs. Also you do say that NBSP must have a width greater than zero, but for some combining marks (those which are not non-spacing, and arguably even some which are) this base character should have zero width. I would like to repeat my earlier proposal for a new character ISOLATED COMBINING MARK BASE. This character would have no glyph, and the general properties of a letter. Its spacing would be just as much as required for proper display of the combining mark - which would be zero for combining marks which have their own width. Except for not being letters, SP and NBSP have, or ought to have, exactly this behavior. Well, there are several differences. An obvious one is that a line break is permitted after SP (but before the combining mark?) And they are different for a number of algorithms including those for text boundaries and bidi. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: Printing and Displaying Dependent Vowels
You can't get away with it that easily. If the standard specifies that space, combining mark should be displayed as an isolated combining mark, then it would be conformant for a partial implementation to display this sequence as nothing or as an illegal sequence. But if the system attempts to display the sequence in a meaningful manner, it must do so according to the standard, i.e. not as dotted circle plus combining mark. Are you saying that you'd like to see apps display text according to the correct behaviour for a given script, or not at all? I don't think that would be particularly helpful. And I think it's a good thing the conformance requirements don't attempt to define what not supporting such-and-such characters means at this level of detail. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Printing and Displaying Dependent Vowels
Antoine Leca scripsit: I am sorry John, I should have miss a post of yours. I asked you where it is written, and did not find any answer to this; unless someone consider that all marks, including spacing combining vowels, are (European) diacritics. Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. This is where (p. 187) the remarks about SP and NBSP appear: # Marks as Spacing Characters. By convention, combining marks may be exhibited # in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK # SPACE. This approach might be taken, for example, when referring to the # diacritical mark itself as a mark, rather than using it in its normal way # in text. The use of U+0020 SPACE versus U+00A0 NO-BREAK SPACE affects line # breaking behavior. # # In charts and illustrations in this standard, the combining nature of these # marks is illustrated by applying them to a dotted circle, as shown in the # examples throughout this standard. # # The Unicode Standard separately encodes clones of many common European # diacritical marks as spacing characters. These related characters are # cross-referenced in the character names list. So assuming that combining mark means combinining character rather than non-spacing mark (the term does not appear in the Glossary), it seems that combining vowels should work fine with SP or NBSP. The reference to European diacriticals plainly applies only to the various spacing diacriticals, some of which are grandfathered in by ASCII or Latin-1. -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com In computer science, we stand on each other's feet. --Brian K. Reid
Re: Printing and Displaying Dependent Vowels
On 29/03/2004 08:42, Peter Constable wrote: You can't get away with it that easily. If the standard specifies that space, combining mark should be displayed as an isolated combining mark, then it would be conformant for a partial implementation to display this sequence as nothing or as an illegal sequence. But if the system attempts to display the sequence in a meaningful manner, it must do so according to the standard, i.e. not as dotted circle plus combining mark. Are you saying that you'd like to see apps display text according to the correct behaviour for a given script, or not at all? I would prefer to see the text displayed according to the standard. In this particular case, I would prefer to see the standard fixed rather than the rendering system. But it is a source of great confusion to everyone when a widely used application does something clearly different from what the standard intends, and yet claims conformance even if technically this is correct. There is clearly a widespread need to display a variety of combining marks in isolation, and with no dotted circle. Unicode defines an encoding for this. Uniscribe apparently does not support this encoding. There is something wrong here. It seems, from what Srivas (Avarangal) wrote, to be part of the requirement for correct display of Tamil, and perhaps other Indic languages, to be able to display isolated forms of such characters as U+0BC6. If Uniscribe does not support this, even if it is technically Unicode conformant, Microsoft cannot claim to support Tamil and other languages. I don't think that would be particularly helpful. And I think it's a good thing the conformance requirements don't attempt to define what not supporting such-and-such characters means at this level of detail. I agree, I think. But a claim to support particular scripts or languages surely implies that all characters in that script (or at least in its modern form) are supported. That is not perhaps a Unicode requirement, but at least in the UK a failure here might be a breach of laws on truthful advertising and description of products. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
John Cowan quoted, Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. This is where (p. 187) the remarks about SP and NBSP appear: # Marks as Spacing Characters. By convention, combining marks may be exhibited # in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK # SPACE. This approach might be taken, for example, when referring to the # diacritical mark itself as a mark, rather than using it in its normal way # in text. Note the use of may and might in the quoted text rather than must. The above could be interpreted in part as '... combining marks may be exhibited in (apparent) isolation by applying them to U+0020 SPACE, or they may not.' Such an interpretation might lead people to decide that the approach is up to the renderer. Semantics aside, if the default display appearance of a combining mark in isolation on a certain system is the mark on a dotted circle, then that system should be considered conformant when it displays space+mark as dotted_circle+mark. An observation, FWIW: on the system here, combiners in Indic scripts get the dotted circle, but combining diacritics from the (mostly) Western combining diacritics range don't. Space + U+0327 displays a stand-alone cedilla here; no dotted circle. Best regards, James Kass
Re: Printing and Displaying Dependent Vowels
On 29/03/2004 10:11, [EMAIL PROTECTED] wrote: Antoine Leca scripsit: I am sorry John, I should have miss a post of yours. I asked you where it is written, and did not find any answer to this; unless someone consider that all marks, including spacing combining vowels, are (European) diacritics. Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. This is where (p. 187) the remarks about SP and NBSP appear: # Marks as Spacing Characters. By convention, combining marks may be exhibited # in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK # SPACE. This approach might be taken, for example, when referring to the # diacritical mark itself as a mark, rather than using it in its normal way # in text. The use of U+0020 SPACE versus U+00A0 NO-BREAK SPACE affects line # breaking behavior. These words are equivocal in more ways than one. What does By convention... may be exhibited mean? Does this mean that the sequence SPACE, mark should be rendered as an isolated mark, or does it mean that optionally it may be? Is the convention one which is optional for those encoding texts, or optional for implementers? Are these words intended to be in any way prescriptive, or are they intended merely to be descriptive of what some people have chosen to do? If This approach might be taken, for example, when referring to the diacritical mark itself as a mark, what other approach might be taken as an alternative? The language is altogether far too loose for a standard. The result is the current confusion, according to which people are trying to encode texts according to what they think Unicode expects them to do, and finding that the rendering engines they use do not provide either this or any other way to display what they want to display, and yet claim to conform to Unicode. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
[Original Message] From: Peter Kirk [EMAIL PROTECTED] On 29/03/2004 06:56, John Cowan wrote: Peter Kirk scripsit: Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. You don't actually say so, but you give me the impression that you think NBSP is a fixed-width space. It isn't; it can assume any width greater than zero, just as SPACE can; in particular, when used before a NSM, I would expect it to have the same width as the NSM. Well, as I understand it NBSP is often expected to be a fixed-width space, and it is in many implementations. In fact I think it ought to be, whether or not this is actually specified. But there ought to be a character which is explicitly NOT fixed width to carry NSMs. Also you do say that NBSP must have a width greater than zero, but for some combining marks (those which are not non-spacing, and arguably even some which are) this base character should have zero width. UAX #14 makes a rather definitive statement on this issue, albeit in an obscure place, in Section 3: Introduction. When expanding or compressing inter-word space, only the space marked by U+0020 SPACE and U+3000 IDEOGRAPHIC SPACE are normally subject to compression, and only spaces marked by U+0020 SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters have fixed width. While one can argue as to whether this has anything to do with the effect on the width of NBSP with a combining character following it or not, it is clear that clear that one should not assume that NBSP is treated exactly the same as SPACE except for not breaking a line. Indeed, I would prefer to see NBSP treated as a fixed-width character that would only be affected by letter spacing in all contexts, including when it has an attached combining character. The idea of an explicit character to be used as a combining character base has merit in my opinion, but only if an acceptable standardization of the behavior of combining characters with some other character such as SPACE cannot be achieved so that it would always be expected to produce an isolated combining character. (except when in an intentional show the codes mode)
Re: Printing and Displaying Dependent Vowels
Peter Kirk said: I will say again as I have said before - but the above (and what I snipped) is extra evidence for it - that what is broke ... is the rule that the isolated (generally spacing) form of a combining mark should be formed by SPACE or NBSP followed by the combining mark. This has been the *intent* of the standard since its inception in 1989. There are many good reasons for not using SPACE for this, including default behaviour like inserting line breaks immediately after SPACE. Nope. UAX #14 specifies the following regarding SPACE followed by combining marks: If U+0020 SPACE is used as a base character, it is treated as AL instead of SP. This means that a combining character sequence of this type is treated as a unit for the purposes of line breaking, and this overrides the behavior otherwise of SPACE to be treated as a line break opportunity. Of course UAX #14 only spells out default behavior, but then default behaviour is what was claimed just above. Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. NBSP is not a fixed-width space. I would like to repeat my earlier proposal for a new character ISOLATED COMBINING MARK BASE. This character would have no glyph, and the general properties of a letter. Its spacing would be just as much as required for proper display of the combining mark - which would be zero for combining marks which have their own width. And after 15 years presence in the standard (or its earlier drafts) of the SP + CM recommendation, what makes you think that introduction of a *new* convention using a *new*, special purpose format control character sorta like a space only different, would lead to any better situation in actual practice? Use of such a character would *NOT* resolve the differences regarding how to display such a combination, by the way. I realise that for backward compatibility reasons the old encoding cannot be made illegal. But it can be deprecated, and a note can be added that this sequence may not always be displayed as preferred. This is a recipe for prolonging the confusion and inconsistency in implementations of this feature. You can't get away with it that easily. If the standard specifies that space, combining mark should be displayed as an isolated combining mark, then it would be conformant for a partial implementation to display this sequence as nothing or as an illegal sequence. But if the system attempts to display the sequence in a meaningful manner, it must do so according to the standard, i.e. not as dotted circle plus combining mark. The standard does not *require* this rendering or anything else. For the most part, the Unicode Standard is *NOT* a text rendering standard -- it is a character encoding standard. All kinds of recommendations are put in regarding how to handle one kind or another of rendering problem, precisely so that every implementer doesn't start from scratch to reinvent the wheel here, and so as to provide some basis for people to represent the same text content with the same spellings for complex scripts. There are reasons why such recommendations are found in Chapters 7 (and 5 and 2) of the standard, and are not nailed down with conformance clauses in Chapter 3. The UTC has, over the years, not found it appropriate to try to make normative requirements on the details of text display, except insofar (as in the Bidirectional Algorithm) as they have a direct bearing on the interpretation of the logical content of the text itself. Well, as I understand it NBSP is often expected to be a fixed-width space, and it is in many implementations. In fact I think it ought to be, whether or not this is actually specified. But there ought to be a character which is explicitly NOT fixed width to carry NSMs. There are *two* such characters: SPACE and NBSP. John Cowan noted: Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. and then quoted the relevant text from p. 187. By the way, the first part of that text has survived almost verbatim from Unicode 1.0, where it was printed on p. 40 in what was then Chapter 3, Character Blocks. It was written there as part of the section Generic Diacritical Marks U+0300 -- U+036F, as that was the most obviously a propos point in the text at the time. The text of the standard has since been morphed, restructured, and extensively added to, but some of its quirks result from the fact that the text has a *history*, and it isn't completely rewritten every time a new book is published. The intent of the UTC and the editors has always seemed clear to me on this particular point -- and the fact that the text in question has
Re: Printing and Displaying Dependent Vowels
At 12:19 PM 3/29/2004, Ernest Cline wrote: [Original Message] From: Peter Kirk [EMAIL PROTECTED] On 29/03/2004 06:56, John Cowan wrote: Peter Kirk scripsit: Using NBSP rather than SPACE has several advantages, and has long been specified in Unicode, although not widely implemented. It is less likely to occur accidentally. But it has disadvantages, especially that it will always be a spacing character, whereas for display of isolated Indic vowels no extra spacing is required. You don't actually say so, but you give me the impression that you think NBSP is a fixed-width space. It isn't; it can assume any width greater than zero, just as SPACE can; in particular, when used before a NSM, I would expect it to have the same width as the NSM. Well, as I understand it NBSP is often expected to be a fixed-width space, and it is in many implementations. In fact I think it ought to be, whether or not this is actually specified. But there ought to be a character which is explicitly NOT fixed width to carry NSMs. Also you do say that NBSP must have a width greater than zero, but for some combining marks (those which are not non-spacing, and arguably even some which are) this base character should have zero width. UAX #14 makes a rather definitive statement on this issue, albeit in an obscure place, in Section 3: Introduction. 4.0.1 will amend that section to correct the wrong impression that NBSP is fixed width and to clarify that this statement is not intended to cover any specialized cases, but just ordinary typographical conventions: When expanding or compressing inter-word space according to common typographical practice, only the spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and U+3000 IDEOGRAPHIC SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. When expanding or compressing inter-character space the presence of U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER are always ignored. I'm sorry if the fact that the placement and context of text was not enough to guide the reader. Note that the 'obscure place' was in the introduction (!) of the UAX, where it was a mere note on a subject not actually covered by the UAX (i.e. line layout) that nevertheless forms the context in which linebreaking happens. Next, people will extract normative statements from the book cover. ;-0 Now that this is settled, all can go on discussing the main point: While one can argue as to whether this has anything to do with the effect on the width of NBSP with a combining character following it or not, it is clear that clear that one should not assume that NBSP is treated exactly the same as SPACE except for not breaking a line. Indeed, I would prefer to see NBSP treated as a fixed-width character that would only be affected by letter spacing in all contexts, including when it has an attached combining character. The idea of an explicit character to be used as a combining character base has merit in my opinion, but only if an acceptable standardization of the behavior of combining characters with some other character such as SPACE cannot be achieved so that it would always be expected to produce an isolated combining character. (except when in an intentional show the codes mode)
Re: Printing and Displaying Dependent Vowels
Unicode rightly or wrongly decided to implement partial Grammar at encoding level. Hence, possible solutions to this problem be defined by UC and not leaving to others is get tangled may be the right way to go. 1/ Linear Depandent with dotted circle- as stand alsone 2/ Linear dependent without dotted circle - as stand alone 3/ Repositioned dependent with dotted circle- as stand alone 4/ Repositioned dependent without dotted circle - as stand alone I think the above four need to be defined by UC. Probably the no:1 above (or is it no: 3 above) is already defined and wee can build on this. Srivas - Original Message - From: Peter Jacobi [EMAIL PROTECTED] To: Avarangal [EMAIL PROTECTED]; Peter Constable [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Saturday, March 27, 2004 9:24 PM Subject: RE: Printing and Displaying Dependent Vowels Hi Srivas, Peter Kirk, Peter Constable, List Members Peter Constable wrote: Peter Kirk wrote: Are these dependent on the font, as some have suggested, or are they prescribed by Uniscribe? Do different versions of Uniscribe differ in this respect, as I rather think? At present, I don't know the answer. I know this is something we have intended to support, but I don't get that behaviour on the particular system I'm using at the moment. I will keep it in mind as an issue to review in the next version of our Indic shaping engine. With the help of members of the [EMAIL PROTECTED] mailing list, I can offer some empirical evidence on this whodunnit: Using the Linux version of Abiword, which uses the Pango renderer, both the Code 2000 and the MS Latha font display the vowel signs without the unwanted dotted circle. NBSP and normal SPACE give identical results. For Code 2000 only, the dotted circle or a similiar ersatz glypg (the screenshot is not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and U+0BCC between the two parts. Best Regards, Peter Jacobi -- +++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++ 100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
RE: Printing and Displaying Dependent Vowels
Hi James, List members, James Kass wrote: U+0B82 TAMIL SIGN ANUSVARA is substituted and re-positioned in the compound glyphs of Code2000 for the normal dotted circle in the default glyphs for U+0BCA, U+0BCB, and U+0BCC. This is only expected to appear with a rendering system which does not support OpenType. This is because the default glyphs for these surroundrant vowel signs would never be drawn on the screen. [...] I see. Thinking once more about it, also in the special contexts, where there is a desire to get a rendering of vowel signs without the dotted circle, U+0BCA, U+0BCB, and U+0BCC wouldn't be called for, but their components U+0BC6, U+0BC7, U+0BBE and U+0BD7. So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. This spacing glyph should be a no-contour glyph, perhaps with the same advance width as U+0020. I've not tried this, but it might just work. The hard part (I assume), is not only to avoid the dotted circle, but make the glyps behave like normal spacing characters, so that e.g. when one of them is surrounded by parentheses, no extra or missing spacing is be seen. So U+0BC0, U+0BC1 and U+0BC2 should acquire the width of SPACE, wheras the other vowel signs should use their glyph's width. Regards, Peter Jacobi -- +++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++ 100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
Re: Printing and Displaying Dependent Vowels
John Hudson [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. This spacing glyph should be a no-contour glyph, perhaps with the same advance width as U+0020. I've not tried this, but it might just work. It should work: Uniscribe inserts the U+25CC glyph that is in the font, so this could be something other than an actual dotted circle. Another option would be to map the dotted circle to a non-contour spacing glyph in one of the discretionary OpenType Layout features such as salt, which would allow users of apps supporting that feature (currently only InDesign ME, so far as I know) to choose whether or not to display the circle. John Hudson If someone wants this, isn't it possible to put a specific lookup in the font so that any dependant vowel following a space character renders as a spacing (stand-alone) dependant vowel? Surely a specific lookup should overide it being displayed on a dotted circle by default. - Chris
Re: Printing and Displaying Dependent Vowels
C J Fynn wrote: If someone wants this, isn't it possible to put a specific lookup in the font so that any dependant vowel following a space character renders as a spacing (stand-alone) dependant vowel? Surely a specific lookup should overide it being displayed on a dotted circle by default. Not necessarily. Applications or layout engines may insert the dotted circle character on the fly during rendering in what they consider invalid sequences. Clearly space+mark is not an invalid sequence according to Unicode, but there may still be some apps that handle this incorrectly. Also, space characters have layout behaviours that do not always make them an ideal base for combining marks, e.g. being swallowed at the end of lines. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: Printing and Displaying Dependent Vowels
C J Fynn responded to John Hudson, If someone wants this, isn't it possible to put a specific lookup in the font so that any dependant vowel following a space character renders as a spacing (stand-alone) dependant vowel? Surely a specific lookup should overide it being displayed on a dotted circle by default. Has anyone tried this? Would the space glyph U+0020 be expected to trigger a look-up in the Tamil GSUB table as if it were a Tamil base character? The reason that I haven't tried this is because, in the OpenType look-ups here for the re-ordrant vowel signs of Tamil, the vowel sign is INPUT1 and the base letter is INPUT2. This is because the rendering engine has already re-ordered the character string before this look-up is performed. It doesn't seem likely that a rendering engine would re-order a vowel sign before a space. It could be tested both ways, I suppose... This seems to be OT for this list, but, here it is, and it will probably keep popping up from time to time unless clarified. I can only make inferences and suppositions based on observation of the behavior and reasoning behind the behavior of the rendering engine used here, Microsoft's Uniscribe. People who know all about this do follow this list, so they're free to offer corrections. inference and supposition Uniscribe inserts the dotted circle into the display for complex scripts in order to give a visual indication of an encoding or spelling error. This seems quite useful whether text is being entered or merely displayed. Allowing dependent vowels to follow the space character breaks this utility. In other words, somebody could write a Tamil word in a web page starting with the E-vowel-sign (U+0BC6), and there'd be no indication that this is improper, either to the author or the visitor. Someone searching for that word on that page wouldn't find it, and so on. Maybe some kind of spell-checker should be used by the original author, but, there seems to be no way to assure that spell-checking was performed by the author of any web page one visits. It is the very appearance of that dotted circle unexpectedly in our texts which alerts us to the fact that we have made a mistake. That dotted circle jumps out of the page into our vision exclaiming, Hey, I'm wrong! I'm so wrong, don't even bother running your spell-checker on me! This is the basis upon which Uniscribe renders text which includes dependent vowel signs, not just for Tamil, but for the other so-called complex scripts, too. The dotted circle plus the matra is the default rendering for combining marks *in isolation*. Uniscribe seems to rightly treat a vowel sign following a space as being in isolation, and, how could it do otherwise? What goes for the space character also seems to go for any other character which is not a valid character *within the Unicode range*. Again, how could it be otherwise. If the first character in a string isn't a Tamil character, there's no reason for the renderer to consult the Tamil OpenType tables in a font. If it did, my gosh, imagine all the pointless look-ups just to display a page which was, for example, mostly Chinese with a few Tamil phrases. end of supposition and inference The good folks engineering the Uniscribe have been most responsive to all kinds of special requests and pointers related to complex script shaping. I think asking them to break the existing mechanism in order to support vowel signs on spaces asks too much, though. People generating texts for educational purposes will always have special needs. So, they'll always need to make special effort to get special effects. Workarounds concerning the original question have already been suggested. If this is treated as a Unicode issue rather than a display issue, then one solution would be for someone to propose a new character, (back on topic a little bit) COMBINING DOTTED CIRCLE FOR COMBINING MARKS. Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine could be changed to insert this new character. Then, these updated rendering engines could be distributed and font developers could add the new characters to fonts and distribute updated fonts. This might just take a while, but it wouldn't be too hard to find examples of the character in actual text use to accompany the proposal... If it ain't broke, don't fix it. So, is it 'broke'? Best regards, James Kass
RE: Printing and Displaying Dependent Vowels
Hi Srivas, Peter Kirk, Peter Constable, List Members Peter Constable wrote: Peter Kirk wrote: Are these dependent on the font, as some have suggested, or are they prescribed by Uniscribe? Do different versions of Uniscribe differ in this respect, as I rather think? At present, I don't know the answer. I know this is something we have intended to support, but I don't get that behaviour on the particular system I'm using at the moment. I will keep it in mind as an issue to review in the next version of our Indic shaping engine. With the help of members of the [EMAIL PROTECTED] mailing list, I can offer some empirical evidence on this whodunnit: Using the Linux version of Abiword, which uses the Pango renderer, both the Code 2000 and the MS Latha font display the vowel signs without the unwanted dotted circle. NBSP and normal SPACE give identical results. For Code 2000 only, the dotted circle or a similiar ersatz glypg (the screenshot is not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and U+0BCC between the two parts. Best Regards, Peter Jacobi -- +++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++ 100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
RE: Printing and Displaying Dependent Vowels
Peter Jacobi wrote, Using the Linux version of Abiword, which uses the Pango renderer, both the Code 2000 and the MS Latha font display the vowel signs without the unwanted dotted circle. NBSP and normal SPACE give identical results. For Code 2000 only, the dotted circle or a similiar ersatz glypg (the screenshot is not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and U+0BCC between the two parts. U+0B82 TAMIL SIGN ANUSVARA is substituted and re-positioned in the compound glyphs of Code2000 for the normal dotted circle in the default glyphs for U+0BCA, U+0BCB, and U+0BCC. This is only expected to appear with a rendering system which does not support OpenType. This is because the default glyphs for these surroundrant vowel signs would never be drawn on the screen. Rather, the expected approach from the rendering engine is to use the component glyphs for these three vowel signs, such as U+0BC7 for the left part of U+0BCA, and U+0BBE for the right-side portion. If the presence of these default glyphs in Code2000 is making problems, they can be adjusted. (Just because I expect a rendering engine to take a certain approach, doesn't mean that a rendering engine will take that approach!) On Windows, as others have noted, the rendering engine (Uniscribe) inserts the dotted circle glyph (if the font has a dotted circle glyph) into the display. The dotted circle character is not inserted into the text, of course. So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. This spacing glyph should be a no-contour glyph, perhaps with the same advance width as U+0020. I've not tried this, but it might just work. Another approach is to simply use a non-OpenType Unicode TrueType font for Tamil. The dotted circles don't seem to ever appear unless the font-in-use has OpenType tables covering the script-in-use. Best regards, James Kass
Re: Printing and Displaying Dependent Vowels
[EMAIL PROTECTED] wrote: So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. This spacing glyph should be a no-contour glyph, perhaps with the same advance width as U+0020. I've not tried this, but it might just work. It should work: Uniscribe inserts the U+25CC glyph that is in the font, so this could be something other than an actual dotted circle. Another option would be to map the dotted circle to a non-contour spacing glyph in one of the discretionary OpenType Layout features such as salt, which would allow users of apps supporting that feature (currently only InDesign ME, so far as I know) to choose whether or not to display the circle. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: Printing and Displaying Dependent Vowels
At 01:55 +0100 2004-03-26, Chris Jacobs wrote: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). Yes. If this sequence is not displayed correctly, complain to your software or font vendor, but it should be. Here I disagree. A font does not have to support each and every combining sequence. If he needs fonts which support combining sequences starting with a space char he surely should look for those, but that is no reason to complain about those fonts that dont. Someone makingg an Indic font should consider this particular concern. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Printing and Displaying Dependent Vowels
On 25/03/2004 13:33, [EMAIL PROTECTED] wrote: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). If this sequence is not displayed correctly, complain to your software or font vendor, but it should be. There are two standards-conforming way of doing these. One is to precede the dependent vowel with a space character; the other is to precede it with a non-breaking space. The latter method is preferable, especially if the standalone dependent vowel is likely to occur as part of a word rather than in isolation, to avoid unwanted line breaks. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
At 02:39 -0800 2004-03-26, Peter Kirk wrote: There are two standards-conforming way of doing these. One is to precede the dependent vowel with a space character; the other is to precede it with a non-breaking space. The latter method is preferable, especially if the standalone dependent vowel is likely to occur as part of a word rather than in isolation, to avoid unwanted line breaks. Of course, one could always display it with a dotted circle as well. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Printing and Displaying Dependent Vowels
On 26/03/2004 03:04, Michael Everson wrote: At 02:39 -0800 2004-03-26, Peter Kirk wrote: There are two standards-conforming way of doing these. One is to precede the dependent vowel with a space character; the other is to precede it with a non-breaking space. The latter method is preferable, especially if the standalone dependent vowel is likely to occur as part of a word rather than in isolation, to avoid unwanted line breaks. Of course, one could always display it with a dotted circle as well. Except that is precisely what Srivas (Avarangal) asked NOT do do. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Printing and Displaying Dependent Vowels
Avarangal asked about the requirements by educational establishments is the ability to print and display dependent vowels without dotted circles. John Cowan answered: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). Does it fullfil the need (i.e., displaying _without_ dotted circles). If so, where is it written? Antoine
Re: Printing and Displaying Dependent Vowels
From: Antoine Leca [EMAIL PROTECTED] Avarangal asked about the requirements by educational establishments is the ability to print and display dependent vowels without dotted circles. John Cowan answered: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). Does it fullfil the need (i.e., displaying _without_ dotted circles). If so, where is it written? Space is a base character, then it combines with the next diacritic with which it creates a default grapheme cluster which should be interpreted as if it was a single character identity. It is NOT defective. Note that NBSP can be used as well instead of SPACE, if you need that SPACE keeps its role of a keyword separator. To display the dotted circle, you can use a defective combining sequence starting by the diacritic or vowel sign: for example use a control followed by the isolated diacritic or vowel sign, or code the diacritic or vowel sign at the beginning of a parsed plain-text element (in XML, HTML, XHTML or SGML, this is normally delimited after character entities have been parsed and resolved). You may also code explicitly the dotted circle symbol followed by the diacritic or vowel sign to create a non defective combining sequence starting by that base symbol. Now how would you interpret differently SPACE+diacritic or SPACE+vowel sign? If you display a dotted circle there, then you'll display two separate glyphs for a single grapheme cluster, and this is not intended by the normal Unicode character model. It may be useful for debugging purpose or as a help tool to compose text, but not to render an actual text out of an input context, and this should require special code in the renderer to disable that feature in fonts or renderers. Note that some fonts may incorrectly display SPACE+diacritic or SPACE+vowel sign with a dotted circle after a space. This is not a issue with Unicode but with the font or with the renderer.
Re: Printing and Displaying Dependent Vowels
Sorry to answer my own post. Avarangal asked about the requirements by educational establishments is the ability to print and display dependent vowels without dotted circles. John Cowan answered: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). Does it fullfil the need (i.e., displaying _without_ dotted circles). If so, where is it written? It seems many are thinking about the section in 2.10, titled Spacing Clones of European Diacritical Marks. I read it as applying to diacritical marks (and perhaps only European ones, but the distinction looks like blurry to me). Beginning of 2.10 makes quite clear that diacritics is only one class (the most important, though) of combining characters. Indic dependent vowels are another. Also, something which is probably very relevant to Avarangal, fact is the implementation from a major vendor in the field, Microsoft Uniscribe, does retain the dotted circle (if present in the font; if not, you would probably get the .missing glyph instead). Antoine
Re: Printing and Displaying Dependent Vowels
From: Antoine Leca [EMAIL PROTECTED] It seems many are thinking about the section in 2.10, titled Spacing Clones of European Diacritical Marks. I read it as applying to diacritical marks (and perhaps only European ones, but the distinction looks like blurry to me). Beginning of 2.10 makes quite clear that diacritics is only one class (the most important, though) of combining characters. Indic dependent vowels are another. I answered to you by saying diacritics or vowel signs, but yes it also includes dependant vowels when they are used to create what is more generally called default grapheme clusters which is a larger set than the set of combining sequences (made of a base character followed by combining characters). Indic scripts are a bit unique by the fact that they have a syllabic structure decomposed into separate letters with a base consonnant and a combining (this is not the proper term for Unicode) vowel modifier after it. This differs from European alphabets (Latin, Greek, Cyrillic) or even from some Asian or African syllabaries (notably Hiragana/Katakana) where these grapheme clusters are (almost always) combining sequences are coded with a base character and diacritics. But if one wants to show the isolated form of of a Indic vowel, there's a orthographic convention to use a sort of vowel order, i.e. a default consonnant, in a way which also happens in the Arabic and Hebrew scripts for the default base vowel coded with a base letter. Indic scripts offer several variations here because there are also half-forms for these vowels, which are not meant to be used isolately but to complement a preceding syllable in the same grapheme cluster. It's hard to say which one of these forms an author would like to present for these isolated dependant vowels because, as their name suggest, they are normally dependant of another preceding consonnant. So the best way to represent these isolated dependant vowels would be to encode an empty/null base consonnant to force the presentation of the dependant vowel. An indic text would more probably use one base consonnant and present all dependant vowels with that consonnant. Trying to represent the isolated vowel creates a theorical grapheme cluster, which is normally not part of the normal orthograph of Indic-written words where these vowels are used. Another solution would be to code these Indic dependant vowels after the Indic letter A (for example after U+0905 DEVANAGARI LETTER A), because this letter represents also the default vowel implied by all other consonnants. A sample with Devanagari could be: (U+0905 LETTER A, U+093E VOWEL SIGN AA) which should normally be presented like the precomposed: (U+0906 LETTER AA), but which incorrectly displays the dotted circle with the Mangal font. So an author has to make some notational compromizes here. But still, I do think that using NBSP as this empty/null base consonnant before the dependant vowel will create the intended Unicode default grapheme cluster. Then it's up to the font or renderer to show the NBSP+vowel cluster properly, without the dotted circle, but it's not a problem of Unicode itself. With NBSP, you get this result: (U+00A0 NBSP, U+093E VOWEL SIGN AA) which often shows a square, probably because many fonts don't have a glyph for the isolated form of the vowel sign. It is true that this looks like a problem because the dotted circle should not appear here after showing the NBSP character (because it creates a single grapheme cluster that should be recognized as such, even if this cluster contains two combining sequences as it contains two base characters), but the problem is in the Mangal font itself (or in the UniScribe engine in Windows), not in Unicode. In fact you could as well wonder how to represent an isolated form of other Indic combining characters like an anusvara or candrabindu, but here also Unicode specifies that they should be coded after a space or preferably a NBSP: (NBSP), (NBSP, ANUSVARA), (NBSP, CANDRABINDU), (NBSP, VISARGA) If dotted circles appear before the symbol, or if the symbol is shown with a square box for a missing glyph, it's not the fault of Unicode. So the best way would be to use a normal Indic base character, such as in: (LETTER A), (LETTER A, ANUSVARA), (LETTER A, CANDRABINDU), (LETTER A, VISARGA) where the sequences look more familiar with the normal Devanagari orthographic and calligraphic rendering rules implemented in usual fonts. Also, something which is probably very relevant to Avarangal, fact is the implementation from a major vendor in the field, Microsoft Uniscribe, does retain the dotted circle (if present in the font; if not, you would probably get the .missing glyph instead). I'm not sure that UniScribe is the cause of this problem. There just appears to exist no GSUB rule in some fonts like Mangal to handle the case of NBSP followed by a Indic vowel sign or combining character, to map them to a single glyph without
Re: Printing and Displaying Dependent Vowels
At end of my response to Antoine Leca, I suggested something which may merit some comments: What is clear is that there's no way to enable these features explicitly in plain-text files, if there's no standard format control in Unicode to enable these OpenType font features. May be these could become new characters to allocate in plane 14? What I mean here is that there's currently no defined way to convey in plain text files the intended rendering features that are now common in OpenType fonts and engines. What we currently have is the script identification and the language identification with language tags in plane 14, but languages tags reaveal much useless, unlike the feature tags that we currently cannot encode. Is there some pending proposal to encode a new set of FEATURE TAGs, in the same spirit as LANGUAGE TAGs in plane 14? Or to use a new leading character in the LANGUAGE TAGs block to mark the begining of a feature tag instead of a language tag (this would require only 1 codepoint allocation, for example E0002)? It would find an immediate application within OpenType renderers, which could be instructed to set or unset some rendering features found today in common fonts, and that could be transported in plain text files, rather than only in rich-text file formats like XML-based documents or Word documents or CSS stylesheets (if such possibility gets added and standardized into CSS).
Re: Printing and Displaying Dependent Vowels
Avarangal wrote: display dependent vowels without dotted circles. Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. Microsoft's Uniscribe allows you to display a dependent vowel with the following sequence (to be followed precisely): U+0020 U+200D U+0Bxx. U+00A0 does not work. Neither does U+200D. Also, this should be the first characters in the string passed to the Windows API: if there are some characters before, they will not trigger the special behaviour, and you will end with the circle. Please note that trying to display something a bit more complex, like U+0020 U+200D U+0BC6 U+0BD7 or U+0020 U+200D U+0BBF U+0B82, will fail. [ I am sorry for the misleading words I had in earlier answers to others. It costs me some time to figure out exactly what does this tool. ] Hope this helps, Antoine
RE: Printing and Displaying Dependent Vowels
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy At end of my response to Antoine Leca, I suggested something which may merit some comments: Does that imply that it might also *not* merit comments? What is clear is that there's no way to enable these features explicitly in plain-text files, if there's no standard format control in Unicode to enable these OpenType font features. May be these could become new characters to allocate in plane 14? This sounds suspiciously like courtyard codes. (Wonders to self: Are Philippe Verdy and William Overington aliases for the same person? :-) What I mean here is that there's currently no defined way to convey in plain text files the intended rendering features that are now common in OpenType fonts and engines. Nor should there be, any more than there should be ways in plain text to indicate typeface, point size, style, etc. There is a class of representations for such information called rich text, and such representation has been and will very likely continue to be beyond the scope of plain text. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Printing and Displaying Dependent Vowels
On Friday, March 26, 2004 7:12 PM, Philippe Verdy va escriure: Indic scripts are a bit unique by the fact that they have a syllabic structure decomposed into separate letters with a base consonnant and a combining (this is not the proper term for Unicode) vowel modifier after it. This differs from European alphabets (Latin, Greek, Cyrillic) or even from some Asian or African syllabaries (notably Hiragana/Katakana) where these grapheme clusters are (almost always) combining sequences are coded with a base character and diacritics. Where exactly is the difference with say IPA? And with Vocalized Perso-Arabic? (And it is not all Indic scripts: Thai and Lao behave differently) Indic scripts offer several variations here because there are also half-forms for these vowels, Please, define half form for vowel. This is new to me. A sample with Devanagari could be: (U+0905 LETTER A, U+093E VOWEL SIGN AA) which should normally be presented like the precomposed: (U+0906 LETTER AA), but which incorrectly displays the dotted circle with the Mangal font. Mangal has nothing to do with this. What you are seeing and critizing is Uniscribe's implementation, fruit of a compromise between performances and dealing with special/inusual cases. This case is not clearly specified by the Devanagari Open Type specifications, but it appears that the default behaviour (considering U+093E as dependent vowel shown in isolation, and rendering it with the added circle) has been elected here by the implemention. In my own implementation of the same specifications, I consider this is a perfectly correct and useful sequence (used in India to teach the sillabary), so I do not insert the circle and as a result (with Mangal) it is shown as you expect. So an author has to make some notational compromizes here. But still, I do think that using NBSP as this empty/null base consonnant before the dependant vowel will create the intended Unicode default grapheme cluster. About NBSP: I hope Paul will read my other post (direct to Avarangal) and will enhance Uniscribe on this respect, allowing NBSP to behave the same as SPO on this respect. I am not sure here (one should look at Unicode 2.0), but I seem to record the behaviour with NBSP has been added around 3.0, and since Uniscribe has been designed against 2.0... Then it's up to the font or renderer to show the NBSP+vowel cluster properly, without the dotted circle, but it's not a problem of Unicode itself. OFF-TOPIC I am reading the Unicode list for quite some time (and sorry Philippe, but I speak about time previous to when you came in). I do not know why, but every now and then, there are comments from regulars that says This is not a defect of Unicode itself, even when nobody is even thinking such a thing. On a psychological point of view, this is quite interesting. ;-) /OFF-TOPIC If dotted circles appear before the symbol, or if the symbol is shown with a square box for a missing glyph, it's not the fault of Unicode. Again! ;-) Also, something which is probably very relevant to Avarangal, fact is the implementation from a major vendor in the field, Microsoft Uniscribe, does retain the dotted circle (if present in the font; if not, you would probably get the .missing glyph instead). I'm not sure that UniScribe is the cause of this problem. I am pretty sure it is! Because if he were using Freetype, he would not have any problem to display the standalone glyph. :-D Something more complex would be to have some way to display *various* representation of the dependent vowels; in Tamil U+0BC1 and U+0BC2, which come to mind, show too much variation, there is not likely to have that one glyph in the font. But for the well-known Burmese AA U+102C or in Traditional Malayalam U+0D41 and U+0D42 this might be an open question. Here again, using Freetype this is perhaps doable, but with some higher-level engine it would be much more complex. If the need for it arises, probably the option would be to define a user-accessible OpenType feature (of alternative kind). There just appears to exist no GSUB rule in some fonts like Mangal to handle the case of NBSP followed by a Indic vowel sign or combining character, Well, we are quite away from the original subject, but anyway... You are missing something important about the Indic OpenType specifications. Besides, in fact before, the substitutions and after that the positioning, which are encoded as TTO tables GSUB and GPOS, there are two stages called analysing and then reordering. Analysing deals mainly with splicing the stream into clusters. Reordering then does a number of operations, and this is this step that will insert the dotted circle. Or will not, depending how it is programmed. I'm not an expert of UniScribe programming, but there may exist some Indic features in Indic fonts, which can be enabled in UniScribe to change the rendering behavior by including some additional (optional) GSUB/GPOS
Re: Printing and Displaying Dependent Vowels
Philippe Verdy va escriure: Space is a base character, then it combines with the next diacritic with which it creates a default grapheme cluster which should be interpreted as if it was a single character identity. Agreed so far for diacritics. Agreed also for non-spacing dependent vowels like U+0BC0. Agreed for the special exceptions like u+0BBE. I disagree for U+093F or U+0BBF (Mc not included in Other_Grapheme_Extend, there is an allowed break before it), until there is something I missed here. It is NOT defective. I do not understand. I did say anything implying that, did I? I just remarked that I was not able to fetch in the text of the standard some words to require from vendors and implementers (like I am) solid base to make them modify their engines to provide special exceptions to deal with the combination U+0020/U+00A0 then U+093F. And no, this is not the same as displaying a diacritic, because it should be re-ordered, rather than being a spacing representation of diacritics. Now how would you interpret differently SPACE+diacritic or SPACE+vowel sign? See above. If you display a dotted circle there, then you'll display two separate glyphs for a single grapheme cluster, and this is not intended by the normal Unicode character model. ? How do you believe anybody will show say u+0063 u+0300? Which font have this as a single glyph? Furthermore, a single character like U+0916 (Devanagari KHA) is very often rendered with two glyphs (namely, Half-Kha then the glyph also used for the AA-matra, U+093E). Unicode does not enter into knowing how does this stuff is handled. Antoine
Re: Printing and Displaying Dependent Vowels
From: Peter Constable [EMAIL PROTECTED] What is clear is that there's no way to enable these features explicitly in plain-text files, if there's no standard format control in Unicode to enable these OpenType font features. May be these could become new characters to allocate in plane 14? This sounds suspiciously like courtyard codes. (Wonders to self: Are Philippe Verdy and William Overington aliases for the same person? :-) I can ensure you that this is not the same person (look at the country of origin detected in the IP address if you are still not convinced). What I mean here is that there's currently no defined way to convey in plain text files the intended rendering features that are now common in OpenType fonts and engines. Nor should there be, any more than there should be ways in plain text to indicate typeface, point size, style, etc. There is a class of representations for such information called rich text, and such representation has been and will very likely continue to be beyond the scope of plain text. Note that I was not speeking strictly about style, but about the way to mark the text to allow or disallow some script features. This remains something optional for the renderer, and this can be ignored as well without breaking the encoded text. What I mean here is a set of format controls which help to the interpretation of the text by renderers. Yes of course we could define all these at the rich text format level (for example in CSS if it has such functions to select alternate rendering options). But when I look at what OpenType features perform (I don't mean the content of the associated extra GSUB/GPOS tables which is not what I mean here) it looks like they are designed to be used for particular languages or scripts, in a way that can be used across multiple font designs. So a font may implement a feature and another may not. This looks very similar to a sort of meta-tagging within the middle of the text to add semantics to it, which can then be used by various renderers and fonts to adapt its style on the fly. This was already the case when language tags were added to Unicode. And OpenType can now include language-specific features which can be triggered by the presence of these tags. But in reality, most font features implemented today are not performed at the language level but with a finer grained level after the language level). And there's no similar way to tag the text with these features. Please don't consider this was a proposal, just a question about the feasibility of applications that need to use such script-specific features, as part of their regular text processing, without even needing it at the graphic level (when I look at some OpentType features, their 4-character labels may become part of the text-level processing, without even needing any glyph processing in the application using these tags. So reread my question (this was not a RFE) like this: are there semantics in these feature tags (yes, just the 4-letters IDs of these tags, not the content of the GSUB/GPOS tables to which they may be mapped in a specific font) which would need a way to represent them as format controls within the plain-text stream? I think that such semantic exists for these, which may be used or left unused in some presentation by a renderer, but may have its own application for plain-text handling (without any glyph processing). I suggested that they may be encoded in plane 14, possibly among language tags, but this was just a suggestion if they ever need to be encoded somewhere.
RE: Printing and Displaying Dependent Vowels
This sounds suspiciously like courtyard codes. (Wonders to self: Are Philippe Verdy and William Overington aliases for the same person? :-) I can ensure you that this is not the same person (look at the country of origin detected in the IP address if you are still not convinced). Well, the originating address that is reported when the message arrives at the list server doesn't guarantee that that's really where it came from, as we all know. You haven't convinced me yet. :-) Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Printing and Displaying Dependent Vowels
On 26/03/2004 12:02, Peter Constable wrote: ... This sounds suspiciously like courtyard codes. (Wonders to self: Are Philippe Verdy and William Overington aliases for the same person? :-) ... Peter, I notice that you have found time while looking at this thread to criticise Philippe's ramblings and speculate about his identity. Perhaps you can use some of your time more profitably in answering the questions about Uniscribe and its treatment of sequences like space, diacritic and NBSP, diacritic. Are these dependent on the font, as some have suggested, or are they prescribed by Uniscribe? Do different versions of Uniscribe differ in this respect, as I rather think? -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: Printing and Displaying Dependent Vowels
From: Peter Kirk [mailto:[EMAIL PROTECTED] Peter, I notice that you have found time while looking at this thread to criticise Philippe's ramblings and speculate about his identity. Yes, he and I have having fun offline debating his identity :-) Perhaps you can use some of your time more profitably Hmmm... I perceive you don't approve of how I've been using my time. You're right: I should spend less time replying on this list and more time on the projects in my yearly objectives ;-) in answering the questions about Uniscribe and its treatment of sequences like space, diacritic and NBSP, diacritic... I have been corresponding with the original inquirer offline to find out more precisely what the issues and requirements of the users he's representing are. It's more of a priority for me to discover that than to discuss details regarding Uniscribe behaviour I don't actually know about for certain (and that I can't change in the immediate future). Are these dependent on the font, as some have suggested, or are they prescribed by Uniscribe? Do different versions of Uniscribe differ in this respect, as I rather think? At present, I don't know the answer. I know this is something we have intended to support, but I don't get that behaviour on the particular system I'm using at the moment. I will keep it in mind as an issue to review in the next version of our Indic shaping engine. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Printing and Displaying Dependent Vowels
We are in the process of updating Tamil keyboard drivers and one of the requirements by educational establishments is the ability to print and display dependent vowels without dotted circles. Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. Srivas
Re: Printing and Displaying Dependent Vowels
Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). If this sequence is not displayed correctly, complain to your software or font vendor, but it should be. -- John Cowan http://www.ccil.org/~cowan[EMAIL PROTECTED] You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! `Tis a Jute (Finnegans Wake 16.5)
Re: Printing and Displaying Dependent Vowels
- Original Message - From: [EMAIL PROTECTED] To: Avarangal [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, March 25, 2004 10:33 PM Subject: Re: Printing and Displaying Dependent Vowels Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. The standards-conforming way to do so is to precede the dependent vowel with a space character (U+0020). Yes. If this sequence is not displayed correctly, complain to your software or font vendor, but it should be. Here I disagree. A font does not have to support each and every combining sequence. If he needs fonts which support combining sequences starting with a space char he surely should look for those, but that is no reason to complain about those fonts that dont.
Re: Printing and Displaying Dependent Vowels
Chris Jacobs chris dot jacobs at freeler dot nl wrote: If this sequence is not displayed correctly, complain to your software or font vendor, but it should be. Here I disagree. A font does not have to support each and every combining sequence. If he needs fonts which support combining sequences starting with a space char he surely should look for those, but that is no reason to complain about those fonts that dont. What John meant was, don't complain to Unicode if this doesn't work, because this is the standard way of doing it. A font does not have to support everything, but it's not Unicode's fault if one doesn't. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/