RE: [indic] Re: Top Nukta... and double nuktas ... and more nuktas
I have been trying to track down a definition of Canonical Combining Class 7: Nuktas (and of the other combining classes): can anyone point me in its direction? A clear definition of the Canonical Combining Clases, would presumably form the basis of an evaulation of the viability of a spacing-headline-height nukta as a separate Unicode code point. The document I posted previously, which I attach again for reference, lists printed documents in which various placements for (I hardly dare say) nuktas are used, including more than one use by more than one author, in both India and Bangladesh, of the double nukta on Ja. The document was prepared for Dr Anthony P. Stone, Project Leader, ISO/TC46/SC2/WG12 Transliteration of Indic scripts, by Abu Jar M Akkas. Judging by this document the dot is found, in the case of Perso-Arabic transcription below, to the right, or aligned with the headline. Only in the first case is it non-spacing. In one case, both below, and to the lower right are found in the same dictionary, which suggests fairly strongly that there is no real difference between those two positions, one a spacing, the other a non-spacing, form of the dot. While the details of the schemes vary slightly, they are united in the principle that the dot does the trick: in other words, the simplest representation is of a Bengali Character, with a dot. There are personal, practical and typographic preferences for where the dot should be, but these are not basic. Solaiman, I was not suggesting that the placement of the nukta should be controlled in any way, nor that it is not useful, placed at headline/matra height, nor that it has not been used in books, but merely that there doesn't seem to be much of a case for making a top nukta an additional letter in Unicode, when you can place the dot which is represented by the current code point anywhere you want in relation to grpahemes in fancy text by constructing a font with ligatures in that form. As it is, the Nukta is listed as having General Category Mn, which is a Mark, Non-Spacing. It has the Canonical Combining Class 7: Nuktas. The Top Nukta you have identified definitely has the appearance of being General Category Mc, Mark Spacing Combining. Nevertheless, the documentation also suggests that the combining classes are not to be taken literally as applied to fancy text, whcih is what your scan is: an example of real-world, fancy, text. Michael, when you say that a second nukta should be stacked on top of a first, do you mean, in principle, in in a plain text representation only - i.e., one in which, symptomatically, no conjunct forms at all would be found? That would seem fair enough. The only form of the double-nuktaed Ja that I have seen does have the nuktas side-by-side, and was prepared by Linotype. I presume this was not done without some research, taking it back to the Bose instance, probably. However, this refleects fancy text, obviously. Typographically, the priority with nuktas is to place them so that they remain distinguishable at small sizes when other elements are combined within the same grapheme. Stacking ( I presume this implies one above the other, both remaining visible) in this instance is a bit counter-productive, since it inevitably results either in an increase in line spacing, or the danger that a further stacked element will crash into an element of the line below, becoming illegible. This would apply in both plain and fancy text. Mike _ From: Omi Azad [mailto:[EMAIL PROTECTED] Sent: 05 August 2003 19:27 To: Solaiman Karim Cc: Paul Nelson (TYPOGRAPHY); Kenneth Whistler; [EMAIL PROTECTED] What will be the result man? Solaiman Karim wrote: hello all I don't know if I misunderstood or not but someone said it is useless to add in unicode. Someone is saying somthing which he doesn't even know what is it he is talkign about. Are you guys saying is just made up what I show to you that it is not only Arbic it is also use in english to translate it some other language such as French so and so. Please let me know if I misunderstood you guys and it seems to me that Bangla should be limited. Solaiman - Original Message - From: Paul Nelson (TYPOGRAPHY) mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] To: Kenneth Whistler mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] Cc: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] Sent: Monday, August 04, 2003 7:16 PM Subject: [indic] Re: Top Nukta... and double nuktas ... and more nuktas Sorry, I guess I totally misunderstood what Omi was stating then. It seems there are no less than 8 different ways to transliterate this stuff. Paul -Original Message- From: Kenneth Whistler [ mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ] Sent: Monday, August 04, 2003 4:14 PM To: Paul Nelson (TYPOGRAPHY) Cc: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ; [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Subject: Re: [indic]
RE: [indic] Re: Top Nukta... and double nuktas ... and more nuktas
Dear Kenneth I stand corrected, apologies Mike -Original Message- From: Kenneth Whistler [mailto:[EMAIL PROTECTED] Sent: 05 August 2003 21:15 To: Mike Meir Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] One small correction to what Mike Meir stated: The Unicode position that nukta modifies the sound is therefore a simplification. But in any event, the nukta, however it is represented, indicates a distinction, usually a change of sound, not what that distinction might be. It is not the Unicode position that a nukta modifies the sound. This is neither a requirement of the Unicode Standard nor something that the UTC has stated. A combining nukta, as for any combining mark in the standard, is a character which graphically modifies a base *character*. What the nature of that modification *signifies* is entirely a matter for the users of the relevant orthography to determine. Indeed, the standard mantra that the editors put in the names list for Indic nuktas is simply: for extending the alphabet to new letters What those new letters are used for -- whether they signify modified sounds and whether such modification is uniformly applied when such letters are used for different languages -- is up to the users of those letters. --Ken
Some Char. to Glyph Statistics, Pan/Single Font
The problem with your glyph statistics is that they are based on mould counts employed by the Monotype hot metal typesetters. The Monotype system was capable of extensive kerning, and therefore many glyphs were constructed from the elements provided by the moulds at the time of composition. The Monotype list of elements therefore comprises: Full characters which areeither basic or couldnot be composed satisfactorily by the system for whatever reason. These might properly be described as glyphs Elements which were combined either with the first set, or with one another, to create glyphs, or approximations to glyphs at the time of casting. These cannot really be considered to be glyphs, as such. However, if one allows that these elements are glyphs, the real number of glyphs employed by Monotype was limited by the matrix case: before 1962 to 225 sorts, and subsequently to 272 sorts. Although additional sorts might be available, they could only be used by substitution with another sort prior to any actual typesetting. More recent Monotype code pages for Bengali seem to be around 450elements, which are combined with floating elements to create text. To date all Indic script composition has been pretty much limited by technology. Taking Bengali as an example, Figgins, around 1826, employed 370 sorts, many of which are kerning versions of other sorts, allowing the composition either of consonant-vowel combinations or approximations to complex conjuncts which were insufficiently common to warrant the creation of separate punches. But again, a number of his sorts exist only to allow the incorporation of combinations which could not be produced by the technology of the time. Our recent revision of the Linotype Bengali code page extends to a font of some 980 elements. 136 of these are differently spaced floating elements, such vowel signs and chandrabindus, which haveno meaning separate from the main characters to which they may be attached, and which would be omitted from an opentype version.It also includes 146 characters whichduplicate the Unicode encoded Bengali characters, which is required for current technological reasons - Microsoft's Office XP does not allow the display of Unicode encode Bengali characters in the font, or at the size which is expected. So the "real" number of elements is 698.(I may also add that we have had to produce alternative versions of the same fonts in which non-spacing elements actually space quiteconsiderably, because ofthe very strange behaviour of Microsoft's Internet Explorer 5.5, so the glyph count islarger than the 980 - another case of technology determining counts). Turning to Devanagari, our researches indicate that the totalnumber of script units (In Unicode terms, combinations of consonants, halants, vowel signs and other signs), excluding the Unicode charactersin the range 0951 to 0954, in use is around the 5550 mark. It is actually greater than this, since there are a number of characters relating to Sanskrit sandhi for which we do not have any conjunct-vowel statistics. In principle, all these should be regarded asglyphs, thoughfew fonts are likely to implement them all (the slaves in this context needing to be human beings, since the issue of the spacing and modification of a smaller number of base elements to produce all these glyphs is an aesthetic rather than a mechanical problem) I have also not included in the count the many variant forms of glyphs which occur as result of differences in formulation for particular combinations. (I have also excluded the rather large number of glyphs which are to be found in the Mangal font supplied by Microsoft, but which seem to be there purely because of a rather strange and literal interpretation of the Unicode Devanagari shaping rules, on the grounds that these glyphs exist only in the font, and would never be used in text.)