RE: [indic] Re: Top Nukta... and double nuktas ... and more nuktas

2003-08-14 Thread Mike Meir
I have been trying to track down a definition of Canonical Combining
Class 7: Nuktas (and of the other combining classes): can anyone point
me in its direction? A clear definition of the Canonical Combining
Clases, would presumably form the basis of an evaulation of the
viability of a spacing-headline-height nukta as a separate Unicode code
point.
 
The document I posted previously,  which I attach again for reference,
lists printed documents in which various placements for (I hardly dare
say) nuktas are used, including more than one use by more than one
author, in both India and Bangladesh, of the double nukta on Ja. The
document was prepared for Dr Anthony P. Stone, Project Leader,
ISO/TC46/SC2/WG12 Transliteration of Indic scripts, by Abu Jar M Akkas.
 
Judging by this document the dot is found, in the case of Perso-Arabic
transcription below, to the right, or aligned with the headline. Only in
the first case is it non-spacing. In one case, both below, and to the
lower right are found in the same dictionary, which suggests fairly
strongly that there is no real difference between those two positions,
one a spacing, the other a non-spacing, form of the dot.
 
While the details of the schemes vary slightly, they are united in the
principle that the dot does the trick: in other words, the simplest
representation is of a Bengali Character, with a dot. There are
personal, practical and typographic preferences for where the dot should
be, but these are not basic. 
 
Solaiman, I was not suggesting that the placement of the nukta should be
controlled in any way, nor that it is not useful, placed at
headline/matra height, nor that it has not been used in books, but
merely that there doesn't seem to be much of a case for making a top
nukta an additional letter in Unicode, when you can place the dot which
is represented by the current code point anywhere you want in relation
to grpahemes in fancy text by constructing a font with ligatures in that
form.
 
As it is, the Nukta is listed as having  General Category Mn, which is a
Mark, Non-Spacing. It has the Canonical Combining Class 7: Nuktas. The
Top Nukta you have identified definitely has the appearance of being
General Category Mc, Mark Spacing Combining. Nevertheless, the
documentation also suggests that the combining classes are not to be
taken literally as applied to fancy text, whcih is what your scan is: an
example of real-world, fancy, text. 
 
Michael, when you say that a second nukta should be stacked on top of a
first, do you mean, in principle, in in a plain text representation only
- i.e., one in which, symptomatically, no conjunct forms at all would be
found? That would seem fair enough.
 
The only form of the double-nuktaed Ja that I have seen does have the
nuktas side-by-side, and was prepared by Linotype. I presume this was
not done without some research, taking it back to the Bose instance,
probably. However, this refleects fancy text, obviously.
 
Typographically, the priority with nuktas is to place them so that they
remain distinguishable at small sizes when other elements are combined
within the same grapheme. Stacking ( I presume this implies one above
the other, both remaining visible) in this instance is a bit
counter-productive, since it inevitably results either in an increase in
line spacing, or the danger that a further stacked element will crash
into an element of the line below, becoming illegible. This would apply
in both plain and fancy text. 
 
 
Mike
 
 

  _  

From: Omi Azad [mailto:[EMAIL PROTECTED] 
Sent: 05 August 2003 19:27
To: Solaiman Karim
Cc: Paul Nelson (TYPOGRAPHY); Kenneth Whistler; [EMAIL PROTECTED]


What will be the result man?



Solaiman Karim wrote:


hello all

   I don't know if I misunderstood or not but someone said it is useless
to

add in unicode. Someone is saying somthing which he doesn't even know
what

is it he is talkign about. Are you guys saying is just made up what I
show

to you that it is not only Arbic it is also use in english to translate
it

some other language such as French so and so. Please let me know if I

misunderstood you guys and it seems to me that Bangla should be limited.



Solaiman

- Original Message -

From: Paul Nelson (TYPOGRAPHY)   mailto:[EMAIL PROTECTED]
[EMAIL PROTECTED]

To: Kenneth Whistler   mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]

Cc:   mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]

Sent: Monday, August 04, 2003 7:16 PM

Subject: [indic] Re: Top Nukta... and double nuktas ... and more nuktas





  

Sorry,



I guess I totally misunderstood what Omi was stating then.



It seems there are no less than 8 different ways to transliterate this

stuff.



Paul



-Original Message-

From: Kenneth Whistler [ mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
]

Sent: Monday, August 04, 2003 4:14 PM

To: Paul Nelson (TYPOGRAPHY)

Cc:  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ;  [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] 

Subject: Re: [indic] 

RE: [indic] Re: Top Nukta... and double nuktas ... and more nuktas

2003-08-05 Thread Mike Meir
Dear Kenneth

I stand corrected, apologies

Mike

 
 -Original Message-
 From: Kenneth Whistler [mailto:[EMAIL PROTECTED] 
 Sent: 05 August 2003 21:15
 To: Mike Meir
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 
 One small correction to what Mike Meir stated:
 
  The Unicode position that nukta modifies the sound is therefore a 
  simplification. But in any event, the nukta, however it is 
  represented, indicates a distinction, usually a change of 
 sound, not 
  what that distinction might be.
 
 It is not the Unicode position that a nukta modifies the sound.
 This is neither a requirement of the Unicode Standard nor 
 something that the UTC has stated.
 
 A combining nukta, as for any combining mark in the standard, 
 is a character which graphically modifies a base *character*.
 What the nature of that modification *signifies* is entirely 
 a matter for the users of the relevant orthography to determine.
 
 Indeed, the standard mantra that the editors put in the names 
 list for Indic nuktas is simply:
 
   for extending the alphabet to new letters
   
 What those new letters are used for -- whether they signify 
 modified sounds and whether such modification is uniformly 
 applied when such letters are used for different languages -- 
 is up to the users of those letters.
 
 --Ken
 
 
 



Some Char. to Glyph Statistics, Pan/Single Font

2001-05-31 Thread Mike Meir



The problem with your glyph statistics is that they 
are based on mould counts employed by the Monotype hot metal 
typesetters.
The Monotype system was capable of extensive 
kerning, and therefore many glyphs were constructed from the elements provided 
by the moulds at the time of composition. The Monotype list of elements 
therefore comprises:

  Full characters which areeither basic or 
  couldnot be composed satisfactorily by the system for whatever reason. 
  These might properly be described as glyphs
  Elements which were combined either with the first 
  set, or with one another, to create glyphs, or approximations to glyphs at the 
  time of casting. These cannot really be considered to be glyphs, as 
  such.
However, if one allows that these elements are 
glyphs, the real number of glyphs employed by Monotype was limited by the matrix 
case: before 1962 to 225 sorts, and subsequently to 272 sorts. Although 
additional sorts might be available, they could only be used by substitution 
with another sort prior to any actual typesetting.

More recent Monotype code pages for Bengali seem to 
be around 450elements, which are combined with floating elements to create 
text.

To date all Indic script composition has been 
pretty much limited by technology. Taking Bengali as an example, Figgins, around 
1826, employed 370 sorts, many of which are kerning versions of other sorts, 
allowing the composition either of consonant-vowel combinations or 
approximations to complex conjuncts which were insufficiently common to warrant 
the creation of separate punches. But again, a number of his sorts exist only to 
allow the incorporation of combinations which could not be produced by the 
technology of the time.

Our recent revision of the Linotype Bengali code 
page extends to a font of some 980 elements. 136 of these are differently spaced 
floating elements, such vowel signs and chandrabindus, which haveno 
meaning separate from the main characters to which they may be attached, and 
which would be omitted from an opentype version.It also includes 146 
characters whichduplicate the Unicode encoded Bengali characters, which is 
required for current technological reasons - Microsoft's Office XP does not 
allow the display of Unicode encode Bengali characters in the font, or at the 
size which is expected. So the "real" number of elements is 698.(I may 
also add that we have had to produce alternative versions of the same fonts in 
which non-spacing elements actually space quiteconsiderably, because 
ofthe very strange behaviour of Microsoft's Internet Explorer 5.5, so the 
glyph count islarger than the 980 - another case of technology determining 
counts).

Turning to Devanagari, our researches indicate that 
the totalnumber of script units (In Unicode terms, combinations of 
consonants, halants, vowel signs and other signs), excluding the Unicode 
charactersin the range 0951 to 0954, in use is around the 5550 mark. It is 
actually greater than this, since there are a number of characters relating to 
Sanskrit sandhi for which we do not have any conjunct-vowel 
statistics.

In principle, all these should be regarded 
asglyphs, thoughfew fonts are likely to implement them all (the 
slaves in this context needing to be human beings, since the issue of the 
spacing and modification of a smaller number of base elements to produce all 
these glyphs is an aesthetic rather than a mechanical problem)

I have also not included in the count the many 
variant forms of glyphs which occur as result of differences in formulation for 
particular combinations.

(I have also excluded the rather large number of 
glyphs which are to be found in the Mangal font supplied by Microsoft, but which 
seem to be there purely because of a rather strange and literal interpretation 
of the Unicode Devanagari shaping rules, on the grounds that these glyphs exist 
only in the font, and would never be used in text.)