subject:"Ligatures"

Re: Arabic ligatures

2015-08-25 Thread Doug Ewell

Shawn Landden shawnlandden at tuta dot io wrote:

 Arabic ligitures have been deprecated[1], despite a need for both
 ligitures and non-ligature versions of the same glyphs.

The only Arabic character that is deprecated in the standard is U+0673
ARABIC LETTER ALEF WITH WAVY HAMZA BELOW. The Wikipedia article cited as
[1] does not claim otherwise.

 Amiri uses contextual alternatives for الله.  These ligatures are
 used in religious documents[2] via pictures, which seems to be what
 the current Unicode standard recommends.

What is your source for this?

 Unlike the presentation forms, there is case for these phrases and
 formulas to be available both in ligature and non-ligature form.

All Arabic letters and combinations can be rendered in ligated or
non-ligated forms as needed using some combination of ZWJ and ZWNJ. See
TUS 8.0, Section 9.2.

 These ligatures should be non-deprecated and subject to canonical
 decomposition, rather than compatibility decomposition.

Section 9.2 (page 386 ff.) explains the Arabic Presentation Forms-A
block (U+FB50—U+FDFF) in greater detail.

--
Doug Ewell | http://ewellic.org | Thornton, CO 

Re: interaction of Arabic ligatures with vowel marks

2014-01-08 Thread Naena Guru

Please see this page: (for IE, use v 2010 and up)
http://lovatasinhala.com/

The font is almost all ligatures. If you copy and inspect the text, you'll
notice that it is simple romanized Singhala. I am currently in Sri Lanka
demonstrating this. The people at president's office and one of the
powerful ministers have seen it. They are elated that after all, Singhala,
the most complex of 'Abigudas' is much like a Western European language and
amazingly computer and user friendly. This is contrary to how it was
portrayed to them by local academics and technocrats causing the poor
country unnecessary debt.

The ideas of Abiguda and Complex fade away if a font is made fully
understanding Unicode's description of ligatures and how they are
implemented by OpenType (now OpenFont). I believe that Arabic and Hebrew
can follow this model so that typing the script is simplified for users
without compromising orthography.


On Wed, Jun 12, 2013 at 8:39 AM, Stephan Stiller
stephan.stil...@gmail.comwrote:

 Hi,

 How is the placement of vowel marks around ligatures handled in Arabic
 text?

 Does anyone have good pointers on this topic?

 My guess is that this does not come up often (just like the topic of
 pointing for handwritten Hebrew), as vowel marks are mostly not added in
 ordinary text. Nonetheless, any text making heavy use of ligatures will
 from time to time need to add vowel marks for a foreign name or as a
 reading aid, and (as many of us know) the Quran is traditionally printed
 with vowel marks.

 I'm also wondering how font designers normally handle this. I think there
 are analogous questions for various ligature-heavy abugidas, so there must
 be an existing body of knowledge. There should be better answers than
 squeeze the vowels around the consonant clusters in whatever way seems
 most intuitive. Do traditional printing presses use extra metal types for
 such glyph clusters, or do they manually add and adjust the positioning of
 vowels?

 Stephan



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: interaction of Arabic ligatures with vowel marks

2013-06-13 Thread Christopher Fynn

Andreas

Have you tried Mihail Bayaryn's Siddhanta font - (or his earlier
Chandas and Uttara fonts)?

http://svayambhava.org/index.php/en/fonts

This font supports many more vertical ligatures for Sanskrit than most
other Devanagri fonts.

- Chris

On 13/06/2013, Andreas Prilop apri...@freenet.de wrote:
 On Wed, 12 Jun 2013, Richard Wordingham wrote:

 While the same principle applies to Indic scripts (and indeed, to the
 Roman alphabet), there is only one Indic mark I can think of for which
 the issue of component association arises, and that is the nukta.

 Sanskrit requires candrabindu U+0901 inside (or on top of)
 two La U+0932.
 See
  http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0138.html

 Instead of
 http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/att-0135/image001.png
 I would like to see the two La on top of each other.

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Khaled Hosny

On Tue, Jun 11, 2013 at 08:09:31PM -0700, Stephan Stiller wrote:
 Hi,
 
 How is the placement of vowel marks around ligatures handled in Arabic text?

OpenType has special support for placing non combining marks over
ligatures (a subset of the general support for controlling the placement
of non-combining marks); it is entirely handled at text rendering level,
no difference in input whether the bases will be ligated or not.

No idea about other font technologies.

Regards,
Khaled

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Richard Wordingham

On Tue, 11 Jun 2013 20:09:31 -0700
Stephan Stiller stephan.stil...@gmail.com wrote:

 Hi,
 
 How is the placement of vowel marks around ligatures handled in
 Arabic text?

For OpenType the clue lies in the three types of GPOS
(http://www.microsoft.com/typography/otspec/gpos.htm) lookup for marks
- mark to base, mark to mark, and mark to ligature.  As base characters
get ligated, the shaper keeps track of which marks were associated
with which component of the ligature, and separate vowel positions are
recorded in the font for each component.

There is more complicated logic to prevent various undesirable
behaviour, such as marks belong to different components interacting
via mark to mark position lookups or ligature lookups.  The idea is to
relieve the font designer of the need to think about such issues.  I
haven't found any public Microsoft documentation on these lookups, and
for open source I can only suggest studying the source code and its
comments - HarfBuzz files hb-ot-layout-gdef-table.hh,
hb-ot-layout-gpos-table.hh and hb-ot-layout-gsubgpos-private.hh are
particularly relevant.

Obviously this will not work if the character sequence is defined in
terms of presentation forms that are already ligatures.

 I'm also wondering how font designers normally handle this. I think 
 there are analogous questions for various ligature-heavy abugidas, so 
 there must be an existing body of knowledge.

While the same principle applies to Indic scripts (and indeed, to the
Roman alphabet), there is only one Indic mark I can think of for which
the issue of component association arises, and that is the nukta.  That
could be handled by the ligation process instead, so I would not rely
on there being a large body of Indic-specific knowledge on the issues.
OpenType has special handling for consonant clusters with visible
internal halant.

Richard.

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Stephan Stiller


Thank you, خالد and Richard.


there is only one Indic mark I can think of for which
the issue of component association arises, and that is the nukta

That is good to know, given the complexity of the Indic scripts.

Other thoughts:

 * One could simply break up Arabic ligatures in need of harakat. If
   someone knows whether or to what extent this is done in otherwise
   ligated text, I will be curious to know.
 * Just now it is occurring to me that {the fact that the shadda is
   often used in ordinary writing} should make it easier to find data
   on all this, unless gemination blocks ligation in certain ways.
 * If there are conventions on the relative placement of harakat in
   general (I mean: not necessarily print), I will be curious to know.
   Some letter/consonant clusters have quite vertical an appearance,
   and any type foundry will need to be familiar with common practice
   (to the extent there is any), no matter what medium or technology is
   used in the end to create a typeface.


Stephan

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Andreas Prilop

On Tue, 11 Jun 2013, Stephan Stiller wrote:

 How is the placement of vowel marks around ligatures
 handled in Arabic text?

 I'm also wondering how font designers normally handle this.

Older fonts in older operating systems (like Windows XP)
often failed. See
 http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/0101.html
 http://www.unicode.org/mail-arch/unicode-ml/y2008-m05/thread.html#139

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Andreas Prilop

On Wed, 12 Jun 2013, Richard Wordingham wrote:

 While the same principle applies to Indic scripts (and indeed, to the
 Roman alphabet), there is only one Indic mark I can think of for which
 the issue of component association arises, and that is the nukta.

Sanskrit requires candrabindu U+0901 inside (or on top of)
two La U+0932.
See
 http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0138.html

Instead of
 http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/att-0135/image001.png
I would like to see the two La on top of each other.

interaction of Arabic ligatures with vowel marks

2013-06-11 Thread Stephan Stiller


Hi,

How is the placement of vowel marks around ligatures handled in Arabic text?

Does anyone have good pointers on this topic?

My guess is that this does not come up often (just like the topic of 
pointing for handwritten Hebrew), as vowel marks are mostly not added in 
ordinary text. Nonetheless, any text making heavy use of ligatures will 
from time to time need to add vowel marks for a foreign name or as a 
reading aid, and (as many of us know) the Quran is traditionally printed 
with vowel marks.


I'm also wondering how font designers normally handle this. I think 
there are analogous questions for various ligature-heavy abugidas, so 
there must be an existing body of knowledge. There should be better 
answers than squeeze the vowels around the consonant clusters in 
whatever way seems most intuitive. Do traditional printing presses use 
extra metal types for such glyph clusters, or do they manually add and 
adjust the positioning of vowels?


Stephan

Ligatures

2004-11-27 Thread Flarn

Can you please give me a list of all the ligatures available? Thanks!
- Michael Norton (a.k.a. Flarn)
E-mail address: [EMAIL PROTECTED]

Ligatures

2004-11-27 Thread Flarn

Can you please give me a list of all the ligatures available? Thanks!
- Michael Norton (a.k.a. Flarn)
E-mail address: [EMAIL PROTECTED]

RE: Ligatures

2004-11-27 Thread Addison Phillips [wM]

I suppose one could construct such a list, but using them to encode text is a 
Very Bad Idea. It is better, for example, to encode the fi ligature as the 
letter f followed by the letter i and let rendering software, fonts, and so 
forth provide the ligature. Encoding ligatures directly will make your life 
harder. For example, most spell checkers will fail the word final when it is 
spelled U+FB01 U+006E U+0061 U+006C (that is, fi-ligature followed by nal). 
If you are constructing a font, there are lots of good links on the Unicode 
website which include information on how to handle ligation without having a 
code point for every combination of characters you ligate.

I haven't time to write a good quality response right now, but no doubt someone 
will jump in with 37 pages of text about the small amount I've already written 
(please excuse my sarcasm, which isn't directed at you).

PS Flarn isn't the reference I think it is, is it?

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Behalf Of Flarn
 Sent: 20041127 15:46
 To: [EMAIL PROTECTED]
 Subject: Ligatures
 
 
 Can you please give me a list of all the ligatures available? Thanks!
 
 - Michael Norton (a.k.a. Flarn)
 E-mail address: [EMAIL PROTECTED]

Re: Ligatures

2004-11-27 Thread Doug Ewell

Hopefully not adding 37 pages...

Michael Norton (a.k.a. Flarn) flarn2003 at megapipe dot net wrote:

 Can you please give me a list of all the ligatures available? Thanks!

If by available you mean separately encoded in precomposed form, you
could start by checking the online, definitive Unicode data file:

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

Upon searching this file, you would find 507 characters with the word
LIGATURE in their name.

However, I'm guessing that what you are after is Latin-script ligatures,
so it probably won't help much that 477 of the 507 ligatures are
Arabic presentation forms.  Of the remaining 30, six are Armenian, six
are Cyrillic, five are Hebrew, and two are actually not ligatures at
all, but paired combining marks intended to show that the two letters
under them form a single sound.

That leaves 11 Latin ligatures encoded in Unicode.  The two IJ
characters, U+0132 () and U+0133 (), aren't really ligatures, so they
don't count.  If we count the OE characters, U+0152 () and U+0153 (),
as ligatures, then we also have to count the AE characters as well,
U+00C6 () and U+00E6 ().

That leaves U+FB00 through U+FB06 (  ).

The problem, as Addison pointed out, is that if you use these forms in
text, most searching and sorting operations will fail to recognize them.
It is better to use the regular letters and let higher-end software
ligate them as appropriate.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Ligatures

2004-11-27 Thread Asmus Freytag

At 07:44 PM 11/27/2004, Doug Ewell wrote:
The problem, as Addison pointed out, is that if you use these forms in
text, most searching and sorting operations will fail to recognize them.
That's not the only problem. In some languages other ligatures, such as
fj might be as commonly needed as fi - the set is (intentionally) not
complete and you should not build your text or technology around them.
It is better to use the regular letters and let higher-end software
ligate them as appropriate.
Note that for many languages you need to use ZWNJ to prohibit ligatures
where disallowed by the orthography. Without that information even
fairly high end software cannot correctly ligate these languages.
There are some (in)famous word pairs that are spelled identically,
except for differences in where the ligatures can go. No software
can figure this out - that information must come from the author.
Getting sorting and searching operations to consistently ignore
the ZWNJ is something that has a higher chance of success, compared
to making such software handle long lists of ligatures.
A./

the length in semantic meaning for ligatures

2004-03-02 Thread Leon Zhu

What is the string length in semantic meaning for a ligature? For example, 
when we impose a length(str) function to them?

Are all the ligatures using the same rule? Or different according to 
different scipts of Arabic, Latin, Devanagari, Syriac, etc?

What else if the ligature itself has its own code point, for example, Latin 
Ligatures: U+FB00 to U+FB06?

thanks,

_
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail

Re: Ligatures with diacritics (was: Ancient Northwest Semitic Script)

2003-12-31 Thread John Hudson

At 01:13 PM 12/30/2003, Peter Kirk wrote:

But if it were, this ligature would be very interesting and problematic 
because it is a ligature between a base character and a diacritic. This is 
not a problem if it is always used, in a particular font, but it is 
problematic if the ligature is optional. This is because ZWNJ and ZWJ 
cannot be used between base characters and diacritics because they break 
the combining sequence. We came across this problem before with Hebrew 
script, but in a rather different (and less ambiguous) context, that of 
the need for a ligature between meteg and hataf vowels.
We should probably be careful to distinguish between ligation explicitly 
requested in text using ZWJ -- which is very much a minority case -- and 
ligation that occurs as either default rendering or as the result of a 
higher level font feature request. There are lots of ligatures of bases and 
marks in lots of fonts: ligation is one possible method of rendering any 
sequence of base plus mark(s), and in some cases if preferable to dynamic 
mark positioning.

OpenType etc fonts are currently able to make these distinctions 
consistently, with the mechanisms John described above; but these 
mechanisms fail if there is a need for the ligature to be optional, as 
ZWNJ and ZWJ cannot be used.
Again, there is the question of whether an optional ligation needs to be 
requested or inhibited in plain text, using these control characters, or 
can be handled at a higher level using markup. In OT rendering, only 
lookups in the Required Ligatures rlig feature cannot be turned off, so 
one would put optional ligatures in the Standard Ligatures liga feature 
if you wanted them on by default, or in the Discretionary Ligatures dlig 
feature if you wanted them off by default.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]
What was venerated as style  was nothing more than
an imperfection or flaw that revealed the guilty hand.
   - Orhan Pamuk, _My name is red_

Re: Ligatures with diacritics

2003-12-31 Thread Peter Kirk

On 30/12/2003 15:44, Chris Jacobs wrote:

I wonder if there are other, better defined, cases of ligatures between 
base characters and diacritics in other scripts, i.e. cases where there 
is an optional alternative to base character plus diacritic which does 
not look like the base character plus the diacritic. 
   

Devangari? 

Syllabe + virama + ZWJ -- consonant.

Note that the ZWJ is _after_ the virama.



 

Interesting. Is this actually valid at the end of a string? Would 
syllable, virama, ZWJ as an isolated string be rendered differently 
from syllable, virama? But it strikes me that this arrangement, 
however sensible within its own writing system, is a distortion of the 
regular rules for ZWJ.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Ligatures with diacritics

2003-12-31 Thread Chris Jacobs

See http://www.unicode.org/versions/Unicode4.0.0/ chapter 9

 Interesting. Is this actually valid at the end of a string?

Yes. Figure 9-6 is an example.

 Would
 syllable, virama, ZWJ as an isolated string be rendered differently
 from syllable, virama?

I don't know. syllable, virama ZWJ is rendered differently from syllable,
virama, ZWNJ
But I don't know which of both is the default.

If it is not at the end of a string then the default is to try to include
yet some more in the ligature, ZWJ or ZWNJ prevents this.

 But it strikes me that this arrangement,
 however sensible within its own writing system, is a distortion of the
 regular rules for ZWJ.

Ligatures with diacritics (was: Ancient Northwest Semitic Script)

2003-12-30 Thread Peter Kirk

On 30/12/2003 11:44, John Hudson wrote:

At 11:15 AM 12/30/2003, Peter Kirk wrote:

Even if it were verified, it isn't a good case for encoding a 
separate character *equivalent* to a combination of two existing 
characters: that's a glyph variant ligature.


Actually, I don't think so. The separate character was not formed by 
merging the dot into the letter, rather the distinction was made in a 
different way.


In modern digital font development, ligation refers to the mechanism 
of display, not the visual appearance, which is largely irrelevant. A 
ligature is any glyph that represents two or more characters, 
typically arrived at by a ligation lookup. If I wanted a special sin 
glyph *equivalent* to the character sequence shin, sindot, I would 
ligate the two characters to that single glyph, either directly

shin sindot - sin

or via a two-stage stylistic variant lookup associated with a 
different typographic feature

shin sindot - shin_sindot
and then
shin_sindot - sin

I understand this, and, as I answered separately, I don't think this is 
the appopriate mechanism in this case as the suggested ligature is not 
fully equivalent to the sequence.

But if it were, this ligature would be very interesting and problematic 
because it is a ligature between a base character and a diacritic. This 
is not a problem if it is always used, in a particular font, but it is 
problematic if the ligature is optional. This is because ZWNJ and ZWJ 
cannot be used between base characters and diacritics because they break 
the combining sequence. We came across this problem before with Hebrew 
script, but in a rather different (and less ambiguous) context, that of 
the need for a ligature between meteg and hataf vowels.

I wonder if there are other, better defined, cases of ligatures between 
base characters and diacritics in other scripts, i.e. cases where there 
is an optional alternative to base character plus diacritic which does 
not look like the base character plus the diacritic. Candidates like ø 
as an alternative for ö are ruled out because they are already 
separately encoded. I have certainly seen glyphs rather like U+0255 used 
for c cedilla. In the light of recent discussions, I can easily imagine 
a script or style like Sutterlin having a special ligated form for u 
umlaut, but that this ligature must not be used, rather two dots should 
be written above the letter as in normal Latin script, in the name Saül 
in which the dots represent a diaeresis rather than an umlaut.

OpenType etc fonts are currently able to make these distinctions 
consistently, with the mechanisms John described above; but these 
mechanisms fail if there is a need for the ligature to be optional, as 
ZWNJ and ZWJ cannot be used.

Are there any real examples where this might be necessary?

As this is a more general issue, I am coying it back to the main Unicode 
list.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Ligatures with diacritics (was: Ancient Northwest Semitic Script)

2003-12-30 Thread Chris Jacobs

 I wonder if there are other, better defined, cases of ligatures between 
 base characters and diacritics in other scripts, i.e. cases where there 
 is an optional alternative to base character plus diacritic which does 
 not look like the base character plus the diacritic. 

Devangari? 

Syllabe + virama + ZWJ -- consonant.

Note that the ZWJ is _after_ the virama.

RE: Faulty ligatures in Adobe PhotoShop

2003-08-28 Thread Kent Karlsson


Doug Ewell wrote:
...
  My copy of Photoshop 7 has an interesting image in its (HTML format)
  help file, page 1_16_4_13.html on Using ligatures and old style
  numerals. It shows three examples of Type with Ligatures option
  unselected and selected: ct, fi and fh.
 
  The bad part of it is that the ligated characters shown (in the
  sencond and third examples) seem to include a long s instead of an
  f...  ty_06.gif attached for reference.
 
 There is no fh ligature in Unicode, 

No, but is is perfectly permissible to ligate f and h anyway, just like you
can (or should) ligate f and j, and g and j (if the glyphs would overlap).

 so Photoshop may have  been trying
 to substitute the closest available ligature to the one you wanted
 (which is wrong, of course).
 
 Substituting an unligated i (U+017F + U+0069) for fi (U+0066 
 + U+0069)
 makes no sense at all.  If the current font doesn't contain an 
 ligature (U+FB01), Photoshop should just leave the combination alone.

U+FB01 is a compatibility character that is best avoided to use at all. Formation of
of an f and i ligature should not depend on if the character U+FB01 is supported
or not (though it is likely to be supported if f and i are ligated).

/kent k

Re: Faulty ligatures in Adobe PhotoShop

2003-08-27 Thread Eric Muller







Doug Ewell wrote:

  Anto'nio Martins-Tuva'lkin antonio at tuvalkin dot web dot pt wrote:

  
  
The bad part of it is that the ligated characters shown (in the
sencond and third examples) seem to include a long "s" instead of an
"f"...  ty_06.gif attached for reference.
  

Thanks for the report, Ill forward to the Photoshop guys. By the way,
the font is apparently Adobe Caslon Pro.

  
Substituting an unligated i (U+017F + U+0069) for fi (U+0066 + U+0069)
makes no sense at all.  If the current font doesn't contain an 
ligature (U+FB01), Photoshop should just leave the combination alone.

More likely, the image was created in Illustrator or some such, and the
glyph selected manually by the author. I did not check explicitly, but
I am ready to bet a whole lot that the font does the correct thing.

Eric.

Faulty ligatures in Adobe PhotoShop

2003-08-26 Thread Anto'nio Martins-Tuva'lkin

My copy of Photoshop 7 has an interesting image in its (HTML format)
help file, page 1_16_4_13.html on Using ligatures and old style
numerals. It shows three examples of «Type with Ligatures option
unselected and selected»: ct, fi and fh.

The bad part of it is that the ligated characters shown (in the sencond
and third examples) seem to include a long s instead of an f...
ty_06.gif attached for reference.

I note that Adobe Photoshop has OTOH quite deep and (apparently) well
designed support for some relatively complex font manipulations, as f.i.
East Asian width and composing oddities.

--   .
António MARTINS-Tuválkin,   |  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 934 821 700 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |attachment: ty_06.gif

Re: Faulty ligatures in Adobe PhotoShop

2003-08-26 Thread Doug Ewell

Anto'nio Martins-Tuva'lkin antonio at tuvalkin dot web dot pt wrote:

 My copy of Photoshop 7 has an interesting image in its (HTML format)
 help file, page 1_16_4_13.html on Using ligatures and old style
 numerals. It shows three examples of Type with Ligatures option
 unselected and selected: ct, fi and fh.

 The bad part of it is that the ligated characters shown (in the
 sencond and third examples) seem to include a long s instead of an
 f...  ty_06.gif attached for reference.

There is no fh ligature in Unicode, so Photoshop may have been trying
to substitute the closest available ligature to the one you wanted
(which is wrong, of course).

Substituting an unligated i (U+017F + U+0069) for fi (U+0066 + U+0069)
makes no sense at all.  If the current font doesn't contain an 
ligature (U+FB01), Photoshop should just leave the combination alone.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Faulty ligatures in Adobe PhotoShop

2003-08-26 Thread John Hudson

At 02:59 AM 8/26/2003, Anto'nio Martins-Tuva'lkin wrote:

My copy of Photoshop 7 has an interesting image in its (HTML format)
help file, page 1_16_4_13.html on Using ligatures and old style
numerals. It shows three examples of «Type with Ligatures option
unselected and selected»: ct, fi and fh.
The bad part of it is that the ligated characters shown (in the sencond
and third examples) seem to include a long s instead of an f...
ty_06.gif attached for reference.
Whoever made the image probably made a mistake; either that or the font 
used has faulty lookups. Photoshop 7 uses OpenType glyph substitution, so 
what you are seeing is not character mapping but glyph-space processing.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]
You need a good operator to make type. If it were a
DIY affair the caster would only run for about five
minutes before the DIYer burned his butt off.
  - Jim Rimmer

Re: Accented ij ligatures (and yery)

2003-07-30 Thread Anto'nio Martins-Tuva'lkin

On 2003.07.07, 00:25, Peter Kirk [EMAIL PROTECTED] wrote:

 Maybe originally U+044B (cyrillic y, yery) was two separate
 letters,

It sure it (though I should provide some references to back this up? Hm,
later...)

 but it is certainly considered and used as one letter in Cyrillic
 languages today.  Encoding it as two letters would be about as
 sensible as insisting that w should be encoded as two u's or that i
 should be encoded as dotless i plus combining dot.

Well, that was precisely my point when asking how much dutch ij (as in
rijk, not as in bijectie) is an analogous case.

 Note that yery is also sometimes written with an acute accent
 centred over the two elements, to indicate stress.

Indeed, in (at least, Russian) dictionaries and schooll books. It can
also recieve an umlaut in Maryan (precomposed as U+04F9), again center
over the enseble of both elements.

--   .
António MARTINS-Tuválkin|  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 934 821 700 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish andAzeri, was: Accented ij ligatures)

2003-07-15 Thread Addison Phillips [wM]

Phillipe wrote:

I hae tried several times to do it. It does not work: you may
effectively remove some tables your don't need, but trying
to extract just the normalizer is a real nightmare. I tried it
in the past, and abondonned: too tricky to maintain, and I
retried it recently (one month ago, from its CVS source) and
this was even worse than the first time.
webMethods includes the ICU normalizer in a couple of our products. The 
code for one of these products requires JDK 1.2.2, so, since I had to 
compile ICU anyway, I took the time to figure out the dependencies and 
build only what I needed.

The list of classes required for the normalizer is actually quite small. 
Of the 1.3MB ICU4j.jar, only 400K are required for the normalizer to 
operate correctly. Source changes required. I will gladly send a 
complete list of classes to anyone who would like it. It took me a day 
to do the work (it took longer to test it than to build it).

Adding the normalizer to the JDK itself would also not be a difficult 
thing for Sun to do: that's because a version of the normalizer is 
already in the JDK, but private.

I will admit that it used to be quite difficult, back in the ICU 1.x 
days, to separate out the normalizer, but I've done that too (for 
reasons I shan't enumerate). I had to modify some source code to make it 
work, but that was mostly because I needed JDK 1.1.x. That JAR file is 
even smaller, at 161K. Building updated data tables is actually easier 
with the old source code...

In any event, you really ought to try the newer versions of ICU4J out. 
They are a lot easier to work with. And a light version isn't that 
hard to create, if that's what you want.

Best Regards,

Addison

--
Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.
+1 408.962.5487  mailto:[EMAIL PROTECTED]
---
Internationalization is an architecture. It is not a feature.
Chair, W3C I18N WG Web Services Task Force
http://www.w3.org/International/ws

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish andAzeri, was: Accented ij ligatures)

2003-07-15 Thread Addison Phillips [wM]

Phillipe wrote:

I hae tried several times to do it. It does not work: you may
effectively remove some tables your don't need, but trying
to extract just the normalizer is a real nightmare. I tried it
in the past, and abondonned: too tricky to maintain, and I
retried it recently (one month ago, from its CVS source) and
this was even worse than the first time.
webMethods includes the ICU normalizer in a couple of our products. The 
code for one of these products requires JDK 1.2.2, so, since I had to 
compile ICU anyway, I took the time to figure out the dependencies and 
build only what I needed.

The list of classes required for the normalizer is actually quite small. 
Of the 1.3MB ICU4j.jar, only 400K are required for the normalizer to 
operate correctly. Source changes are not required. I will gladly send a 
complete list of classes to anyone who would like it. It took me a day 
to do the work (it took longer to test it than to build it).

Adding the normalizer to the JDK itself would also not be a difficult 
thing for Sun to do: that's because a version of the normalizer is 
already in the JDK, but private.

I will admit that it used to be quite difficult, back in the ICU 1.x 
days, to separate out the normalizer, but I've done that too (for 
reasons I shan't enumerate). I had to modify some source code to make it 
work, but that was mostly because I needed JDK 1.1.x. That JAR file is 
even smaller, at 161K. Building updated data tables is actually easier 
with the old source code...

In any event, you really ought to try the newer versions of ICU4J out. 
They are a lot easier to work with. And a light version isn't that 
hard to create, if that's what you want.

Best Regards,

Addison

--
Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.
+1 408.962.5487  mailto:[EMAIL PROTECTED]
---
Internationalization is an architecture. It is not a feature.
Chair, W3C I18N WG Web Services Task Force
http://www.w3.org/International/ws

Re: Ligatures in Turkish and Azeri

2003-07-15 Thread Anto'nio Martins-Tuva'lkin

On 2003.07.12, 20:59, Anto'nio Martins-Tuva'lkin
[EMAIL PROTECTED] wrote:

 Just browsed some old book with that in mind

I here meant rather books, plural. And I'll keep an eye for this in
the future.

--   .
António MARTINS-Tuválkin,   |  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 934 821 700 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-14 Thread Philippe Verdy

On Sunday, July 13, 2003 10:21 PM, John Cowan [EMAIL PROTECTED] wrote:

 Michael Everson scripsit:
 
   A good choice if you don't slash your DIGIT SEVENs and can make
   your DIGIT ONEs sufficiently distinct.
  
  Eh? I *do* slash my DIGITs SEVEN and I use a single vertical stroke
  from my DIGITs ONE. The TIRONIAN SIGN ET as used in Ireland has no
  horizontal stroke.
 
 I should have said do slash your DIGIT SEVENs.  So the glyph in the
 Unicode 3.0 book is not typical of Irish practice?  It seems to have a
 horizontal stroke all right.

In French too: children at school learn to use an horizontal stroke when
drawing a digit seven, and the oblique stroke is often curved to become
vertical at its central base (not placed at the left corner, and uses a
small loop to connect to the top horizontal stroke. I have always used
a medial horizontal stroke on my sevens, often starting it the top left
corner with a tiny loop too to create a vertical serif...

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-14 Thread Mark Davis

First, you should check again, since a significant amount of work was
done in modularization in 2.6.

Second, the phrase IBM forgot to modularize ICU is misleading, at
the least. Unlike some people, who appear to have unbounded time and
energy for, say, writing emails, we have to carefully pick and choose
where we spend our time. Whether very fine-grained modularization is
important depends a great deal on the client's requirements, and must
be traded off against the many other things we could be doing with our
time.

Third, ICU4J is a source product. Saying that it is impossible to
integrate the ICU's Normalize... is also misleading, since one can
clearly modify source to remove dependencies on code one doesn't want
to include, if it is not core to the functionality. (Of course, it may
vary in amount of effort that is required.). And transliterators are
not, in any event, required for Normalization.

Mark
__
http://www.macchiato.com
  Eppur si muove 

- Original Message - 
From: Philippe Verdy [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, July 14, 2003 11:13
Subject: Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish
and Azeri, was: Accented ij ligatures)


 On Monday, July 14, 2003 5:34 AM, Mark Davis [EMAIL PROTECTED]
wrote:

  ...
   Of course
   Java already includes some parts of ICU, but other things are in
   ICU4J are difficult now to integrate in Java, simply because IBM
   forgot to modularize ICU so that it can be integrated slowly.
   Accepting ICU4J as part of the core is a big decision choice,
   because ICU4J is quite large, and there are certainly developers
   for Java that would not accept to have 1 aditional MB of data
and
   classes loaded in each JVM (particularly because the integration
   of ICU would affect a lot of core classes for the Java2 platform
   now also used for small devices).
  ...
   For example, it is impossible to integrate the ICU's Normalizer
   class in Java without also importing the UChar class and all its
   related services for UString, such as transliterators, and
  ...
 
  You are very misinformed about ICU4J.

 I hae tried several times to do it. It does not work: you may
 effectively remove some tables your don't need, but trying
 to extract just the normalizer is a real nightmare. I tried it
 in the past, and abondonned: too tricky to maintain, and I
 retried it recently (one month ago, from its CVS source) and
 this was even worse than the first time.

 I know that there's now a recent announcement (less than 1
 month ago) for its modularization, but it's true that I did not
 check the new modularized sources. So my application
 of ICU4J is still only when I can accept the whole package,
 as maintaining a stripped-down customization is too tricky.

 But may be this has changed, I just updated my ICU sources
 from CVS. I'll recheck it to see if a ICU Light version can be
 created (which would only keep the core features, without the
 support for tailoring rules compiled at run-time).

 -- 
 Philippe.
 Spams non tolrs: tout message non sollicit sera
 rapport  vos fournisseurs de services Internet.

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread John Cowan

Jim Allan scripsit:

 What this doesn't indicate is that sometimes in medieval text the 
 ampersand ligature is used to spell _et_ as part of a longer word.  

Not just mediaeval text; c. for etc. (= et cetera) was common
right through the 19th century if not later.

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand.
--Gerald Holton

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread John Cowan

Jim Allan scripsit:

 See http://www.adobe.com/type/topics/theampersand.html for a short 
 history of the ampersand and some of its variations in modern computer 
 fonts.

Unfortunately the explanation of the name ampersand given there
is exactly backwards:  it is not  per se and, but and per se .
Anglophones used to recite the alphabet by saying ... x, y, z, and
per se [by itself] , pronounced of course and per se and and later
ampersand.

 Check common fonts like Trebuchet MS, Berkeley Book, Goudy Sans, Korinna 
  and Univers for recognizable _Et_ ampersands.

I hand-write  by making a tall lower-case epsilon glyph and then drawing
a solidus over it.

-- 
I am expressing my opinion.  When myJohn Cowan
honorable and gallant friend is called, [EMAIL PROTECTED]
he will express his opinion.  This is   http://www.ccil.org/~cowan
the process which we call Debate.   --Winston Churchill

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Michael Everson

At 01:21 -0400 2003-07-13, John Cowan wrote:

I hand-write  by making a tall lower-case epsilon glyph and then drawing
a solidus over it.
I just use the TIRONIAN SIGN ET.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread James H. Cloos Jr.

 John == John Cowan [EMAIL PROTECTED] writes:

John Not just mediaeval text; c. for etc. (= et cetera) was
John common right through the 19th century if not later.

And picked up steam again online in the 1980s; groups.google.com
should have lots of examples of c.

-JimC

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread John Cowan

Michael Everson scripsit:

 I hand-write  by making a tall lower-case epsilon glyph and then drawing
 a solidus over it.
 
 I just use the TIRONIAN SIGN ET.

A good choice if you don't slash your DIGIT SEVENs and can make your
DIGIT ONEs sufficiently distinct.

-- 
Dream projects long deferredJohn Cowan [EMAIL PROTECTED]
usually bite the wax tadpole.http://www.ccil.org/~cowan
--James Lileks  http://www.reutershealth.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Jim Allan

John Cowan posted:

Not just mediaeval text; c. for etc. (= et cetera) was common
right through the 19th century if not later. 
The combination _c_ is still used. Search for c in 
http://www.scotland.gov.uk/consultations/environment/tacnh-00.asp for 
example.

But in mentioning medieval use I was thinking of use of the ampersand as 
a replacement for _et_ in words where _et_ is not the Latin word _et_.

An article I read some years back discussed a medieval listing and 
explanation of the Icelandic alphabet which included the __ as a letter.

The author of the article explained this by noting that __ was used 
occasionally in manuscripts to spell _et_ in Icelandic words.

Jim Allan

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Michael Everson

At 14:09 -0400 2003-07-13, John Cowan wrote:
Michael Everson scripsit:

 I hand-write  by making a tall lower-case epsilon glyph and then drawing
 a solidus over it.
 I just use the TIRONIAN SIGN ET.
A good choice if you don't slash your DIGIT SEVENs and can make your
DIGIT ONEs sufficiently distinct.
Eh? I *do* slash my DIGITs SEVEN and I use a single vertical stroke 
from my DIGITs ONE. The TIRONIAN SIGN ET as used in Ireland has no 
horizontal stroke.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread John Cowan

Michael Everson scripsit:

 A good choice if you don't slash your DIGIT SEVENs and can make your
 DIGIT ONEs sufficiently distinct.
 
 Eh? I *do* slash my DIGITs SEVEN and I use a single vertical stroke 
 from my DIGITs ONE. The TIRONIAN SIGN ET as used in Ireland has no 
 horizontal stroke.

I should have said do slash your DIGIT SEVENs.  So the glyph in the
Unicode 3.0 book is not typical of Irish practice?  It seems to have a
horizontal stroke all right.

-- 
Where the wombat has walked,John Cowan [EMAIL PROTECTED]
it will inevitably walk again.  http://www.ccil.org/~cowan

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Michael Everson

At 16:21 -0400 2003-07-13, John Cowan wrote:

I should have said do slash your DIGIT SEVENs.  So the glyph in the
Unicode 3.0 book is not typical of Irish practice?  It seems to have a
horizontal stroke all right.
It is utterly typical of Irish practice. I meant that it doesn't have 
an additional horizontal stroke as a slashed 7 does.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Doug Ewell

Philippe Verdy verdy_p at wanadoo dot fr wrote:

 All this discussion shows that there is an extremely large number of
 glyph variation for the ampersand which is both (at the abstract
 level) a symbol character, and a ligature of two lowercase abstract
 characters. But ligatures for the uppercase ET and titlecase Et
 do exist as well. For Unicode, only the abstract symbol is encoded,
 but not the ligatures, despite they share a common set of glyphs.

That is one of the essential features of Unicode.  Abstract characters
are encoded; glyph variants (in general) are not.

 Could the variant selectors may be used ? I see that Unicode
 does not allow a free use of variant selectors, which are defined
 only for cases where it would be important to preserve the
 precise semantic of the encoded text, but not as a way to
 preserve the glyphic information (so character variants are
 strictly limited).

That's correct.  The difference between the Arial-style glyph that looks
a bit like a tilted treble clef (U+1D11E) and John's
epsilon-with-solidus and Philippe's e-with-small-attached-t is one of
style only.  The distinction does not need to be encoded in plain text,
any more than the distinction between a lowercase g with one bowl versus
two.

Apparently the math experts really, really needed to make a distinction
in plain text between (e.g.) a less-than-or-equal sign with a horizontal
bottom stroke and one with a slanted bottom stroke.  We can take it on
faith that that distinction is important in plain text, but we don't
need to add more distinctions that probably aren't.

 I don't see a solution for this problem within Unicode itself
 (and neither in ISO/IEC 10646), unless a separate standard
 is started to encode glyphs mapped to characters
 (in the UCS-4 space, out of its 17 first planes?). For now the
 safest way is to use specific fonts encoding these glyphs
 in PUA positions, and bind these fonts to the abstract text
 using stylesheets, meta information, or markup languages.
 But with such technic, the abstract text would be modified.

 A way to avoid it is to surround the text with markup that
 specifies an explicicit substitution, like this in XML:

 typo as=#xF001;et/typo,

You probably don't want to start down the slippery slope of encoding
Latin glyph variants as PUA characters.  Check the archives of this
mailing list; you will find that proposals to use the PUA to turn
Unicode into a glyph registry are generally not well received.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-13 Thread Mark Davis

...
 Of course
 Java already includes some parts of ICU, but other things are in
 ICU4J are difficult now to integrate in Java, simply because IBM
 forgot to modularize ICU so that it can be integrated slowly.
 Accepting ICU4J as part of the core is a big decision choice,
 because ICU4J is quite large, and there are certainly developers
 for Java that would not accept to have 1 aditional MB of data and
 classes loaded in each JVM (particularly because the integration
 of ICU would affect a lot of core classes for the Java2 platform
 now also used for small devices).
...
 For example, it is impossible to integrate the ICU's Normalizer
 class in Java without also importing the UChar class and all its
 related services for UString, such as transliterators, and
...

You are very misinformed about ICU4J.

Mark
__
http://www.macchiato.com
  Eppur si muove 

- Original Message - 
From: Philippe Verdy [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, July 12, 2003 14:45
Subject: Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish
and Azeri, was: Accented ij ligatures)


 On Saturday, July 12, 2003 4:17 PM, Jony Rosenne
[EMAIL PROTECTED] wrote:

  What has iw to with Hebrew?
 
  I wasn't involved with the change, but I'm glad it was done. Java
and
  other systems probably still use it because they never bothered to
  check the latest version of 639. I know for certain that this was
the
  case with one of the major computer vendors.

 In the case of Java, I don't think so. Sun has certainly maintained
the
 language code simply to avoid breaking existing localizations to
 Hebrew of Java-written software, waiting probably for a better way
to
 locate locales than the fixed locales path resolution algorithm
which
 is part of its core Classes since the beginning.

 As long as Java core classes will not use a locale resolver that
allows
 tuning the resolution algorithm used to load resource bundles, while
 also maintaining the compatibility with the existing softwares that
 assume that Hebrew resources are loaded with the iw language code,
 Sun will not change this code.

 In IBM ICU4J, there is such an extended resolver, but Sun takes a
 long time to approve such proposals, and have it first accepted,
 documented, balloted and voted in its JCP program. Of course
 Java already includes some parts of ICU, but other things are in
 ICU4J are difficult now to integrate in Java, simply because IBM
 forgot to modularize ICU so that it can be integrated slowly.
 Accepting ICU4J as part of the core is a big decision choice,
 because ICU4J is quite large, and there are certainly developers
 for Java that would not accept to have 1 aditional MB of data and
 classes loaded in each JVM (particularly because the integration
 of ICU would affect a lot of core classes for the Java2 platform
 now also used for small devices).

 For example, it is impossible to integrate the ICU's Normalizer
 class in Java without also importing the UChar class and all its
 related services for UString, such as transliterators, and
 advanced features such as the UCA tailoring rules run-time
 compiler. Some ICU open-sourcers, as well as its users seem
 to think now that the modularization of ICU is an important but
 complex project.

 -- 
 Philippe.
 Spams non tolrs: tout message non sollicit sera
 rapport  vos fournisseurs de services Internet.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-12 Thread Peter_Constable

 Where does the fact of saying that a Grapheme Disjoiner...

The character you should be referring to is not a new character GDJ, but 
rather is the existing ZWNJ, the functions of which include prevention of 
a ligature.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-12 Thread Philippe Verdy

On Saturday, July 12, 2003 6:51 AM, Doug Ewell [EMAIL PROTECTED] wrote:

 Philippe Verdy verdy_p at wanadoo dot fr wrote:
 
  Good luck with ISO language codes which does not even
  define them, and contain many duplicate codes even in
  the Alpha-2 space (he/iw, in/id), or unprecize codes
  matching sometimes very imprecize families of languages
  overlapping other language codes...
 
 The codes iw for Hebrew and in for Indonesian were deprecated
 FOURTEEN YEARS AGO.  It is not accurate or fair to refer to them as
 duplicates of he and id.  The Registration Authority deprecates
 such codes, rather than deleting them, for backward compatibility with
 any data that might contain the old codes.

I was sure also that iw was not used today, until I found that it is
still used in Java on Windows, for legacy reasons... Creating a resource
bundle in Hebrew with the code he was simply... ignored. So I had to
rename it to iw.

Shamely, on Linux or various Unixes the correct code to use for locales
varies, and it comes from the user-environment settings, actually setup
by a system profile, most of the time... Users that want to get the
benefit of existing locales for Hebrew will constantly need to change
between he and 'iw. The normal installation solution is still today
to create a file link between he and iw resources, so that they both
can be used.

I was really disappointed when I saw that these legacy language codes
were not simplifiable the way we think, by ignoring iw and in, and still
today, Java does not offer a way to create links at runtime to resolve
locales with equivalent ids, without duplicating resources or creating
special rules with: if ( code=he|| code=iw )
(don't forget that Java has also run-time resources with no files)...

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-12 Thread Peter Kirk

On 11/07/2003 11:18, Philippe Verdy wrote:

# T: special case for uppercase I and dotted uppercase I
#- For non-Turkic languages, this mapping is normally not used.
#- For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters.

snip

Is that what is called a character subset for a scripted language family? Well I don't like the term Turkic to name it. I prefer the more common Altaic Latin alphabet, seen as a standard subset of the Latin script, with additional properties.

May be Unicode should not try to use language codes for families of languages, but it could define representative subsets of characters which may contain characters from several scripts, but would be minimized according to the tradition of a family of languages. Such families seem evident from the current ISO-8859-* and Mac/Windows/DOS charsets.

-- Philippe.

Thank you, Philippe. Well, I am glad to read not normally used rather
than must not be used as this allows mapping T to be used for other
languages when appropriate.

I also don't like the word Turkic here. This is a linguistic term for a
language family, see
http://www.ethnologue.com/show_family.asp?subid=710. Turkish and Azeri
are Turkic languages, but there are many Turkic languages which don't
use this case mapping, either because they use other alphabets
(Cyrillic, Arabic, occasionally Hebrew, perhaps even Greek) or because
they use a Latin alphabet with the regular case mapping as in Uzbek and
Turkmen. There are also some non-Turkic minority languages which need
the T case mapping. Altaic Latin alphabet is a reasonable alternative,
although again Altaic is a language family name, covering Turkic,
Mongolian and Tungus, see
http://www.ethnologue.com/show_family.asp?subid=709, and as far as I
know mapping T is not needed for any Mongolian or Tungusic languages.

Does anyone know of a good resource on the web, or elsewhere, listing
the alphabets used for different languages around the world? I know a
project was attempted a few years ago at least for Europe. It would be
useful to have this kind of data available somewhere even with no
official status.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-12 Thread Michael Everson

At 03:25 -0700 2003-07-12, Peter Kirk wrote:

Does anyone know of a good resource on the web, or elsewhere, 
listing the alphabets used for different languages around the world? 
I know a project was attempted a few years ago at least for Europe. 
It would be useful to have this kind of data available somewhere 
even with no official status.
http://www.evertype.com/alphabets
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-12 Thread Patrick Andries



Samedi 12 juillet  6h51, Doug Ewell [EMAIL PROTECTED] crivit :

 The codes iw for Hebrew and in for Indonesian were deprecated
 FOURTEEN YEARS AGO.  It is not accurate or fair to refer to them as
 duplicates of he and id.  The Registration Authority deprecates
 such codes, rather than deleting them, for backward compatibility with
 any data that might contain the old codes.

Just out of curiosity, why was  iw  deprecated ? Seems perfectly fine to
me.
And why was  he  chosen (Herero, Hemba, Hellenic Greek) ?

P.A.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-12 Thread Peter Kirk

On 12/07/2003 04:18, Michael Everson wrote:

At 03:25 -0700 2003-07-12, Peter Kirk wrote:

Does anyone know of a good resource on the web, or elsewhere, listing 
the alphabets used for different languages around the world? I know a 
project was attempted a few years ago at least for Europe. It would 
be useful to have this kind of data available somewhere even with no 
official status.


http://www.evertype.com/alphabets
Thank you, Michael. I knew you had this information, of course, as I 
helped to provide it, but I didn't know where it was now. This is of 
course restricted to Europe as you have defined it, and is not 
exhaustive for Turkey. Also it doesn't include recent Latin alphabets 
for minority languages of Azerbaijan, as used in schools to a rather 
limited extent, perhaps because I never sent you the data.

The link to http://www.evertype.com/alphabets/azerbaijan.pdf is broken; 
and in http://www.evertype.com/alphabets/turkish.pdf the dotted capital 
I is missing, as viewed in Acrobat Reader 5.1 on Windows 2000.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish andAzeri, was: Accented ij ligatures)

2003-07-12 Thread Michael Everson

At 08:11 -0400 2003-07-12, Patrick Andries wrote:

Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine to
me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ?
Iwrit (iw), being a German transliteration of the name of the Hebrew 
language, and Jiddisch (ji) were both thought (by someone) to be less 
suitable than the English-based he and yi which replaced them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-12 Thread Jony Rosenne

What has iw to with Hebrew?

I wasn't involved with the change, but I'm glad it was done. Java and other
systems probably still use it because they never bothered to check the
latest version of 639. I know for certain that this was the case with one of
the major computer vendors.

Jony

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Andries
 Sent: Saturday, July 12, 2003 2:12 PM
 To: Philippe Verdy; Doug Ewell
 Cc: [EMAIL PROTECTED]
 Subject: Re: ISO 639 duplicate codes (was: Re: Ligatures in 
 Turkish and Azeri, was: Accented ij ligatures)
 
 
 
 
 Samedi 12 juillet à 6h51, Doug Ewell [EMAIL PROTECTED] écrivit :
 
  The codes iw for Hebrew and in for Indonesian were deprecated 
  FOURTEEN YEARS AGO.  It is not accurate or fair to refer to them as 
  duplicates of he and id.  The Registration Authority 
 deprecates 
  such codes, rather than deleting them, for backward 
 compatibility with 
  any data that might contain the old codes.
 
 Just out of curiosity, why was « iw » deprecated ? Seems 
 perfectly fine to me. And why was « he » chosen (Herero, 
 Hemba, Hellenic Greek) ?
 
 P.A.

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-12 Thread Patrick Andries


Michael Everson [EMAIL PROTECTED] écrivit :

 At 08:11 -0400 2003-07-12, Patrick Andries wrote:

 Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine
to
 me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ?

 Iwrit (iw), being a German transliteration of the name of the Hebrew
 language, and Jiddisch (ji) were both thought (by someone) to be less
 suitable than the English-based he and yi which replaced them.

This is also what I concluded, but  «iv» for ivrit could have pleased those
who thought the transliteration must be English-based (what a strange
idea!).

P. A.

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-12 Thread Philippe Verdy

On Saturday, July 12, 2003 4:17 PM, Jony Rosenne [EMAIL PROTECTED] wrote:

 What has iw to with Hebrew?
 
 I wasn't involved with the change, but I'm glad it was done. Java and
 other systems probably still use it because they never bothered to
 check the latest version of 639. I know for certain that this was the
 case with one of the major computer vendors.

In the case of Java, I don't think so. Sun has certainly maintained the
language code simply to avoid breaking existing localizations to
Hebrew of Java-written software, waiting probably for a better way to
locate locales than the fixed locales path resolution algorithm which
is part of its core Classes since the beginning.

As long as Java core classes will not use a locale resolver that allows
tuning the resolution algorithm used to load resource bundles, while
also maintaining the compatibility with the existing softwares that
assume that Hebrew resources are loaded with the iw language code,
Sun will not change this code.

In IBM ICU4J, there is such an extended resolver, but Sun takes a
long time to approve such proposals, and have it first accepted,
documented, balloted and voted in its JCP program. Of course
Java already includes some parts of ICU, but other things are in
ICU4J are difficult now to integrate in Java, simply because IBM
forgot to modularize ICU so that it can be integrated slowly.
Accepting ICU4J as part of the core is a big decision choice,
because ICU4J is quite large, and there are certainly developers
for Java that would not accept to have 1 aditional MB of data and
classes loaded in each JVM (particularly because the integration
of ICU would affect a lot of core classes for the Java2 platform
now also used for small devices).

For example, it is impossible to integrate the ICU's Normalizer
class in Java without also importing the UChar class and all its
related services for UString, such as transliterators, and
advanced features such as the UCA tailoring rules run-time
compiler. Some ICU open-sourcers, as well as its users seem
to think now that the modularization of ICU is an important but
complex project.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-12 Thread Philippe Verdy

On Saturday, July 12, 2003 9:59 PM, Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED] 
wrote:

 On 2003.07.10, 20:34, John Cowan [EMAIL PROTECTED] wrote:
 
  IIRC, Portuguese traditional typography also avoids the fi-ligature,
  even though the language has no dotless-i.
 
 Just browsed some old book with that in mind and I cannot really
 corroborate. I've even seen some other more exotic ligatures, such as
 st and ct.
 
 Maybe there was such a reccomendation in some portugguese type-setting
 manual, but its result doesn't show...

In French typography, we also find the special ligatures for the French
(and Roman Latin) word et (means and), using old alternate forms for
the lowercase letter e, looking mostly like a Greek epsilon (or the Latin
Small Open E, still used in Tamazigh as a letter distinct from the
standard Latin Small E).

The resulting ligature glyph is very near from the ASCII ampersand
character, and I just wonder if the ampersand is not a variation of this
French or Latin ligature, which belongs to the same typographic
traditions as the s, t, c, t and long-s, t ligatures (and
probably the long-s, s ligature too in German's sharp-s).

In French text, using the  character to replace a et word would
seem ugly (or lazy), even today where it looks like a technical symbol
imported from English or used in trademarks (such as the new
France Telecom Orange logo, where it clearly uses the common
association of this character with Internet), and called esperluète,
éperluète, or commonly et commercial.

On the opposite, the use of the et ligature (which is really
representing the French word et with its two letters) is quite
common even in recent books and publications, and it looks
pretty good typographically, notably for its titlecase version at
at the beginning of sentences.

There are many examples in various languages, where what was a
typographic ligature ot two letters, became used as a separate
letter or character in another language... Now that computers can
generate these ligatures more easily, I think there is a renewal
of their use and creation, probably meaning in the future more
ligatures converted to plain letters in written languages.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-12 Thread Jim Allan

Philippe Verdy posted:

In French typography, we also find the special ligatures for the French
(and Roman Latin) word et (means and), using old alternate forms for
the lowercase letter e, looking mostly like a Greek epsilon (or the Latin
Small Open E, still used in Tamazigh as a letter distinct from the
standard Latin Small E). 
See http://www.adobe.com/type/topics/theampersand.html for a short 
history of the ampersand and some of its variations in modern computer 
fonts.

What this doesn't indicate is that sometimes in medieval text the 
ampersand ligature is used to spell _et_ as part of a longer word. So 
perhaps it should be considered a letter with alphabetic properties?

The forms you describe seems like some of those shown in my link and all 
but the two earliest would be recognized by English readers as 
acceptable modern ampersand forms.

Check common fonts like Trebuchet MS, Berkeley Book, Goudy Sans, Korinna 
 and Univers for recognizable _Et_ ampersands.

In common proofreading practice in English, at least in my experience, 
the ampersand is often pronounced as et.

On the opposite, the use of the et ligature (which is really
representing the French word et with its two letters) is quite
common even in recent books and publications, and it looks
pretty good typographically, notably for its titlecase version at
at the beginning of sentences. 
Possibly a capital ampersand is needed?

Jim Allan

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-12 Thread Patrick Andries


- Original Message - 
From: Jim Allan [EMAIL PROTECTED]


 See http://www.adobe.com/type/topics/theampersand.html for a short
 history of the ampersand and some of its variations in modern computer
 fonts.

Whole article (17 pages) about ampersand ligature in French (and other
languages) :

http://www.gutenberg.eu.org/pub/GUTenberg/publicationsPDF/22-blanchard.pdf

RE: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Kent Karlsson


 Note also: the Soft_Dotted property was created and considered
 specially for Turkish and Azeri.

Adding to the long, and unfortunately getting longer, list of misleading
statements from Philippe!  No, the reason for the Soft_Dotted property
was/is to mark which characters (regardless of language) that don't
display
intrinsic dot(s) above subglyph(s) when (another) combining character
above
is applied to it (and to then keep the dot(s) a combining dot above or a
combining diaeresis, as appropriate, must be used explicitly).

 In this language context the ASCII i is always rendered with a dot,
 kept also for uppercases.

I hope you don't mean to use a dotted glyph for U+0069!

B.t.w.  It is perfectly legal to use a ligature (in the TECHNICAL sense,
perhaps not the typographic sense) for f, i also for Turkish and
related
languages, especially if the f and i would otherwise overlap.  The point
is that f, i and f, dotless i must be clearly distinguishable for
these
languages, and that may mean that one has to use a TECHNICAL ligature
for f, i having a glyph where the dot on the i is clearly visible (the
horizontal bar of the f and the top serif of the i may still merge).
That may be done by whatever means that is better-looking for that
particular font, e.g. moving the loop of the f to the left, right, or
up.
(Using ZWNJ should not do that, if correctly implemented, but can
instead, mistakenly, result in overlapping f and dot-of-i glyphs, since
not 
even a technical ligature, IIUC (correct me if I'm wrong), would be
allowed...)

/kent k

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Philippe Verdy

On Friday, July 11, 2003 1:12 PM, Kent Karlsson [EMAIL PROTECTED] wrote:

  Note also: the Soft_Dotted property was created and considered
  specially for Turkish and Azeri.
 
 Adding to the long, and unfortunately getting longer, list of
 misleading statements from Philippe!  No, the reason for the
 Soft_Dotted property was/is to mark which characters (regardless of
 language) that don't display intrinsic dot(s) above subglyph(s)
 when (another) combining character above
 is applied to it (and to then keep the dot(s) a combining dot above
 or a combining diaeresis, as appropriate, must be used explicitly).

I don't know how I can say, with my limited English, things without
being always accused of creating misleading things.

Correct things if you think my words create possible confusion in
their interpretation, but please don't over-exhibit them. I don't know
how non-English native writers can participate here if all differences
of interpretations caused by possible use of inappropriate English
terms are answered with flame. This is really frustrating...

The important words in my sentence is considered specially,
where specially does not imply only. It's just that Turkish and
Azeri are already given special treatment in Unicode, which already
includes language exceptions in its technical algorithms (notably
for character foldings).

And according to this treatment, the U+0069 character is already
intended to have a semantic value of a dotted i and not a dotless
i in languages where this creates a semantic difference, so the
question of the Soft_Dotted property is more glyphic than purely
semantic, and it has a semantic behavior (at the abstract text
level where Unicode is supposed to standardize things) mostly in
case folding operations where the actual encoding of the converted
abstract text is important.

The rest of the description of the Soft_Dotted property is mostly a
recommandation for authors of fonts and text renderers, so that
they should *preserve this semantic difference* in the rendered text
between abstract letters dotted and dotless i's... And this does
not affect the encoding of the abstract text or any algorithmic
transformation of the encoded abstract text.

By saying preserve this semantic difference*, I do not imply that
the U+0069 must/should have a dot above: it remains a font design
problem, out of scope of Unicode. There are certainly many ways
to preserve the semantic difference in the rendered text when this
is really appropriate (for example in Turkish and Azeri, or with a
distinct and emphasized rendering of the Turkish dot, including
in possible ligatures with other letters).

FLAME-OFF
And please, do not flame me if this message contains new
terms that also create confusion. I can reread the best I can,
and there are certainly other better ways to say the same thing
in English without these unintentional confusive interpretations,
and I am sorry by advance that such confusion still persist.

Accept the fact that I'm not a Unicode member and Unicode
is only one of my interests, and I have a lot of other
terminologies with which I have to work with.

If you can't accept that approximative English language may
be used by participants here, and refuse to understand the
real intent of users when they write here, then have this
group be moderated, but don't say it is open to discussions
from anybody using Unicode.

For normative aspects, with all exact terms, Unicode has its
web site, its publications, its data files, its working draft
documents, its technical committees, its permanent members,
its chaimans, and even bugcomment report forms to
interact with users at the normative level.
And I am sure that permanent Unicode members do not even
need this newsgroup to exchange their work on normative
documents that are directly sent to the working committee
bureaus, or via private email, phone calls, snail letters, or
their own web sites.
Please don't expect the same linguistic level quality here.

Also don't complain if my messages are long, but the constant
critics about what I am supposed to imply, gives me no
other choice than explaining always what I mean, and this is
particularly lengthy, and really boring in a newsgroup.
/FLAME-OFF

Thanks for your patience.

-- Philippe.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Peter Kirk

On 11/07/2003 05:56, Philippe Verdy wrote:

Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.
 

Whatever it was that was specially created or adjusted for Turkish and 
Azeri, was it specifically restricted to these two languages? These are 
I think the only relatively major languages which use the special dotted 
and dotless i case mappings. But they are also used, at least in a small 
way, for minority languages of Turkey and Azerbaijan. (Use of these 
minority languages in Turkey is illegal, but that's another matter.) 
They were used in the 1930's for many Central Asian languages, and were 
at least proposed in the 1990's for newly introduced Latin alphabets. So 
I hope that what is fixed by Unicode is the name not of two languages 
but of an extensible family of scripts.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Philippe Verdy

On Friday, July 11, 2003 3:50 PM, Peter Kirk [EMAIL PROTECTED] wrote:
 So I hope that what is fixed by Unicode is the name not
 of two languages but of an extensible family of scripts.

I think you speak about family of languages?

Good luck with ISO language codes which does not even
define them, and contain many duplicate codes even in
the Alpha-2 space (he/iw, in/id), or unprecize codes
matching sometimes very imprecize families of languages
overlapping other language codes...

Until it is demonstrated that a language needs such fix
in Unicode support tables, it's best to just say that these
fixes are needed for some recognized language codes and
that applications are allowed to add their own fixes or
language tailorings, and that the existing language
tailorings in Unicode databases are just non-normative
samples.

-- Philippe.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Peter Kirk

On 11/07/2003 08:51, Philippe Verdy wrote:

On Friday, July 11, 2003 3:50 PM, Peter Kirk [EMAIL PROTECTED] wrote:
 

So I hope that what is fixed by Unicode is the name not
of two languages but of an extensible family of scripts.
   

I think you speak about family of languages?

Not really. A set of languages, but they are not all related in any way, 
and many of them have more than one script or alphabet so this is not 
really a property of the languages. Perhaps set of alphabets would be 
a better way to put it.

Good luck with ISO language codes which does not even
define them, and contain many duplicate codes even in
the Alpha-2 space (he/iw, in/id), or unprecize codes
matching sometimes very imprecize families of languages
overlapping other language codes...
Until it is demonstrated that a language needs such fix
in Unicode support tables, ...
If necessary I can collect some data to demonstrate this, at least for 
some languages.

... it's best to just say that these
fixes are needed for some recognized language codes and
that applications are allowed to add their own fixes or
language tailorings, and that the existing language
tailorings in Unicode databases are just non-normative
samples.
-- Philippe.



 

Agreed. But does Unicode actually treat them as non-normative samples?

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Philippe Verdy

On Friday, July 11, 2003 6:43 PM, Peter Kirk [EMAIL PROTECTED] wrote:

 Agreed. But does Unicode actually treat them as non-normative samples?

Note clear here: the reference documents say that these tables are
normative for applications that want to implement a conforming
case folding. But UTR#30 (characters folding) contains still many
areas marked as to be done, so it is not clear that all folding issues
have been solved. It seems reasonnable however that non language
specific elements in the CaseFolding table are normative, as they
are computed from UCD...

I see this comment:
[quote]
# The entries in this file are in the following machine-readable format:
#
# code; status; mapping; # name
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple
characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.
# T: special case for uppercase I and dotted uppercase I
#- For non-Turkic languages, this mapping is normally not used.
#- For Turkic languages (tr, az), this mapping can be used instead of the normal 
mapping for these characters.
#  Note that the Turkic mappings do not maintain canonical equivalence without 
additional processing.
#  See the discussions of case mapping in the Unicode Standard for more 
information.
#
# Usage:
#  A. To do a simple case folding, use the mappings with status C + S.
#  B. To do a full case folding, use the mappings with status C + F.
#
#The mappings with status T can be used or omitted depending on the desired 
case-folding
#behavior. (The default option is to exclude them.)
#
[/quote]

Simple Case Mapping (C+S) is not marked to be done in UTR#30, but other special 
mappings with status T are off by default (so they depend of a specific tailoring, a 
non-normative behavior if I interpret it correctly, as applications are free to use or 
not use them, under unspecified conditions, i.e. here the desired behavior).

This concerns many more characters than just Turkish/Azeri uses, and there is some 
overlap with the informative and unfinished UTR#30 reference:

(1) Simple mappings (are they normative?):

1F88; S; 1F80; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI
1F89; S; 1F81; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI
1F8A; S; 1F82; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F8B; S; 1F83; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F8C; S; 1F84; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F8D; S; 1F85; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F8E; S; 1F86; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND 
PROSGEGRAMMENI
1F8F; S; 1F87; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND 
PROSGEGRAMMENI

1F98; S; 1F90; # GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI
1F99; S; 1F91; # GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI
1F9A; S; 1F92; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F9B; S; 1F93; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F9C; S; 1F94; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F9D; S; 1F95; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F9E; S; 1F96; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1F9F; S; 1F97; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI

1FA8; S; 1FA0; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI
1FA9; S; 1FA1; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI
1FAA; S; 1FA2; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1FAB; S; 1FA3; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1FAC; S; 1FA4; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1FAD; S; 1FA5; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1FAE; S; 1FA6; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND 
PROSGEGRAMMENI
1FAF; S; 1FA7; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND 
PROSGEGRAMMENI

1FBC; S; 1FB3; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI
1FCC; S; 1FC3; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
1FFC; S; 1FF3; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI

(2) Full mappings (clearly optional):

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
01F0; F; 006A 030C; # LATIN SMALL LETTER J WITH CARON

0390; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
03B0; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS

0587; F; 0565 0582; # ARMENIAN SMALL LIGATURE ECH YIWN

1E96; F; 0068 0331; # LATIN SMALL LETTER H WITH LINE BELOW
1E97;

ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-11 Thread Doug Ewell

Philippe Verdy verdy_p at wanadoo dot fr wrote:

 Good luck with ISO language codes which does not even
 define them, and contain many duplicate codes even in
 the Alpha-2 space (he/iw, in/id), or unprecize codes
 matching sometimes very imprecize families of languages
 overlapping other language codes...

The codes iw for Hebrew and in for Indonesian were deprecated
FOURTEEN YEARS AGO.  It is not accurate or fair to refer to them as
duplicates of he and id.  The Registration Authority deprecates
such codes, rather than deleting them, for backward compatibility with
any data that might contain the old codes.

The part about codes for language families overlapping other codes for
specific languages is, regrettably, true.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Philippe Verdy

On Thursday, July 10, 2003 12:08 PM, Peter Kirk [EMAIL PROTECTED] wrote:

 On 1st July Philippe Verdy wrote:
 
  If fonts still want to display dots on these characters, that's a
  rendering problem: there already exists a lot of fonts used for
  languages other than Turkish and Azeri, which do not display any
  dot on a lowercase ASCII i or j (dotted), and display a dot on their
  uppercase ASCII versions (normally not dotted with classic fonts)...
  
  The absence or presence of these dots is then seen as decorative
  even if these fonts are not suitable for Turkish and Azeri, but
  this is clearly not an encoding problem in the Unicode encoded text,
  and not a problem either for case conversions.
  
 
 Turkish and Azeri do not use the ij ligature. The sequences i - j and
 dotless i - j do occur (rarely, as j is a rare letter in both
 languages) but are treated as separate letters.

I know, and the quoted paragraph did not speak about the ij ligature
but effectively about the separate dotted/dotless i/I letters, for which
decorated fonts where the lowercase ASCII (dotted) i codepoint
uses a dotless glyph, or the uppercase ASCII (dotless) I codepoint
uses a dotted glyph (some fonts are ligating the dot with decorative
curves). These fonts are effectively not suitable for Turkish and
Azeri.

 In Turkish and Azeri the sequences f - i and f - dotless i both occur,
 and are fairly frequent. So it is inappropriate in these languages to
 use fi ligatures in which the dot on the i is lost or invisible, at
 least where the second character is a dotted i. Has any thought been
 given to this issue? Is it possible to block such ligation on a
 language-dependent basis?

Isn't there a Grapheme Disjoiner format control character to force the
absence of a ligature like fi, i.e. f, GDJ, i?

 Also it is certainly possible that in dictionaries etc in these
 languages stress might be marked by an accent on the vowel - as
 certainly in the older Cyrillic Azeri just as in Bulgarian as just
 posted. In this case the dot should not be removed from the dotted i
 when the stress mark is added, so that the distinction from dotless i
 is not lost. Has that issue been addressed? (In my Latin script Azeri
 dictionary stress is marked by a spacing grave accent before the
 vowel, but this may have been done precisely to work around this
 problem.) 

This is part of the proposal for review: an explicit combining dot-above
diacritic can be inserted between the normal (soft-dotted) base letter
and the above diacritic (with class 230):
latin-small-i, dot-above, accute-accent
cyrillic-small-je, dot-above, grave-accent

-- Philippe.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Peter Kirk

On 10/07/2003 08:21, Philippe Verdy wrote:

In Turkish and Azeri the sequences f - i and f - dotless i both occur,
and are fairly frequent. So it is inappropriate in these languages to
use fi ligatures in which the dot on the i is lost or invisible, at
least where the second character is a dotted i. Has any thought been
given to this issue? Is it possible to block such ligation on a
language-dependent basis?
   

Isn't there a Grapheme Disjoiner format control character to force the
absence of a ligature like fi, i.e. f, GDJ, i?
Maybe, but it is hardly realistic to expect all existing Turkish and 
Azeri text to be recoded to insert a character in the middle of each f - 
i sequence.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Philippe Verdy

On Thursday, July 10, 2003 5:41 PM, Peter Kirk [EMAIL PROTECTED] wrote:

  Isn't there a Grapheme Disjoiner format control character to
  force the absence of a ligature like fi, i.e. f, GDJ, i?
  
 Maybe, but it is hardly realistic to expect all existing Turkish and
 Azeri text to be recoded to insert a character in the middle of each
 f - i sequence.

Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.

In this language context the ASCII i is always rendered with a dot,
kept also for uppercases.

The other solution would be to use f, i, dot-above: the forced dot-above
diacritic avoids the ligature, and the sequence is rendered by two glyphs
for f and i, dot-above, i.e. the glyph for f, and the force-dotted
glyph for i.

Its uppercase conversion cause no problem:

F, I, dot-above
= F + I, dot-above
= F + I-dot-above

As well as additional stress diacritics:

f, i, dot-above, accute-accent
= f + i, dot-above, accute-accent
F, I, dot-above, accute-accent
= F + I-dot-above, accute-accent
= F + I-dot-above, accute-accent

-- Philippe.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Peter Kirk

On 10/07/2003 09:34, Stefan Persson wrote:

Peter Kirk wrote:

 Maybe, but it is hardly realistic to expect all existing Turkish and 
Azeri text to be recoded to insert a character in the middle of each f 
- i sequence.

Aren't most Turkish and Azeri text coded as ISO-8859-9 and similar 
code pages?  I that case, it would be enough to add the proper 
disjoiners to the proper Unicode conversion tables.

Stefan


There is no existing code page covering Azeri Latin, so everything is in 
Unicode or in one of a huge variety of custom solutions. See 
http://www.azer.com/aiweb/categories/magazine/81_folder/81_articles/81_standardfonts.html, 
and the article The Land of Azeri Fonts: It's a Jungle Out There in 
the same magazine issue, unfortunately not online, which summarises 20 
or so custom encodings all in current use.

Anyway, I understood from the recent discussion of Hebrew that it is 
Unicode policy not to do anything which could theoretically invalidate 
existing text even if it could be proved that no such text existed.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Stefan Persson

Peter Kirk wrote:

 Maybe, but it is hardly realistic to expect all existing Turkish and 
Azeri text to be recoded to insert a character in the middle of each f - 
i sequence.

Aren't most Turkish and Azeri text coded as ISO-8859-9 and similar code 
pages?  I that case, it would be enough to add the proper disjoiners to 
the proper Unicode conversion tables.

Stefan

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Philippe Verdy

On Thursday, July 10, 2003 6:42 PM, Peter Kirk [EMAIL PROTECTED] wrote:

 Anyway, I understood from the recent discussion of Hebrew that it is
 Unicode policy not to do anything which could theoretically invalidate
 existing text even if it could be proved that no such text existed.

Where does the fact of saying that a Grapheme Disjoiner can be used in Turkish to 
avoid that the f collapses the dot above a next lowercase i?

This does not change anything: existing texts can still produce ligatures in a 
renderer, unless explicitly said to not do so with a Grapheme Disjoiner, or the 
renderer is specially tuned to support the Turkish/Azeri languages. Existing texts do 
not need to be reencoded, if they are already correctly labelled with their language.

The absence of such language specifier will never forbid a renderer to choose a fi 
ligature if available, unless these renderers are made conforming by correctly 
interpreting the Grapheme Disjoiner to mean break the grapheme cluster here, and 
display the previous character(s), then the Grapheme Disjoiner can be rendered itself 
as a non-spacing empty glyph, then the rest of the string...

I'm still convinced that a ligature is still possible for a turkish f, dotted-i 
sequence, using f, i, dot-above. The ligature would apply to the middle bar of the 
f joined with the top serif of the i, but the top-right loop of the f would simply 
be a small horital bar, disjoined from the dot still present on the i.

The same ligature could be used for the encoded sequence f, dotless-i, so an actual 
font would render the glyphs for f, i, dot-above as a base ligature glyph for f, 
dotless-i (with a top horizontal bar for the f part), and add separately the 
dot-above glyph kerned into the existing f-dotless-i ligature.

To force disable this last ligature, we would use the encoded sequence f, GDJ, 
dot-less-i

According to unicode the sequence i, dot-above has always been valid, despite it 
apparently has the same dotted glyph for all languages. It differs only in the fact 
that the explicit dot-above removes the Soft_Dotted property of the previous i to 
make it dotless, followed by a forced diacritic.

So the encoded sequence i, dot-above is now made equivalent (for rendering 
purpose) to dotless-i, dot-above (despite they are not canonically equivalent per 
UAX#15: NFC/D) and not equivalent to an isolated i (not followed above 
diacritics)...

-- Philippe.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Kenneth Whistler

Peter Kirk asked:

  In Turkish and Azeri the sequences f - i and f - dotless i both occur,
  and are fairly frequent. So it is inappropriate in these languages to
  use fi ligatures in which the dot on the i is lost or invisible, at
  least where the second character is a dotted i. Has any thought been
  given to this issue? Is it possible to block such ligation on a
  language-dependent basis?
 

and Philippe Verdy responded with another question:

 Isn't there a Grapheme Disjoiner format control character to force the
 absence of a ligature like fi, i.e. f, GDJ, i?

The answer to Philippe's rejoinder question is no, there is not
a Grapheme Disjoiner format control character.

What Philippe has in mind, however, is covered in the standard
by the interaction of the joiner and non-joiner characters
with ligature control:

U+200C ZERO WIDTH NON-JOINER is intended to break both cursive
connections and ligatures in rendering.

ZWNJ requests that glyphs in the lowest available category
(for the given font) be used.

  -- Unicode 4.0, Section 15.2, Layout Controls

The categories referred to, from lowest to highest, are:

1. unconnected
2. cursively connected
3. ligated

At Peter pointed out, however, it is neither expected or reasonable
to have to go back through and drop in ZWNJ's at every relevant
location in existing Turkish or Azeri text, simply to prevent
fi ligation. Such use of ZWNJ is intended to be exceptional,
to deal with special cases.

The general solutions depend either on use of fonts (or more
generally, renderers) which block such ligation across the
board. It is my understanding that modern font technologies
allow the choice of ligation to essentially be a style selection
for the font. How well various applications take advantage
of that and make the choice available easily to end users may
be an open issue still, but the fundamental pieces to do this
correctly are available.

--Ken

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Philippe Verdy

On Thursday, July 10, 2003 8:37 PM, Kenneth Whistler [EMAIL PROTECTED] wrote:

 Peter Kirk asked:
 
   In Turkish and Azeri the sequences f - i and f - dotless i both
   occur, and are fairly frequent. So it is inappropriate in these
   languages to use fi ligatures in which the dot on the i is lost
   or invisible, at least where the second character is a dotted i.
   Has any thought been given to this issue? Is it possible to block
   such ligation on a language-dependent basis?
  
 
 and Philippe Verdy responded with another question:
 
  Isn't there a Grapheme Disjoiner format control character to
  force the absence of a ligature like fi, i.e. f, GDJ, i?
 
 The answer to Philippe's rejoinder question is no, there is not
 a Grapheme Disjoiner format control character.

I did not refer to a specific unicode character, I knew that there
is one already dedicated, but I did not want to comment about
this choice.

There's no contractiction. The Grapheme Disjoiner, for you is
ZWNJ. OK.

And I did not want to promote any change in any legally and
lecacy encoded text, only to suggest ways to solve the
apparent rendering problem in Turkish, when the f, i
encoded character pair may be badly rendered. For the actual
rendering, selecting a fi ligature is not appropriate for
Turkish, and in fact the canonically decomposed character
has no linguistic ambiguity in Turkish.

So what ever the fi encoded codepoint designates, it is not
the fi ligature glyoh but really two characters, whose ligation
may still be performed according to language context.

A font that would automatically select a fi ligature to represent
a sequence of f, i codepoints, from the fact that the fi
codepoint is canonically equivalent is probably  defective and not
conforming. Such selection of ligature must be put under the
control of the renderer with additional markup, which can in fact
select among three ligatures in Turkish: the fi ligature glyph
where the f is ligated with the dot above i (normal ligature for
languages other than Turkish/Azeri, the f-dotted-i and
f-fotted-i ligatures for Turkish/Azeri.

Markup is necessary to select the appropriate glyph, or this
can be selected by using the Grapheme Disjoiner (ZWNJ)
or the Grapheme Joiner (ZWJ) in addition to the use of
a i or dotless-i codepoint eventually followed by the
i-above diacritic. All this enrichment of text is assumed
to be under the control of the markup added to the original
text which does not need to specify whever ligatures should
or should not be used.

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread John Cowan

Philippe Verdy scripsit:

 Where does the fact of saying that a Grapheme Disjoiner can be used
 in Turkish to avoid that the f collapses the dot above a next lowercase i?

It is settled that ZWNJ is the correct character to break ligatures.
ZWJ means make a ligature if you can; if not, shape characters to
joining forms if you can; if not that either, do nothing.  ZWNJ means
break ligatures, if any, and shape characters to non-joining forms,
if possible.

 I'm still convinced that a ligature is still possible for a turkish f,
 dotted-i sequence, using f, i, dot-above. The ligature would apply
 to the middle bar of the f joined with the top serif of the i,
 but the top-right loop of the f would simply be a small horital bar,
 disjoined from the dot still present on the i.

Yes, theoretically.  Whether that is good Turkish typography is a different
question, which AFAIK prefers simply an f-glyph followed by an i-glyph with
no ligaturing.

IIRC, Portuguese traditional typography also avoids the fi-ligature, even though
the language has no dotless-i.

 The same ligature could be used for the encoded sequence f, dotless-i, 

I doubt that any font has a ligature for this combination at all.

 So the encoded sequence i, dot-above is now made equivalent
 (for rendering purpose) to dotless-i, dot-above (despite they are
 not canonically equivalent per UAX#15: NFC/D) and not equivalent
 to an isolated i (not followed above diacritics)...

There is no guarantee that the native i dot looks the same as the dot above
in a given font (it may have different vertical kerning or even a different
shape), nor is there any guarantee that the i with its dot removed looks
the same as the dotless-i.

-- 
John Cowan  www.ccil.org/~cowan  www.reutershealth.com  [EMAIL PROTECTED]
'My young friend, if you do not now, immediately and instantly, pull
as hard as ever you can, it is my opinion that your acquaintance in the
large-pattern leather ulster' (and by this he meant the Crocodile) 'will
jerk you into yonder limpid stream before you can say Jack Robinson.'
--the Bi-Coloured-Python-Rock-Snake

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Peter Kirk

On 10/07/2003 11:37, Kenneth Whistler wrote:

At Peter pointed out, however, it is neither expected or reasonable
to have to go back through and drop in ZWNJ's at every relevant
location in existing Turkish or Azeri text, simply to prevent
fi ligation. Such use of ZWNJ is intended to be exceptional,
to deal with special cases.
The general solutions depend either on use of fonts (or more
generally, renderers) which block such ligation across the
board. It is my understanding that modern font technologies
allow the choice of ligation to essentially be a style selection
for the font. How well various applications take advantage
of that and make the choice available easily to end users may
be an open issue still, but the fundamental pieces to do this
correctly are available.
 

Thank you, Ken. I think you get my point. I am not so interested in 
character level mechaisms for disabling the ligature as in higher level 
features. But I guess I am really thinking in terms of markup, so 
outside the domain of Unicode, which might disable ligation.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Laurentiu Iancu

See also
http://www.microsoft.com/typography/developers/opentype/detail.htm
which explains how ligatures can be turned off on a language-dependent basis.

Laurentiu


Peter Kirk asked:

 In Turkish and Azeri the sequences f - i and f - dotless i both occur,
 and are fairly frequent. So it is inappropriate in these languages to
 use fi ligatures in which the dot on the i is lost or invisible, at
 least where the second character is a dotted i. Has any thought been
 given to this issue? Is it possible to block such ligation on a
 language-dependent basis?

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread Kenneth Whistler


  and Philippe Verdy responded with another question:
  
   Isn't there a Grapheme Disjoiner format control character to
   force the absence of a ligature like fi, i.e. f, GDJ, i?
  
  The answer to Philippe's rejoinder question is no, there is not
  a Grapheme Disjoiner format control character.
 
 I did not refer to a specific unicode character, I knew that there
 is one already dedicated, but I did not want to comment about
 this choice.
 
 There's no contractiction. The Grapheme Disjoiner, for you is
 ZWNJ. OK.

ad hominem

Every so often, Philippe, it would be refreshing if, when someone
points out in error in your claims about the Unicode Standard,
that you would simply acknowledge the error and discontinue
making the claim, instead of coming back trying to claim that
the error was just another way of being right.

/ad hominem

There is a separate character, U+034F COMBINING GRAPHEME JOINER,
which is the grapheme joiner, abbreviation CGJ in the
standard. That character has nothing to do with ligation
control. There has also been debate, on several occasions,
within the UTC, regarding the advisability of encoding
a grapheme non-joiner, as a pair with the grapheme joiner.
But again, such a grapheme non-joiner -- which has *not* been
encoded, by the way -- would have nothing to do with ligation
control.

So it is a disservice to the list, perpetuating confusion, to
invent the term Grapheme Disjoiner and use it in a series
of notes regarding ligation control, when the standard already
designates the ZWJ and the ZWNJ as the relevant controls
related to ligation control.

So it is not that for me the Grapheme Disjoiner is the ZWNJ;
rather, it is for the Unicode Standard that the ZWNJ is the
designated, standardized format control for ligation control
of the sort you are talking about. Please learn the terminology
and make correct use of it.

 A font that would automatically select a fi ligature to represent
 a sequence of f, i codepoints, from the fact that the fi
 codepoint is canonically equivalent

U+FB01 LATIN SMALL LIGATURE FI is not a *canonical* equivalent to
f, i; it is *compatibility* equivalent. That is an important
distinction.

 is probably  defective and not
 conforming. 

Wrong. There is nothing nonconformant about fonts automatically
ligating f, i (or any other sequence). Such automatic
ligation may not always be appropriate or the desired result
for an end user, but that has nothing to do with the conformance
requirements of the standard.

 Such selection of ligature must be put under the
 
 
Wrong. must -- may

 control of the renderer with additional markup, which can in fact
 select among three ligatures in Turkish: the fi ligature glyph
 where the f is ligated with the dot above i (normal ligature for
 languages other than Turkish/Azeri, the f-dotted-i and
 f-fotted-i ligatures for Turkish/Azeri.

It is unclear that any such ligatures are required or desireable
for Turkish/Azeri, in any case.

 Markup is necessary to select the appropriate glyph, or this
  ^^^
  
Wrong. A higher-level protocol is needed, and that may involve
markup. But the Turkish requirements can equally well be
met by simply setting no ligature style settings for
the relevant fonts.

 can be selected by using the Grapheme Disjoiner (ZWNJ)
   
   
Wrong term. See above.

 or the Grapheme Joiner (ZWJ) in addition to the use of
 ^
 
Wrong term. See above.

 a i or dotless-i codepoint eventually followed by the
 i-above diacritic.

And in any case, it is inadvisable to be suggesting use of
ZWJ and ZWNJ in this way to solve the problem of assuring that
Turkish texts don't ligate inappropriately on rendering. 

 All this enrichment of text is assumed
 to be under the control of the markup added to the original
 text which does not need to specify whever ligatures should
 or should not be used.

This last clause I agree with. But the implication that
markup has to be added to Turkish text in order to get it
to render correctly regarding ligature usage is incorrect.
Adding markup to the text is adding to the original text
as surely as adding ZWNJ format controls would be. In any
case it is unnecessary, since alternatives exist which simply
specify suppression (or use) of ligatures stylistically in
the fonts.

--Ken

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-10 Thread James H. Cloos Jr.

 Peter == Peter Kirk [EMAIL PROTECTED] writes:

Peter Maybe, but it is hardly realistic to expect all existing
Peter Turkish and Azeri text to be recoded to insert a character in
Peter the middle of each f - i sequence.

But a lot of it already does do that.  In TeX Turkish uses f{}i to
block the (fonts) ligation.  roff does something similar.  Im
sure all of the other text-source publishing systems do as well.

Even the WYSI(NR)WYG must be doming something to accomplish that.

-JimC

 NR  Not Really

Re: Accented ij ligatures (and yery)

2003-07-03 Thread Anto'nio Martins-Tuva'lkin

On 2003.07.01, 15:09, Pim Blokland [EMAIL PROTECTED] wrote:

 Maybe it was a bad idea to include ? as a character in Unicode at all,
 but now it's there, there's no reason to ignore it when refining the
 rules, to deprecate it practically.

Food for thought: How would you compare U+0133 (ij digraph) with
U+044B (cyrillic y, yery)?

Consider that the latter also consists graphically of two separate
letters: U+044A (hard sign) and U+0456 (old i) -- though the first
looks rather like U+044C (soft sign). This is an obvious difference,
but everything else seems quite comparable. Except nobody in this list
is making a big fuss about having included U+044B in the standard was
such a bad idea... ;-)

--   .
António MARTINS-Tuválkin,   |  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 934 821 700 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |

RE: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-02 Thread Kent Karlsson

 Believe it or not, the IJ and ij digraphs *were* included for
 compatibility with an 8-bit legacy character set (ISO 6937).

6937 is a multibyte encoding (one or two bytes per character).
There are no combining characters at all in 6937, even though
there is a common misunderstanding that there are, since the
lead bytes are (almost) systematically assigned.

 Whether
 that automatically means they should have been assigned canonical
 instead of compatibility decompositions, I don't know.

I think in this case it is correct that the decomposition is a compatibility
one.  It could have been: none; like for the oe and ae ligatures.
This is in contrast to the MICRO SIGN which ideally should have had
a canonical decomposition; but Latin-1 characters got special treatment
(and ASCII characters have even more special treatment in this regard,
where some spacing accents are not decomposed at all).

/kent k

RE: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-02 Thread Kent Karlsson



 In either cases, the Soft_Dotted property is probably overkill on
 the existing ij or IJ ligatures (should should have been better

There is no point in having a soft-dotted property for the capital
letter...

 named letters and not ligatures) for Dutch. Or is this update
 needed to document officially the expected rendering behavior for
 sequences ij,accute and ij,macron?

Yes. ij ligature, combining acute should give a dotless ij digraph
with an acute accent centred over it; ij ligature, combining double
acute should give a dotless ij digraph with an acute on top of each
dotless subletter glyph; I'm by now not sure which is the correct one,
but the first one can only be produced this way.  (And the others are
unrelated to the dotless-i and dotless-j, so keep these two out of the
pot.)

 The main interest of the Soft_Dotted property is not to describe the
 rendering for the character, 

Yes, it is.  I should know, the soft-dotted property was my suggestion
in the first place...  And please read the note accompanying the public
review issue.  Not all of the characters in my initial list was
actually
given the property, however. This is what the current suggestion
tries to correct.  I know, there are Thai and Khmer letters where a
glyph
appendage below is removed when there are other things below, like
a vowel or a subjoined consonant; and there is as yet no property for
that...
(But those appendages don't have any similar combining character below
either.)

 but to document how case conversions
 (lowercase, uppercase, titlecase, folded) can be performed safely on

The soft-dotted property is not primarily defined for case mapping,
even though it is used there too.  Case mapping is documented in the
UCD;
for non-same-always-1-1 cases, they are documented in SpecialCasing.txt.
There is no special rule for the ij/IJ combination (even for Dutch)
there; and
it may be unlikely that there will be one.  It's easier to just use the
ij ligature
characters (which do have the expected case mapping already)...

/kent k

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-02 Thread Doug Ewell

Kent Karlsson kentk at cs dot chalmers dot se wrote:

 Believe it or not, the IJ and ij digraphs *were* included for
 compatibility with an 8-bit legacy character set (ISO 6937).

 6937 is a multibyte encoding (one or two bytes per character).
 There are no combining characters at all in 6937, even though
 there is a common misunderstanding that there are, since the
 lead bytes are (almost) systematically assigned.

It's still an 8-bit character set.  Characters are defined in terms of
8-bit code units; some use one, others use two.  This is just like the
double-byte character sets used for CJK.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-01 Thread Philippe Verdy

On Tuesday, July 01, 2003 1:55 PM, Kent Karlsson [EMAIL PROTECTED] wrote:

  My feeling about the proposed Public Review document should
  exclude the ij ligature, waiting for the decision about the new
  dotless-ij ligature approved in the first rounds by UTC and
  waiting for approval by ISO JTC...
 
 There is no proposal to add any dotless ij ligature character.  Please
 read the pipeline documents more carefully before going off imagining
 a character not being proposed, and is unlikely to be seriously
 proposed.

Sorry, I should have written dotless-j in the last paragraph, for
the proposed character at U+0237 (LATIN SMALL LETTER DOTLESS J)

For me the ij ligature is mostly used for Dutch, and the few
applications where ij,accute and ij,macron are used should be
rendering them according to that language, where it is handled as a
single letter.

In all other cases, the ij ligature should be avoided, simply because
there are other better choices with i/dotless-i/I/dotted-I and
j/J/proposed-dotless-j, in combination with double diacritics
inserted between them to produce the desired effect.

In either cases, the Soft_Dotted property is probably overkill on
the existing ij or IJ ligatures (should should have been better
named letters and not ligatures) for Dutch. Or is this update
needed to document officially the expected rendering behavior for
sequences ij,accute and ij,macron?

The main interest of the Soft_Dotted property is not to describe the
rendering for the character, but to document how case conversions
(lowercase, uppercase, titlecase, folded) can be performed safely on
the Unicode encoded string. I'd like to know exactly why it is needed
for Dutch, as such a ligature is not used in Turkish and Azeri written
with the Altaic Latin alphabet...

If fonts still want to display dots on these characters, that's a
rendering problem: there already exists a lot of fonts used for
languages other than Turkish and Azeri, which do not display any
dot on a lowercase ASCII i or j (dotted), and display a dot on their
uppercase ASCII versions (normally not dotted with classic fonts)...

The absence or presence of these dots is then seen as decorative
even if these fonts are not suitable for Turkish and Azeri, but this is
clearly not an encoding problem in the Unicode encoded text,
and not a problem either for case conversions.

The only reason that would justify adding a Soft_Dotted property
on ij would be that it is needed to allow the correct handling
of language-dependant case conversions.

-- Philippe.

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-01 Thread Pim Blokland

Michael Everson schreef:

 I think the answer is, regarding the soft dot property, please
leave
 the ij ligature alone.

And I think not.
When putting accents on the  (which does happen!), the dots must
go. Simple as that.
Maybe it was a bad idea to include  as a character in Unicode at
all, but now it's there, there's no reason to ignore it when
refining the rules, to deprecate it practically.

Pim Blokland

Re: Accented ij ligatures

2003-07-01 Thread Stefan Persson

Pim Blokland wrote:

When putting accents on the  (which does happen!), the dots must
go. Simple as that.
Where should the accent be placed in that case?  Should the accent be 
centered over ij?  Should there be one accent over i and then the 
same over j?  Or should the accent only be an accent over one of the 
letters?

Stefan

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-01 Thread Philippe Verdy

On Tuesday, July 01, 2003 4:09 PM, Pim Blokland [EMAIL PROTECTED] wrote:
 Maybe it was a bad idea to include  as a character in Unicode at
 all, but now it's there, there's no reason to ignore it when
 refining the rules, to deprecate it practically.

No, that was needed for correct Dutch support. Look at the case
conversion of ij into IJ, even with titlecase...

The character itself is not breakable in Dutch where it is definitely
not a ligature, but a single character, with its own case conversion
rule, exactly like the ae and AE letters (considered as
ligatures or as unreakable letters depending on the language that
use them).

That's why ij and IJ are not canonically decomposable as
i, j and I, J (this is just a compatibility decomposition).

If it had only been a shortcut character mapped for compatibility
reasons from some 8-bit encodings, it would have been normalized
with a canonical decomposition.

(the exception to this rule is the inclusion of Arabic ligatures which
were clearly and always decomposable, but that could not be
canonically decomposed because it would have required more than
a character pair for the NFD equivalence, so they are only
given a NFKD decomposition and their usage is strongly
deprecated, and just included for an unnecessary roundtrip
conversion from legacy Arabic encodings).

-- Philippe.

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-01 Thread Doug Ewell

Philippe Verdy verdy_p at wanadoo dot fr wrote:

 Maybe it was a bad idea to include  as a character in Unicode at
 all, but now it's there, there's no reason to ignore it when
 refining the rules, to deprecate it practically.

 No, that was needed for correct Dutch support. Look at the case
 conversion of ij into IJ, even with titlecase...

You don't need a separate character for that.  You can use special
casing rules.  That's why Unicode doesn't have special I and i
characters for Turkish.

Believe it or not, the IJ and ij digraphs *were* included for
compatibility with an 8-bit legacy character set (ISO 6937).  Whether
that automatically means they should have been assigned canonical
instead of compatibility decompositions, I don't know.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Accented ij ligatures (was: Unicode Public Review Issues update)

2003-06-30 Thread Pim Blokland

Philippe Verdy schreef:

 Interesting issue for the Latin Small ij Ligature (U+0133):
 Normally the Soft_Dotted issupposed to make disappear one dot when
 there's and additional diacritic above, but many applications may
 keep these two dots above, fitting the diacritic in the middle.

 This proposal would mean that this become illegal, and it promote
the
 use of an additional intermediate dot-above diacritic if the dot
must
 be kept.

I don't know of any instances where a ij digraph would keep the dots
AND get additional accent marks, nor of any where the ij would
appear with a dotless i and dotless j and a single dot above,
centered between them. Can you give examples?

Pim Blokland

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-06-30 Thread Philippe Verdy

On Monday, June 30, 2003 1:58 PM, Pim Blokland [EMAIL PROTECTED] wrote:

 Philippe Verdy schreef:
 
  Interesting issue for the Latin Small ij Ligature (U+0133):
  Normally the Soft_Dotted issupposed to make disappear one dot when
  there's and additional diacritic above, but many applications may
  keep these two dots above, fitting the diacritic in the middle.
  
  This proposal would mean that this become illegal, and it promote
  the use of an additional intermediate dot-above diacritic if the
  dot must be kept.
 
 I don't know of any instances where a ij digraph would keep the dots
 AND get additional accent marks, nor of any where the ij would
 appear with a dotless i and dotless j and a single dot above,
 centered between them. Can you give examples?

No of course: the only sequence I know is a dotless ij digraph with
a centered accute accent. I just wonder if this public review makes
things clear that the presence of an accute accent is supposed to
remove both dots. For now I have seen some fonts keeping
the two dots, when centering an additional accute accent.
The text of this update should specify that for this pair, the
intended option is to remove both soft dots, if there are other
diacritics.

But if one wants to restore the preious visual behavior, even if it's
incorrect for languages using this digraph as a letter, what would be
the behavior of using the following sequence:
ij, combining dot above, combining accute
(i.e. should this display 1 or 2 dots?)

Should the previous incorrect rendering be approximated with:
ij, combining diaeresis, combining accute
or
ij, combining dot above, combining dot above, combining accute
???

-- Philippe.

Re: Accented ij ligatures (was: Unicode Public Review Issuesupdate)

2003-06-30 Thread James H. Cloos Jr.

 Philippe == Philippe Verdy [EMAIL PROTECTED] writes:

Philippe But if one wants to restore the preious visual behavior,
Philippe even if it's incorrect for languages using this digraph as a
Philippe letter, what would be the behavior of using the following
Philippe sequence: ij, combining dot above, combining accute
Philippe (i.e. should this display 1 or 2 dots?)

Seems clear to me that if ij has soft dots (and I agree it should)
then to get a pair of dots via a combining accent one should use a
two dot combining accent:  U+0308 COMBINING DIAERESIS.

So if you want two dots and an acute use ij, U+0308, U+0301: 

Of course a given fonts diaeresis will often not line up with the
stems of its ij, and a custom one should be used instead.  Or
features and/or ligs as appropriate to the font technology could
just use the ij glyph w/ an extra acute.  Either way it is a glyph
issue rather than a character issue.

But it really seems to be just an academic issue, yes?

-JimC

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-06-30 Thread Philippe Verdy

On Monday, June 30, 2003 9:13 PM, James H. Cloos Jr. [EMAIL PROTECTED] wrote:

 So if you want two dots and an acute use ij, U+0308, U+0301: 
 
 Of course a given fonts diaeresis will often not line up with the
 stems of its ij, and a custom one should be used instead.  Or
 features and/or ligs as appropriate to the font technology could
 just use the ij glyph w/ an extra acute.  Either way it is a glyph
 issue rather than a character issue.

Doesn't it create a new equivalence for the sequences
ij, diaeresis and ij
if neither of them are followed by another combining above diacritic ?
If we dont want such equivalences, the Unicode standard should
say then that it's illegal to use two consecutive identical combining
diacritics. Or simply forbid using ij,diaeresis alone (not followed
by another diacritic with CC=230).

Yes this is really tricky, and academic, I admit. But what forbids
encoding two superposed arrows above any letter? Or encoding
a ij,macron (with the dots removed from ij) followed by
diaeresis, which could have a mathematical meaning?

-- Philippe.

Re: Accented ij ligatures (was: Unicode Public Review Issuesupdate)

2003-06-30 Thread Michael Everson

I think the answer is, regarding the soft dot property, please leave 
the ij ligature alone.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: List of ligatures for languages of the Indian subcontinent.

2003-03-18 Thread William Overington

Thank you for your comments.

I am not going to attempt to produce the list of ligatures myself.

I am writing the paper to draw attention to the problem which exists in
relation to the DVB-MHP (Digital Video Broadcasting - Multimedia Home
Platform) system of interactive broadcasting and its application to the
languages of the Indian subcontinent and hopefully provide a software format
for resolving it..

It appears that the software requirement is essentially as follows, if one
wishes to use a font-based method of display with an ordinary font.



Receive a stream of input characters encoded in regular Unicode UTF-16
format suitable for processing as Java char items.

Output a local stream of Java char suitable to be used in a Java drawString
method with an ordinary font.



As far as I can tell at present, the eutocode typography file format could
be used to produce char codes for conjunct forms and for dealing with matras
by scanning whole words, in that the changes needed seem always to be within
a word and that there is no carry over to a following word.

http://www.users.globalnet.co.uk/~ngo/ast03300.htm

The discussion has led me to believe that it would be helpful for me to add
an additional possibility to a eutocode typography file, using two presently
unallocated codes.  I have not yet finally decided which particular two yet,
nor finalized their definitions, as I am open to any suggestions for
improvement, yet here is the idea.  For the moment I refer to them as U+EBEX
and U+EBEY.

A line in a eutocode typography file could have a line as follows.

sequence1 U+EBEX sequence2 U+EBEY sequence3

The spaces in the above line are for setting out the line clearly here, in
use the spaces would not be there.

Such a line would have the meaning as follows.

Carry out the replacement

sequence2 U+EBEF sequence3

if and only if sequence1 matches the sequence stored in the language choice
string.

The sequence1 sequence is expressed using none or more characters from the
range U+0020 to U+007E and is the decoded result of the latest use of a
sequence of plane 14 language tags.  The idea is that the plane 14 tags
would be used to signal particular languages, represented as in
international standards, though the eutocode typography file will only
define a sequence as such, not compliance with any list of languages.

Would this be sufficient to provide a way to guide a Java program to produce
an output stream of Java char to use to access an ordinary font in order to
render languages of the Indian subcontinent, provided that a eutocode
typography file and a font were supplied?

I recognize that the preparing of the eutocode typography file and the
ordinary font containing the glyphs is a large task and I am not going to
try to do it myself.  However, if I can publish a software format which has
the capability to solve the problem and can draw attention to the need to
prepare the list and to prepare fonts which implement the list in part or in
full together with eutocode typography files which can be used so that the
fonts can be applied in applications, and can also produce a wish for the
list to be a published open resource with a view to helping interoperability
then I feel that that is about as far as I can go in this topic at the
moment.  However, I do feel that acting now may well be beneficial as a well
known infrastructural method will be available for consideration when people
want to produce such displays on interactive television displays.

This is but one of a number of ideas for techniques to use in content
authorship for the DVB-MHP platform.

http://www.users.globalnet.co.uk/~ngo/ast03200.htm

In relation to the font of colour codes downloadable from the following
page.

http://www.users.globalnet.co.uk/~ngo/font7001.htm

I have now produced a test version which includes those colour codes and
also four for point size and 28 others for various aspects of access level
multimedia authoring.  This includes codes for variations of object
replacement character defined within the Private Use Area.  One is OBJECT
REPLACEMENT CHARACTER SYNONYM because trying to place a U+FFFC into some
wordprocessors can cause problems if the wordprocessor also accepts graphics
and uses U+FFFC for that.  The others are OBJECT REPLACEMENT CHARACTER with
left, centre and right alignment.  The rest are mostly to do with producing
a basic programmed learning capability within a plain text file, including
such items as GREEN MARKER and so on so that when a push button is pushed
all input characters are skipped until a marker of the corresponding colour
is reached.  There are also a SKIP UNTIL CONTINUE and a CONTINUE MARKER so
that programmed learning layouts following simple flow charts may be
expressed in a sequential manner within a file.

Thank you for your interest in reading through all of this posting.  I have
recently produced an ornaments font, which I am hoping to write up
for the web, and wonder if you

RE: List of ligatures for languages of the Indian subcontinent.

2003-03-18 Thread Marco Cimarosti

Kenneth Whistler wrote:
 Dream on. The information needed exists in books and other
 reference source in libraries, book shops, and other collections
 across India -- and, for that matter, around the world. It is
 merely a matter of collecting the relevant information and
 distilling it into succinct, yet complete, statements of the
 relevant information needed for proper typographic practice
 for each script, for each style of each script, for each local
 typographic tradition for each style, and so on.

A couple of hints for William and other people interested in this issue:

-   Akira Nakanishi, Writing Systems of the World -- Alphabets,
Syllabaries, Pictograms, Tuttle 1980(1999), ISBN 0804816549.
This is charming little book explores all the scripts used in the
world today, giving for each one of them a table of all the signs (apart
Chinese, of course) and an explanation of how the script works. For each
script, it also reproduces a page from a daily newspaper written in that
scripts. The information is not always 100% accurate, however the book
remains an invaluable introduction to the scripts of the world, and a
perfect complement to the reading of the Unicode Standard.

-   The grammars in the National Integration Series by Balaji
Publications, Madras, India.
Each grammar in this series is a small A5-format book bearing a
title like: Learn language name in 30 Days through English. The grammars
are not very valid by the linguistic point of view (it's unlikely that the
reader will actually learn an Indian language in one month!), but they all
have a very interesting introduction to the script used by each language,
which also normally includes a table of all the combinations of
consonant+vowel, and a table of the essential consonant clusters, and of
half or subjoined consonants. If you compare the grammars of languages
sharing the same script (such as Sanskrit, Hindi, and Marathi, all written
with the Devanagari script), you can verify how the list of required
ligatures varies from a language to another. Notice that also these books
are far from being 100% accurate.

All the above books have low price and are easily found in bookshops in the
UK and elsewhere.

Another good source for making a lists of required glyphs are the existing
non-Unicode fonts for Indic languages. The nicest free collection I have
seen so far is the Akruti GNU TrueType fonts, which contains a set of glyphs
appropriate for most modern usages:

http://www.akruti.com/freedom/

_ Marco

List of ligatures for languages of the Indian subcontinent. (from Re: per-character stories in a database)

2003-03-17 Thread William Overington

 And nobody out there is volunteering to do it.

I would do it gladly, but I do not have any skills at Indian languages.  My
opinion is that the list is important for the future of digital interactive
broadcasting so I am trying to get the list done so that it is ready for use
in displaying distance education texts in interactive broadcasting
situations across the Indian subcontinent using my telesoftware invention.

I was told that I could commission it.  I described what I thought was a
good design brief for the list and asked how much it would cost.  I am still
waiting to find out.

A lot of the information needed to prepare the numbered list is apparently
in files, it is just that it is not available to people.

If the Unicode Consortium really does not wish to include this important
project within its scope, then it will need to be achieved in some other
manner.  I would have thought that whether the Unicode Consortium will take
this project on or not should go to a formal board meeting of the Unicode
Consortium so that there can be no doubt whatsoever of the provenance of any
decision.

William Overington

17 March 2003

Re: List of ligatures for languages of the Indian subcontinent. (from Re: per-character stories in a database)

2003-03-17 Thread John Hudson

A few observations, so that William will understand the scope and some of 
the issues of what he is proposing.

1. For some Indic scripts, including Devanagari, there is no fixed set of 
'ligatures' that would be normative for every typeface, or for every 
language using the script. So even for a single script you would be looking 
at multiple lists, with the same combination of characters likely 
represented in different ways for different languages.

2. The idea of a 'ligature', as it exists in the Latin script, is not 
really found in Indic scripts. This terminology derives from the 
application of particular typecasting and typesetting technologies to Indic 
scripts. So while some aspects of some Indic scripts may, with relative 
accuracy, be spoken of as ligatures in some font formats (e.g. the 'akhand' 
feature of OpenType that forms obligatory 'ligatures'), it is not necessary 
that Indic scripts require mapping of multiple characters to single glyphs. 
This is simply one model for rendering one aspect of Indic scripts. [As a 
parallel, consider Tom Milo's ligature-free approach to Arabic, another 
script widely and erroneously assumed to involve ligatures.]

3. As Rick has already alluded to re. Tibetan, it is far from necessary for 
all the *graphemes* of a script to be represented by individual, ligature 
glyphs. A grapheme may be composed of single glyphs and/or ligatures 
combined with dynamically positioned mark glyphs. Building or even 
cataloguing every possible grapheme -- every combination of base glyph, 
ligature and mark(s) in a script -- is an incredibly inefficient approach 
to Indic rendering.

4. Cataloguing and publishing known consonant conjunct forms for Indic 
scripts is a good idea and a worthwhile goal, which would indeed be a 
valuable resource for font developers. Michael Everson has indicated that 
he has what he considers a comprehensive list for Devanagari, and I 
probably have something close to comprehensive in my own files and books. 
However, William should not delude himself that such a catalogue would 
represent all that is necessary to rendering Indic scripts in the 
technologies that interest him. Once you have the conjuncts catalogued, and 
have identified subsets of conjuncts that are appropriate to the languages 
that you intend to support, you still need to implement shaping and 
positioning for matras relative to every base glyph and every conjunct.

William writes: '...I do not have any skills at Indian languages.' While 
some may find his enthusiasm admirable, it would be a good idea for him to 
develop such skills before he starts writing papers on implementing such 
languages for digital interactive broadcasting or any other technology.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]
Anyone who has both children and house pets has
surely noticed that the children exposed to language
will develop language, in turn, whereas the house
pets will not.   - Stephen Pinker

Re: List of ligatures for languages of the Indian subcontinent.

2003-03-17 Thread Kenneth Whistler

William Overington asked:

  And nobody out there is volunteering to do it.

 I was told that I could commission it.

That statement by Michael Everson was not a *permission*, but
merely a statement of fact. Anyone can commission any expert
they like, under contract to produce whatever output or
specification the purchaser would like. That includes you.
  
 I described what I thought was a
 good design brief for the list and asked how much it would cost.  I am still
 waiting to find out.

Well, the short answer is that it would cost a *lot*. But don't
expect the Unicode discussion list to price out contracts for
you. :-)

 
 A lot of the information needed to prepare the numbered list is apparently
 in files, it is just that it is not available to people.

Dream on. The information needed exists in books and other
reference source in libraries, book shops, and other collections
across India -- and, for that matter, around the world. It is
merely a matter of collecting the relevant information and
distilling it into succinct, yet complete, statements of the
relevant information needed for proper typographic practice
for each script, for each style of each script, for each local
typographic tradition for each style, and so on.

And once you start down that road -- as John Hudson pointed out --
you would quickly find that the problem is not one of
enumerating the list of required ligatures, but is rather
more complicated than that -- and that the term ligature is
not even the pertinent typographic construct of most interest
to Indian rendering.

 If the Unicode Consortium really does not wish to include this important
 project within its scope, 

It does not.

 then it will need to be achieved in some other
 manner.

Just so.

--Ken

Re: Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-14 Thread William Overington

Yesterday, 13 March 2003, I wrote as follows.

quote

So I reasoned that the system might scan through a font when it is loaded
and decide upon the lowest point for the whole font and then proceed on that
basis.

end quote

An email correspondent has kindly written to me privately and I now know
that it is not necessary for an application such as a wordprocessing package
to make a complete survey of all the glyphs in a font as the font is being
loaded, because the information on what are the high and low points for the
font is readily available in predefined locations within the font.

I expect that many readers of this list already know that, yet I feel that I
should post this note in case some readers do not because I would not want
to have set them off on a wrong way of looking at how a system works.

William Overington

14 March 2003

Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-13 Thread William Overington

Thank you both for your responses.

Yes, U+2502 or U+2503 would achieve the desired effect for which I devised
U+E700 STAFF without resorting to the Private Use Area.

The only reason for my not using one of those was that I was unaware of
those codes as such.  An interesting point is that they appear to be usable
with fonts which have descenders yet still fill the entire height of the
font.  I suppose that when I had, some time ago, when looking through what
Unicode offers, in a general context, not looking for the STAFF effect at
that time, seen the box drawing characters I thought of those characters in
the context of the character set of the old PET computer from the 1970s and
of the way that some software on older non-graphics terminals on mainframe
computers makes an attempt at message windows using such characters to
construct boxes.

Indeed, an interesting footnote to U+2502 states = Videotex Mosaic DG14.  I
cannot quite remember what Videotex was.  I remember Videotext (with a t at
the end) and seem to remember that Videotex (no t at the end) was a
different system, possibly from the USA or maybe France.  There was also a
system which started called NAPLPS, which was an acronym for something like
North American something and the word Presentation was in it, though I
forget the exact acronym derivation.

I was unaware of the VDMX table and so had a look at http://www.yahoo.com
and found a couple of useful documents.

However, VDMX appears to refer specifically to OpenType rather than ordinary
TrueType.

My reason for including the STAFF character, the intended effect of which I
can now produce using U+2502 or U+2503, was that, being fairly new to
producing fonts and just, thus far, using the Softy editor to produce
ordinary TrueType fonts, I had noticed, when trying it out in 2002, that if
I produce a font with a b c d e f then the font displays with lines packed
togather, yet that if I then add g the line spacing for all lines increases,
even if there is no g in that line.  So I reasoned that the system might
scan through a font when it is loaded and decide upon the lowest point for
the whole font and then proceed on that basis.  Now, in defining Quest text
I wanted to have the possibility of accents on capital letters and
descenders such as y and g and always look clear, so I decided effectively
to lock some leading into the font and set the maximum height right from the
start.

Features of Quest text are that it is designed so that characters are
produced directly from drawings in the Softy editor, not from template
graphics, and that Quest text is designed, as far as possible, by the
application of a set of rules, such as that verticals are all 256 font units
wide, with both edges at a font unit value which is a multiple of 256 and
that horizontals are all 168 font units in vertical height with one edge at
a font unit value which is a multiple of 256, corners which are curved are
curved with a single Bézier curve which has an action length, as I call it,
of 128 font units in both horizontal and vertical directions.  Some
characters, such as x and k are exceptions to the general rules, yet Quest
text is largely made up of horizontals and verticals, including for letters
such as A O e and s.  The idea is that hopefully Quest text will be very
clear at both 12 point and 18 point and that, as point size increases, it
will display its artistic look.  At 300 point, Quest text looks smooth and
rounded with an elegant combining of wider verticals with narrower
horizontals, almost as if drawn with a pen with a nib 256 font units wide
and 168 font units high.  The rules do produce the effect though that
capitals look lighter than lowercase letters as they are overall wider and
yet use the same width verticals.  I am wondering whether to consider that a
fault or a feature!  :-)

An important part of the development process of Quest text is to display
some text at 12 point in WordPad, make a Print Screen graphic and paste it
into Paint and then study the graphic at 8x magnification.  Hopefully Quest
text combines great clarity with an artistic look.

William Overington

13 March 2003

RE: Ligatures

2003-03-13 Thread Kent Karlsson






  probably didn't come out right. I never meant to say moving the 
  characters apart was the best solution.
  Moving only the 
  offending accent mark rather than the entire (composite) character might help 
  in some cases, but this technique also should be used with care. Like in the 
  case of "Te", if you have a very wide T and a very small e, any accent on the 
  e would endup to the far right of it if you force avoiding collision 
  with the T. So in this case I think you can't help putting the e and the T 
  further apart if the e has an accent than if it doesn't.
Then you have kerned the T and (unaccented) e too close to begin with, 
which is bad (taste)...

  This also depends on 
  the font.There is no universal solution!
I 
may agree with that. But changing the kerning (relative to what is done 
for the base letters)
isWAY down in the list of actions that should be 
taken.

/kent k

Re: Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-13 Thread John Hudson

At 02:21 AM 3/13/2003, William Overington wrote:

My reason for including the STAFF character, the intended effect of which I
can now produce using U+2502 or U+2503, was that, being fairly new to
producing fonts and just, thus far, using the Softy editor to produce
ordinary TrueType fonts, I had noticed, when trying it out in 2002, that if
I produce a font with a b c d e f then the font displays with lines packed
togather, yet that if I then add g the line spacing for all lines increases,
even if there is no g in that line.  So I reasoned that the system might
scan through a font when it is loaded and decide upon the lowest point for
the whole font and then proceed on that basis.
Linespacing in typical Windows apps is controlled by OS/2 table vertical 
metrics WinAscent and WinDescent. My guess, from your description, is that 
Softy automatically prevents clipping by assigning OS/2 table values based 
on the max height of the font bounding box (the height from the lowest 
descent to the heighest ascent). Is there no way to manually set OS/2 
values in Softy? If not, you should get yourself a proper font tool. 
FontLab is best, but Font Creator from High Logic is a pretty good and much 
cheaper option.

I think this is getting off topic for this list.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]
It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467

Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-12 Thread William Overington

John Hudson wrote as follows.

quote

If you don't intend to use the PUA codepoint in text, there really is no
point in having it at all.

end quote

Well, one useful scenario is as follows.  Suppose please that one wishes to
process incoming regular Unicode text, using a eutocode typography file to
influence the process, details of the format on the
http://www.users.globalnet.co.uk/~ngo/ast03300.htm web page, and then use
the output Unicode format text stream as codes to look up glyphs in an
ordinary TrueType font, so as to produce a display which includes using some
ligature glyphs.  Having a code such as U+E70B for fj and codes for other
characters as part of a consistent set which is published has the advantage
that if various software authors use the eutocode typography file format,
and various people spend time encoding specific eutocode typography files,
(such as for 18th Century English printing with long s ligatures, German
Fraktur printing and the ligatures of languages of the Indian subcontinent),
and various people produce ordinary TrueType fonts with ligature glyphs
encoded using consistent lists of published Private Use Area code points for
ligatures, then the existence of the list of Private Use Area code points
may well help in interoperability, so that, for example, having looked at
the result using a font produced by one artist one may have a look at the
result using a font produced by another artist without needing to change
the contents of the particular eutocode typography file being used for the
processing and having then to reprocess the original text using that second
eutocode typography file.

Another use is that preparing some text using WordPad and other programs,
not for interchange but just for, say, producing a local print of a poster,
having a consistent, widely used set of Private Use Area code points for
ligatures would mean that a poster designer could try out a number of fonts
from various artists without needing to reset the text each time using
whatever code points each font designer used for each particular ligature
glyph.

I would mention that my thinking on using Private Use Area codes for
ligatures has gradually moved towards the use of the eutocode typography
file rather than interchanging files using Private Use Area code points for
ligatures, yet I do feel that, for local use such Private Use Area
allocations for ligatures as the golden ligatures collection provides are
potentially useful as they do provide for interoperability of fonts which
contain ligatures which fonts are produced by a variety of artists.  Use of
the golden ligatures collection is entirely optional, yet it can be used to
try to achieve some level of interoperability of fonts.  Indeed, font
designers who produce fonts using advanced font technologies, where the
conversion tables are internal to the font rather than external as with the
eutocode typography file, where the glyphs for ligatures are not accessed
directly may, if they choose, make use of the code point allocations of the
golden ligatures collection so as to allow the glyphs also to be accessed
from other platforms with a hope of some level of interoperability.
Certainly, using the code points of the golden ligatures collection is not
using regular Unicode code point allocations, yet as a self-help facility
amongst end users so that use of fonts containing ligatures is easier, the
golden ligatures collection is perhaps of some practical use.

I accept that the use of Private Use Area encodings does not guarantee
compatibility, yet one can take care to try to make the use of Private Use
Area codes for ligatures and other characters as graceful as possible.

For example, although there is absolutely no requirement at all for me to do
so, and no one has asked me to do so, I decided to make sure that no golden
ligatures code point allocations made in the future will clash with the code
points used for Phaistos Disc Script in the ConScript Registry.

I am happy to point out, in addition, that I do quite like the idea of a
link with traditional letterpress printing where each ligature character was
cast as one piece of metal for the whole ligature and one could actually
pick them up and place them in a composing stick, so the golden ligatures
collection is about art and nostalgia as well as about technology and
practicality of achieving a stylish display using computing equipment.

I have added a new code recently, which is U+E700 STAFF which is a vertical
line from the very top of the glyph and going as far below the 0 line as one
chooses for a particular font.  With Quest text I encoded this character
early with a line going vertically from -768 font units to 2048 font units.
This forces the overall display height of the font before I added either of
lowercase y and g, which in fact go down to -512 font units in Quest text,
so the U+E700 character within the font helps in the display process even
though the character is not usually

Re: Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-12 Thread jameskass

.
William Overington wrote,

 I have added a new code recently, which is U+E700 STAFF which is a vertical
 line from the very top of the glyph and going as far below the 0 line as one
 chooses for a particular font.  With Quest text I encoded this character
 early with a line going vertically from -768 font units to 2048 font units.

Since the full height box drawing glyphs are supposed to join vertically,
wouldn't adding something like U+2502 or U+2503 to a font achieve the
desired effect without resorting to the PUA?

Best regards,

James Kass
.

1 2 3 >

1 - 100 of 281 matches

Mail list logo