subject:"RTL PUA\?"

Re: RTL PUA?

2011-09-03 Thread Christopher Fynn

 What is needed is a way to specify the properties in a
 platform-independent way, where platform means not only OS but also
 font technology.

The font format used by all smart font technologies (OT, AAT,
Graphite) are all based on the TrueType font file format which allows
you to add any number of custom tables. If the people responsible for
the OT, AAT  Graphite specs agreed on it amongst themselves, it might
be possible to specify an embedded table of properties for PUA
characters that all the different rendering engines could read and
make use of.

That might not be completely font-technology independent - but pretty close.

 - C

Re: RTL PUA?

2011-08-25 Thread Philippe Verdy

2011/8/25 Peter Constable peter...@microsoft.com:
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On 
 Behalf Of Philippe Verdy

 But I suspect that the strong opposition given by Peter Constable...

 Yet again, I think you're putting words in my mouth. The only thing I think 
 I've explicitly spoken against in this thread is changing the default bidi 
 category of PUA characters to ON.

Something that will break all existing implementations, but will not
solve the problem, it will just reduce the number of Bidi controls
needed in texts: BC=ON only means means that the resolved direction of
PUA characters will come from the resolved direction of previous
(non-PUA) characters. It does not work at the beginning of paragraphs.
The actual direction properties should be overridable to be another
*strong* RTL direction than the default, instead of changing it to be
extremely weak and contextual.

 In fact when Peter says that the Bidi processing and the OpenType layout
 engine are in separate layers (so that the OpenType layout works in a lower
 layer and all BiDi processing is done before any font details are inspected),
 I think that this is a perfect lie:

 The Unicode Bidi Algorithm uses _character_ properties and operates on 
 _characters_. OpenType Layout tables deal only with glyphs.

You're repeating again what I also know and used in my arguments. I
have never stated that the Bidi algorithm operates at the glyph level,
I have clearly said the opposite. You are only searching a
contradiction which does not even appear.

 At least the Uniscribe layout already has to inspect the content of any 
 OpenType
 font, at least to process its cmap and implement the font fallback 
 mechanism,
 just to see which font will match the characters in the input string to 
 render.

 If it can do that, it can also inspect later a table in the selected font to 
 see which
 PUAs are RTL or LTR. And it can do that as a source of information for BiDi 
 ...

 In theory, that could be done. A huge problem with your suggestion, though, 
 is that the bidi algorithm deals only with characters and makes no references 
 whatsoever to font data, and for that reason -- I would hazard to guess -- 
 most implementations of the Unicode bidi algorithm do not rely in any way on 
 font data and would need significant re-engineering to do so.

You repeat again your argument that I have not contradicted. but this
has nothing to do with what I want to express. And any way a
reengineering will be needed in all the proposed solutions (except if
we have to encode the Bidi controls around those PUAs, something that
we really want to avoid as often as we avoid them for non-PUA
characters).

The Bidi algorithm is not changed in any way, it still uses the
character properties, except that the source of the property values
for PUA should be overridable (not only from the standard UCD, for PUA
characters), as already permitted in the Unicode standard which just
assigns them *default* property values.

If a Bidi algorithm implementation does not allow such overrides, it
is already broken and has to be fixed, because it was insufficiently
engineered. The fact that it cannot process font data at the step
specified in OpenType specifications is a defect of this
specification, which is incomplete. But even if you don't want to add
such data table in fonts, the external data will have to come from
somewere else. Otherwise only the default property values will be
used.

Re: RTL PUA?

2011-08-25 Thread Philippe Verdy

2011/8/25 Peter Constable peter...@microsoft.com:
 From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

2011/8/22 Joó Ádám a...@jooadam.hu:
 Speaking of actual implementation, I’m convinced that this format
 should be the same as it is for encoded characters ...

 As well, the small properties files can be embedded, in a very compact
 form, in the PUA font.

 In one sense having data regarding PUA character properties embedded within a 
 font could make sense since the interpretation of instances of those PUA 
 characters will be tied to particular fonts.

 However, I don't see this as really being workable: rendering implementations 
 will typically do certain types of processes without access to any font data.

Remove the future will in your sentence... you're assuming how future
implementations will work.

And the certain types of process element is extremely fuzzy. Those
that want to use PUA as RTL characters will never be satisfied, they
want an access to some properties data that are not only those from
the UCD.

But you're right in one thing: the font is not expected to contain all
those properties. I am still convinced that this is the best place for
BC property values which are tied to the font, for rendering purpose.
Only the properties for PUA characters that have absolutely no use in
rendering should not be in fonts (for example collation weights, case
mappings, custom character name aliases if one wants).

Some other properties may be needed for rendering purpose: notably
text segmentation data for handling line breaks (many PUA are
currently used for custom sinograms in the Han script, that allows
linebreak to occur before and after each of them; but this behavior
would not be perceived as correct for most scripts.

However, I don't think that line breaking properties data are very
well fitting in fonts, because such segmentation is not needed only
for rendering. However for most of those non-rendering purpose (e.g.
plain-text search), we genenrally don't want to have the search result
depending on soft line breaks. Soft line breaks are only meant for
rendering purpose, and so this breakability may become also under the
control of the font.

On the opposite, hard line breaks are controlled by existing non-PUA
control characters, so they are not a problem and don't need to be
overriden. Those hard line breaks are very often expected to be
searchable, unlike soft line breaks which should remain invisible in
plain-text searches as they are only the result of some rendering
process.

Re: RTL PUA?

2011-08-24 Thread Philippe Verdy

2011/8/24 John Hudson j...@tiro.ca:
 Philippe, I'll need to think about this some more and try to get a better
 grasp of what you're suggesting. But some immediate thoughts come to mind:

 If BiDi is to be applied to shaped glyph strings, surely that means needing
 to step backwards through the processing that arrived at those shaped glyph
 strings in order to correctly identify their relationship to underlying
 character codes, since it is the characters, not the glyphs, that have
 directional properties. There's nothing in an OT font that says e.g. GID 456
 /lam_alif.fina/ is an RTL glyph, so the directionality has to be processed
 at the character level and mapped up through the GSUB features to the
 glyphs.

No backward stepping is needed: process the text using grapheme
cluster boundaries as a minimum unit of processing: apply
normalization, try to cmap all their characters from the same font
(use fallback fonts if needed), then if this fails try to cmap their
individual character components to find a font match.

This done, each character is now mapped to a definitive font and a
putative (incompletely resolved) glyph id in that font. Note that PUAs
will be isolated at this point (they form their own grapheme cluster).
You can then check if the font provides an override for the BC
property, from the default strong LTR value.

Then independantly:
- you can process the list of glyphs one by one, trying to match all
applicable GSUB's only if they occur on the same font as the font
associated with the previous character. You can also easily select the
typographic variants of that font, for a single glyph.
- you can update the current Bidi level of the character, using the BC
property value overrides specified in the font
containing the PUA, or the normative value for non-PUA, otherwise the
default BC property value for PUA.

If finally the remaining glyph id's are no longer substitutable, you
can then apply GPOS rules (or legacy tables for base-to-base kerning)
reliably, because you also know if the BiDi level is even (LTR) or odd
(RTL). You can then consider the glyph metrics to accumulate widths in
order to detect if an automatic line-break can occur.

When a forced or automatic linebreak does occur, you can then adjust
the justification of glyph ids. Because you also know at that point
what is the directionality of all characters (including the first
glyph of the line, and if this line starts a paragraph, from which you
have determined what is the main direction of the baseline).

You can also automatically adjust the widths of kashidas (or even
automatically insert them for microjustification of glyphs, according
to the joining properties of the associated characters).

Then you can reorder the glyph ids that are in runs opposed to the
main direction of the baseline for the paragraph.

Some more refinements are needed for handling some text decorations
(such as underlines which is not necessarily continuous in all styles
and may need to avoid cutting through strokes; but this would require
some metrics from the font, associated to glyphs with descenders).

All the above can be done in parallel (i.e. character per character,
each one being handled glyph id by glyph id, as long as there are
matchable GSUB or GPOS). The memory requirement is limited to as many
glyphs that can fit in the margin of a single line;

Finally the line can be fully drawn with the reordered glyphs (you may
need to clip the kashidas to their autojustified width, to avoid them
to overlap too far away the surrounding joined characters).

RE: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-24 Thread William_J_G Overington

Thank you to Doug and to Asmus for replying.
  
Originally I was thinking of the format simply being so as to help to level the 
infrastructural ground as between a PUA (Private Use Area) application using 
left-to-right characters and a PUA application using right-to-left characters.
 
However, the research needs to proceed in the best direction so as to get the 
best possible result, so I am happy for my original idea to be augmented and 
changed if that is what is needed.
 
Do any people who would like to use PUA applications that use right-to-left 
characters have any views on a format please? Is such a format regarded as 
useful? What does it need to do?
 
What would be the features of a very minimal RTL constructed script that would 
exhibit all of the features for which a researcher might want to use the 
Private Use Area for research with a real-world RTL script please?
 
I am thinking of making a small font with some characters that consist of a 
leftward pointing arrow with a broad tail with the tail having markings to give 
a clue to the sound. These markings would be based on the hatching system used 
for representing colours in monochrome. For example, vertical lines for r 
because that is red or rouge, horizontal lines for b because that is blue or 
bleu. I thought of having an o as an o drawn with a left arrow attached to it. 
I could then produce a glyph for a br ligature and maybe a rb ligature. I am 
thinking that the ligature glyphs could be wider, have only one leftward 
pointing arrow yet have two types of markings on the tail of the arrow, side by 
side.
 
Would that and a space be enough for a constructed script that would exhibit 
the needed properties for a demonstration or would some more glyphs be needed?
 
My thinking is that the font, complete with its PUA.RTL assignment statement, 
could be a benchmark test font for testing a special researcher's edition of 
a wordprocessing application or a desktop publishing application. By using a 
font for a minimal constructed script, the task of producing and testing the 
special researcher's edition of a software application could be separated 
from the complexities of a full real script, perhaps therefore increasing the 
chances of the special researcher's edition of a software package being 
produced.
  
I feel that I could make the font as a TrueType font. In order to produce an 
OpenType font I would need to consolidate what I have started to learn about 
OpenType fonts, though I would be happy for the TrueType font to be adapted by 
other people if they so wish.
 
William Overington
 
24 August 2011

Re: RTL PUA?

2011-08-24 Thread John H. Jenkins


John Hudson 於 2011年8月23日 下午9:08 寫道：

 I think you may be right that quite a lot of existing OTL functionality 
 wouldn't be affected by applying BiDi after glyph shaping: logical order and 
 resolved order are often identical in terms of GSUB input. But it is in the 
 cases where they are not identical that there needs to be a clearly defined 
 and standard way to do things on which font developers can rely. [A parallel 
 is canonical combining class ordering and GPOS mark positioning: there are 
 huge numbers of instances, even for quite complicated combinations of base 
 plus multiple marks, in which it really doesn't matter what order the marks 
 are in for the typeform to display correctly; but there are some instances in 
 which you absolutely need to have a particular mark sequence.]

And this is really the key point.  There really isn't anything inherent to 
OpenType that absolutely *requires* the bidi algorithm be run in character 
space.  It would theoretically be possible to manage things in a fashion so 
that it's run afterwards, à la AAT.  But font designers *must* know which way 
it's being done in practice, and, in practice, all OT engines run the bidi 
algorithm in character space and not in glyph space.  At this point, trying to 
arrange things so that it can be done in glyph space instead is a practical 
impossibility.

=
Hoani H. Tinikini
John H. Jenkins
jenk...@apple.com

Re: RTL PUA?

2011-08-24 Thread Philippe Verdy

2011/8/24 John H. Jenkins jenk...@apple.com:

 John Hudson 於 2011年8月23日 下午9:08 寫道：

 I think you may be right that quite a lot of existing OTL functionality 
 wouldn't be affected by applying BiDi after glyph shaping: logical order and 
 resolved order are often identical in terms of GSUB input. But it is in the 
 cases where they are not identical that there needs to be a clearly defined 
 and standard way to do things on which font developers can rely. [A parallel 
 is canonical combining class ordering and GPOS mark positioning: there are 
 huge numbers of instances, even for quite complicated combinations of base 
 plus multiple marks, in which it really doesn't matter what order the marks 
 are in for the typeform to display correctly; but there are some instances 
 in which you absolutely need to have a particular mark sequence.]

 And this is really the key point.  There really isn't anything inherent to 
 OpenType that absolutely *requires* the bidi algorithm be run in character 
 space.  It would theoretically be possible to manage things in a fashion so 
 that it's run afterwards, à la AAT.  But font designers *must* know which way 
 it's being done in practice, and, in practice, all OT engines run the bidi 
 algorithm in character space and not in glyph space.  At this point, trying 
 to arrange things so that it can be done in glyph space instead is a 
 practical impossibility.

One problem of interpretation: I have never suggested that the Bidi
algorithm would need to run in the glyph space. You can still run it
in the character space.

Reread my suggestions where I clearly and explicitly  spoke about how
boundaries between runs of characters that are in a resolved direction
and runs of glyphs that are in the same resolved direction just has to
be kept.

The only borderline case occuring only if one wants to create some
ligaturing feature (substitution and/or positioning) between glyphs
belonging to distinct successive runs, something that is still for now
unsupported, even though it is visually possible (and may even be
wanted, notably for kerning or microjustification of lines displaying
runs in both directions).

This does not even mean that glyph ids will be reordered for RTL runs
of glyphs or RTL runs of characters. In OpenType, there is clearly the
need in all cases to maintain at least a mapping from positions in the
character streams for each directional run to the positions in the
glyphs stream. But such mapping is evidently not needed for each
character or even for each grapheme cluster, and it does not have to
be bijectively reversible (for example, distinct positions of
directional runs in the characters streams may map to the same
position in the glyphs stream).

And this surjective(*) mapping does not even have to be
monotonic(*) between each character or grapheme cluster, but only
strictly monotonic(*) between non-empty directional runs (otherwise
it would be impossible, in the final drawing step, to compute the
relative positions of runs in the rendered line, because it would be
impossible to sort these non-empty runs along the baseline axis; note
also that empty runs that may occur in the glyphs space can be
skipped, and in fact must be skipped to assert the condition of
strict monotony).

Note (*):  mathematical meaning of these terms.

For example, most Indic scripts exhibit a *non-monotonic* surjective
function that maps the positions of successive grapheme clusters in
the characters stream, to the positions in the glyphs stream (but
given that Indic scripts are only strong LTR, this is not a
limitation: all streams of Indic characters or streams of Indic glyphs
will never include in their middle any boundary between non-empty runs
with opposite resolved directions).

RE: RTL PUA?

2011-08-24 Thread Peter Constable

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

2011/8/22 Joó Ádám a...@jooadam.hu:
 Speaking of actual implementation, I’m convinced that this format 
 should be the same as it is for encoded characters ...

 As well, the small properties files can be embedded, in a very compact 
 form, in the PUA font.

In one sense having data regarding PUA character properties embedded within a 
font could make sense since the interpretation of instances of those PUA 
characters will be tied to particular fonts.

However, I don't see this as really being workable: rendering implementations 
will typically do certain types of processes without access to any font data.




Peter

RE: RTL PUA?

2011-08-24 Thread Peter Constable

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Philippe Verdy

 Lookup tables in fonts (at least OpenType) do not work at the character 
 level, but at the glyph level: they substitute glyph ids by other glyph ids. 

That much is true.


 Sequences of glyph ids are already reordered in visual order by the layout 
 engine when they are searched in OpenType lookups, should they be RTL 
 glyphs, or Indic glyphs with special reordering requirements (independant 
 of the logical ordering of characters/code points).

OpenType lookup tables are agnostic wrt LTR or RTL; sequences of glyphs IDs in 
a lookup are from start to finish. For Indic scripts, some re-orderings are 
assumed to have been applied before lookups are processed. As for bidi, it is 
_not_ the case that a glyph sequence in a lookup table is ordered in LTR visual 
order, as Philippe's statement suggests. Rather, they are ordered from start to 
finish. One might choose to perceive that in LTR/RTL terms; you certainly don't 
have to, though which way you perceive it will have to correlate with whether 
you think of an implementation as actually having done some level reordering 
before OpenType Layout tables are processed--which certainly is not mandatory 
for implementations.


 The only lookup table in fonts that work at the character/code point level is 
 their cmap 

Note that the 'cmap' is not typically referred to as a lookup table since 
there is a distinct set of data structures in OpenType that are formally called 
Lookup tables.


 Not all fonts need a cmap; for some of them, a default cmap may be implied 
 or automatically constructed -- for example Symbol fonts in Windows, that are 
 implicitly mapped in a PUA range; 

Not true. All OpenType fonts require a cmap table. This is true even of 
symbol encoded fonts. Strictly speaking, symbol-encoded fonts are not encoded 
using Unicode, and so are not mapped in a PUA range. It is true, though, that 
they use 16-bit code points and that in many symbol-encoded fonts the code 
point range used does have numerical values that correlate to those of Unicode 
PUA characters in the BMP. But years ago Bob Hallissy and I confirmed that 
symbol-encoded fonts could work with code points in other numerical ranges.


Peter

RE: RTL PUA?

2011-08-24 Thread Peter Constable

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

 2011/8/22 Peter Constable peter...@microsoft.com:


 Of course _OpenType_ cannot, but any rendering engine that uses OpenType 
 _must_ resolve the bidi level of _all_ characters in a sequence that it is 
 given to 
 render. Given our current situation, a default rendering implementation 
 would 
 resolve PUA characters to an even (LTR) level unless, of course, bidi 
 control 
 characters -- particularly RLO -- are used to override the directionality of 
 the 
 character, as you mention.

[snip]

 So now I perceive your opinion :

 - you don't want the solution proposed by Michael Everson (simply adding a 
 range 
 of RTL PUA), that I also think is not necessary, but is clearly a possible 
 solution.

You're putting words in my mouth: I don't think I've expressed any opinion in 
this thread for or against Michael's proposal. All I've commented on is that 
the OpenType spec has a model for glyph mirroring that can work for mirroring 
of PUA characters as well, that bidi category is one of several properties that 
can affect text processing, and that people shouldn’t expect PUA characters to 
behave as they desire in all software titles.


 - you propose to use BiDi overrrides. 

Again, you're putting words in my mouth. All I said is that default behaviour 
can be expected to observe bidi categories unless bidi controls are used to 
override the assigned categories for characters in a run.




Peter

Re: RTL PUA?

2011-08-23 Thread Vinod Kumar

On 22 August 2011 22:40, John Hudson j...@tiro.ca wrote:





 Glyph ID inputs for OTL processing are according to reading/resolved order.
 This is typically the same as logical order, but the term logical order
 really applies to character strings, not glyph strings, which are much more
 maleable. The order of input strings in GSUB lookups or contexts is
 dependent not only on the underlying character order, but also on the
 results of previous GSUB lookups. So while, unlike AAT and Graphite,
 OpenType Layout doesn't explicitly provide for glyph re-ordering, some kinds
 of glyph reordering are possible using sequences of contextual lookups to
 duplicate a glyph in a second location in the string and then remove the
 first instance. We use this in some Devanagari fonts to enable subsequent
 ligation of short ikar variants to the left of a consonant base with reph
 marks to the right of that base.
 र्क्मि

 JH


 Open Font Format  GSUB tables contain glyphs in visual (or reading order)
and not in logical order. The logical order of characters is transformed to
a visual order of characters before the font layer. Thus for Devanagari, the
I matra is moved left and the reph is moved right. It is this visual
syllable of characters that is transformed to a visual syllable of glyphIds
using the cmap table in the Open font.
Consider Devanagari RKMI. The logical order of characters is Ra Halant Ka
Halant Ma IMatra. This logical order is transformed to the visual order of
characters IMatra Ka Halant Ma Ra Halant, then to  the visual order of
GIds IMatraGId KaGId HalantGId MaGId RaGId HalantGId. GSub tables will
transform KaGId HalantGId - HalfKaGId.
RaGId HalantGId will be transformed to rephGId. So finally, the visual
order of GIds will be IMatraGId  HalfKaGId MaGId rephGId. This can be
further beautified but is already in a readable form. It will appear as
र्क्मि
So no reordering is needed in the Open font for general case.

For special cases, (example RDMI ,  logical Ra Halant Da Halant Ma
IMatra), reordering in the font is needed. Here,  DaGId HalantGId MaGId
does not reduce to HalfDAGId MaGid in most Hindi fonts, but remains as
DaGId HalantGId MaGId .  The visual order of Gids for the syllable is
  IMatraGId DaGId HalantGId MaGId RaGId HalantGId.
It is converted to IMatraGId DaGId HalantGId MaGId rephGId by the font.
Now the font has to reorder the IMatrGId and the rephGId over the retained
HalantGId. The correct glyph sequence would be
DaGId rephGId HalantGId  IMatraGId MaGId 
र्द,मि

(Using Gmail I have created RDMI. The halant is shown by the comma. How does
one create Halant?)

Comparing RKMI and RDMI we can see that font level reordering of
glyphIds is needed only for RKMI.


vinod kumar
-- 
पृथिवी सस्यशालिनी
the earth be green

Re: RTL PUA?

2011-08-23 Thread Richard Wordingham

On Mon, 22 Aug 2011 20:58:23 +0200
Philippe Verdy verd...@wanadoo.fr wrote:

 The computing order of features should not then be:
  - BiDi algorithm for reordering grapheme clusters
(I trust you mean the ordering of clusters relative to one another, not
the ordering within clusters.)
  - font search and font fallback (using cmap)
  - GSUB (lookups of ligatures or discretionary glyph variants)
  - GPOS
 but really:
  - font lookup and font fallback (using cmap)
  - GSUB (lookups of ligatures or discretionary glyph variants)
  - BiDi algorithm for reordering glyphs representing the grapheme
 clusters or ligatured grapheme clusters
  - GPOS

You've forgotten the conversion from encoding order to mechanical
typing order.  That is done before GSUB, but needs some assistance from
GSUB for multipart characters (typically circumposed vowels).  

 The BiDi algorithm absolutely does not have to be changed.

But you have to remember that preposed combining marks (and
fragments) must inherit the BiDi class of the base letter.  I'm glad
you know what a circumposed Indic vowel looks like when subject to a
right-to-left override.

Richard.

Re: RTL PUA?

2011-08-23 Thread Philippe Verdy

2011/8/23 Richard Wordingham richard.wording...@ntlworld.com:
 The BiDi algorithm absolutely does not have to be changed.

 But you have to remember that preposed combining marks (and
 fragments) must inherit the BiDi class of the base letter.  I'm glad
 you know what a circumposed Indic vowel looks like when subject to a
 right-to-left override.

Yes I know the case of preposed Indic vowels, e.g. vowel I in
Devanagari, as well of vowels that are splitted in two parts (one
before the consonnant cluster and one after it). However, this applies
here to an LTR script, for which we don't have BiDi issues. PUA Indic
characters can safely be represented using existing PUA characters
without needing any directionality property override from their
default strong LTR value.

The case of RTL PUA will be in fact much more rare now than other PUAs
(except if someone creates a RTL conscript). Typically, it will be
used for rare or special characters that are not encoded (or won't be
encoded, such as specific glyph variants of letters in an existing RTL
script, including Arabic, for which a text author wants a PUA to
maintain a distinction that he cannot manage by other means just in
the encoded text, or because the character has not demonstrated for
now a sufficient proof of usage, due to extremely rare usage, found
for example in a single old book or manuscript, or to characters that
were invented specifically by some author, for esoteric reasons).

It could also apply to the need for encoding things that we don't
consider as characters (for example if someone wants to encode some
custom decorating swash to Arabic text, that have no logical or
phonetical reading and no other semantic by itself). Note that in this
case, the PUA will act as a custom variation sequence (variation
sequences must be assigned if we want to use something else than a
PUA) or as a custom diacritic...

Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-23 Thread William_J_G Overington

On Monday 22 August 2011, William_J_G Overington wjgo_10...@btinternet.com 
wrote:
 
 Would a third option work?
  
 In the Description section of the Macintosh Roman section of a TrueType font, 
 include a line of text in a plain text format of which the following line of 
 text is an example.
  
 PUA.RTL=$E000-$E1FF,$E440-$E447,$E541,$E549,$E57C,$EA00-$EA0F,$EC07;
  
 One could specify precisely which Private Use Area characters were to become 
 RTL when using that particular font.
  
 One would need rendering software that looked for such a string of text in 
 the font file, yet, as far as I am aware, no approval from any committee in 
 order to put this solution into practical use.
 
Thinking further on this, I am putting forward the following suggestion for 
discussion, in the hope that it might be of use.
 
Suppose that a  a special researcher's edition of a wordprocessing 
application or a desktop publishing application at start up looks in a 
specified directory for a file with the following file name.
 
pua_major.txt
 
If pua_major.txt exists, then it is opened and it is searched for a PUA.RTL 
assignment statement. If a PUA.RTL assignment statement is not found in the 
file, it is taken as if the following had been included in the file.
 
PUA.RTL=;
 
If pua_major.txt is found, then that is an end of the searching process and no 
search for PUA.RTL would take place in a font file.
 
If pua_major.txt is not found, then the application looks in a specified 
directory for a file with the following file name.
 
pua_minor.txt
 
If pua_minor.txt exists, then it is opened and it is searched for a PUA.RTL 
assignment statement. If a PUA.RTL assignment statement is not found in the 
file, it is taken as if the following had been included in the file.
 
PUA.RTL=;
 
Also, if the file is not found, the PUA.RTL assignment statement is taken as 
the following.
 
PUA.RTL=;
 
However, the value of PUA.RTL thus determined would be kept in reserve and only 
used if there were no PUA.RTL assignment statement in the font that is being 
used.
 
This method would allow the choice of where to specify right-to-left 
directionality for some Private Use characters to be made either as being in a 
font file or in a text file, with the choice of whether the text file is an 
override or a backup of any such information within a font.
 
Would such a format solve the needs of those who want to use right-to-left 
Private Use characters? If not, could people say what other features are needed 
please in the hope that a suitable system can be specified by consensus within 
this thread?
 
William Overington
 
23 August 2011

RE: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-23 Thread Doug Ewell

William_J_G Overington wjgo underscore 10009 at btinternet dot com
wrote:

 Suppose that a  a special researcher's edition of a wordprocessing 
 application or a desktop publishing application at start up looks in a 
 specified directory for a file with the following file name.

 pua_major.txt

 If pua_major.txt exists, then it is opened and it is searched for a PUA.RTL 
 assignment statement. If a PUA.RTL assignment statement is not found in the 
 file, it is taken as if the following had been included in the file.

 PUA.RTL=;
 ...

Of all applications, a word processor or DTP application would want to
know more about the properties of characters than just whether they are
RTL.  Line breaking, word breaking, and case mapping come to mind.

I would think the format used by standard UCD files, or the XML
equivalent, would be preferable to making one up:

E100;ENGSVANYALI LETTER P;Lo;0;R;N;
E101;ENGSVANYALI LETTER B;Lo;0;R;N;
E102;ENGSVANYALI LETTER M;Lo;0;R;N;
...

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-23 Thread John Hudson


Philippe Verdy verd...@wanadoo.fr wrote:


The computing order of features should not then be:
 - BiDi algorithm for reordering grapheme clusters
 - font search and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - GPOS



but really:



 - font lookup and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - BiDi algorithm for reordering glyphs representing the grapheme
clusters or ligatured grapheme clusters
 - GPOS


I can see the advantages of such an approach -- performing GSUB prior to 
BiDi would enable cross-directional contextual substitutions, which are 
currently impossible -- but the existing model in which BiDi is applied 
to characters *not glyphs* isn't likely to change. Switching from 
processing GSUB lookups in logical order rather than reading order would 
break too many things.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-23 Thread Asmus Freytag


On 8/23/2011 7:22 AM, Doug Ewell wrote:

Of all applications, a word processor or DTP application would want to
know more about the properties of characters than just whether they are
RTL.  Line breaking, word breaking, and case mapping come to mind.

I would think the format used by standard UCD files, or the XML
equivalent, would be preferable to making one up:





The right answer would follow the XML format of the UCD.

That's the only format that allows all necessary information contained 
in one file, and it would leverage of any effort that users of the main 
UCD have made in parsing the XML format.


An XML format shold also be flexible in that you can add/remove not just 
characters, but properties as needed.


The worst thing do do, other than designing something from scratch, 
would be to replicate the UnicodeData.txt layout with its random, but 
fixed collection of properties and insanely many semi-colons. None of 
the existing UCD txt files carries all the needed data in a single file.


A./

RE: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-23 Thread Doug Ewell

Asmus Freytag asmusf at ix dot netcom dot com wrote:

 The right answer would follow the XML format of the UCD.

Question: Since the ucdxml formats became available, has any consensus
emerged as to whether the flat or grouped formats are preferred? 
Obviously they both contain the same data, but one is much smaller and
the other might be more convenient in some ways.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-23 Thread John Hudson


Behdad Esfahbod wrote:


I can see the advantages of such an approach -- performing GSUB prior to BiDi
would enable cross-directional contextual substitutions, which are currently
impossible -- but the existing model in which BiDi is applied to characters
*not glyphs* isn't likely to change. Switching from processing GSUB lookups in
logical order rather than reading order would break too many things.



You can't get cross-directional-run GSUB either way because  by definition
GSUB in an RTL run runs RTL, and GSUB in an LTR run runs LTR.  If you do it
before Bidi, you get, eg, kerning between two glyphs which end up being
reordered far apart from eachother.  You really want GSUB to be applied on the
visual glyph string, but which direction it runs is a different issue.


Kerning is GPOS, not GSUB.

But generally I agree. My point was that Philippe's suggestion, although 
it could be the basis of an alternative form of layout that might have 
some benefits if fully worked out, is a radical departure from how 
OpenType works.


J.


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-23 Thread John H. Jenkins


John Hudson 於 2011年8月23日 下午2:33 寫道：

 Behdad Esfahbod wrote:
 
 I can see the advantages of such an approach -- performing GSUB prior to 
 BiDi
 would enable cross-directional contextual substitutions, which are currently
 impossible -- but the existing model in which BiDi is applied to characters
 *not glyphs* isn't likely to change. Switching from processing GSUB lookups 
 in
 logical order rather than reading order would break too many things.
 
 You can't get cross-directional-run GSUB either way because  by definition
 GSUB in an RTL run runs RTL, and GSUB in an LTR run runs LTR.  If you do it
 before Bidi, you get, eg, kerning between two glyphs which end up being
 reordered far apart from eachother.  You really want GSUB to be applied on 
 the
 visual glyph string, but which direction it runs is a different issue.
 
 Kerning is GPOS, not GSUB.
 
 But generally I agree. My point was that Philippe's suggestion, although it 
 could be the basis of an alternative form of layout that might have some 
 benefits if fully worked out, is a radical departure from how OpenType works.
 

I'll toss in my obligatory, That's how AAT does it reference.  It has 
advantages and disadvantages—but, as you say, OT would have to be heavily 
redesigned to do it.  

=
John H. Jenkins
井作恆
Жбь А. ЖЩэпЮьц
jenk...@apple.com

Re: RTL PUA?

2011-08-23 Thread Philippe Verdy

2011/8/23 John Hudson j...@tiro.ca:
 Behdad Esfahbod wrote:

 I can see the advantages of such an approach -- performing GSUB prior to
 BiDi
 would enable cross-directional contextual substitutions, which are
 currently
 impossible -- but the existing model in which BiDi is applied to
 characters
 *not glyphs* isn't likely to change. Switching from processing GSUB
 lookups in
 logical order rather than reading order would break too many things.

 You can't get cross-directional-run GSUB either way because  by definition
 GSUB in an RTL run runs RTL, and GSUB in an LTR run runs LTR.  If you do
 it
 before Bidi, you get, eg, kerning between two glyphs which end up being
 reordered far apart from eachother.  You really want GSUB to be applied on
 the
 visual glyph string, but which direction it runs is a different issue.

 Kerning is GPOS, not GSUB.

 But generally I agree. My point was that Philippe's suggestion, although it
 could be the basis of an alternative form of layout that might have some
 benefits if fully worked out, is a radical departure from how OpenType
 works.

Rereading closely the OpenType spec, in fact I don't see any major
problem if even the Bidi algorithm is applied last, even after
applying not only the GSUB's (ligaturing, custom Indic reordering of
multipart vowels or ra forms), but also the GPOS (yes, this is for
kerning, i.e. base-to-base, but also for mark-to-base and mark-to-mark
positioning).

I admit that this wouldviolate some existing rules implied in some
implementations, but at least it would offer some more intererests.
However, if one really wants to implment kerning between LTR runs and
RTL runs (e.g. between an Arabic letter and a Latin letter), one would
need to make sure that Bidi reordering has been performed before GPOS
(and this is really the case...).

Processing such kerning pairs would require another convention than
the resolved direction. It would require that such kerning pairs are
scanned only so that the first item of the pair will always be the
left-most. GPOS is in fact more powerful than that because it can also
involve more than simple pairs, using contexts longer on both the
right and the left of tested glyphs.

But the existence of such complex positioning rules would create
difficulties for the actual readers of the rendered text, because he
will not know from which side he must start to read a word that
displays for example a run of Latin letters on one side, and a run of
Arabic letters on the other side. Let's say that he starts by reading
the Arabic part, in normal order, how to read the LAtin part of this
strange «word».

It's is still not a stupid case: such positioning problems occur at
the boundaries of words, where there are whitespaces. Once you have
resolved the direction of those whitespaces, there's then a boundary
with the next word which may use another direction. What happens on
those whitespaces is that you may find typographic elements (such as
swashes) which should not overflow on the next part.

Currently it is assuled that writers will use a larger whitespace
character if needed, to avoid collisions. But if the whitespace is
very narrow, or is zero-width, the problem resurrects immediately of
kerning, in its traditional typographic definition, which is to
improve the legibility of the rendered text, to exhibit a visually
constant spacing between words and between letters, so that
inter-letter separation will not be confused with interword
separation.

I admit that this (extremely rare) problem is much less critical with
the Arabic script (because it is always cursive and most letters in
the same word are joined), but this means that the probem may be more
significant between Latin and Hebrew, or more probably between Greek
and Hebrew (in very old historic texts, where even the Greek script
did not have a strong LTR directionality, and where whitespace was not
always used between words).

Re: RTL PUA?

2011-08-23 Thread John Hudson


Philippe Verdy wrote:


Rereading closely the OpenType spec...


I suggest you read also the script-specific OT layout specifications.

http://www.microsoft.com/typography/SpecificationsOverview.mspx

You'll note, for example, that the Arabic font spec doesn't even mention 
BiDi, because it is assumed that this has been resolved before glyph 
runs for OTL processing are even identified. This makes sense to me 
because BiDi is a character-centric operation.


The Microsoft font specs describe what Uniscribe (and DWrite) do with 
text and fonts for particular scripts, and there may be some differences 
in other implementations. For example, Uniscribe performs s invalid mark 
sequence checks that others, preferring to see this as a task for 
spellcheckers, do not. But the glyph selection and positioning results 
should be the same across implementations. Font makers need to know how 
text is processed and OTL features applied in order to make fonts that 
work with resulting glyph runs and input strings. Changing the point in 
the glyph string resolution when BiDi is applied breaks everything. It's 
a complete non-starter.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-23 Thread Philippe Verdy

2011/8/24 John Hudson j...@tiro.ca:
 Philippe Verdy wrote:

 Rereading closely the OpenType spec...

 I suggest you read also the script-specific OT layout specifications.

 http://www.microsoft.com/typography/SpecificationsOverview.mspx

 You'll note, for example, that the Arabic font spec doesn't even mention
 BiDi, because it is assumed that this has been resolved before glyph runs
 for OTL processing are even identified. This makes sense to me because BiDi
 is a character-centric operation.

 The Microsoft font specs describe what Uniscribe (and DWrite) do with text
 and fonts for particular scripts, and there may be some differences in other
 implementations. For example, Uniscribe performs s invalid mark sequence
 checks that others, preferring to see this as a task for spellcheckers, do
 not. But the glyph selection and positioning results should be the same
 across implementations. Font makers need to know how text is processed and
 OTL features applied in order to make fonts that work with resulting glyph
 runs and input strings. Changing the point in the glyph string resolution
 when BiDi is applied breaks everything. It's a complete non-starter.

I had already read this subspecs. And I think you're wrong, because
the list of glyphs is in resolved order, even after all ligature
substitution, glyph breaking (for Indic scripts) has a completely
independant order from the logical reading of characters.

You can perfectly run the BiDi algorithm after the glyph
substitutions. All what the Bidi algorithm is to delimit runs of
characters that are to be rendered in one direction or the other. The
same limits will also be boundaries across the associated runs of
glyph ids.

There's in fact absolutely no need of the Bidi algorithm to process
all glyph substitutions, because they will be performed exactly the
same way. The two algorithms are in fact completely independant of
each other, at least if you don't need to apply substitutions that
span distinct runs.

However there's a dependancy between the BiDi algorithm and the glyph
positioning, because each RTL or LTR run needs to have its own
left-side bearing, and its own right side bearing, in order to
mutually space these runs correctly. IT also influences the direction
by which you'll advance the coordinates along the baseline for
positioning the fully resolved glyph ids. This requires then to know
the principal direction of each run of glyph ids.

In fact you have absolutely not demonstrated anything that this
concept would even break anything, except ligatures between RTL and
LTR characters, i.e. between resolved RTL and LTR glyphs, something
that can only occur over the a boundary between a resolved RTL run of
glyph ids, and a resolved LTR run run of glyphs ids. But I was said
that OpenType layout does not support such thing, or that this
possible behavior is for now undocumented in OpenType specs, but this
is not the case of AAT layout and Graphite layout, but I admit that
this would cause problems on how to position such ligature glyphs that
would have an ambiguous direction, because it would then belong to two
successive directional runs at the character level).

As the above paragraph may not be very clear to understand, let's
suppose that you wanted to create a GSUB ligature between ARABIC LAM
(resolved to RTL at the character level) and LATIN CAPITAL LETTER A
(resolved to LTR at the character level, in the Bidi algorithm). You
would cmap this ligature to a LAM_A glyph id. Technically, nothing
in OpenType GSUB's forbids you do to that in your font. But the
OpenType engine that needs to maintain an equivalence of boundaries
between runs of characters (from Bidi) and runs of glyph ids (from the
cmap, then after GSUB substitutions) will not know if the LAM_A glyph
belongs to the first run (terminated by the RTL character LAM) or the
second run (starting by the LTR character A) without providing *with
each* GSUB rule an indication of where to place the new direction
boundary if there was a direction boundary in the middle of the source
list of glyphs, before its substitution.

Yes this is a very borderline case, because I have never seen it or
needed it in practice. Unicode prefers reencoding a new similar
character with the opposite strong direction (for example the HEBREW
ALEF SYMBOL for maths, which is very similar to the Hebrew letter but
has a opposite direction ; but here I wonder how it would create a
ligature with another strong LTR character that is also not a
diacritic, even if there's an evidence that such pair can be
GPOS'itionned, i.e. kerned).

What is only assumed is that GSUB will preserve the boundaries between
runs of characters that are in the same direction; but of course it
does not always preserve the boundaries between the logical character
clusters. This may explain your concern that this could potentially
break something, but only if you don't care about preserving
unambiguously the boundaries between directional runs, and

Re: RTL PUA?

2011-08-23 Thread John Hudson

Philippe, I'll need to think about this some more and try to get a 
better grasp of what you're suggesting. But some immediate thoughts come 
to mind:


If BiDi is to be applied to shaped glyph strings, surely that means 
needing to step backwards through the processing that arrived at those 
shaped glyph strings in order to correctly identify their relationship 
to underlying character codes, since it is the characters, not the 
glyphs, that have directional properties. There's nothing in an OT font 
that says e.g. GID 456 /lam_alif.fina/ is an RTL glyph, so the 
directionality has to be processed at the character level and mapped up 
through the GSUB features to the glyphs.


I think you may be right that quite a lot of existing OTL functionality 
wouldn't be affected by applying BiDi after glyph shaping: logical order 
and resolved order are often identical in terms of GSUB input. But it is 
in the cases where they are not identical that there needs to be a 
clearly defined and standard way to do things on which font developers 
can rely. [A parallel is canonical combining class ordering and GPOS 
mark positioning: there are huge numbers of instances, even for quite 
complicated combinations of base plus multiple marks, in which it really 
doesn't matter what order the marks are in for the typeform to display 
correctly; but there are some instances in which you absolutely need to 
have a particular mark sequence.]


I've lost track of what the putative benefit of processing BiDi post 
glyph shaping is. I think I missed part of your earlier exchange with 
Behdad.



JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-22 Thread Asmus Freytag


On 8/21/2011 7:34 PM, Doug Ewell wrote:

So what you are asking about is a directional control character that would 
assign subsequent characters a BC of 'AL', right?

You don't want to call this a LANGUAGE MARK or anything else that implies language 
identification, because of the existence of real language identification 
mechanisms and the history of Unicode and language tagging.


An ARM (Arabic RTL Mark) would be a sensible addition to the standard. 
It would close a small gap in design that currently prevents a fully 
faithful plain text export of bidi text from rich text (higher level 
protocol) formats.


In a HLP you can assign any run to behave as if it was following a 
character with bidi property AL.


When you export this text as plain text, unless there is an actual AL 
character, you cannot get the same behavior (other than by the 
heavy-handed method of completely overriding the directionality, making 
your plain text less editable).


So, yes, there's a bit of a use case for such a mark.

(It's effect is limited to treatment of numeric expressions, so it's not 
an Arabic language mark, but one that triggers the same bidi context 
as the presence of an Arabic Script (AL) character.)


A./


--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by ATT

-Original Message-
From: Richard Wordinghamrichard.wording...@ntlworld.com
Sender: unicode-bou...@unicode.org
Date: Mon, 22 Aug 2011 03:19:39
To: Unicode Mailing Listunicode@unicode.org
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 23:55:46 +
Doug Ewelld...@ewellic.org  wrote:


What's a LANGUAGE MARK?

There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect).  'AL' and 'R' have different effects on certain characters
next to digits - it's the mind-numbing part of the BiDi algorithm.
With one a $ sign after a string of European (or is it Arabic?) digits
appears on the left and in the other it appears on the right.  I
can't remember whether 'higher-level protocols' have an effect on this
logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
has a BC of AL. That's why I tentatively raised the notion of ARABIC
LANGUAGE MARK.  Incidentally, an RLO gives characters with a
temporary BC of R, not AL.

Richard.

RE: RTL PUA?

2011-08-22 Thread Jonathan Rosenne

I don't buy the assumption that all the world is either AAT, Graphite or 
Uniscribe.

Anyhow, this discussion is going off topic, the issue is should Unicode specify 
an RTL PUA area, not whether some products, however respectable, provide a 
bypass.

Jony

 -Original Message-
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
 Behalf Of Shriramana Sharma
 Sent: Monday, August 22, 2011 8:12 AM
 To: unicode@unicode.org
 Subject: Re: RTL PUA?
 
 On 08/22/2011 08:24 AM, Peter Constable wrote:
  I'm not saying that there shouldn't be_some_  software that can do
  what you expect. But there will likely be some different views on
  what ought to be included within that some.
 
 Peter, given that both AAT and Graphite have provisions for assigning
 custom properties including BC to PUA characters, it seems Uniscribe is
 the only one missing out. Those advocating RTL PUA areas seem to reject
 AAT and Graphite as hacks or wow *one* application [*].
 
 [* = LibreOffice is the *only* multipurpose application running on
 /Windows/ to support Graphite and I'm not counting SIL WorldPad. On *nix
 platforms, *any* number of applications that use HB-NG for rendering
 will be able to handle Graphite in the near future because HB-Graphite
 integration is already done. That is to say, once GTK and Qt fully
 switch to HB-NG.]
 
 Anyhow, if you Microsoft guys added support in Uniscribe for ascribing
 custom properties including BC to PUA characters (or have you already
 done it) it would be what would satisfy these PUA RTL users and convince
 them that no RTL PUA zones are needed, it seems.
 
 The suggestion has been made that fonts should be able to carry some
 additional custom tables specifying custom properties for PUA
 characters, which seems reasonable. I'm not sure if the OT GDEF table or
 the AAT PROP table completely satisfies this requirement. People
 interesting in using custom properties for the PUA (which includes me
 for Indic script) should then sit up and formulate the syntax for such
 tables.
 
 If Uniscribe, AAT, and Harfbuzz then provided generic support for
 parsing such tables and rendering PUA characters accordingly, it would
 be an all-around solution both for RTL PUA as well as Indic PUA, I
 suppose. (But I'm not sure how such a custom table would interact with
 the innate ability of Graphite to handle custom properties. It should
 probably be either the new proposed custom table or Graphite.)
 
 [sigh]
 
 --
 Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Michael Everson

On 22 Aug 2011, at 03:57, Peter Constable wrote:

 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On 
 Behalf Of Asmus Freytag
 
 Treating PUA characters as ON is very problematic
 
 As would be changing the default property of PUA characters from L to ON.

Which is why that will not be proposed.

Michael Everson * http://www.evertype.com/

Re: RTL PUA?

2011-08-22 Thread Michael Everson

On 22 Aug 2011, at 05:53, Shriramana Sharma wrote:

 While I don't know much about RTL scripts, if the logic order is ALEF + 
 LAMED, but the presentation order is LAMED + ALEF *because of the RTL nature* 
 do you write the rule as ALEF + LAMED = ALEF_LAMED_LIGATURE or LAMED + ALEF = 
 ALEF_LAMED_LIGATURE ?

The specific shape of that ligature is not a result of the directionality 
property.

Michael Everson * http://www.evertype.com/

Re: RTL PUA?

2011-08-22 Thread Petr Tomasek

On Mon, Aug 22, 2011 at 10:42:05AM +0530, Shriramana Sharma wrote:
 On 08/22/2011 08:24 AM, Peter Constable wrote:
 I'm not saying that there shouldn't be_some_  software that can do
 what you expect. But there will likely be some different views on
 what ought to be included within that some.
 
 Peter, given that both AAT and Graphite have provisions for assigning 
 custom properties including BC to PUA characters, it seems Uniscribe is 
 the only one missing out. Those advocating RTL PUA areas seem to reject 
 AAT and Graphite as hacks or wow *one* application [*].

I personally would say to make some blocks in Plane 16 default to R, some
AL and some ON. For fonts based on rendering engines that
don't allow fonts to change characters properties this would be
crutial, for those engines that are capable of changing the properties
it would present no problem (the font can change this properties arbitrary
even if it defaults to RTL...).

 [* = LibreOffice is the *only* multipurpose application running on 
 /Windows/ to support Graphite and I'm not counting SIL WorldPad. On *nix 
 platforms, *any* number of applications that use HB-NG for rendering 
 will be able to handle Graphite in the near future because HB-Graphite 
 integration is already done. That is to say, once GTK and Qt fully 
 switch to HB-NG.]

That said, the HarfBuzz-ng itself (i.e. it's own engine) tries to imitate
the Uniscribe. Most probably, Graphite fonts will still be an exception
on these systems...

 [sigh]
 
 -- 
 Shriramana Sharma
 

-- 
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 04:34 PM, Behdad Esfahbod wrote:

On 08/22/11 06:53, Shriramana Sharma wrote:



  While I don't know much about RTL scripts, if the logic order is ALEF + 
LAMED,
  but the presentation order is LAMED + ALEF*because of the RTL nature*  do you
  write the rule as ALEF + LAMED = ALEF_LAMED_LIGATURE or LAMED + ALEF =
  ALEF_LAMED_LIGATURE ?

Depends on your specific shaping engine logic.  OpenType assumes native
direction per script.  So if you have Arabic text between LRO/PDF, you have to
reverse the order then apply OpenType shaping.  Other engines may decide to
handle these differently.  But the general statement is true: ligatures are
visual artifacts and hence only form in one direction, not the other (except
if it's, say, the ff ligature).


Hi Behdad. I only asked whether the OT *tables* would contain the 
entries in the logical order or the visual order. Clearly it would still 
be the visual order (but Philippe Verdy seemed to imagine/suggest 
otherwise).


It is clear that in the *script itself* the ligature would form in the 
direction of writing.


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 05:26 PM, Behdad Esfahbod wrote:

OpenType tables contain entries in the logical order of the script in
question.  Ie. Arabic tables are always RTL.


Yes I understand, but still, to clarify:

The font tables themselves contain only ASCII characters I presume. In 
it do you write:


ALEF + LAMED = ALEF_LAMED_LIGATURE

or

LAMED + ALEF = ALEF_LAMED_LIGATURE ?

IIUC, in logical order ALEF precedes LAMED, and in visual order, ALEF 
stands to the right of LAMED.


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 12:21 PM, Jonathan Rosenne wrote:

I don't buy the assumption that all the world is either AAT, Graphite
or Uniscribe.


Nobody asserted that either. It is only pointed out that major 
implementations are able to provide what you seek.



Anyhow, this discussion is going off topic, the issue is should
Unicode specify an RTL PUA area, not whether some products, however
respectable, provide a bypass.


I don't see why you call it a *bypass*. Only if the road in front of you 
presents obstacles and does not allow you to proceed further, you need 
to take a bypass. If we are considering the Standard as the road which 
we need to take, the road doesn't present any obstacle to using PUA 
characters as RTL, so Graphite etc are not providing a *bypass* but in 
fact just being good generous implementations that allow custom 
properties for the PUA as the Standard allows.


The request being made to allocate BC=R areas in the PUA is sure to 
generate an impression that conformant implementations should consider 
such a property normative, which then would violate the definition of 
the PUA that conformant implementations need not treat any property of 
the PUA as normative.


Returning to your concerns, it is being asserted that since 
implementations are *already* able to provide for custom properties for 
the PUA, there is *no* need for Unicode to specify an RTL PUA area and 
furthermore as such a specification would violate the definition of the 
PUA, it should also *not* be done. One both *need* not do it and 
*should* not do it.


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Joó Ádám

 Um... Computers are hardware, and don't understand a thing. What I think you 
 mean is computer _software_. (I know, I'm being pedantic, but with good 
 reason.)

Sorry, I just can’t resist pointing out that difference between
hardware and software is only the fact that the former is material,
with all the consequences that follows. In any other way they are
completely interchangeable.

As for the other part of your mail, Peter, sorry, but it really
doesn’t make any sense to me. As John has pointed out, you can adjust
the properties of private use characters on Apple computers. Perhaps
there is a way to do so on Windows, Unix and other systems as well.
What Philippe and Doug are proposing, and I also strongly agree with,
is to have a standard way of interchange of these properties. I don’t
think it is neccessary to go into the advantages of standards.

Speaking of actual implementation, I’m convinced that this format
should be the same as it is for encoded characters (whether it is the
plain text format of the Unicode Character Database, XML or anything
else). Rendering engines should – maybe they already do so – accept
multiple files containing character properties, which could make
upgrades to the newer versions of the standard a matter of downloading
the new standard set, and provide a way of overriding private use (or
even standard if one is so inclined) characters’ properties.
Introduction of unencoded scripts would therefore become a matter of
distributing a small properties file and the corresponding fonts.


Á

Re: RTL PUA?

2011-08-22 Thread Mark E. Shoulson


On 08/22/2011 08:26 AM, Shriramana Sharma wrote:

On 08/22/2011 05:26 PM, Behdad Esfahbod wrote:

OpenType tables contain entries in the logical order of the script in
question.  Ie. Arabic tables are always RTL.


Yes I understand, but still, to clarify:

The font tables themselves contain only ASCII characters I presume. In 
it do you write:


ALEF + LAMED = ALEF_LAMED_LIGATURE

or

LAMED + ALEF = ALEF_LAMED_LIGATURE ?

IIUC, in logical order ALEF precedes LAMED, and in visual order, ALEF 
stands to the right of LAMED.


In the ligature tables, it's recorded as ALEF + LAMED = 
ALEF_LAMED_LIGATURE.  The font tables are concerned with what happens 
when this character follows that one, not what happens when this 
character stands on the right of that one.  So it's stored in logical 
order.


~mark

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Peter Constable peter...@microsoft.com:
 From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

 As I explained in an earlier message, the layout engine doesn't use
 the default property value but the resolved bidi level.

 Once again, you refuse to understand my arguments.

 I don't think I'm refusing to understand anything. I'm merely taking your 
 assertions _as stated_ and evaluating whether I think they are accurate or 
 not. Perhaps what you intend to convey assumes things not clear in what 
 you've stated, since you think I'm not understanding you.


 What I'm saying is that OpenType CANNOT resolve the bidi level of
 PUAs (with the exception where we use additional BiDi controls,

 Of course _OpenType_ cannot, but any rendering engine that uses OpenType 
 _must_ resolve the bidi level of _all_ characters in a sequence that it is 
 given to render. Given our current situation, a default rendering 
 implementation would resolve PUA characters to an even (LTR) level unless, of 
 course, bidi control characters -- particularly RLO -- are used to override 
 the directionality of the character, as you mention.

 which remains a hack, because it adds unnecessary unvisible markup
 around the encoded texts, and complexifies the use of strings and
 substrings).

 We'll, depending on how you define hack, some might reasonably suggest that 
 any usage of PUA is a hack. (Of course, some who may not use the term in 
 the same way might argue that it is certainly not a hack.)

 You can turn the problem as you want, but PUAs (as well as unknown
 characters) still have default properties that, in fine, will get used in 
 absence of a more precise definition (i.e. an explicit override) of the 
 actual BiDi property needed for the character.

So now I perceive your opinion :

- you don't want the solution proposed by Michael Everson (simply
adding a range of RTL PUA), that I also think is not necessary, but is
clearly a possible solution.

- you propose to use BiDi overrrides. I also think (like Michael
Everson) that this is an unpractical hack (Michael Everson that has to
work and discuss with old scripts, or many new unencoded characters to
add to existing scripts (notably Arabic) trying to encode them,
finding various ways to represent them, and *test* his solutions, will
certainly think that embedding each occurence of a PUA substring in
BiDi controls, including in the middle of Arabic words, is certainly a
very bad hack.

- He must certainly think (I also think it too), that PUA characters
are NOT hacks. They are architectural to the well-being of the UCS,
essential in various situations to preserve the software conformance
with the standard. In fact, for old and rare scripts, using PUAs will
remain essential for long, because those scripts will need more and
more time now to get encoded, requiring more extensive researches,
more collaborations with less technical-aware people that cannot
understand why they'll have to test the proposed solutions using test
fonts and test input methods tht require them to enter BiDi controls
around all those PUA characters.

The only problem here is the strong LTR property of all existing PUAs,
as if they were only needed for rare Han sinograms, or for symbols.

Note that, for using a PUA for rare letters found in Arabic, it is
impossible to embed the whole Arabic text in Bidi overrides: this
would completely break the normal behavior of the non-PUA characters
found in the text, notably sequences of Arabic digits, because the
BiDi controls are effectively disabling the BiDi algorithm so that it
will return a single RTL run for all the text in these controls. IF
BiDi controls are used, they have to be inserted ONLY between
subranges containing the PUAs, and only those.

The solution proposed by Michael (a new block of RTL PUAs, probably in
plane 14) still has an advantage: no BiDi controls are needed at all.
The BiDi algorithm does not have to be disabled. All other aspects of
RTL scripts (or mixed RTL/LTR scripts) are preserved (including
mirroring behaviors for auto-LTR characters (at the begining of
paragraphs) and characters whose directionality depends on the
resolved direction of the precening text.

I don't think this is necessary though: I see no reason why
implementations *have to* keep the strong LTR property of existing
PUAs. This strong LTR property is only the consequence of the fact
that this is only the *default* value of those PUAs, and applications
should not be restricted from changing this property as they want,
especially for PUAs.

But to change this property value, we need an explicit PUA agreement
about their usage, in such a way that it can be understood by a
computer. This means an external source of character properties. My
opinion is that this need is most often sufficient if it solves just
the problem of correct display order. Given that the encoded texts
(using those existing strong LTR PUAs that we want to adopt a RTL

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Peter Constable peter...@microsoft.com:
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On 
 Behalf Of Asmus Freytag

 Treating PUA characters as ON is very problematic

 As would be changing the default property of PUA characters from L to ON.

I also agree with that. This is a bad option that would break
compatibility (the solution advocatd by Michael Everson seems better,
in that perspective, because it does not change any existing property
given to existing assigned PUAs).

Anyway when I spoke about a computer note that I did not use the
definite article. It is evidently implied that there's also a need for
software changes as well (so this does not mean *all* computers, but
this could reach someday *most* computers with their installed or
upgraded softwares). Your last remark in another message of this
thread was really pedantic.

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Shriramana Sharma samj...@gmail.com:
 On 08/22/2011 12:01 AM, Peter Constable wrote:

 If you mean a rule to substitute [g1 g2] with [g3] won't apply if the
 sequence processed by the OpenType Layout lookup processor is [g2
 g1],

 Peter, actually I suspect Philippe is thinking that in the case of RTL, the
 *glyphs* are placed in reverse order and then he is asking how can the
 ligation take place.

No, I've not said anything about ligation. But yes the problem is
related to the expected reverse order of glyphs, for some PUAs, but
not necessarily all of them (not the LTR runs of PUAs, after Bidi
resolution). Ligation is a completely orthogonal problem (not really a
problem because it is already solved).

RE: RTL PUA?

2011-08-22 Thread Murray Sargent

It's actually quite easy to convince Uniscribe to treat specific characters as 
RTL, others as LTR, and, in general, with whatever classifications you desire. 
Pass a preprocessed string to Uniscribe's ScriptItemize(). RichEdit has used 
that approach to some degree starting with RichEdit 3.0 (Windows/Office 2000). 
It's also a handy way to force all operators to be treated as LTR in an LTR 
math zone and as RTL in an RTL math zone (aside from numeric contexts for '.' 
and ','). And you can force IRIs to display LTR or RTL that way by classifying 
the delimiters such as the dots in the domain name accordingly. Some of my blog 
posts on http://blogs.msdn.com/b/murrays/ discuss this in greater detail.

So there's no need to change the properties of the PUA to establish PUA RTL 
conventions. They won't be generally interchangeable, but that's the nature of 
the PUA. You also have to implement such choices using rich/structured text. 
Plain text doesn't have a place to store the necessary properties. Most text is 
rich text anyway grin.

Murray

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Mark E. Shoulson m...@kli.org:
 I'm not certain I understand the question, but if I have it right... The
 logic order is ALEF + LAMED, and the presentation... places those in a
 right-to-left sequence, shall we say (since talking about the presentation
 *order* is confusing here).  The font table contains the lookup that ALEF +
 LAMED = ALEF_LAMED_LIGATURE.  It all goes according to the logical order,
 since the presentation order isn't really an order, it's just a direction.
  (this is different from things like devanagari short-i vowel, which moves
 with respect to the other letters in the script.)

Lookup tables in fonts (at least OpenType) do not work at the
character level, but at the glyph level: they substitute glyph ids by
other glyph ids. Sequences of glyph ids are already reordered in
visual order by the layout engine when they are searched in OpenType
lookups, should they be RTL glyphs, or Indic glyphs with special
reordering requirements (independant of the logical ordering of
characters/code points).

In addition, the same sequence of characters may be sometimes searched
in several distinct sequences of glypg ids (this depends on the kind
of OpenType table being consulted, as well as on character properties
which also determine which lookup table will be searched and the
relative order of successive lookups).

The only lookup table in fonts that work at the character/code point
level is their cmap (which maps a default glyph id from each encoded
character, independantly of their logical or visual ordering, as well
as independantly of the script/language in which those characters or
glyphs are used, but possibly depending on the encoding used and the
software platform supporting that encoding).

Not all fonts need a cmap; for some of them, a default cmap may be
implied or automatically constructed -- for example Symbol fonts in
Windows, that are implicitly mapped in a PUA range; another example is
Type1 or CFF fonts that have a default standardEncoding inherited
from PostScript, based on glyph names (rather than glyph ids or code
points) that may have themselves an implicit mapping to UCS codepoints
(if these names are those defined in the AGL). Not all these mappings
are 1-to-1, which means that they are not reversible, in the general
case.

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Shriramana Sharma samj...@gmail.com:
 Hi Behdad. I only asked whether the OT *tables* would contain the entries in
 the logical order or the visual order. Clearly it would still be the visual
 order (but Philippe Verdy seemed to imagine/suggest otherwise).

No ! I've not imagined that. You incorrectly reinterpret
imaginatively another incorrect imaginative reinterpretation, made by
someone else, of what I wrote, which did not even suggest that.

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Shriramana Sharma samj...@gmail.com:
 On 08/22/2011 05:26 PM, Behdad Esfahbod wrote:

 OpenType tables contain entries in the logical order of the script in
 question.  Ie. Arabic tables are always RTL.

 Yes I understand, but still, to clarify:

 The font tables themselves contain only ASCII characters I  presume.

No. The lookup tables contain sequences of numeric glyph ids (16 bit
integers in TrueType and OpenType). Which are also not the code point
values, and not the character names or glyph names.

 you write:

 ALEF + LAMED = ALEF_LAMED_LIGATURE

 or

 LAMED + ALEF = ALEF_LAMED_LIGATURE ?

Let's say that;
- the LAMED character is cmap'ped (by its code point value in an cmap
for Unicode, or by its code position in a cmap for another legacy
8-bit encoding) to the glyph id 1012,
- and the ALEF character is cmapped to the glyph id 1001 (the values
of glyph ids are not important, not even their relative order or
differences, they don't need to obey any standard),
- and the ALEF-LAMED ligature is in glyph id 1540 (the ALEF-LAMED
character of the UCS may also be cmapped separately, but this is not a
requirement)

Then the lookup to perform the ligature will contain : (1012, 1001) - (1540).

Glyph id's are presented and scanned in the lookup table, in sequences
preordered in visual order by the text layout/shaping engine.

However, given that the ALEF-LAMED is also a character of the UCS, the
text layout/shaping engine that knows the Arabic script can also
perform a character-based substitution itself, even in absence of the
lookup of glyph ids in fonts; then it can render the ligature
character according to the glyph id to which it is cmapped in that
font.

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Joó Ádám a...@jooadam.hu:
 Um... Computers are hardware, and don't understand a thing. What I think you 
 mean is computer _software_. (I know, I'm being pedantic, but with good 
 reason.)

 Sorry, I just can’t resist pointing out that difference between
 hardware and software is only the fact that the former is material,
 with all the consequences that follows. In any other way they are
 completely interchangeable.

Same opinion for me.

 As for the other part of your mail, Peter, sorry, but it really
 doesn’t make any sense to me. As John has pointed out, you can adjust
 the properties of private use characters on Apple computers. Perhaps
 there is a way to do so on Windows, Unix and other systems as well.
 What Philippe and Doug are proposing, and I also strongly agree with,
 is to have a standard way of interchange of these properties. I don’t
 think it is neccessary to go into the advantages of standards.

 Speaking of actual implementation, I’m convinced that this format
 should be the same as it is for encoded characters (whether it is the
 plain text format of the Unicode Character Database, XML or anything
 else). Rendering engines should – maybe they already do so – accept
 multiple files containing character properties, which could make
 upgrades to the newer versions of the standard a matter of downloading
 the new standard set, and provide a way of overriding private use (or
 even standard if one is so inclined) characters’ properties.
 Introduction of unencoded scripts would therefore become a matter of
 distributing a small properties file and the corresponding fonts.

As well, the small properties files can be embedded, in a very compact
form, in the PUA font.

This small table can be limited to just listing the ranges of PUA code
points that are strong RTL instead of LTR. Most often, there will be
only one range, and this just requires a couple of integers in that
embedded table (possibly more, only if you want to represent more
properties), without requiring a complex XML parser or a complex
parser for the tabulated ASCII format used in the UCD, which is
overkill for just the few properties that are needed for correct
display.

So the duplication in each font is not a real problem (note that there
won't be a lot of fonts, most often there will be only one that
matches the PUA agreement and that is suitable to render the
UCS-encoded PUA text).

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 As well, the small properties files can be embedded, in a very compact
 form, in the PUA font.

As soon as you embed all the information in the font, you require
different solutions for systems that use different font technologies.
I was thinking of something more portable.

 This small table can be limited to just listing the ranges of PUA code
 points that are strong RTL instead of LTR. Most often, there will be
 only one range, and this just requires a couple of integers in that
 embedded table (possibly more, only if you want to represent more
 properties), without requiring a complex XML parser or a complex
 parser for the tabulated ASCII format used in the UCD, which is
 overkill for just the few properties that are needed for correct
 display.

I generally assume there is more to character handling than display.

 So the duplication in each font is not a real problem (note that there
 won't be a lot of fonts, most often there will be only one that
 matches the PUA agreement and that is suitable to render the
 UCS-encoded PUA text).

Depending on how you count, there are already two to four fonts that
support Ewellic in the PUA.  There are probably many more that support
Tengwar or Cirth or Klingon.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/20/2011 10:54 AM, Shriramana Sharma wrote:

On 08/19/2011 10:05 PM, Mark Davis ☕ wrote:

All of the property assignments to PUA characters (except the GC) are
purely informative.


I just now noticed that you had excepted the GC in the above. Why is
that? How are applications supposed to handle combining marks etc if in
the PUA?


Mark, can you please reply to the above --

It seems that while it is true that GC=Co should be retained *in the 
standard* to clearly identify the character as a PUA character, the 
applications will still by changing that GC to Lo, Mc, Mn, No etc for 
their internal private-agreement processing. So what is the exact nature 
of your excepting the GC in your statement above?


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 05:20 PM, Shriramana Sharma wrote:


Hi Behdad. I only asked whether the OT *tables* would contain the
entries in the logical order or the visual order. Clearly it would still
be the visual order


My mistake: I should have said *logical* order.


(but Philippe Verdy seemed to imagine/suggest
otherwise).


This one is correct w.r.t. what I had *intended* to say above: i.e. 
Philippe thinks the entries contain the glyphs in *visual* order.


See other mail replying to Philippe pointing this out.

--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 09:00 PM, Philippe Verdy wrote:


The font tables themselves contain only ASCII characters I  presume.


No. The lookup tables contain sequences of numeric glyph ids (16 bit
integers in TrueType and OpenType). Which are also not the code point
values, and not the character names or glyph names.


And numeric glyph IDs are still ASCII aren't they? I was just noting 
that the glyph tables themselves don't *use* the actual codepoints of 
the characters getting ligated (while they *refer* to them).



Let's say that;
- the LAMED character is cmap'ped (by its code point value in an cmap
for Unicode, or by its code position in a cmap for another legacy
8-bit encoding) to the glyph id 1012,
- and the ALEF character is cmapped to the glyph id 1001 (the values
of glyph ids are not important, not even their relative order or
differences, they don't need to obey any standard),
- and the ALEF-LAMED ligature is in glyph id 1540 (the ALEF-LAMED
character of the UCS may also be cmapped separately, but this is not a
requirement)

Then the lookup to perform the ligature will contain : (1012, 1001) -  (1540).


No! See Behdad's post -- it is clearly said that the lookup will still 
be in logical order (1001, 1012) - (1540) and not in visual order as 
you say. See? This is what I meant in the other mail by you suggesting 
that the tables containing the characters in visual order and not in 
logical order, to which you replied (without much real explanation I'm 
afraid):


quoteNo ! I've not imagined that. You incorrectly reinterpret
imaginatively another incorrect imaginative reinterpretation, made by
someone else, of what I wrote, which did not even suggest that./quote


Glyph id's are presented and scanned in the lookup table, in sequences
preordered in visual order by the text layout/shaping engine.


Nope -- they are placed in the lookup table in *logical* order. IIUC the 
entire sequence of glyphs is only reordered from RTL at the very end. 
Peter or Behdad, can you corroborate this?


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 09:31 PM, Doug Ewell wrote:

Philippe Verdyverdy underscore p at wanadoo dot fr  wrote:


As well, the small properties files can be embedded, in a very compact
form, in the PUA font.


As soon as you embed all the information in the font, you require
different solutions for systems that use different font technologies.


Why? In the end all the systems base upon the character properties 
specified by the standard. For the PUA characters in question, what is 
needed for a table of properties to override the default ones. The 
systems would then handle those new properties in the same way that they 
would handle the regular ones. Granted, if the renderers hardcode the 
properties (as most OT ones do) then some parsing is required to import 
all the override data provided by the extra font table into a struct or 
such -- after which (I presume) it would be possible (to a large 
extent?) to treat it the same as an encoded script. [Actually, this 
seems quite difficult to implement in OT, where the philosophy is to 
explicitly hardcode the properties, but Graphite and AAT should be fine 
I guess.]



I generally assume there is more to character handling than display.


True -- so if someone wanted a PUA script to be handled properly in 
sorting etc one would have to prepare collation tables which would 
obviously go *outside* the font.


--
Shriramana Sharma

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

Shriramana Sharma samjnaa at gmail dot com wrote:

 As soon as you embed all the information in the font, you require
 different solutions for systems that use different font technologies.
 
 Why? In the end all the systems base upon the character properties 
 specified by the standard. For the PUA characters in question, what is 
 needed for a table of properties to override the default ones. The 
 systems would then handle those new properties in the same way that they 
 would handle the regular ones.

Right, so if you embed that table in an OT font, the information is not
available to a system that uses a font technology other than OT.

What is needed is a way to specify the properties in a
platform-independent way, where platform means not only OS but also
font technology.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-22 Thread Petr Tomasek

On Mon, Aug 22, 2011 at 07:51:22AM -0700, Doug Ewell wrote:
 Some PUA properties, like glyph shapes and maybe directionality, can be
 stored in a font.  Others, like numeric values and casing, might not or
 cannot.  An interchangeable format needs to be agreed upon for the

Why not?

P.T.

-- 
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 10:12 PM, Doug Ewell wrote:

Right, so if you embed that table in an OT font, the information is not
available to a system that uses a font technology other than OT.


I don't understand why you would say so -- assuming we are all talking 
about TrueType fonts, AAT just uses some tables, OT others and Graphite 
still others. They are all just tables appended to the TrueType font 
data. Any software that is able to read TT font data can also read the 
tables. So what's the problem?


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread John Hudson


Shriramana Sharma wrote:

The font tables themselves contain only ASCII characters I presume. 


OpenType Layout tables use Glyph IDs. OTL development tools typically 
use glyph names, which may be particular to the tool or the same names 
used in the post or CFF tables.


OTL tables work on glyphs, not characters, and bidi will have been 
resolved prior to application of OTL substitution and positioning. Input 
glyph strings for substitution lookups are always in the resolved 
direction of the glyph run, so Arabic and Hebrew alphabetic runs are 
processed right-to-left, i.e.


alef lamed - alef_lamed

*not*

lamed alef - alef_lamed

Similarly, context stings for glyph positioning (if present) will be 
right-to-left, although anchor attachment positions on individual glyphs 
are relative to the 0,0 coordinate, i.e. the left sidebearing.


JH



--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

Petr Tomasek tomasek at etf dot cuni dot cz wrote:

 Some PUA properties, like glyph shapes and maybe directionality, can
 be stored in a font.  Others, like numeric values and casing, might
 not or cannot.  An interchangeable format needs to be agreed upon for
 
 Why not?

Where does one store numeric values in a font?  Maybe this should be
taken off-list.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-22 Thread John Hudson


Shriramana Sharma wrote:

I was just noting 
that the glyph tables themselves don't *use* the actual codepoints of 
the characters getting ligated (while they *refer* to them).


Characters are mapped to glyph IDs in the font cmap tables.

Glyph IDs are mapped to other glyph IDs (one-to-one, one-to-many, 
many-to-one, or one-to-one-of-many) in the layout GSUB table.


No! See Behdad's post -- it is clearly said that the lookup will still 
be in logical order (1001, 1012) - (1540) and not in visual order as 
you say.


I think there may be some confusion in this discussion over what 
constitutes 'visual order'. I try to avoid the term because it is 
difficult for right-to-left readers to accustom themselves to thinking 
of visual order as anything other than right-to-left. I prefer the term 
'reading order' or 'resolved order', i.e. resolved bidi and script 
shaping order, which may have involved integrated reordering (reordering 
within the glyph processing) as in the case of Indic scripts.


Nope -- they are placed in the lookup table in *logical* order. IIUC the 
entire sequence of glyphs is only reordered from RTL at the very end. 
Peter or Behdad, can you corroborate this?


Glyph ID inputs for OTL processing are according to reading/resolved 
order. This is typically the same as logical order, but the term logical 
order really applies to character strings, not glyph strings, which are 
much more maleable. The order of input strings in GSUB lookups or 
contexts is dependent not only on the underlying character order, but 
also on the results of previous GSUB lookups. So while, unlike AAT and 
Graphite, OpenType Layout doesn't explicitly provide for glyph 
re-ordering, some kinds of glyph reordering are possible using sequences 
of contextual lookups to duplicate a glyph in a second location in the 
string and then remove the first instance. We use this in some 
Devanagari fonts to enable subsequent ligation of short ikar variants to 
the left of a consonant base with reph marks to the right of that base.


JH



--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-22 Thread William_J_G Overington

On Monday 22 August 2011, Philippe Verdy verd...@wanadoo.fr wrote:
 
 So there are only two options:
 
[snipped]
 
 ... : this requires an approval either by the UTC  WG2 (solution 1) or by 
 the OpenType working group (solution 2).
 
Would a third option work?
 
In the Description section of the Macintosh Roman section of a TrueType font, 
include a line of text in a plain text format of which the following line of 
text is an example.
 
PUA.RTL=$E000-$E1FF,$E440-$E447,$E541,$E549,$E57C,$EA00-$EA0F,$EC07;
 
One could specify precisely which Private Use Area characters were to become 
RTL when using that particular font.
 
One would need rendering software that looked for such a string of text in the 
font file, yet, as far as I am aware, no approval from any committee in order 
to put this solution into practical use.
 
William Overington
 
22 August 2011

Re: RTL PUA?

2011-08-22 Thread John H. Jenkins


Doug Ewell 於 2011年8月22日 上午10:59 寫道：

 Petr Tomasek tomasek at etf dot cuni dot cz wrote:
 
 Some PUA properties, like glyph shapes and maybe directionality, can
 be stored in a font.  Others, like numeric values and casing, might
 not or cannot.  An interchangeable format needs to be agreed upon for
 
 Why not?
 
 Where does one store numeric values in a font?  Maybe this should be
 taken off-list.
 


This is actually a relevant point.  The major TrueType variants all work 
primarily with glyphs, not characters.  Using them as a place to store 
information about the *characters* in the text is therefore not a reliable way 
to provide an override for default system behavior.  By the time the rendering 
engine consults the fonts for layout specifics, large chunks of the text 
processing will already be completed.  

OpenType, for example, expects that the bidi algorithm is largely run in 
character space, not glyph space, and therefore without regard for the specific 
font involved.  (AAT does almost everything in glyph space, including bidi.  
I'm not sure about Graphite.)  

The net result is that a font is an unreliable way of storing 
character-specific information useful on multiple platforms.  This is one 
reason why embedding the existing directionality controls within the text 
itself is currently the most reliable way of getting the behavior one might 
want in a platform-agnostic way.

=
Siôn ap-Rhisiart
John H. Jenkins
jenk...@apple.com

Re: RTL PUA?

2011-08-22 Thread Joó Ádám

 True -- so if someone wanted a PUA script to be handled properly in sorting
 etc one would have to prepare collation tables which would obviously go
 *outside* the font.

If a proper definition of an unencoded script needs additional
properties which cannot be stored in the font anyway, why would you
want to store part of it in OT tables? It’s just not the right place.
Fonts’ sole purpose is to display already defined characters, not to
define them. Tails shouldn’t be made wagging dogs.

Á

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/22/2011 10:55 PM, Joó Ádám wrote:

If a proper definition of an unencoded script needs additional
properties which cannot be stored in the font anyway, why would you
want to store part of it in OT tables? It’s just not the right place.
Fonts’ sole purpose is to display already defined characters, not to
define them. Tails shouldn’t be made wagging dogs.


True, but we are only trying to help those who find themselves unable to 
even *display* PUA characters as RTL (or as Indic with reordering, which 
can be handled by IndicMatraCategory). Since collation never cares about 
whether the script is LTR or RTL or Indic (with the except of Thai etc 
where the encoding is as per visual order and not logical order) the 
collation data can be outside the font, since it is not needed for display.


--
Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread John H. Jenkins


William_J_G Overington 於 2011年8月22日 上午10:49 寫道：

 In the Description section of the Macintosh Roman section of a TrueType font, 
 include a line of text in a plain text format of which the following line of 
 text is an example.
 
 PUA.RTL=$E000-$E1FF,$E440-$E447,$E541,$E549,$E57C,$EA00-$EA0F,$EC07;
 

Forgive my asking, but this reference to the description section of the 
Macintosh Roman section of a TrueType font has me puzzled, because I don't 
know what you're talking about.  What table contains this string?

=
井作恆
John H. Jenkins
jenk...@apple.com

Re: RTL PUA?

2011-08-22 Thread William_J_G Overington

On Monday 22 August 2011, John H. Jenkins jenk...@apple.com wrote:
 
 Forgive my asking, but this reference to the description section of the 
 Macintosh Roman section of a TrueType font has me puzzled, because I don't 
 know what you're talking about.  What table contains this string?
 
When I use FontCreator, made by High-Logic, http://www.high-logic.com is the 
webspace: with a font file open, I can select Format from the menu bar and then 
select Naming... from the drop down menu.
 
That leads to a dialogue panel.
 
From that dialogue panel one may select, for an ordinary, basic Unicode font, 
either of two platforms, namely Macintosh Roman and Microsoft Unicode BMP only.
 
Having selected a platform, one may view the text content of various fields for 
that platform, such as font family name and copyright notice, version string 
and postscript name. There is then a button that is labelled Advanced... that, 
if clicked, opens another dialogue panel with various other text fields, 
including Font Designer and Description, which are the two that I often use.
 
Now, when the text values in the fields are stored in the font file, the values 
for the Macintosh Roman platform are stored in plain text and the values for 
the Microsoft Unicode BMP only platform are stored in some encoded format.
 
So, if one opens a TrueType font file in WordPad and one searches for an item 
of plain text that is in one of the fields of the font, then the text that is 
in the Macintosh platform can be found, yet the text that is in the Microsoft 
Unicode BMP only platform cannot be found.
 
So, I thought that if a manufacturer of a wordprocessing application or a 
desktop publishing application decided to make a special researcher's edition 
of the software, then that software could, when a font is selected, first scan 
the font for a PUA.RTL string and, if one is found, override the left-to-right 
nature of the identified characters to be a right-to-left nature, just while 
that font is selected.
 
Whether such a software package ever becomes available is something that only 
time will tell, yet it seems to me that it is a method that could be used 
without needing any changes by any committee.
 
William Overington
  
22 August 2011

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Doug Ewell d...@ewellic.org:
 Depending on how you count, there are already two to four fonts that
 support Ewellic in the PUA.  There are probably many more that support
 Tengwar or Cirth or Klingon.

First, these fonts can work fine with the default LTR directionality.
So there's no need for additional data for them. Second, even if they
were RTL, the needed info for each of these fonts, embedded in them
would be extremely small, reduced to just specifying the range of RTL
characters they need to contain.

So I don't see that as a problem. Those fonts do exist and are used
exactly because there was no problem for rendering them with texts
encoded in logical order (the same as the visual order). It's still
strange that we can have several fonts for esoteric fonts that have
been used effectively by very few people, when there are centuries of
traditions, and many interested users (but spread in very small
communities worldwide) that cannot use computer technologies to render
their favorite scripts, or that want to teach them, or make books and
other publications to expose them, as an important humane cultural
heritage, even if this was only to translate them or transcribe them
in a more modern script.

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Shriramana Sharma samj...@gmail.com:
 On 08/22/2011 09:00 PM, Philippe Verdy wrote:

 The font tables themselves contain only ASCII characters I  presume.

 No. The lookup tables contain sequences of numeric glyph ids (16 bit
 integers in TrueType and OpenType). Which are also not the code point
 values, and not the character names or glyph names.

 And numeric glyph IDs are still ASCII aren't they? I was just noting that
 the glyph tables themselves don't *use* the actual codepoints of the
 characters getting ligated (while they *refer* to them).

 Let's say that;
 - the LAMED character is cmap'ped (by its code point value in an cmap
 for Unicode, or by its code position in a cmap for another legacy
 8-bit encoding) to the glyph id 1012,
 - and the ALEF character is cmapped to the glyph id 1001 (the values
 of glyph ids are not important, not even their relative order or
 differences, they don't need to obey any standard),
 - and the ALEF-LAMED ligature is in glyph id 1540 (the ALEF-LAMED
 character of the UCS may also be cmapped separately, but this is not a
 requirement)

 Then the lookup to perform the ligature will contain : (1012, 1001) -
  (1540).

 No! See Behdad's post -- it is clearly said that the lookup will still be in
 logical order (1001, 1012) - (1540) and not in visual order as you say.
 See? This is what I meant in the other mail by you suggesting that the
 tables containing the characters in visual order and not in logical order,
 to which you replied (without much real explanation I'm afraid):

 quoteNo ! I've not imagined that. You incorrectly reinterpret
 imaginatively another incorrect imaginative reinterpretation, made by
 someone else, of what I wrote, which did not even suggest that./quote

 Glyph id's are presented and scanned in the lookup table, in sequences
 preordered in visual order by the text layout/shaping engine.

 Nope -- they are placed in the lookup table in *logical* order. IIUC the
 entire sequence of glyphs is only reordered from RTL at the very end. Peter
 or Behdad, can you corroborate this?

Hmmm... this is not very clear then in the OpenType specification.
Well it does not matter the which order is physically used in the
stored table as long as it is consistant.

But this confirms that the OpenType rendering algorithm, the way it is
presented in the OpenType specification, is completely wrong: the Bidi
algorithm is definitely not the first step needed before performing
glyph substitutions.

However the Bidi algorithm really needs to reorder the glyphs at least
relatively, for correct application of GPOS (glyph positionining). As
a consequence, the font to use will be completely known (all
cmap'pings will have been applied already, and no glyph substitution
can accur across distinct fonts that have independant glyph ids). As
such the PUA agreement implied by the PUA font would have been
asserted. Nothing forbids then to use the font as THE reliable source
of information about which PUAs are RTL and which ones are LTR.

The computing order of features should not then be:
 - BiDi algorithm for reordering grapheme clusters
 - font search and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - GPOS
but really:
 - font lookup and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - BiDi algorithm for reordering glyphs representing the grapheme
clusters or ligatured grapheme clusters
 - GPOS

The BiDi algorithm absolutely does not have to be changed. This time
there's absolutely no PUA with unknown directionality if the font
defines the RTL property for these PUA (using the normative LTR only
as a default when the font does not specify it)

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 Shriramana Sharma samj...@gmail.com:
 True -- so if someone wanted a PUA script to be handled properly in sorting
 etc one would have to prepare collation tables which would obviously go
 *outside* the font.

Collation tables can aleady be tailored very easily with existing
technologies. And anyway this has nothing to do with directionality of
characters, or their rendering, on which they absolutely do not
depend.

Tailored collations already have a working standard and syntax in the
CLDR project or ICU and in a few other libraries (notably in CPAN for
Perl).

Re: RTL PUA?

2011-08-22 Thread John H. Jenkins


William_J_G Overington 於 2011年8月22日 下午12:36 寫道：

 On Monday 22 August 2011, John H. Jenkins jenk...@apple.com wrote:
 
 Forgive my asking, but this reference to the description section of the 
 Macintosh Roman section of a TrueType font has me puzzled, because I don't 
 know what you're talking about.  What table contains this string?
 
 When I use FontCreator, made by High-Logic, http://www.high-logic.com is the 
 webspace: with a font file open, I can select Format from the menu bar and 
 then select Naming... from the drop down menu.
 
 That leads to a dialogue panel.
 
 From that dialogue panel one may select, for an ordinary, basic Unicode font, 
 either of two platforms, namely Macintosh Roman and Microsoft Unicode BMP 
 only.
 
 Having selected a platform, one may view the text content of various fields 
 for that platform, such as font family name and copyright notice, version 
 string and postscript name. There is then a button that is labelled 
 Advanced... that, if clicked, opens another dialogue panel with various other 
 text fields, including Font Designer and Description, which are the two that 
 I often use.
 
 Now, when the text values in the fields are stored in the font file, the 
 values for the Macintosh Roman platform are stored in plain text and the 
 values for the Microsoft Unicode BMP only platform are stored in some encoded 
 format.
 
 So, if one opens a TrueType font file in WordPad and one searches for an item 
 of plain text that is in one of the fields of the font, then the text that is 
 in the Macintosh platform can be found, yet the text that is in the Microsoft 
 Unicode BMP only platform cannot be found.
 
 So, I thought that if a manufacturer of a wordprocessing application or a 
 desktop publishing application decided to make a special researcher's 
 edition of the software, then that software could, when a font is selected, 
 first scan the font for a PUA.RTL string and, if one is found, override the 
 left-to-right nature of the identified characters to be a right-to-left 
 nature, just while that font is selected.
 
 Whether such a software package ever becomes available is something that only 
 time will tell, yet it seems to me that it is a method that could be used 
 without needing any changes by any committee.
 

Ah.  You're referring to an entry in the 'name' table, then.  The intention of 
the 'name' table is to provide localizable strings for the UI.  Using it to 
store data of any sort for the rendering engine would be very, very 
inappropriate.  

In general, one should not be using a text editor to examine the contents of a 
TrueType font. It would be like using a text editor to examine the contents of 
an application.  Even if you see some plain text, you really don't have any 
sense for how it's actually being used.  

You may want to bone up on the structure of TrueType/OpenType fonts.

=
John H. Jenkins
井作恆
Жбь А. ЖЩэпЮьц
jenk...@apple.com

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

There is more to displaying characters than LTR versus RTL, and there is
more to handling characters than just displaying them.  This point
continues to be lost on several people responding to this thread.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 Depending on how you count, there are already two to four fonts that
 support Ewellic in the PUA.  There are probably many more that
 support Tengwar or Cirth or Klingon.
 
 First, these fonts can work fine with the default LTR directionality.
 So there's no need for additional data for them. Second, even if they
 were RTL, the needed info for each of these fonts, embedded in them
 would be extremely small, reduced to just specifying the range of RTL
 characters they need to contain.

This isn't my point.  Multiple fonts can exist for PUA scripts and the
user should not have to be constrained to using just the one font which
happens to contain property information, because someone decided
properties should be stored in the font.

 So I don't see that as a problem. Those fonts do exist and are used
 exactly because there was no problem for rendering them with texts
 encoded in logical order (the same as the visual order).

Not my point.

 It's still
 strange that we can have several fonts for esoteric fonts that have
 been used effectively by very few people, when there are centuries of
 traditions, and many interested users (but spread in very small
 communities worldwide) that cannot use computer technologies to render
 their favorite scripts, or that want to teach them, or make books and
 other publications to expose them, as an important humane cultural
 heritage, even if this was only to translate them or transcribe them
 in a more modern script.

One person added Ewellic to his shareware font as an experiment, and I
paid another person to do a font for me.  Sorry if this was culturally
insensitive.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

Shriramana Sharma samjnaa at gmail dot com wrote:

 Right, so if you embed that table in an OT font, the information is not
 available to a system that uses a font technology other than OT.
 
 I don't understand why you would say so -- assuming we are all talking 
 about TrueType fonts, AAT just uses some tables, OT others and Graphite 
 still others. They are all just tables appended to the TrueType font 
 data. Any software that is able to read TT font data can also read the 
 tables. So what's the problem?

OK, so it's obvious by now I'm not a font guy.

But I still maintain that there's more to proper handling of Unicode
characters, PUA or otherwise, than whether their directionality is LTR
or Arabic-RTL or non-Arabic-RTL or what have you.  That's why all those
other properties exist.  And I maintain that PUA users need a place to
store those other properties, and that the font doesn't seem like the
right place for non-display properties.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-22 Thread Philippe Verdy

2011/8/22 William_J_G Overington wjgo_10...@btinternet.com:
 Having selected a platform, one may view the text content of various fields 
 for that platform, such as font family name and copyright notice, version 
 string and postscript name. There is then a button that is labelled 
 Advanced... that, if clicked, opens another dialogue panel with various other 
 text fields, including Font Designer and Description, which are the two that 
 I often use.

 Now, when the text values in the fields are stored in the font file, the 
 values for the Macintosh Roman platform are stored in plain text and the 
 values for the Microsoft Unicode BMP only platform are stored in some encoded 
 format.

Note some encoded format. The strings are encoded using the encoding
specified in the platform selectors. The strings for the Macintish
Romain platform will be encoded using MacRoman. The strings for the MS
Unicode BMP platform will be encoded with the BMP part of UTF-16
(without support for surrogates). The strings for the Unicode platform
will use the UTF-32 encoding.

 So, if one opens a TrueType font file in WordPad and one searches for an item 
 of plain text that is in one of the fields of the font, then the text that is 
 in the Macintosh platform can be found:

It just happens that you are opening the TrueType font as if it was a
plain-text encoded with Windows-1252, or some other 8-bit encoding
based on ASCII.  You are also searching ASCII characters that are
encoded identically in Windows-1252 as well as in the MacRoman
encoding, so you find a match.

 yet the text that is in the Microsoft Unicode BMP only platform cannot be 
 found.

Because tou would have to insert null bytes in your search strings, to
find an exact match in an UTF-16 encoded string. Without these nulls,
you'll get no match. What you are doing is a search in a text loaded
after assuming the wrong encoding. TrueType fonts are binary
containers, that can mix several encodings for its plain-text
elements, but that also embed many other non-text data. This happens
even if your text editor is capable of loading Unicode-encoded texts
(this fails here if you try to load it as UTF-16, because the whole
TTF container cannot match the conformance requirements for correctly
encoded UTF-16 texts, for the whole document, but only for fragments
of it. On the opposite, there's no conformance problem if you try to
read the file as if it was Windows-1252 or ISO-8859-1...

ALM (was: Re: RTL PUA?)

2011-08-22 Thread Ken Whistler


On 8/21/2011 3:31 PM, Richard Wordingham wrote:

I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.


ARABIC *LETTER* MARK, not *LANGUAGE* mark. (And suggested
to just be renamed to AL MARK.)

Proposed? Yes.

Discussed? Yes.

Rejected? No.

The last UTC meeting took a consensus to issue a public review
issue on the proposed ALM and ELM (embedding level mark)
characters. So there will be further discussion and chance for input.

Nothing has been decided yet.

--Ken

Re: RTL PUA?

2011-08-22 Thread Richard Wordingham

On Mon, 22 Aug 2011 07:51:22 -0700
Doug Ewell d...@ewellic.org wrote:

 Some PUA properties, like glyph shapes and maybe directionality, can
 be stored in a font.  Others, like numeric values and casing, might
 not or cannot.  An interchangeable format needs to be agreed upon for
 the properties in the latter category.

I suggest that the obvious format is that used for capturing the UCD in
XML.  Only the characters in which you are interested need be
specified.  One very important property for several scripts is the
script to which a character belongs.

One reason for associating properties with a font is that text that is
to be displayed is at that point tentatively associated with a font.
Another is that in a multi-font document, a PUA character could
have multiple implicit properties dependent on the font it appears in.

Richard.

RE: RTL PUA?

2011-08-22 Thread Doug Ewell

Richard Wordingham richard dot wordingham at ntlworld dot com wrote:

 One reason for associating properties with a font is that text that is
 to be displayed is at that point tentatively associated with a font.

I thought John said fonts dealt with glyph IDs, not characters per se.

 Another is that in a multi-font document, a PUA character could
 have multiple implicit properties dependent on the font it appears in.

Normal, assigned characters don't change their Unicode properties
depending on font.  I don't see why PUA characters would be different.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Re: RTL PUA?

2011-08-22 Thread N. Ganesan

On Sat, Aug 20, 2011 at 7:08 AM, Shriramana Sharma samj...@gmail.com
wrote:
 On 08/20/2011 01:57 PM, Martin Hosken wrote:

 D49 states that all properties of PUA characters are overridable by a
 higher protocol. But in 'normal' implementations, there are no higher
 level protocols to override the properties and so they use the
 defaults in the Unicode Database. So while in *theory* it's possible
 to override these values, nobody does. (This happens to also be the
 case with other tailoring algorithms in Unicode). Adding the
 configuration that tailoring requires is usually prohibitive and so
 it just doesn't get done.

 Good point -- Michael should note this.

 Somebody remarked that Apple Mac OS's rendering engine already
 supports an extended OT table which would signal that the glyphs in
 a PUA font are RTL. If other rendering don't support it, again it
 is not the fault of the standard.

 Is there a specificatino for that OT table? Are you implementing this
 in anything?

 Read a previous post by John Jenkins. He's the one who said they have a
 prop table in Apple's implemention of OT (or is it their own AAT) that
 enables one to do this.


Is this correct? that Apple solves the problem of RTL PUA user requirements?

See John Jenkins latest mail that says:
[Begin Quote]
To be honest, I don't know if using the 'prop' table to override
directionality for glyphs still works.  A quick-and-dirty test on Lion
suggests that it doesn't, so I may have spoken too quickly.  This is not a
part of the functionality of AAT which gets much exercise, so it's entirely
possible that it was lost at some point without anyone noticing.  In any
event, my apologies for raising any false hopes.
[End Quote]

Hope a new proposal or a UTN from UC will make things clear, and RTL
community benefits.

N. Ganesan

Jonathan Kew 於 2011年8月21日 上午10:48 寫道：

 On 21 Aug 2011, at 17:21, Behdad Esfahbod wrote:

 On 08/21/11 16:44, Shriramana Sharma wrote:

 BTW can John Jenkins show us a few entries from the prop table of some
font
 supporting the custom Apple PUA characters, especially the RTL and GC=No
ones?

 Like this?

 https://developer.apple.com/fonts/ttrefman/RM06/Chap6prop.html

 However, note that this documentation is very old, and does not make it
clear whether there is any support for overriding directionality in current
Mac OS X software.


Yes, it's very old, largely because we haven't done anything with the
structure of the 'prop' table for a long, long time.  Still, anything
referring to QuickDraw GX is obviously overdue for an update.

To be honest, I don't know if using the 'prop' table to override
directionality for glyphs still works.  A quick-and-dirty test on Lion
suggests that it doesn't, so I may have spoken too quickly.  This is not a
part of the functionality of AAT which gets much exercise, so it's entirely
possible that it was lost at some point without anyone noticing.  In any
event, my apologies for raising any false hopes.

=
井作恆
John H. Jenkins
jenk...@apple.com




 If the application doesn't do this and allows Graphite to break the
 text into runs, then Graphite can treat PUA characters as having BC
 other than L? /myunderstanding

 Yes that understanding is correct.

 Great! Could you then place some sample characters from your
 Scheherezade font in the PUA and render them RTL and show to us then
 Michael would be convinced.

 --
 Shriramana Sharma

Re: RTL PUA?

2011-08-22 Thread Shriramana Sharma


On 08/23/2011 03:29 AM, N. Ganesan wrote:

Hope a new proposal or a UTN from UC will make things clear, and RTL
community benefits.


Dear Ganesan,

I wonder if you have actually understood all the issues here. As usual 
you have done your copy-paste from somebody else's post. Please say 
something if you have something to actually contribute instead of just 
saying I support Oriya OM I support PUA RTL or such.


If you support PUA RTL, and since you are so interested in Grantha, you 
should do a proposal for regions in the PUA to be allocated proper 
IndicMatraCategory properties so that today we can put Grantha in the 
PUA and get it rendered properly by existing rendering engines.


--
Shriramana Sharma

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/19 Michael Everson ever...@evertype.com:
 There is plenty of space. There would be no difficulty in assigning some rows 
 to a RTL PUA. Mucking about with the directionality of the existing PUA would 
 be extremely unwise.

 Conceivably certain closed user-groups could be using closed-distribution 
 rendering engines which would support bidi and glyph reordering or such for 
 PUA codepoints.

 Not everyone is a programmer and can devise a rendering engine. But lots of 
 people can make fonts that could support a RTL conscript or some private 
 Arabic characters.

Hmmm Given the current standard in OpenType, and the fact that
OpenType fonts cannot reorder glyphs to support the BiDi algorithm and
correctly handle featues like ligatures, I have serious doubt about
the feasibility of an OpenType font capable of supporting an RTL
conscript or some private Arabic characters, that will work with
existing OpenType engines, simply because there's absolutely nothing
to describe such properties.

This would be possible only if the engine can not only use the
existing OpenType fonts, but also include some supplementary character
properties tables for PUA assignments used in that font, or these
custom properties can be integrated in extension tables added in the
OpenType fonts, notably: directionality and mirroring, but also as
well the combining classes, some decomposition mappings, and probably
also fallback mapping. There would also be the need to represent a
finite state machine needed to recognize grapheme cluster boundaries,
at least, and list the feature names in which the substitution 
positioning rules for recognized sequences of PUA characters (or their
mapped glyphs).

What this means is that, in practice, PUA are only usable in fonts for
characters with strong LTR directionality, excluding all reordering
and mirroring. Those conscripts will then have to be represented in
PUAs as if they were completely with strong LTR characters, like the
sinograms. It's not impossible to do that, but you have to completely
forget the logical encoding order and only use a strict visual order
for these PUA-encoded conscripts, and even for unencoded rare Arabic
letters/clusters for which you'd want to just use a PUA.

The alternative is to not use OpenType features, but use one of the
alternatives: Apple's AAT or SIL's Graphite, which are less restricted
than OpenType, or some newer font formats (in this case, you won't
need any newer PUA ranges with strong RTL properties, you can just use
the existing assignments).

-- Philippe.

Re: RTL PUA?

2011-08-21 Thread Petr Tomasek

On Sun, Aug 21, 2011 at 12:21:28AM +, Doug Ewell wrote:
 The more I think of it, the more I like the idea of reassigning the default 
 BC of Plane 16 to 'R'. What would the arguments against this be?
 

I found a font (Asana Math) installed on my system that occupies 
U+10fddf..U+10fffd.

P.

-- 
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/20 Ken Whistler k...@sybase.com:
 There are 131,068 private use code points in the standard. That is all there
 ever  will be.

I also fully agree (sorry then to Michael Everson support for such new
RTL PUA assignments).

All that can be done is to fix the softwares. Notably the font formats
where you'll be able to define the necessary overrides for
directionality mirroring mappings (for RTL conscripts), and other
reordering properties that may be needed to support Indic conscripts
(such as prepended letters).

Adding new RTL PUAs will require any way modification of
renderers/layout engines to support it. These same engines can as well
be modified to support external character properties table needed to
override the existing PUAs, so that they can be rendered correctly.

May be it's the desire of OpenType designers to not use any such
overrides, but this was only intended for normal non-PUA characters.
An revised OpenType specification can perfectly integrate the
possibility of some new extension table, and assert that these custom
properties stored in fonts will ONLY by valid and usable for PUA
characters only, as a font validation constraint.

Re: RTL PUA?

2011-08-21 Thread Michael Everson

On 21 Aug 2011, at 02:44, Doug Ewell wrote:

 Would that really be a better default? I thought the main RTL needs for the 
 PUA would be for unencoded scripts, not for even more Arabic letters.

Could easily be for work on new Arabic-script orthographies which use new 
letters. Or for similar scripts that treat numbers as Arabic does.

 (How many more are there anyway?)

No one knows. :-)

Michael Everson * http://www.evertype.com/

RE: RTL PUA?

2011-08-21 Thread Jonathan Rosenne

Several RTL scripts do not require shaping nor ligatures.

Jony

 -Original Message-
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
 Behalf Of Philippe Verdy
 Sent: Sunday, August 21, 2011 10:29 AM
 To: Michael Everson
 Cc: unicore UnicoRe Discussion; Unicode Discussion List
 Subject: Re: RTL PUA?

 2011/8/19 Michael Everson ever...@evertype.com:
  There is plenty of space. There would be no difficulty in assigning some
 rows to a RTL PUA. Mucking about with the directionality of the existing
 PUA would be extremely unwise.

  Conceivably certain closed user-groups could be using closed-
 distribution rendering engines which would support bidi and glyph
 reordering or such for PUA codepoints.

  Not everyone is a programmer and can devise a rendering engine. But lots
 of people can make fonts that could support a RTL conscript or some
private
 Arabic characters.

 Hmmm Given the current standard in OpenType, and the fact that
 OpenType fonts cannot reorder glyphs to support the BiDi algorithm and
 correctly handle featues like ligatures, I have serious doubt about
 the feasibility of an OpenType font capable of supporting an RTL
 conscript or some private Arabic characters, that will work with
 existing OpenType engines, simply because there's absolutely nothing
 to describe such properties.

 This would be possible only if the engine can not only use the
 existing OpenType fonts, but also include some supplementary character
 properties tables for PUA assignments used in that font, or these
 custom properties can be integrated in extension tables added in the
 OpenType fonts, notably: directionality and mirroring, but also as
 well the combining classes, some decomposition mappings, and probably
 also fallback mapping. There would also be the need to represent a
 finite state machine needed to recognize grapheme cluster boundaries,
 at least, and list the feature names in which the substitution 
 positioning rules for recognized sequences of PUA characters (or their
 mapped glyphs).

 What this means is that, in practice, PUA are only usable in fonts for
 characters with strong LTR directionality, excluding all reordering
 and mirroring. Those conscripts will then have to be represented in
 PUAs as if they were completely with strong LTR characters, like the
 sinograms. It's not impossible to do that, but you have to completely
 forget the logical encoding order and only use a strict visual order
 for these PUA-encoded conscripts, and even for unencoded rare Arabic
 letters/clusters for which you'd want to just use a PUA.

 The alternative is to not use OpenType features, but use one of the
 alternatives: Apple's AAT or SIL's Graphite, which are less restricted
 than OpenType, or some newer font formats (in this case, you won't
 need any newer PUA ranges with strong RTL properties, you can just use
 the existing assignments).

 -- Philippe.

RE: RTL PUA?

2011-08-21 Thread Peter Constable

From: unicore-boun...@unicode.org [mailto:unicore-boun...@unicode.org] On 
Behalf Of Michael Everson


 Yeah OK maybe simply base+diacritic stuff or even ligatures would be 
 easy to do via simple substitution rules in tables, but how about glyph 
 reordering?

 No problem unless you are using Uniscribe.

Which of these are you saying? 

- That mark positioning and simple substitution rules involving PUA characters 
is not a problem unless you're using Uniscribe

- That glyph re-ordering of PUA characters is not a problem unless you're using 
Uniscribe

(Unless we have a bug I haven't encountered, the first is incorrect. The second 
suggests that you've missed Sharma's point entirely.)


 Indic scripts involving reordering and split-positioning vowel signs can't 
 be handled by placing them in the PUA.

 There are other ways of handling such clusters. 

Oh? You must mean something like ignoring Unicode. If not, please clarify.



Peter

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham

On Sun, 21 Aug 2011 01:44:02 +
Doug Ewell d...@ewellic.org wrote:

 The more I think of it, the more I like the idea of reassigning the
 default BC of Plane 16 to 'R'. What would the arguments against this
 be?
 
 BC of 'AL'?

 Would that really be a better default? I thought the main RTL needs
 for the PUA would be for unencoded scripts, not for even more Arabic
 letters. (How many more are there anyway?)

Not necessarily better, I'm just suggesting that both need to be
supported.  However, we need to look at use cases.

(1) Unencoded Arabic script letters with joining behaviour, for use with
any application.

(a) We need the character to have AL, R or ON for it to be included in
BiDi runs.  If we use ON we may need RLM when the character is at the
edge of a run, and even then, its behaviour may be no better than a
character with a BC of R.

(b) It may get left out of script runs.  There were problems on
Windows with the Tamil ligature k.SS not rendering, despite font
support, when the character U+0BB7 TAMIL LETTER SSA was new.  And
that's in a left-to right script with a character in the appropriate
block!

(2) Complete right-to-left script.  I'm presuming the difference
between AL and R is then a matter of what right-to-left script the
potential users chiefly also use.

(a) As a practical implementation, the distinction between AL and R
would matter if the script has modern use.  Otherwise, any of ON, AL
and R would do, though one might face the annoyance of having to start
chunks of text with RLM.  If a script with modern use should be encoded
using a BC of R, then I believe ON would also do as a stop-gap until
the script is encoded.

How fiendish is BiDi-sensitive transliteration?

(b) For experimentation, I believe the difference between AL, R and ON
would matter little, even though it would be irritiating to have to
use RLM.

(c) Complex script support is patchy - one might be restricted to
applications that allow the font to provide full complex script support.

The big issue in all this, though, is (i) how to update the rendering
system with a new set of values for Unicode properties, including
script, and (ii) the scope of such an update.  (The distinction between
the PUA and the rest is that it makes sense for PUA properties to
change as freely as fonts.) This, incidentally, is analogous to locales
reflecting code page selections.  There is also, though less pressing,
the issue of tailoring collations.  (The worst issue is there is
distinct canonically inequivalent characters of type Lo comparing equal
- I've seen it for Canadian Aboriginal Syllabics for Windows XP and for
Thai in Ubuntu 10.04 - surely that's not the normal British collation
of such characters.)

One minor problem with (i) *was* that it wasn't clear how one should
annotate a copy of UnicodeData.txt to show that it has been modified.
The standard XML alternative provides allows for comments, thereby
solving that problem.

If Issue (i) can be readily solved at the machine or user level or
lower, then the default properties of the PUA become irrelevant.

Richard.

RE: RTL PUA?

2011-08-21 Thread Peter Constable

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Philippe Verdy

 Hmmm Given the current standard in OpenType, and the fact that 
 OpenType fonts cannot reorder glyphs to support the BiDi algorithm 
 and correctly handle featues like ligatures...

I agree that OpenType font tables cannot to glyph re-ordering. But totally 
incorrect in saying that it cannot handle ligatures.


 What this means is that, in practice, PUA are only usable in fonts for 
 characters with strong LTR directionality, excluding all reordering and 
 mirroring. 

In the OpenType specification, the only data related to glyph mirroring that a 
rendering engine is assumed to have is the bidi mirroring data from TUS 5.1. 
(See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) All other 
glyph mirroring is to be handled using glyph substitution data in OpenType 
Layout tables in fonts.




Peter

Re: RTL PUA?

2011-08-21 Thread Doug Ewell

I think as soon as we start talking about this many scenarios, we are no 
longer talking about what the *default* bidi class of the PUA (or some 
part of it) should be.  Instead, we are talking about being able to 
specify private customizations, so that one can have 'AL' runs and 'ON' 
runs and so forth.


There really isn't any way the UTC is going to approve changing one part 
of the PUA to be default 'AL', another part 'R', another part 'ON', etc. 
Asmus just said that merely assigning one plane to be different from the 
others should be a non-starter.


For this discussion, I really don't find it very interesting that 
existing technologies A, B, and C don't currently provide a way to 
override the default PUA properties.  Through most of the 1990s, most 
existing applications and technologies didn't support Unicode at all, or 
very small parts of it, and the solution generally was to update them so 
that they would.  The same should be true here.  I would suggest that 
installing a modified copy of UnicodeData.txt seems like a rather clumsy 
solution; if text files are involved, I'd suggest leaving 
UnicodeData.txt alone and creating some sort of overrides file.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell 


-Original Message- 
From: Richard Wordingham

Sent: Sunday, August 21, 2011 9:48
To: unicode@unicode.org
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 01:44:02 +
Doug Ewell d...@ewellic.org wrote:


The more I think of it, the more I like the idea of reassigning the
default BC of Plane 16 to 'R'. What would the arguments against this
be?



BC of 'AL'?



Would that really be a better default? I thought the main RTL needs
for the PUA would be for unencoded scripts, not for even more Arabic
letters. (How many more are there anyway?)


Not necessarily better, I'm just suggesting that both need to be
supported.  However, we need to look at use cases.

(1) Unencoded Arabic script letters with joining behaviour, for use with
any application.

(a) We need the character to have AL, R or ON for it to be included in
BiDi runs.  If we use ON we may need RLM when the character is at the
edge of a run, and even then, its behaviour may be no better than a
character with a BC of R.

(b) It may get left out of script runs.  There were problems on
Windows with the Tamil ligature k.SS not rendering, despite font
support, when the character U+0BB7 TAMIL LETTER SSA was new.  And
that's in a left-to right script with a character in the appropriate
block!

(2) Complete right-to-left script.  I'm presuming the difference
between AL and R is then a matter of what right-to-left script the
potential users chiefly also use.

(a) As a practical implementation, the distinction between AL and R
would matter if the script has modern use.  Otherwise, any of ON, AL
and R would do, though one might face the annoyance of having to start
chunks of text with RLM.  If a script with modern use should be encoded
using a BC of R, then I believe ON would also do as a stop-gap until
the script is encoded.

How fiendish is BiDi-sensitive transliteration?

(b) For experimentation, I believe the difference between AL, R and ON
would matter little, even though it would be irritiating to have to
use RLM.

(c) Complex script support is patchy - one might be restricted to
applications that allow the font to provide full complex script support.

The big issue in all this, though, is (i) how to update the rendering
system with a new set of values for Unicode properties, including
script, and (ii) the scope of such an update.  (The distinction between
the PUA and the rest is that it makes sense for PUA properties to
change as freely as fonts.) This, incidentally, is analogous to locales
reflecting code page selections.  There is also, though less pressing,
the issue of tailoring collations.  (The worst issue is there is
distinct canonically inequivalent characters of type Lo comparing equal
- I've seen it for Canadian Aboriginal Syllabics for Windows XP and for
Thai in Ubuntu 10.04 - surely that's not the normal British collation
of such characters.)

One minor problem with (i) *was* that it wasn't clear how one should
annotate a copy of UnicodeData.txt to show that it has been modified.
The standard XML alternative provides allows for comments, thereby
solving that problem.

If Issue (i) can be readily solved at the machine or user level or
lower, then the default properties of the PUA become irrelevant.

Richard.

Re: RTL PUA?

2011-08-21 Thread John Hudson


Jonathan Rosenne wrote:


People do all kinds of fancy things. I guess old manuscripts contain many
ligatures...


Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/21 Peter Constable peter...@microsoft.com:
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On 
 Behalf Of Philippe Verdy

 Hmmm Given the current standard in OpenType, and the fact that
 OpenType fonts cannot reorder glyphs to support the BiDi algorithm
 and correctly handle featues like ligatures...

 I agree that OpenType font tables cannot to glyph re-ordering. But totally 
 incorrect in saying that it cannot handle ligatures.

I meant recognizing and generating ligatures in the context where
re-ordering has been performed externally by the renderer. Ligatures
can only be recognized in OpenType, provided that the layout engine
has performed the reordering itself, because OpenType fonts won't
recognize ligatures with glyphs in arbitrary order or intersperced
with other unrelated characters coming from an unreordered glyph
sequence.

 What this means is that, in practice, PUA are only usable in fonts for
 characters with strong LTR directionality, excluding all reordering and
 mirroring.

 In the OpenType specification, the only data related to glyph mirroring that 
 a rendering engine is assumed to have is the bidi mirroring data from TUS 
 5.1. (See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) 
 All other glyph mirroring is to be handled using glyph substitution data in 
 OpenType Layout tables in fonts.

Exactly, but mirroring data for remapping glyphs will not be be part
of that font. Glyph mirroring substitution data in substitution rules
of OpenType fonts does not work because it cannot solve the ambiguity
of the expected direction, as the context length is limited (otherwise
the number of contextual pairs to recognize would explode
combinatorially, making such implementation unpractical to implement
in decent table sizes in fonts, even if we use class-based
substitution, because the necessary character-to-class mappings would
also require large mapping tables, including for a lot of characters
that are not even mapped in the font and for which the font was never
designed).

Mirroring behavior is then best handled in the layout engine, which
has a more global and centralized view of properties of the whole UCS.
Here, we just want to complement this view of character properties, by
permitting to specify a set of character properties for PUA characters
only, expecting that the layout engine will handle all the other
character properties for non-PUA characters, using the standard data
of the UCD...

Re: RTL PUA?

2011-08-21 Thread Mark E. Shoulson


On 08/21/2011 01:09 PM, John Hudson wrote:

Jonathan Rosenne wrote:

People do all kinds of fancy things. I guess old manuscripts contain 
many

ligatures...


Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.
Closest you might have to ligatures is idiosyncratic 
letters-getting-joined-together by rapid writing, etc.  There are some 
examples in Ada Yardeni's book.  But they're not really ligatures; at 
best _maybe_ they're calligraphic variants (tho mostly they're quite the 
opposite of calligraphic).


Alef-Lamed did get a fair amount of use as a true ligature, though.

~mark

RE: RTL PUA?

2011-08-21 Thread Peter Constable

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

 I agree that OpenType font tables cannot to glyph re-ordering. But totally 
 incorrect in saying that it cannot handle ligatures.

 I meant recognizing and generating ligatures in the context where 
 re-ordering has been performed externally by the renderer. 

That statement isn't adequate: the results of re-ordering may result in 
contexts in which ligatures will occur. That can happen, for instance, in 
displaying Indic scripts.


 Ligatures can only be recognized in OpenType, provided that the layout
 engine has performed the reordering itself, because OpenType fonts
 won't recognize ligatures with glyphs in arbitrary order or intersperced 
 with other unrelated characters coming from an unreordered glyph sequence.

I'm not sure what it means to create a ligature of glyphs in arbitrary order. 
If you mean a rule to substitute [g1 g2] with [g3] won't apply if the sequence 
processed by the OpenType Layout lookup processor is [g2 g1], then that's true: 
if the behaviour of the script is such that glyph re-ordering is appropriate, 
then a rendering engine for OpenType should do that reordering, and 
substitution lookups in OpenType fonts should be written to assume that that 
reordering has taken place.


 What this means is that, in practice, PUA are only usable in fonts 
 for characters with strong LTR directionality, excluding all 
 reordering and mirroring.

 In the OpenType specification, the only data related to glyph mirroring 
 that a rendering engine is assumed to have is the bidi mirroring data from 
 TUS 5.1. (See 
 http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) 
 All other glyph mirroring is to be handled using glyph substitution data in 
 OpenType Layout tables in fonts.

 Exactly, but mirroring data for remapping glyphs will not be be part of that 
 font. 

Um... Why not? If the mirroring isn't in reflected in 
http://www.unicode.org/Public/5.1.0/ucd/BidiMirroring.txt, then it must be 
handled by glyph substitution in the font as a normal GSUB operation.



Peter

Re: RTL PUA?

2011-08-21 Thread Petr Tomasek

On Sun, Aug 21, 2011 at 10:09:22AM -0700, John Hudson wrote:
 Jonathan Rosenne wrote:
 
 People do all kinds of fancy things. I guess old manuscripts contain many
 ligatures...
 
 Not in Hebrew. The only common ligature is the aleph_lamed, a 
 post-classical import from Judaeo-Arabic.
 
 JH

Not true. See:

Collete Sirat. Hebrew Manuscripts of the Middle Ages. Cambridge University 
Press 2002,
fig. 114 (p. 176) or fig. 127 (p. 189) or fig. 134 (p. 193).

-- 
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/21 Peter Constable peter...@microsoft.com:
 In the OpenType specification, the only data related to glyph mirroring that 
 a rendering engine is assumed to have is the bidi mirroring data from TUS 
 5.1. (See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) 
 All other glyph mirroring is to be handled using glyph substitution data in 
 OpenType Layout tables in fonts.

In addition, this specification highly depends on two things:
- the layout engine fully knows the properties of all characters in
order to implement BiDi reordering as well as BiDi mirroring
- the layout engine fully knows the necessary mappings for the OMPL
table (this assumes that it always implements the latest version of
the UCD)

This is not the case because:
- an OpenType layout engine will always implement a specific version
of the UCD. Standard properties defined in the UCD will never concern
unassigned characters that will be assigned in a later version. As
well, it will not provide any normative property for the PUA. All it
can then do is then to apply default properties for unassigned
(still unknown) characters, as well as for all PUAs.
- as such it will never be able to assert which runs of text
containing PUAs or unassigned characters are in RTL order of LTR
order.
- if it uses the default LTR order, it will not be able to find any
mirroring mapping in the OMPL, because the OMPL lookup table will only
be searched for runs tht have been identified as RTL
- if it uses the default RTL order assumed from some blocks, the OMPL
will still not work with unknown characters/code points (the OMPL only
contains a list of pairs of known (assigned) non-PUA characters), so
character-level mirroring will not work as expected.
- in addition, if it cannot know if a run of reordered characters is
LTR or RTL, after mapping them to the glyph id's from the cmap (where
it exists in a font for the unknown non-PUA character or the PUA
character), it won't know which of the ltrm or rtlm tables to use
(if it assumes incorrectly the default LTR order, which is the default
for PUA, it will only lookup in the ltrm table, not on the rtlm
table. Mirroring will then not work if the RTL or RTL guess was wrong.

The only way to change this would be that the OpenType layout engine
allows overriding its default properties for unassigned or PUA
characters. For the case of BiDi reordering, this would require the
support of an additional lookup table in the OpenType font, containing
overrides for the BiDi character class assigned to characters. Of
course, this lookup table should NEVER be used if the character is
non-PUA and known in the implementation of the UCD by the layout
engine. The rule would be:
- if the character is not a PUA and is known in the current
implemented version of the UCD, use the known character property of
the UCD (allow no override).
- otherwise if the character (which is then either a PUA or an unknown
non-PUA) is mapped in the font's cmap table, and there's a BiDi
lookup table on the OpenType font, and that lookup table provides the
proerty value for that character, use that property
- otherwise use the default property value (indicated in the UCD and
Unicode specifications).

A similar rule can be used as well for the character-level mirroring:
the standard OMPL will be used if and only if the character is not a
PUA and is known in the impelemtned version of the UCD. Otherwise, an
OMPL table in the OpenType font will contain additional character
pairs to lookup. Such lookup will however never be performed if the
character is in a LTR run (which means that this feature is dependant
on the correct implementation of the BiDi override above, which must
be impelmented first).

Then only, the existing ltrm and rtlm lookup tables in OpenType
can be used like today, because the OpenType layout engine knows
reliably which one to use. This allows standard glyph-level mirroring
to be specified (between pairs of glyph-id's).

Also the existing ltra and rtla lookup tables will be workable to
provide lists of alternate mirrored glyphs (but only for advanced
applications that allows selecting alternate variants). It may be
possible that this first requires the support of additional variation
sequences (using variation selectors), which are unknonw in the
implemented version of the UCD, using an additional lookup table
working under the same rule as above, in order to allow sequences of
PUA+VSn (which will never be part of the UCD, but may be needed under
the PUA convention agreement that the font provides).

One difficulty in this scheme is that all those properties in OpenType
were never meant to be overridable in specific fonts. This means that
they were assumed to be consistant across all fonts. The difficulty
can come because of the behavior of font subsitutions. I don't think
this is critical because this also means that we change of PUA
agreement in this case: the encoded PUA text is then dependant of the
PUA font used to

Re: RTL PUA?

2011-08-21 Thread John Hudson


Petr Tomasek wrote:

Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.



Not true. See:
Collete Sirat. Hebrew Manuscripts of the Middle Ages. Cambridge University 
Press 2002,
fig. 114 (p. 176) or fig. 127 (p. 189) or fig. 134 (p. 193).


I wouldn't classify any of those examples as 'common'. I also wouldn't 
classify all examples of touching letters -- of which many occur in 
rapidly written text -- as ligatures. Aleph+lamed on the other hand is a 
regularly occurring distinct formation in whole classes of manuscripts 
(and persisting in typography). I have a good collection of books on 
Hebrew palaeography, and while there are many examples of Hebrew letters 
being very tightly spaced there are relatively few instances of what I 
would consider ligatures, i.e. formations in which the ductus or spacing 
of the specific sequences of letters is modified to facilitate connection.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/21 Peter Constable peter...@microsoft.com:
 Exactly, but mirroring data for remapping glyphs will not be be part of that
 font.

 Um... Why not? If the mirroring isn't in reflected in 
 http://www.unicode.org/Public/5.1.0/ucd/BidiMirroring.txt, then it must be 
 handled by glyph substitution in the font as a normal GSUB operation.

A GSUB operation will only be used if it is specified in the correct
feature table. The problem here is which feature to use: rtlm or
ltrm ? It's impossible to know because it first depend on the layout
engine to KNOW exactly if the run of text is RTL or LTR.

Without a font-level support of BiDi properties of PUAs (or unassigned
characters), the layout engine will assume the wrong guess from the
default property value. And then it won't find the expected GSUB
operation, because it won't match it in the correct feature subtable.

RE: RTL PUA?

2011-08-21 Thread Peter Constable

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

 In the OpenType specification

 In addition, this specification highly depends on two things:
 - the layout engine fully knows the properties of all characters in 
 order to implement BiDi reordering as well as BiDi mirroring

Not true: mirroring depends on the resolved directionality, not the Unicode 
character properties.


 - the layout engine fully knows the necessary mappings for the OMPL
 table (this assumes that it always implements the latest version of the UCD)

No. The OMPL is fixed at TUS 5.1.




Peter

RE: RTL PUA?

2011-08-21 Thread Peter Constable

From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

 A GSUB operation will only be used if it is specified in the correct feature 
 table. The problem here is which feature to use: rtlm or ltrm ? It's 
 impossible to know because it first depend on the layout engine to KNOW 
 exactly if the run of text is RTL or LTR.

The layout engine already _has_ to know the bidi level of a run regardless.


 Without a font-level support of BiDi properties of PUAs (or unassigned 
 characters), 

I'm trying to tell you that, wrt mirroring, that's already defined in the 
OpenType spec.


 the layout engine will assume the wrong guess from the default property 
 value. And then it won't find the expected GSUB operation, because it won't 
 match it in the correct feature subtable.

As I explained in an earlier message, the layout engine doesn't use the 
default property value but the resolved bidi level.


Btw, in the past few weeks, you've written several posts in which you make 
assertions about how rendering implementations work and, in some cases, why 
more is needed. And then I or others have to spend a bunch of time writing 
responses so that you get the correct understanding and, more importantly, so 
that others don't get mislead. It would be a lot easier if you just asked, How 
is this done?


Peter

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/21 Peter Constable peter...@microsoft.com:
 From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

 A GSUB operation will only be used if it is specified in the correct feature
 table. The problem here is which feature to use: rtlm or ltrm ? It's
 impossible to know because it first depend on the layout engine to KNOW
 exactly if the run of text is RTL or LTR.

 The layout engine already _has_ to know the bidi level of a run regardless.


 Without a font-level support of BiDi properties of PUAs (or unassigned
 characters),

 I'm trying to tell you that, wrt mirroring, that's already defined in the 
 OpenType spec.


 the layout engine will assume the wrong guess from the default property
 value. And then it won't find the expected GSUB operation, because it won't
 match it in the correct feature subtable.

 As I explained in an earlier message, the layout engine doesn't use the 
 default property value but the resolved bidi level.

Once again, you refuse to understand my arguments. What I'm saying is
that OpenType CANNOT resolve the bidi level of PUAs (with the
exception where we use additional BiDi controls, which remains a hack,
because it adds unnecessary unvisible markup around the encoded texts,
and complexifies the use of strings and substrings).

You can turn the problem as you want, but PUAs (as well as unknown
characters) still have default properties that, in fine, will get used
in absence of a more precise definition (i.e. an explicit override) of
the actual BiDi property needed for the character.

 Btw, in the past few weeks, you've written several posts in which you make 
 assertions about how rendering implementations work and, in some cases, why 
 more is needed. And then I or others have to spend a bunch of time writing 
 responses so that you get the correct understanding and, more importantly, so 
 that others don't get mislead. It would be a lot easier if you just asked, 
 How is this done?

Ok, you've replied, but not completely.

And at least on this point, Michael Everson is also right when he says
that PUAs do not properly handle RTL scripts only because of their
default BiDi property value. But I don't maintain his idea of encoding
new PUAs, when in fact we can effectively provide the additional
character properties needed, for example in fonts, without changing
the default proerty of PUA (I son't support it at all, and probably
you too) and without allocating more (unneeded) PUA block(s) for RTL
scripts (and also without hacking on top of another existing set of
RTL assigned characters).

I did not post any assertion about how OpenType could be used, just
wanted to explain that with the current specifications, it cannot
*currently* resolve the problem (and Michael Everson certainly fully
agrees with that, but he can reply as well if he thinks that I
misinterpret his last few messages).

We really need a raliable way to transport a PUA agreement in such a
way that it can be understood by a computer. An encoded font can
transport this information reliably, which at least must include some
necessary character property values, and it offers a smooth way for
transitions during all the encoding process of new scripts (notably
during the experimentation), as well as after that, for its adoption
for more general use (before a large majority of users can use updated
implementations of their text renderers, that will provide
automatically those properties for newly encoded characters and
scripts.

Simply because it's MUCH easier to upgrade a font (especially a PUA
font which is not part of the core fonts of the operating system),
than to upgrade a rendering engine (bound to the OS, for the case of
Microsoft APIs and libraries in Windows). An extensible set of
properties, managed with a good rule of priorities to avoid hacks or
non-compliant implementations, can certainly accelerate the
development and adoption rate by many years, can improve the number of
experimentations possible, can help avoiding errors during the
encoding process for new characters and scripts.

It could reduce this delay from about 10 years (during which even if
the script or characters are encoded, it will not be available or
usable reliably), to just a few months (even anticipating the final
encoding in the UCS, by a reliable way to represent it as PUAs,
managed with help of a PUA font, and after the UCD encoding, with a
font that provides the upward upgrade for older implementations of the
layout engine only knowing an older UCD version)

I ma completley convinced that we don't need more PUAs due to
continuous lack of support in existing softwares. But softwares can
still be updated to provide the support with the help of transitional
subtables in fonts (that can easily be ignored by newer engines that
won't require such extension tables), for integrating the additional
character properties.

Philippe.

Re: RTL PUA?

2011-08-21 Thread Doug Ewell

For once, I am in strong agreement with something Philippe had to say:

 We really need a raliable way to transport a PUA agreement in such a
way that it can be understood by a computer.

I don't necessarily agree that fonts, or (especially) any particular font 
technology, are the one and only way to accomplish this, because there's more 
to character handling than display. Maybe some sort of open format could be 
devised that could be used as a plug-in to a variety of existing components.

--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by ATT

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham

On Sun, 21 Aug 2011 11:00:26 -0600
Doug Ewell d...@ewellic.org wrote:

 I think as soon as we start talking about this many scenarios, we are
 no longer talking about what the *default* bidi class of the PUA (or
 some part of it) should be.  Instead, we are talking about being able
 to specify private customizations, so that one can have 'AL' runs and
 'ON' runs and so forth.

I was exploring the consequences to see if there was a one size fits
all solution.  Someone (you?) suggested ON as a default, and I like
it.  I think it would also work fairly well for practical CJK
applications as well - the only problems are that LRM and RLM would
occasionally be needed, and the subtle differences between AL and R
would be lost.  I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.

 Through most of the 1990s, most 
 existing applications and technologies didn't support Unicode at all,
 or very small parts of it, and the solution generally was to update
 them so that they would.  The same should be true here.

Agreed.  I also noted that changes would be of limited assistance for
extending existing supported scripts.

 I would
 suggest that installing a modified copy of UnicodeData.txt seems like
 a rather clumsy solution; if text files are involved, I'd suggest
 leaving UnicodeData.txt alone and creating some sort of overrides
 file.

While partial overrides are cleaner, that appears to be the way to fix
Pango, albeit via recompilation.  According to the comments, its BiDi
settings are derived from the file automatically.  Also, one needs a
method of updating the properties of codepoints as they become assigned
and properties change.  There are also advantages to trying out proposed
changes.

Richard.

Re: RTL PUA?

2011-08-21 Thread Philippe Verdy

2011/8/21 Doug Ewell d...@ewellic.org:
 For once, I am in strong agreement with something Philippe had to say:

 We really need a raliable way to transport a PUA agreement in such a
 way that it can be understood by a computer.

 I don't necessarily agree that fonts, or (especially) any particular font 
 technology, are the one and only way to accomplish this, because there's more 
 to character handling than display. Maybe some sort of open format could be 
 devised that could be used as a plug-in to a variety of existing components.

Yes but without display support, at least, all the other needs will
never be addressed, because you won't have text encoded to work with.
So don't even dream for example about performing plain-text search, if
you don't have encoded texts to search in ! Collation is then a
secondary target. Proper display is an immediate need (that even comes
before the development of easy input methods, or later developments of
spell checkers, content indexers, semantic analyzers, and localization
of softwares to use a given script through its UI).

For proper display of PUAs, all that is needed is a minimum set of
character properties. I have argued, against what Peter Constable
thinks, that OpenType cannot handle RTL characters with PUAs, because
it has absolutely no source of information to know if a run of text is
RTL or LTR, when implemeing the BiDi algorithm.

OK, the mirroring property is probably not essential (because most
mirrored characters are today only punctuations, that already cover a
very wide range. If needed additional PUA punctuations may be added,
and even coded in two mirrored code positions, even if they are not
automatically mirrored according to their context : for such rare
cases, using BiDi format controls around them, or other equivalent CSS
embedding styles in HTML, and similar technics, will be enough.

But for most of the RTL text using PUAs in long runs or mixed within
other sequences of standard RTL characters (for example in the middle
of words), format controls are clearly not the solution (it does not
work reliably in HTML for example, if you have to split words within
separate spans, and inserting those controls in the middle of words is
really a nightmare). In addition it completely defeats the plain-text
searchability and editability of encoded texts. This will only slow
down the production of encoded texts that in fact, almost no work will
be done with those PUAs. As a consequence, most texts will wait
indefinitely for some encoding effort.

The need will become even more urgent now that the UTC and WG2 will
pass most of its time in discussing scripts that are rarely used,
where the cultural knowledge will be difficult to find. If we don't
have an easy way to experiment their encodings at least with PUAs, for
extended periods (because there will be the need of a long research
period, with conflicting experimentations), those scripts will remain
unencoded in the UCS for very long. And in fact I doubt that even the
WG2 or the UTC will have the resources to provide all this effort
without commiting many critical errors that will be a plague for the
long-term future.

We absolutely need a transition mechanism, and PUAs can be part of
this transition. For the same reason, the possibility offered to
support external character prorperties, for characters that are not
encoded or encoded in separate efforts via PUAs, and later that will
be encoded with low levels of implementations and deployment for many
year, would certainly help maintaining the needed resources (at UTC
and WG2) at a low level, where most of the experimentations will be
performed independantly without depending on the release of a putative
version of the UCS finally accepting to encode the script.

But even in this case, or historic scripts, the encoding effort will
be hard to finalize: it is highly probable that those scripts will be
encoded progressively, with a starting minimum subset about which most
people will agree, and many other characters remaining that need
longer experimentations or researches. Those scripts will then need to
support for long a mix of standard assignments, and PUAs, at the same
time, for distinct small communities that will need to share and
discuss their agreement.

The current problem is that there is absolutely no transition
mechanism in the UCS encoding process: a character gets fully encoded
with most of its essential properties becoming normative, some of them
impossible to change later (even if there was an error or an
unexpected caveat, that the interested communities have not had any
chance to experiment before they were finally approved by the UTC and
WG2).

Unicode should not interfere with what users will want to do with
PUAs. After all, PUAs was made specifically for that. If users need to
assign their own property values to PUAs, they must be able to do
that. And these properties must find a way to be representable in the
current technology

Re: RTL PUA?

2011-08-21 Thread Asmus Freytag


On 8/21/2011 3:31 PM, Richard Wordingham wrote:

On Sun, 21 Aug 2011 11:00:26 -0600
Doug Ewelld...@ewellic.org  wrote:


I think as soon as we start talking about this many scenarios, we are
no longer talking about what the *default* bidi class of the PUA (or
some part of it) should be.  Instead, we are talking about being able
to specify private customizations, so that one can have 'AL' runs and
'ON' runs and so forth.

I was exploring the consequences to see if there was a one size fits
all solution.  Someone (you?) suggested ON as a default, and I like
it.  I think it would also work fairly well for practical CJK
applications as well - the only problems are that LRM and RLM would
occasionally be needed, and the subtle differences between AL and R
would be lost.  I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.


If your implementation supported the directional overrides, it would be 
possible to use these to lay out any RTL text in a portable manner. Just 
enclose any RTL run with RLO and PDF (pop directional formatting).


No impact on any existing implementation, no impact on the standard.

Those who produce rendering engines that do not support these overrides 
today could be leaned on to upgrade their implementations - that change 
would benefit users of non-PUA RTL languages as well (because sometimes, 
the bidi-algorithm can fail, such as for part numbers, and being able to 
use RLO is a simple way to stabilize such problematic text).


Treating PUA characters as ON is very problematic - their display would 
become context sensitive in unintended ways. No users of CJK characters 
would think of using LRM characters, but if text is inserted or viewed 
in RTL context, it could behave randomly.


In contrast, always supplying a RLO override for RTL text (containing 
PUA characters) would be a simple thing to remember and to get right.


A./

Re: RTL PUA?

2011-08-21 Thread Doug Ewell

I suggested 'R' for Plane 16, not 'ON'.

What's a LANGUAGE MARK?

--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by ATT

-Original Message-
From: Richard Wordingham richard.wording...@ntlworld.com
Sender: unicode-bou...@unicode.org
Date: Sun, 21 Aug 2011 23:31:58 
To: unicode@unicode.org
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 11:00:26 -0600
Doug Ewell d...@ewellic.org wrote:

 I think as soon as we start talking about this many scenarios, we are
 no longer talking about what the *default* bidi class of the PUA (or
 some part of it) should be.  Instead, we are talking about being able
 to specify private customizations, so that one can have 'AL' runs and
 'ON' runs and so forth.

I was exploring the consequences to see if there was a one size fits
all solution.  Someone (you?) suggested ON as a default, and I like
it.  I think it would also work fairly well for practical CJK
applications as well - the only problems are that LRM and RLM would
occasionally be needed, and the subtle differences between AL and R
would be lost.  I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.

 Through most of the 1990s, most 
 existing applications and technologies didn't support Unicode at all,
 or very small parts of it, and the solution generally was to update
 them so that they would.  The same should be true here.

Agreed.  I also noted that changes would be of limited assistance for
extending existing supported scripts.

 I would
 suggest that installing a modified copy of UnicodeData.txt seems like
 a rather clumsy solution; if text files are involved, I'd suggest
 leaving UnicodeData.txt alone and creating some sort of overrides
 file.

While partial overrides are cleaner, that appears to be the way to fix
Pango, albeit via recompilation.  According to the comments, its BiDi
settings are derived from the file automatically.  Also, one needs a
method of updating the properties of codepoints as they become assigned
and properties change.  There are also advantages to trying out proposed
changes.

Richard.

Re: RTL PUA?

2011-08-21 Thread Michael Everson

On 22 Aug 2011, at 00:37, Asmus Freytag wrote:

 If your implementation supported the directional overrides, it would be 
 possible to use these to lay out any RTL text in a portable manner. Just 
 enclose any RTL run with RLO and PDF (pop directional formatting).
 
 No impact on any existing implementation, no impact on the standard.

Useful for RTL'ing the Phaistos Disc text or even Latin for the Jabberwocky 
text. Not so desirable for nonce or novel Arabic (or other RTL script) 
characters intended to be used within RTL text strings.

 Those who produce rendering engines that do not support these overrides today 
 could be leaned on to upgrade their implementations - that change would 
 benefit users of non-PUA RTL languages as well (because sometimes, the 
 bidi-algorithm can fail, such as for part numbers, and being able to use RLO 
 is a simple way to stabilize such problematic text).

The problem is that existing PUA characters are all strong L.

 Treating PUA characters as ON is very problematic - their display would 
 become context sensitive in unintended ways. No users of CJK characters would 
 think of using LRM characters, but if text is inserted or viewed in RTL 
 context, it could behave randomly.

Easy to fix: Add RTL PUA characters. 

 In contrast, always supplying a RLO override for RTL text (containing PUA 
 characters) would be a simple thing to remember and to get right.

Not, I think, practical and certainly not putting RTL and LTR users on the same 
level in terms of PUA usage. 

Michael Everson * http://www.evertype.com/

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham

On Sun, 21 Aug 2011 23:55:46 +
Doug Ewell d...@ewellic.org wrote:

 What's a LANGUAGE MARK?

There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect).  'AL' and 'R' have different effects on certain characters
next to digits - it's the mind-numbing part of the BiDi algorithm.
With one a $ sign after a string of European (or is it Arabic?) digits
appears on the left and in the other it appears on the right.  I
can't remember whether 'higher-level protocols' have an effect on this
logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
has a BC of AL. That's why I tentatively raised the notion of ARABIC
LANGUAGE MARK.  Incidentally, an RLO gives characters with a
temporary BC of R, not AL.

Richard.

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham

On Sun, 21 Aug 2011 16:37:34 -0700
Asmus Freytag asm...@ix.netcom.com wrote:

 Treating PUA characters as ON is very problematic - their display
 would become context sensitive in unintended ways. No users of CJK
 characters would think of using LRM characters, but if text is
 inserted or viewed in RTL context, it could behave randomly.

I think a problem would be immediately obvious.  Also, the CJK PUA
characters would usually be guarded by non-PUA CJK characters.

 In contrast, always supplying a RLO override for RTL text (containing 
 PUA characters) would be a simple thing to remember and to get right.

So long as you remembered to pop before digits.  This could easily go
wrong if the text were amended.  For example, if two paragraphs were
merged, one could easily delete a PDF, and then digits at the bottom of
the second paragraph, quite possibly off-screen at the time, would
suddenly flip.

Richard.

1 2 >

1 - 100 of 162 matches

Mail list logo