Re: RTL PUA?

2011-08-21 Thread Asmus Freytag

On 8/21/2011 7:34 PM, Doug Ewell wrote:

So what you are asking about is a directional control character that would 
assign subsequent characters a BC of 'AL', right?

You don't want to call this a LANGUAGE MARK or anything else that implies language 
identification, because of the existence of "real" language identification 
mechanisms and the history of Unicode and language tagging.


An ARM (Arabic RTL Mark) would be a sensible addition to the standard. 
It would close a small gap in design that currently prevents a fully 
faithful plain text export of bidi text from rich text (higher level 
protocol) formats.


In a HLP you can assign any run to behave as if it was following a 
character with bidi property AL.


When you export this text as plain text, unless there is an actual AL 
character, you cannot get the same behavior (other than by the 
heavy-handed method of completely overriding the directionality, making 
your plain text less editable).


So, yes, there's a bit of a use case for such a mark.

(It's effect is limited to treatment of numeric expressions, so it's not 
an "Arabic language" mark, but one that triggers the same bidi context 
as the presence of an Arabic Script (AL) character.)


A./


--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by AT&T

-Original Message-
From: Richard Wordingham
Sender: unicode-bou...@unicode.org
Date: Mon, 22 Aug 2011 03:19:39
To: Unicode Mailing List
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 23:55:46 +
"Doug Ewell"  wrote:


What's a LANGUAGE MARK?

There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect).  'AL' and 'R' have different effects on certain characters
next to digits - it's the mind-numbing part of the BiDi algorithm.
With one a $ sign after a string of European (or is it Arabic?) digits
appears on the left and in the other it appears on the right.  I
can't remember whether 'higher-level protocols' have an effect on this
logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
has a BC of AL. That's why I tentatively raised the notion of ARABIC
LANGUAGE MARK.  Incidentally, an RLO gives characters with a
temporary BC of R, not AL.

Richard.









Re: RTL PUA?

2011-08-21 Thread Mark E. Shoulson

On 08/22/2011 12:53 AM, Shriramana Sharma wrote:

On 08/22/2011 12:01 AM, Peter Constable wrote:

If you mean a rule to substitute [g1 g2] with [g3] won't apply if the
sequence processed by the OpenType Layout lookup processor is [g2
g1],


Peter, actually I suspect Philippe is thinking that in the case of 
RTL, the *glyphs* are placed in reverse order and then he is asking 
how can the ligation take place.


While I don't know much about RTL scripts, if the logic order is ALEF 
+ LAMED, but the presentation order is LAMED + ALEF *because of the 
RTL nature* do you write the rule as ALEF + LAMED = 
ALEF_LAMED_LIGATURE or LAMED + ALEF = ALEF_LAMED_LIGATURE ?




I'm not certain I understand the question, but if I have it right... The 
logic order is ALEF + LAMED, and the presentation... places those in a 
right-to-left sequence, shall we say (since talking about the 
presentation *order* is confusing here).  The font table contains the 
lookup that ALEF + LAMED ⇒ ALEF_LAMED_LIGATURE.  It all goes according 
to the logical order, since the presentation "order" isn't really an 
order, it's just a direction.  (this is different from things like 
devanagari short-i vowel, which "moves" with respect to the other 
letters in the script.)


~mark



Re: RTL PUA?

2011-08-21 Thread Shriramana Sharma

On 08/22/2011 08:24 AM, Peter Constable wrote:

I'm not saying that there shouldn't be_some_  software that can do
what you expect. But there will likely be some different views on
what ought to be included within that "some".


Peter, given that both AAT and Graphite have provisions for assigning 
custom properties including BC to PUA characters, it seems Uniscribe is 
the only one missing out. Those advocating RTL PUA areas seem to reject 
AAT and Graphite as "hacks" or "wow *one* application" [*].


[* = LibreOffice is the *only* multipurpose application running on 
/Windows/ to support Graphite and I'm not counting SIL WorldPad. On *nix 
platforms, *any* number of applications that use HB-NG for rendering 
will be able to handle Graphite in the near future because HB-Graphite 
integration is already done. That is to say, once GTK and Qt fully 
switch to HB-NG.]


Anyhow, if you Microsoft guys added support in Uniscribe for ascribing 
custom properties including BC to PUA characters (or have you already 
done it) it would be what would satisfy these PUA RTL users and convince 
them that no RTL PUA zones are needed, it seems.


The suggestion has been made that fonts should be able to carry some 
additional custom tables specifying custom properties for PUA 
characters, which seems reasonable. I'm not sure if the OT GDEF table or 
the AAT PROP table completely satisfies this requirement. People 
interesting in using custom properties for the PUA (which includes me 
for Indic script) should then sit up and formulate the syntax for such 
tables.


If Uniscribe, AAT, and Harfbuzz then provided generic support for 
parsing such tables and rendering PUA characters accordingly, it would 
be an all-around solution both for RTL PUA as well as Indic PUA, I 
suppose. (But I'm not sure how such a custom table would interact with 
the innate ability of Graphite to handle custom properties. It should 
probably be either the new proposed custom table or Graphite.)


[sigh]

--
Shriramana Sharma



Re: RTL PUA?

2011-08-21 Thread Shriramana Sharma

On 08/22/2011 12:01 AM, Peter Constable wrote:

If you mean a rule to substitute [g1 g2] with [g3] won't apply if the
sequence processed by the OpenType Layout lookup processor is [g2
g1],


Peter, actually I suspect Philippe is thinking that in the case of RTL, 
the *glyphs* are placed in reverse order and then he is asking how can 
the ligation take place.


While I don't know much about RTL scripts, if the logic order is ALEF + 
LAMED, but the presentation order is LAMED + ALEF *because of the RTL 
nature* do you write the rule as ALEF + LAMED = ALEF_LAMED_LIGATURE or 
LAMED + ALEF = ALEF_LAMED_LIGATURE ?


--
Shriramana Sharma



RE: RTL PUA?

2011-08-21 Thread Peter Constable
Um... Computers are hardware, and don't understand a thing. What I think you 
mean is computer _software_. (I know, I'm being pedantic, but with good 
reason.) 

The question is, whether you need a protocol that can be understood by _all_ 
computer software, or by _some_ computer software. Obviously, it is not 
feasible to have something that will be understood by all computer software. 
So, it would be _some_ computer software. But, then, I think such things exist; 
e.g. there appear to be ways to do this in some Mac software, and in some 
Graphite software.

But something tells me you expect more than that: not _all_ software; just 
_most_ software; or just most of the really-common-used software that gets used 
by people every day all over the world. 

To keep this in perspective, note that 99.% of those people (or fewer) have 
no understanding or even knowledge of PUA characters, and apart from some CJK 
sceanrios* 99.% have no need to use them. (*The biggest exception is the 
invention of characters for personal names in places like Taiwan. But RTL is 
not relevant there.)

I'm not saying that there shouldn't be _some_ software that can do what you 
expect. But there will likely be some different views on what ought to be 
included within that "some".


Peter



-Original Message-
From: Doug Ewell [mailto:d...@ewellic.org] 
Sent: Sunday, August 21, 2011 2:20 PM
To: verd...@wanadoo.fr; Peter Constable
Cc: Michael Everson; Unicode Discussion List
Subject: Re: RTL PUA?

For once, I am in strong agreement with something Philippe had to say:

> We really need a raliable way to transport a PUA agreement in such a
way that it can be understood by a computer.

I don't necessarily agree that fonts, or (especially) any particular font 
technology, are the one and only way to accomplish this, because there's more 
to character handling than display. Maybe some sort of open format could be 
devised that could be used as a plug-in to a variety of existing components.

--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by AT&T




RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Asmus Freytag

> Treating PUA characters as ON is very problematic

As would be changing the default property of PUA characters from L to ON.




Peter




Re: RTL PUA?

2011-08-21 Thread Doug Ewell
So what you are asking about is a directional control character that would 
assign subsequent characters a BC of 'AL', right?

You don't want to call this a LANGUAGE MARK or anything else that implies 
language identification, because of the existence of "real" language 
identification mechanisms and the history of Unicode and language tagging.
 
--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by AT&T

-Original Message-
From: Richard Wordingham 
Sender: unicode-bou...@unicode.org
Date: Mon, 22 Aug 2011 03:19:39 
To: Unicode Mailing List
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 23:55:46 +
"Doug Ewell"  wrote:

> What's a LANGUAGE MARK?

There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect).  'AL' and 'R' have different effects on certain characters
next to digits - it's the mind-numbing part of the BiDi algorithm.
With one a $ sign after a string of European (or is it Arabic?) digits
appears on the left and in the other it appears on the right.  I
can't remember whether 'higher-level protocols' have an effect on this
logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
has a BC of AL. That's why I tentatively raised the notion of ARABIC
LANGUAGE MARK.  Incidentally, an RLO gives characters with a
temporary BC of R, not AL.

Richard.





RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

>> As I explained in an earlier message, the layout engine doesn't use 
>> the "default" property value but the resolved bidi level.
>
> Once again, you refuse to understand my arguments. 

I don't think I'm refusing to understand anything. I'm merely taking your 
assertions _as stated_ and evaluating whether I think they are accurate or not. 
Perhaps what you intend to convey assumes things not clear in what you've 
stated, since you think I'm not understanding you.


> What I'm saying is that OpenType CANNOT resolve the bidi level of 
> PUAs (with the exception where we use additional BiDi controls, 

Of course _OpenType_ cannot, but any rendering engine that uses OpenType _must_ 
resolve the bidi level of _all_ characters in a sequence that it is given to 
render. Given our current situation, a default rendering implementation would 
resolve PUA characters to an even (LTR) level unless, of course, bidi control 
characters -- particularly RLO -- are used to override the directionality of 
the character, as you mention.

> which remains a hack, because it adds unnecessary unvisible markup 
> around the encoded texts, and complexifies the use of strings and 
> substrings).

We'll, depending on how you define "hack", some might reasonably suggest that 
any usage of PUA is "a hack". (Of course, some who may not use the term in the 
same way might argue that it is certainly not "a hack".)

You can turn the problem as you want, but PUAs (as well as unknown
characters) still have default properties that, in fine, will get used in 
absence of a more precise definition (i.e. an explicit override) of the actual 
BiDi property needed for the character.


> And at least on this point, Michael Everson is also right when he says 
> that PUAs do not properly handle RTL scripts only because of their 
> default BiDi property value. 

That depends on what your expectation is of PUA code points in stock software 
implementations, without any further tailoring. Sharma has made the point that, 
even if there were PUA code points with default RTL bidi properties, a user who 
wanted PUA characters for an as-yet-unencoded Indic script would still not have 
PUA code points with the default properties they'd need to get their script to 
display correctly. We could start naming off any kind of text process and any 
number of unencoded characters or script; we can't possibly (literally) assign 
PUA code points with default values for every different combination of 
character property values. But that is what would be needed if one were to 
argue that there should be PUA code points available for an arbitrary user to 
get desired behaviour in their text process of concern using only available PUA 
code positions and stock software without further tailoring.

I don't hold to that expectation (certainly not in the strong form as I've 
described).


> I did not post any assertion about how OpenType could be used, just 
> wanted to > explain that with the current specifications, it cannot
> *currently* resolve the problem 

OpenType cannot resolve the problem of the directionality of PUA characters for 
the particular reason that OpenType doesn't deal with bidi layout at all! It 
sits at a lower level in a text rendering stack. But, what it can do -- and 
this is _all_ that I said -- is that (once the layers above it have resolved 
bidi levels) the OpenType spec has details covering the mechanisms necessary to 
handling glyph mirroring.



Peter




Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 16:37:34 -0700
Asmus Freytag  wrote:

> Treating PUA characters as ON is very problematic - their display
> would become context sensitive in unintended ways. No users of CJK
> characters would think of using LRM characters, but if text is
> inserted or viewed in RTL context, it could behave randomly.

I think a problem would be immediately obvious.  Also, the CJK PUA
characters would usually be guarded by non-PUA CJK characters.

> In contrast, always supplying a RLO override for RTL text (containing 
> PUA characters) would be a simple thing to remember and to get right.

So long as you remembered to pop before digits.  This could easily go
wrong if the text were amended.  For example, if two paragraphs were
merged, one could easily delete a PDF, and then digits at the bottom of
the second paragraph, quite possibly off-screen at the time, would
suddenly flip.

Richard.



Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 23:55:46 +
"Doug Ewell"  wrote:

> What's a LANGUAGE MARK?

There are *three* strong directionalities - 'L' left-to-right, 'AL'
right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
suspect).  'AL' and 'R' have different effects on certain characters
next to digits - it's the mind-numbing part of the BiDi algorithm.
With one a $ sign after a string of European (or is it Arabic?) digits
appears on the left and in the other it appears on the right.  I
can't remember whether 'higher-level protocols' have an effect on this
logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
has a BC of AL. That's why I tentatively raised the notion of ARABIC
LANGUAGE MARK.  Incidentally, an RLO gives characters with a
temporary BC of R, not AL.

Richard.



Re: RTL PUA?

2011-08-21 Thread Michael Everson
On 22 Aug 2011, at 00:37, Asmus Freytag wrote:

> If your implementation supported the directional overrides, it would be 
> possible to use these to lay out any RTL text in a portable manner. Just 
> enclose any RTL run with RLO and PDF (pop directional formatting).
> 
> No impact on any existing implementation, no impact on the standard.

Useful for RTL'ing the Phaistos Disc text or even Latin for the Jabberwocky 
text. Not so desirable for nonce or novel Arabic (or other RTL script) 
characters intended to be used within RTL text strings.

> Those who produce rendering engines that do not support these overrides today 
> could be leaned on to upgrade their implementations - that change would 
> benefit users of non-PUA RTL languages as well (because sometimes, the 
> bidi-algorithm can fail, such as for part numbers, and being able to use RLO 
> is a simple way to stabilize such problematic text).

The problem is that existing PUA characters are all strong L.

> Treating PUA characters as ON is very problematic - their display would 
> become context sensitive in unintended ways. No users of CJK characters would 
> think of using LRM characters, but if text is inserted or viewed in RTL 
> context, it could behave randomly.

Easy to fix: Add RTL PUA characters. 

> In contrast, always supplying a RLO override for RTL text (containing PUA 
> characters) would be a simple thing to remember and to get right.

Not, I think, practical and certainly not putting RTL and LTR users on the same 
level in terms of PUA usage. 

Michael Everson * http://www.evertype.com/





Re: RTL PUA?

2011-08-21 Thread Doug Ewell
I suggested 'R' for Plane 16, not 'ON'.

What's a LANGUAGE MARK?

--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by AT&T

-Original Message-
From: Richard Wordingham 
Sender: unicode-bou...@unicode.org
Date: Sun, 21 Aug 2011 23:31:58 
To: 
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 11:00:26 -0600
"Doug Ewell"  wrote:

> I think as soon as we start talking about this many scenarios, we are
> no longer talking about what the *default* bidi class of the PUA (or
> some part of it) should be.  Instead, we are talking about being able
> to specify private customizations, so that one can have 'AL' runs and
> 'ON' runs and so forth.

I was exploring the consequences to see if there was a one size fits
all solution.  Someone (you?) suggested ON as a default, and I like
it.  I think it would also work fairly well for practical CJK
applications as well - the only problems are that LRM and RLM would
occasionally be needed, and the subtle differences between AL and R
would be lost.  I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.

> Through most of the 1990s, most 
> existing applications and technologies didn't support Unicode at all,
> or very small parts of it, and the solution generally was to update
> them so that they would.  The same should be true here.

Agreed.  I also noted that changes would be of limited assistance for
extending existing supported scripts.

> I would
> suggest that installing a modified copy of UnicodeData.txt seems like
> a rather clumsy solution; if text files are involved, I'd suggest
> leaving UnicodeData.txt alone and creating some sort of "overrides"
> file.

While partial overrides are cleaner, that appears to be the way to fix
Pango, albeit via recompilation.  According to the comments, its BiDi
settings are derived from the file automatically.  Also, one needs a
method of updating the properties of codepoints as they become assigned
and properties change.  There are also advantages to trying out proposed
changes.

Richard.





Re: RTL PUA?

2011-08-21 Thread Asmus Freytag

On 8/21/2011 3:31 PM, Richard Wordingham wrote:

On Sun, 21 Aug 2011 11:00:26 -0600
"Doug Ewell"  wrote:


I think as soon as we start talking about this many scenarios, we are
no longer talking about what the *default* bidi class of the PUA (or
some part of it) should be.  Instead, we are talking about being able
to specify private customizations, so that one can have 'AL' runs and
'ON' runs and so forth.

I was exploring the consequences to see if there was a one size fits
all solution.  Someone (you?) suggested ON as a default, and I like
it.  I think it would also work fairly well for practical CJK
applications as well - the only problems are that LRM and RLM would
occasionally be needed, and the subtle differences between AL and R
would be lost.  I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.


If your implementation supported the directional overrides, it would be 
possible to use these to lay out any RTL text in a portable manner. Just 
enclose any RTL run with RLO and PDF (pop directional formatting).


No impact on any existing implementation, no impact on the standard.

Those who produce rendering engines that do not support these overrides 
today could be leaned on to upgrade their implementations - that change 
would benefit users of non-PUA RTL languages as well (because sometimes, 
the bidi-algorithm can fail, such as for part numbers, and being able to 
use RLO is a simple way to stabilize such problematic text).


Treating PUA characters as ON is very problematic - their display would 
become context sensitive in unintended ways. No users of CJK characters 
would think of using LRM characters, but if text is inserted or viewed 
in RTL context, it could behave randomly.


In contrast, always supplying a RLO override for RTL text (containing 
PUA characters) would be a simple thing to remember and to get right.


A./




Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/21 Doug Ewell :
> For once, I am in strong agreement with something Philippe had to say:
>
>> We really need a raliable way to transport a PUA agreement in such a
> way that it can be understood by a computer.
>
> I don't necessarily agree that fonts, or (especially) any particular font 
> technology, are the one and only way to accomplish this, because there's more 
> to character handling than display. Maybe some sort of open format could be 
> devised that could be used as a plug-in to a variety of existing components.

Yes but without display support, at least, all the other needs will
never be addressed, because you won't have text encoded to work with.
So don't even dream for example about performing plain-text search, if
you don't have encoded texts to search in ! Collation is then a
secondary target. Proper display is an immediate need (that even comes
before the development of easy input methods, or later developments of
spell checkers, content indexers, semantic analyzers, and localization
of softwares to use a given script through its UI).

For proper display of PUAs, all that is needed is a minimum set of
character properties. I have argued, against what Peter Constable
thinks, that OpenType cannot handle RTL characters with PUAs, because
it has absolutely no source of information to know if a run of text is
RTL or LTR, when implemeing the BiDi algorithm.

OK, the mirroring property is probably not essential (because most
mirrored characters are today only punctuations, that already cover a
very wide range. If needed additional PUA punctuations may be added,
and even coded in two mirrored code positions, even if they are not
automatically mirrored according to their context : for such rare
cases, using BiDi format controls around them, or other equivalent CSS
embedding styles in HTML, and similar technics, will be enough.

But for most of the RTL text using PUAs in long runs or mixed within
other sequences of standard RTL characters (for example in the middle
of words), format controls are clearly not the solution (it does not
work reliably in HTML for example, if you have to split words within
separate spans, and inserting those controls in the middle of words is
really a nightmare). In addition it completely defeats the plain-text
searchability and editability of encoded texts. This will only slow
down the production of encoded texts that in fact, almost no work will
be done with those PUAs. As a consequence, most texts will wait
indefinitely for some encoding effort.

The need will become even more urgent now that the UTC and WG2 will
pass most of its time in discussing scripts that are rarely used,
where the cultural knowledge will be difficult to find. If we don't
have an easy way to experiment their encodings at least with PUAs, for
extended periods (because there will be the need of a long research
period, with conflicting experimentations), those scripts will remain
unencoded in the UCS for very long. And in fact I doubt that even the
WG2 or the UTC will have the resources to provide all this effort
without commiting many critical errors that will be a plague for the
long-term future.

We absolutely need a transition mechanism, and PUAs can be part of
this transition. For the same reason, the possibility offered to
support external character prorperties, for characters that are not
encoded or encoded in separate efforts via PUAs, and later that will
be encoded with low levels of implementations and deployment for many
year, would certainly help maintaining the needed resources (at UTC
and WG2) at a low level, where most of the experimentations will be
performed independantly without depending on the release of a putative
version of the UCS finally accepting to encode the script.

But even in this case, or historic scripts, the encoding effort will
be hard to finalize: it is highly probable that those scripts will be
encoded progressively, with a starting minimum subset about which most
people will agree, and many other characters remaining that need
longer experimentations or researches. Those scripts will then need to
support for long a mix of standard assignments, and PUAs, at the same
time, for distinct small communities that will need to share and
discuss their agreement.

The current problem is that there is absolutely no transition
mechanism in the UCS encoding process: a character gets fully encoded
with most of its essential properties becoming normative, some of them
impossible to change later (even if there was an error or an
unexpected caveat, that the interested communities have not had any
chance to experiment before they were finally approved by the UTC and
WG2).

Unicode should not interfere with what users will want to do with
PUAs. After all, PUAs was made specifically for that. If users need to
assign their own property values to PUAs, they must be able to do
that. And these properties must find a way to be representable in the
current technology frameworks.

If

Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 11:00:26 -0600
"Doug Ewell"  wrote:

> I think as soon as we start talking about this many scenarios, we are
> no longer talking about what the *default* bidi class of the PUA (or
> some part of it) should be.  Instead, we are talking about being able
> to specify private customizations, so that one can have 'AL' runs and
> 'ON' runs and so forth.

I was exploring the consequences to see if there was a one size fits
all solution.  Someone (you?) suggested ON as a default, and I like
it.  I think it would also work fairly well for practical CJK
applications as well - the only problems are that LRM and RLM would
occasionally be needed, and the subtle differences between AL and R
would be lost.  I expect ARABIC LANGUAGE MARK would not go down well
- has it already been proposed and rejected?.

> Through most of the 1990s, most 
> existing applications and technologies didn't support Unicode at all,
> or very small parts of it, and the solution generally was to update
> them so that they would.  The same should be true here.

Agreed.  I also noted that changes would be of limited assistance for
extending existing supported scripts.

> I would
> suggest that installing a modified copy of UnicodeData.txt seems like
> a rather clumsy solution; if text files are involved, I'd suggest
> leaving UnicodeData.txt alone and creating some sort of "overrides"
> file.

While partial overrides are cleaner, that appears to be the way to fix
Pango, albeit via recompilation.  According to the comments, its BiDi
settings are derived from the file automatically.  Also, one needs a
method of updating the properties of codepoints as they become assigned
and properties change.  There are also advantages to trying out proposed
changes.

Richard.



Re: ARABIC LETTER KORANIC YEH WITH HAMZA

2011-08-21 Thread Philippe Verdy
2011/8/21 Arno Schmitt :
> Philippe,
> Philippe> The rule relative to the shadda is so strong that this is even one 
> of
> Philippe> the very first thing you're taught in some didactic tutorials on how
> Philippe> to read Arabic.
>
> the rule is not valid for most orthographies of the Koran

OK, but are these Koranic variants semantically different ? May be the
placement rule was not so strong in the history. For modern Arabic,
the two placements will be perceived as equivalent with a strong
preference for the raised vowel in presence of the consonnantal shadda
modifier. In fact, those two placements should probably have been
unified in a single codepoint, with only a variation selector for
maintaining the vowel at the lower position. (But it's not the time
now to discuss about this disunification, even if I don't know any
contrasting example where the different placements implies distinct
semantics or readings of the long vowel, such as some dialectal
diphtong).

Another thing that I know is that the preferred repetition of the
vowel between the previous consonnant where it is added as a
diacritic, and the long vowel after it (using a matres lectionis
letter) is not universal: the diacritic vowel can frequently be
omitted, as it is implicit (and many texts that are supposed to be
consistant in displaying this repeitition everywhere, contain frequent
cases where this repetition is omitted ; notably when the long vowel
is an unmodified Alef "matres lectionis"). I can find many examples of
this even in modern didactic courses (I'm not sure that the omission
was made on purpose, it's probably because the diacritic vowel adds no
value, and is not mandatory anyway, when instead the presence of the
matres lectionis is absolutely required by orthographic rules in all
writing styles, including unpointed texts).

The cases where the diacritic vowel is less frequently is frequently
when this explicitly marks a vowel mutation for a declination,
feminine or plural, or to help interpret the liaison that may occur
with a nearby word. But in the middle of radicals (not altered by
vowel mutations by gammar), such repetition is frequently omitted, ony
the matres letionis long vowel letter remains. The split between
modern Arabic and Koranic texts is not so strict. I also see similar
omissions in old Koranic texts even though they are pointed with great
details (for correct reading): this superfluous implicit vowel mark
does not change the reading, and it may be more valuable to place
other diacritics than this vowel, or to use larger and more visible
glyphs for the base letter.



Re: RTL PUA?

2011-08-21 Thread Doug Ewell
For once, I am in strong agreement with something Philippe had to say:

> We really need a raliable way to transport a PUA agreement in such a
way that it can be understood by a computer.

I don't necessarily agree that fonts, or (especially) any particular font 
technology, are the one and only way to accomplish this, because there's more 
to character handling than display. Maybe some sort of open format could be 
devised that could be used as a plug-in to a variety of existing components.

--
Doug Ewell • d...@ewellic.org
Sent via BlackBerry by AT&T




Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/21 Peter Constable :
> From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy
>
>> A GSUB operation will only be used if it is specified in the correct feature
>> table. The problem here is which feature to use: "rtlm" or "ltrm" ? It's
>> impossible to know because it first depend on the layout engine to KNOW
>> exactly if the run of text is RTL or LTR.
>
> The layout engine already _has_ to know the bidi level of a run regardless.
>
>
>> Without a font-level support of BiDi properties of PUAs (or unassigned
>> characters),
>
> I'm trying to tell you that, wrt mirroring, that's already defined in the 
> OpenType spec.
>
>
>> the layout engine will assume the wrong guess from the "default" property
>> value. And then it won't find the expected GSUB operation, because it won't
>> match it in the correct feature subtable.
>
> As I explained in an earlier message, the layout engine doesn't use the 
> "default" property value but the resolved bidi level.

Once again, you refuse to understand my arguments. What I'm saying is
that OpenType CANNOT resolve the bidi level of PUAs (with the
exception where we use additional BiDi controls, which remains a hack,
because it adds unnecessary unvisible markup around the encoded texts,
and complexifies the use of strings and substrings).

You can turn the problem as you want, but PUAs (as well as unknown
characters) still have default properties that, in fine, will get used
in absence of a more precise definition (i.e. an explicit override) of
the actual BiDi property needed for the character.

> Btw, in the past few weeks, you've written several posts in which you make 
> assertions about how rendering implementations work and, in some cases, why 
> more is needed. And then I or others have to spend a bunch of time writing 
> responses so that you get the correct understanding and, more importantly, so 
> that others don't get mislead. It would be a lot easier if you just asked, 
> "How is this done?"

Ok, you've replied, but not completely.

And at least on this point, Michael Everson is also right when he says
that PUAs do not properly handle RTL scripts only because of their
default BiDi property value. But I don't maintain his idea of encoding
new PUAs, when in fact we can effectively provide the additional
character properties needed, for example in fonts, without changing
the default proerty of PUA (I son't support it at all, and probably
you too) and without allocating more (unneeded) PUA block(s) for RTL
scripts (and also without hacking on top of another existing set of
RTL assigned characters).

I did not post any assertion about how OpenType could be used, just
wanted to explain that with the current specifications, it cannot
*currently* resolve the problem (and Michael Everson certainly fully
agrees with that, but he can reply as well if he thinks that I
misinterpret his last few messages).

We really need a raliable way to transport a PUA agreement in such a
way that it can be understood by a computer. An encoded font can
transport this information reliably, which at least must include some
necessary character property values, and it offers a smooth way for
transitions during all the encoding process of new scripts (notably
during the experimentation), as well as after that, for its adoption
for more general use (before a large majority of users can use updated
implementations of their text renderers, that will provide
automatically those properties for newly encoded characters and
scripts.

Simply because it's MUCH easier to upgrade a font (especially a PUA
font which is not part of the core fonts of the operating system),
than to upgrade a rendering engine (bound to the OS, for the case of
Microsoft APIs and libraries in Windows). An extensible set of
properties, managed with a good rule of priorities to avoid hacks or
non-compliant implementations, can certainly accelerate the
development and adoption rate by many years, can improve the number of
experimentations possible, can help avoiding errors during the
encoding process for new characters and scripts.

It could reduce this delay from about 10 years (during which even if
the script or characters are encoded, it will not be available or
usable reliably), to just a few months (even anticipating the final
encoding in the UCS, by a reliable way to represent it as PUAs,
managed with help of a PUA font, and after the UCD encoding, with a
font that provides the upward upgrade for older implementations of the
layout engine only knowing an older UCD version)

I ma completley convinced that we don't need more PUAs due to
continuous lack of support in existing softwares. But softwares can
still be updated to provide the support with the help of transitional
subtables in fonts (that can easily be ignored by newer engines that
won't require such extension tables), for integrating the additional
character properties.

Philippe.




RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

> A GSUB operation will only be used if it is specified in the correct feature 
> table. The problem here is which feature to use: "rtlm" or "ltrm" ? It's 
> impossible to know because it first depend on the layout engine to KNOW 
> exactly if the run of text is RTL or LTR.

The layout engine already _has_ to know the bidi level of a run regardless.


> Without a font-level support of BiDi properties of PUAs (or unassigned 
> characters), 

I'm trying to tell you that, wrt mirroring, that's already defined in the 
OpenType spec.


> the layout engine will assume the wrong guess from the "default" property 
> value. And then it won't find the expected GSUB operation, because it won't 
> match it in the correct feature subtable.

As I explained in an earlier message, the layout engine doesn't use the 
"default" property value but the resolved bidi level.


Btw, in the past few weeks, you've written several posts in which you make 
assertions about how rendering implementations work and, in some cases, why 
more is needed. And then I or others have to spend a bunch of time writing 
responses so that you get the correct understanding and, more importantly, so 
that others don't get mislead. It would be a lot easier if you just asked, "How 
is this done?"


Peter




RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

>> In the OpenType specification

> In addition, this specification highly depends on two things:
> - the layout engine fully knows the properties of all characters in 
> order to implement BiDi reordering as well as BiDi mirroring

Not true: mirroring depends on the resolved directionality, not the Unicode 
character properties.


> - the layout engine fully knows the necessary mappings for the OMPL
> table (this assumes that it always implements the latest version of the UCD)

No. The OMPL is fixed at TUS 5.1.




Peter





Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/21 Peter Constable :
>> Exactly, but mirroring data for remapping glyphs will not be be part of that
>> font.
>
> Um... Why not? If the mirroring isn't in reflected in 
> http://www.unicode.org/Public/5.1.0/ucd/BidiMirroring.txt, then it must be 
> handled by glyph substitution in the font as a normal GSUB operation.

A GSUB operation will only be used if it is specified in the correct
feature table. The problem here is which feature to use: "rtlm" or
"ltrm" ? It's impossible to know because it first depend on the layout
engine to KNOW exactly if the run of text is RTL or LTR.

Without a font-level support of BiDi properties of PUAs (or unassigned
characters), the layout engine will assume the wrong guess from the
"default" property value. And then it won't find the expected GSUB
operation, because it won't match it in the correct feature subtable.



Re: RTL PUA?

2011-08-21 Thread John Hudson

Petr Tomasek wrote:

Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.



Not true. See:
Collete Sirat. Hebrew Manuscripts of the Middle Ages. Cambridge University 
Press 2002,
fig. 114 (p. 176) or fig. 127 (p. 189) or fig. 134 (p. 193).


I wouldn't classify any of those examples as 'common'. I also wouldn't 
classify all examples of touching letters -- of which many occur in 
rapidly written text -- as ligatures. Aleph+lamed on the other hand is a 
regularly occurring distinct formation in whole classes of manuscripts 
(and persisting in typography). I have a good collection of books on 
Hebrew palaeography, and while there are many examples of Hebrew letters 
being very tightly spaced there are relatively few instances of what I 
would consider ligatures, i.e. formations in which the ductus or spacing 
of the specific sequences of letters is modified to facilitate connection.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_



Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/21 Peter Constable :
> In the OpenType specification, the only data related to glyph mirroring that 
> a rendering engine is assumed to have is the bidi mirroring data from TUS 
> 5.1. (See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) 
> All other glyph mirroring is to be handled using glyph substitution data in 
> OpenType Layout tables in fonts.

In addition, this specification highly depends on two things:
- the layout engine fully knows the properties of all characters in
order to implement BiDi reordering as well as BiDi mirroring
- the layout engine fully knows the necessary mappings for the OMPL
table (this assumes that it always implements the latest version of
the UCD)

This is not the case because:
- an OpenType layout engine will always implement a specific version
of the UCD. Standard properties defined in the UCD will never concern
unassigned characters that will be assigned in a later version. As
well, it will not provide any normative property for the PUA. All it
can then do is then to apply "default" properties for unassigned
(still unknown) characters, as well as for all PUAs.
- as such it will never be able to assert which runs of text
containing PUAs or unassigned characters are in RTL order of LTR
order.
- if it uses the default LTR order, it will not be able to find any
mirroring mapping in the OMPL, because the OMPL lookup table will only
be searched for runs tht have been identified as RTL
- if it uses the default RTL order assumed from some blocks, the OMPL
will still not work with unknown characters/code points (the OMPL only
contains a list of pairs of known (assigned) non-PUA characters), so
character-level mirroring will not work as expected.
- in addition, if it cannot know if a run of reordered characters is
LTR or RTL, after mapping them to the glyph id's from the cmap (where
it exists in a font for the unknown non-PUA character or the PUA
character), it won't know which of the "ltrm" or "rtlm" tables to use
(if it assumes incorrectly the default LTR order, which is the default
for PUA, it will only lookup in the "ltrm" table, not on the "rtlm"
table. Mirroring will then not work if the RTL or RTL guess was wrong.

The only way to change this would be that the OpenType layout engine
allows overriding its default properties for unassigned or PUA
characters. For the case of BiDi reordering, this would require the
support of an additional lookup table in the OpenType font, containing
overrides for the BiDi character class assigned to characters. Of
course, this lookup table should NEVER be used if the character is
non-PUA and known in the implementation of the UCD by the layout
engine. The rule would be:
- if the character is not a PUA and is known in the current
implemented version of the UCD, use the known character property of
the UCD (allow no override).
- otherwise if the character (which is then either a PUA or an unknown
non-PUA) is mapped in the font's "cmap" table, and there's a "BiDi"
lookup table on the OpenType font, and that lookup table provides the
proerty value for that character, use that property
- otherwise use the default property value (indicated in the UCD and
Unicode specifications).

A similar rule can be used as well for the character-level mirroring:
the standard OMPL will be used if and only if the character is not a
PUA and is known in the impelemtned version of the UCD. Otherwise, an
"OMPL" table in the OpenType font will contain additional character
pairs to lookup. Such lookup will however never be performed if the
character is in a LTR run (which means that this feature is dependant
on the correct implementation of the BiDi override above, which must
be impelmented first).

Then only, the existing "ltrm" and "rtlm" lookup tables in OpenType
can be used like today, because the OpenType layout engine knows
reliably which one to use. This allows standard glyph-level mirroring
to be specified (between pairs of glyph-id's).

Also the existing "ltra" and "rtla" lookup tables will be workable to
provide lists of alternate mirrored glyphs (but only for advanced
applications that allows selecting alternate variants). It may be
possible that this first requires the support of additional variation
sequences (using variation selectors), which are unknonw in the
implemented version of the UCD, using an additional lookup table
working under the same rule as above, in order to allow sequences of
PUA+VSn (which will never be part of the UCD, but may be needed under
the PUA convention agreement that the font provides).

One difficulty in this scheme is that all those properties in OpenType
were never meant to be overridable in specific fonts. This means that
they were assumed to be consistant across all fonts. The difficulty
can come because of the behavior of font subsitutions. I don't think
this is critical because this also means that we change of PUA
agreement in this case: the encoded PUA text is then dependant of the
PUA font used 

Re: RTL PUA?

2011-08-21 Thread Petr Tomasek
On Sun, Aug 21, 2011 at 10:09:22AM -0700, John Hudson wrote:
> Jonathan Rosenne wrote:
> 
> >People do all kinds of fancy things. I guess old manuscripts contain many
> >ligatures...
> 
> Not in Hebrew. The only common ligature is the aleph_lamed, a 
> post-classical import from Judaeo-Arabic.
> 
> JH

Not true. See:

Collete Sirat. Hebrew Manuscripts of the Middle Ages. Cambridge University 
Press 2002,
fig. 114 (p. 176) or fig. 127 (p. 189) or fig. 134 (p. 193).

-- 
Petr Tomasek 
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA






RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy

>> I agree that OpenType font tables cannot to glyph re-ordering. But totally 
>> incorrect in saying that it cannot handle ligatures.

> I meant "recognizing and generating ligatures in the context where 
> re-ordering has been performed externally by the renderer". 

That statement isn't adequate: the results of re-ordering may result in 
contexts in which ligatures will occur. That can happen, for instance, in 
displaying Indic scripts.


> Ligatures can only be recognized in OpenType, provided that the layout
> engine has performed the reordering itself, because OpenType fonts
> won't recognize ligatures with glyphs in arbitrary order or intersperced 
> with other unrelated characters coming from an unreordered glyph sequence.

I'm not sure what it means to create a ligature of glyphs in arbitrary order. 
If you mean a rule to substitute [g1 g2] with [g3] won't apply if the sequence 
processed by the OpenType Layout lookup processor is [g2 g1], then that's true: 
if the behaviour of the script is such that glyph re-ordering is appropriate, 
then a rendering engine for OpenType should do that reordering, and 
substitution lookups in OpenType fonts should be written to assume that that 
reordering has taken place.


>>> What this means is that, in practice, PUA are only usable in fonts 
>>> for characters with strong LTR directionality, excluding all 
>>> reordering and mirroring.
>>
>> In the OpenType specification, the only data related to glyph mirroring 
>> that a rendering engine is assumed to have is the bidi mirroring data from 
>> TUS 5.1. (See 
>> http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) 
>> All other glyph mirroring is to be handled using glyph substitution data in 
>> OpenType Layout tables in fonts.
>
> Exactly, but mirroring data for remapping glyphs will not be be part of that 
> font. 

Um... Why not? If the mirroring isn't in reflected in 
http://www.unicode.org/Public/5.1.0/ucd/BidiMirroring.txt, then it must be 
handled by glyph substitution in the font as a normal GSUB operation.



Peter




Re: RTL PUA?

2011-08-21 Thread Mark E. Shoulson

On 08/21/2011 01:09 PM, John Hudson wrote:

Jonathan Rosenne wrote:

People do all kinds of fancy things. I guess old manuscripts contain 
many

ligatures...


Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.
Closest you might have to "ligatures" is idiosyncratic 
letters-getting-joined-together by rapid writing, etc.  There are some 
examples in Ada Yardeni's book.  But they're not really ligatures; at 
best _maybe_ they're calligraphic variants (tho mostly they're quite the 
opposite of calligraphic).


Alef-Lamed did get a fair amount of use as a true ligature, though.

~mark



Re: ARABIC LETTER KORANIC YEH WITH HAMZA

2011-08-21 Thread Lorna Priest

mmarx wrote:

I am so pleased that there are some people on
the list interested in RTL scripts after all.

encoded are
ARABIC LETTER YEH (U+064A)   with two dots under the
  main part of the letter
  ييي ي
ARABIC LETTER FARSI YEH (U+06CC) with two dots under
  init and medi forms only
  ییی ی
ARABIC LETTER ALEF MAKSURA' U+0649 without dots ىىى ى
and
ARABIC LETTER YEH WITH HAMZA ABOVE (U+0626)
this letter has in "normal" Arabic no dots under
the main part of the letter :  ئئئ ئ

As you can see on the picture there is a variants
of this character that has dots in init and medi, and
no dots in fina and isol. Should it be called
ARABIC LETTER FARSI YEH WITH HAMZA
or
ARABIC LETTER KORANIC YEH WITH HAMZA ?
A third option (ARABIC LETTER MAGHRIBI KORANIC YEH
WITH HAMZA) is not a good idea, because the letter
although nowadays used in editions of the qur'an
-for_ the Maghribi, it used to be used in the East
as well. (I stress the "for", because the pictures
in the attachment are from an edition printed in
Medina.)
In the Qur'an the hamza is not always above:
if there is kasra on the letter, the hamza
is below too.

Is there someone preparing a proposal for
missing Quranic characters  or should one
do it one by one?
My understanding when I was at the February UTC was that no more 
characters "with hamza" will be accepted *unless* there is something 
different about the shaping of the letter "with hamza" or if the letter 
"with hamza" means something totally different than the normal hamza 
purpose. Otherwise, it is expected that the hamza above or hamza below 
will be sufficient.


I don't think this has been documented as yet.

Lorna






Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/21 Peter Constable :
> From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On 
> Behalf Of Philippe Verdy
>
>> Hmmm Given the current standard in OpenType, and the fact that
>> OpenType fonts cannot reorder glyphs to support the BiDi algorithm
>> and correctly handle featues like ligatures...
>
> I agree that OpenType font tables cannot to glyph re-ordering. But totally 
> incorrect in saying that it cannot handle ligatures.

I meant "recognizing and generating ligatures in the context where
re-ordering has been performed externally by the renderer". Ligatures
can only be recognized in OpenType, provided that the layout engine
has performed the reordering itself, because OpenType fonts won't
recognize ligatures with glyphs in arbitrary order or intersperced
with other unrelated characters coming from an unreordered glyph
sequence.

>> What this means is that, in practice, PUA are only usable in fonts for
>> characters with strong LTR directionality, excluding all reordering and
>> mirroring.
>
> In the OpenType specification, the only data related to glyph mirroring that 
> a rendering engine is assumed to have is the bidi mirroring data from TUS 
> 5.1. (See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) 
> All other glyph mirroring is to be handled using glyph substitution data in 
> OpenType Layout tables in fonts.

Exactly, but mirroring data for remapping glyphs will not be be part
of that font. Glyph mirroring substitution data in substitution rules
of OpenType fonts does not work because it cannot solve the ambiguity
of the expected direction, as the context length is limited (otherwise
the number of contextual pairs to recognize would explode
combinatorially, making such implementation unpractical to implement
in decent table sizes in fonts, even if we use class-based
substitution, because the necessary character-to-class mappings would
also require large mapping tables, including for a lot of characters
that are not even mapped in the font and for which the font was never
designed).

Mirroring behavior is then best handled in the layout engine, which
has a more global and centralized view of properties of the whole UCS.
Here, we just want to complement this view of character properties, by
permitting to specify a set of character properties for PUA characters
only, expecting that the layout engine will handle all the other
character properties for non-PUA characters, using the standard data
of the UCD...




Re: C1 Control Pictures Proposal

2011-08-21 Thread Doug Ewell
Perhaps it would help for you to do a quick survey of applications that 
already make use of the existing C0 control pictures, and include the 
results in your argument.  That might help convince some of us who feel 
the C0 pictures are only there for compatibility with previous character 
encodings, and aren't really used by anyone, and that a new set of C1 
pictures would meet with similar disuse.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­ 





Re: RTL PUA?

2011-08-21 Thread John Hudson

Jonathan Rosenne wrote:


People do all kinds of fancy things. I guess old manuscripts contain many
ligatures...


Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_



Re: ARABIC LETTER KORANIC YEH WITH HAMZA

2011-08-21 Thread Arno Schmitt
Philippe,
what you say about "no exception" is wrong,
as you can see from the attachments.

Philippe> I was told long ago that the normative placement of kasra below the
Philippe> letter was also requiring it to go below the shadda (above the letter)
Philippe> when there was one, and this suffered no exception, at least in
Philippe> Koranic texts: the shadda effectively modifies the consonnant, not the
Philippe> vowel, and defines the new higher baseline of the consonnant cluster,
Philippe> under which the kasra is simply position

Philippe> So the case is similar here, going in the reverse direction for the
Philippe> placement of hamza, relative to kasra that logically comes after the
Philippe> hamza and that may be omitted if vowel precision is not needed.

Philippe> Both exceptions are highly related to the logical order of binding for
Philippe> those hamza and shadda diacritics.

since hamza never is shaddad/geminated/doubled,
I do not see what this means concretely.


Philippe> The rule relative to the shadda is so strong that this is even one of
Philippe> the very first thing you're taught in some didactic tutorials on how
Philippe> to read Arabic.

the rule is not valid for most orthographies of the Koran

Arno
<><><><>

Re: RTL PUA?

2011-08-21 Thread Doug Ewell
I think as soon as we start talking about this many scenarios, we are no 
longer talking about what the *default* bidi class of the PUA (or some 
part of it) should be.  Instead, we are talking about being able to 
specify private customizations, so that one can have 'AL' runs and 'ON' 
runs and so forth.


There really isn't any way the UTC is going to approve changing one part 
of the PUA to be default 'AL', another part 'R', another part 'ON', etc. 
Asmus just said that merely assigning one plane to be different from the 
others "should be a non-starter."


For this discussion, I really don't find it very interesting that 
existing technologies A, B, and C don't currently provide a way to 
override the default PUA properties.  Through most of the 1990s, most 
existing applications and technologies didn't support Unicode at all, or 
very small parts of it, and the solution generally was to update them so 
that they would.  The same should be true here.  I would suggest that 
installing a modified copy of UnicodeData.txt seems like a rather clumsy 
solution; if text files are involved, I'd suggest leaving 
UnicodeData.txt alone and creating some sort of "overrides" file.


--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­


-Original Message- 
From: Richard Wordingham

Sent: Sunday, August 21, 2011 9:48
To: unicode@unicode.org
Subject: Re: RTL PUA?

On Sun, 21 Aug 2011 01:44:02 +
"Doug Ewell"  wrote:


The more I think of it, the more I like the idea of reassigning the
default BC of Plane 16 to 'R'. What would the arguments against this
be?



BC of 'AL'?



Would that really be a better default? I thought the main RTL needs
for the PUA would be for unencoded scripts, not for even more Arabic
letters. (How many more are there anyway?)


Not necessarily better, I'm just suggesting that both need to be
supported.  However, we need to look at use cases.

(1) Unencoded Arabic script letters with joining behaviour, for use with
any application.

(a) We need the character to have AL, R or ON for it to be included in
BiDi runs.  If we use ON we may need RLM when the character is at the
edge of a run, and even then, its behaviour may be no better than a
character with a BC of R.

(b) It may get left out of script runs.  There were problems on
Windows with the Tamil ligature k.SS not rendering, despite font
support, when the character U+0BB7 TAMIL LETTER SSA was new.  And
that's in a left-to right script with a character in the appropriate
block!

(2) Complete right-to-left script.  I'm presuming the difference
between AL and R is then a matter of what right-to-left script the
potential users chiefly also use.

(a) As a practical implementation, the distinction between AL and R
would matter if the script has modern use.  Otherwise, any of ON, AL
and R would do, though one might face the annoyance of having to start
chunks of text with RLM.  If a script with modern use should be encoded
using a BC of R, then I believe ON would also do as a stop-gap until
the script is encoded.

How fiendish is BiDi-sensitive transliteration?

(b) For experimentation, I believe the difference between AL, R and ON
would matter little, even though it would be irritiating to have to
use RLM.

(c) Complex script support is patchy - one might be restricted to
applications that allow the font to provide full complex script support.

The big issue in all this, though, is (i) how to update the rendering
system with a new set of values for Unicode properties, including
script, and (ii) the scope of such an update.  (The distinction between
the PUA and the rest is that it makes sense for PUA properties to
change as freely as fonts.) This, incidentally, is analogous to locales
reflecting code page selections.  There is also, though less pressing,
the issue of tailoring collations.  (The worst issue is there is
distinct canonically inequivalent characters of type Lo comparing equal
- I've seen it for Canadian Aboriginal Syllabics for Windows XP and for
Thai in Ubuntu 10.04 - surely that's not the normal British collation
of such characters.)

One minor problem with (i) *was* that it wasn't clear how one should
annotate a copy of UnicodeData.txt to show that it has been modified.
The standard XML alternative provides allows for comments, thereby
solving that problem.

If Issue (i) can be readily solved at the machine or user level or
lower, then the default properties of the PUA become irrelevant.

Richard.




Re: C1 Control Pictures Proposal

2011-08-21 Thread Sean Leonard
Hi Ken et. al.,

On Aug 17, 2011, at 2:49 PM, Ken Whistler wrote:

> 
> Further comments:
> 
> On 8/13/2011 10:48 AM, Sean Leonard wrote:
>> In accordance with this and other text in the Standard, it is not really 
>> possible to assign glyphs uniformly and interchangeably to the code points 
>> in U+-U+001F and U+0080-U+009F.
> 
> Of course it is. The Unicode Standard has done so for years: they are called 
> code chart
> display glyphs. What one cannot expect is that plain text renderers will 
> display control
> characters as visible glyphs in a uniform fashion -- they aren't supposed to, 
> because
> the control codes aren't graphic characters. That is,rather, what "show 
> hidden" modes
> are all about, and there really aren't any constraints on the details of 
> exactly how
> a show hidden implementation may choose to display the undisplayable, as it 
> were.

Can you please explain where in the Unicode standard you are referring to? Is 
there a "show hidden" mode or code point sequence in the Unicode Standard? If 
you are referring to "code chart display glyphs" meaning the glyphs in the 
literal document for U+0080, that is beside the point. If you are referring to 
a "show hidden code points" mode in an editor (such as a terminal emulator, 
Emacs, Notepad++, or another editor), I understand what you are getting at, but 
that is exactly what is unhelpful. As you point out, "there really aren't any 
constraints on the details of exactly how
a show hidden implementation may choose to display the undisplayable"--and that 
is exactly the problem. One advantage of my proposal is that fonts that provide 
glyphs for these code points can have glyphs that are visually similar (e.g., 
in monospace dimensions yet remain readable) between that code point and other 
graphic characters. For those who say "oh, just have an editor show [HOP] or 
whatever", that is exactly the problem: the editor cannot show [HOP] in a 
uniform way along with the rest of the glyphs that represent U+ - U+007F 
and U+00A0-U+00FF [modulo U+00A0 and U+00AD]. How ironic is it that fonts can 
encode the characters U+-U+001F (and space and delete) uniformly for 
display, yet can do no such similar thing for the other half of these 
characters?

This is definitely not a confusion between glyphs and characters. This is about 
having character code points for a uniform representation of these characters 
as-displayed in interchange, so that two systems (e.g., an application and the 
graphics rendering subsystem of the operating system, or the graphics rendering 
subsystem of an operating system and the font software that the OS uses) can 
interchange data unambiguously.

The Unicode Standard does not dictate the precise glyphs; it only shows 
representative glyphs. A font designer could choose among alternative glyphs 
for the graphic character code point. For example, for U+001B -> U+241B ESCAPE, 
the font designer could choose ESC (scrunched horizontally), ESC (diagonally), 
^[ (scrunched horizontally--^[ is a common legacy rendering of ESC) or ESC with 
a box around it. But because the user has chosen that particular font in that 
particular editor or rendering session, the user would be guaranteed that ESC 
-> ^[ (scrunched) would be visually similar to ^\ (file separator, scrunched), 
which would be visually similar to the C1s and to the graphic characters. No 
such guarantee can currently be made without C1 Control Pictures.

> 
>>  Variation selectors (sec. 16.4), for example, "provide a mechanism for 
>> specifying a restriction on the set of glyphs that are used to represent a 
>> particular character [examples given of CJK ideographs and Mongolian 
>> letters]." Variation selectors and other Unicode-defined control code points 
>> are ill-suited to causing C1 values to be displayed, because C1 values have 
>> no "display representation" in and of themselves.
> 
> That whole discussion of variation selectors is beside the point. Variation 
> sequences can
> only be defined for *base* characters. Base characters are a subset of graphic
> characters (see D51 in Chapter 3 of the Unicode Standard). Control characters
> aren't graphic characters. Hence they are not base characters, either, and 
> could
> never be used in variation sequences, anyway.

Correct. As per above, C1 control characters lack graphical variations. Let's 
give them graphics. To display is to know.

-Sean

> 
> --Ken
> 
> 





RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Philippe Verdy

> Hmmm Given the current standard in OpenType, and the fact that 
> OpenType fonts cannot reorder glyphs to support the BiDi algorithm 
> and correctly handle featues like ligatures...

I agree that OpenType font tables cannot to glyph re-ordering. But totally 
incorrect in saying that it cannot handle ligatures.


> What this means is that, in practice, PUA are only usable in fonts for 
> characters with strong LTR directionality, excluding all reordering and 
> mirroring. 

In the OpenType specification, the only data related to glyph mirroring that a 
rendering engine is assumed to have is the bidi mirroring data from TUS 5.1. 
(See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) All other 
glyph mirroring is to be handled using glyph substitution data in OpenType 
Layout tables in fonts.




Peter




Re: RTL PUA?

2011-08-21 Thread Richard Wordingham
On Sun, 21 Aug 2011 01:44:02 +
"Doug Ewell"  wrote:

>> The more I think of it, the more I like the idea of reassigning the
>> default BC of Plane 16 to 'R'. What would the arguments against this
>> be?
 
>> BC of 'AL'?

> Would that really be a better default? I thought the main RTL needs
> for the PUA would be for unencoded scripts, not for even more Arabic
> letters. (How many more are there anyway?)

Not necessarily better, I'm just suggesting that both need to be
supported.  However, we need to look at use cases.

(1) Unencoded Arabic script letters with joining behaviour, for use with
any application.

(a) We need the character to have AL, R or ON for it to be included in
BiDi runs.  If we use ON we may need RLM when the character is at the
edge of a run, and even then, its behaviour may be no better than a
character with a BC of R.

(b) It may get left out of script runs.  There were problems on
Windows with the Tamil ligature k.SS not rendering, despite font
support, when the character U+0BB7 TAMIL LETTER SSA was new.  And
that's in a left-to right script with a character in the appropriate
block!

(2) Complete right-to-left script.  I'm presuming the difference
between AL and R is then a matter of what right-to-left script the
potential users chiefly also use.

(a) As a practical implementation, the distinction between AL and R
would matter if the script has modern use.  Otherwise, any of ON, AL
and R would do, though one might face the annoyance of having to start
chunks of text with RLM.  If a script with modern use should be encoded
using a BC of R, then I believe ON would also do as a stop-gap until
the script is encoded.

How fiendish is BiDi-sensitive transliteration?

(b) For experimentation, I believe the difference between AL, R and ON
would matter little, even though it would be irritiating to have to
use RLM.

(c) Complex script support is patchy - one might be restricted to
applications that allow the font to provide full complex script support.

The big issue in all this, though, is (i) how to update the rendering
system with a new set of values for Unicode properties, including
script, and (ii) the scope of such an update.  (The distinction between
the PUA and the rest is that it makes sense for PUA properties to
change as freely as fonts.) This, incidentally, is analogous to locales
reflecting code page selections.  There is also, though less pressing,
the issue of tailoring collations.  (The worst issue is there is
distinct canonically inequivalent characters of type Lo comparing equal
- I've seen it for Canadian Aboriginal Syllabics for Windows XP and for
Thai in Ubuntu 10.04 - surely that's not the normal British collation
of such characters.)

One minor problem with (i) *was* that it wasn't clear how one should
annotate a copy of UnicodeData.txt to show that it has been modified.
The standard XML alternative provides allows for comments, thereby
solving that problem.

If Issue (i) can be readily solved at the machine or user level or
lower, then the default properties of the PUA become irrelevant.

Richard.



RE: RTL PUA?

2011-08-21 Thread Peter Constable
From: unicore-boun...@unicode.org [mailto:unicore-boun...@unicode.org] On 
Behalf Of Michael Everson


>> Yeah OK maybe simply base+diacritic stuff or even ligatures would be 
>> easy to do via simple substitution rules in tables, but how about glyph 
>> reordering?

> No problem unless you are using Uniscribe.

Which of these are you saying? 

- That mark positioning and simple substitution rules involving PUA characters 
is not a problem unless you're using Uniscribe

- That glyph re-ordering of PUA characters is not a problem unless you're using 
Uniscribe

(Unless we have a bug I haven't encountered, the first is incorrect. The second 
suggests that you've missed Sharma's point entirely.)


>> Indic scripts involving reordering and split-positioning vowel signs can't 
>> be handled by placing them in the PUA.

> There are other ways of handling such clusters. 

Oh? You must mean something like ignoring Unicode. If not, please clarify.



Peter




Re: RTL PUA?

2011-08-21 Thread Shriramana Sharma

On 08/21/2011 08:19 AM, Asmus Freytag wrote:

The best default would be an explicit "PU" - undefined behavior in the
absence of a private agreement.


Hm -- but really this would only serve to allay concerns like Michael's 
stemming from a presumption that the BC is "deeper" than other 
characters (which I should concede is not entirely false). But you can't 
define explicit undefined values for *all* properties (even those that 
you can change despite stability) can you?



There are some properties where stability guarantees prevent adding a
new value. In that case, the documentation should point out that the
intended effect was to have a PU value, but for historical / stability
reasons, the tables contain a different entry.


What are these properties? The standard says that the canonical 
decomposition will not be changed. Mark Davis said the GC can not be 
changed[*]. What else?


[* There is no need to *officially* change the GC of the PUA characters, 
but PUA-supporting implementations will certainly need to be able to 
handle letters, marks and numbers etc as if they were encoded 
characters, and Mark has expressed he is fine by that.]



Suggesting a "structure" on the private use area, by suggesting
different default properties, ipso facto makes the PUA less private.
That should be a non-starter.


I entirely agree (obviously).

--
Shriramana Sharma



RE: RTL PUA?

2011-08-21 Thread Jonathan Rosenne
People do all kinds of fancy things. I guess old manuscripts contain many
ligatures, but I don't think this kind of joining should be required for the
RTL PUA.

Jony

> -Original Message-
> From: unicore-boun...@unicode.org [mailto:unicore-boun...@unicode.org] On
> Behalf Of Michael Everson
> Sent: Sunday, August 21, 2011 12:51 PM
> To: unicore UnicoRe Discussion; Unicode Discussion List
> Subject: Re: RTL PUA?
> 
> On 21 Aug 2011, at 10:43, Jonathan Rosenne wrote:
> 
> > Yes, this is why an RTL etc. PUA area is quite useful.
> >
> > BTW, I am not aware of joining in properly written cursive Hebrew.
> 
> I've seen a nice shin-lamed ligature in some styles.
> 
> Michael Everson * http://www.evertype.com/





Re: RTL PUA?

2011-08-21 Thread Michael Everson
On 21 Aug 2011, at 10:43, Jonathan Rosenne wrote:

> Yes, this is why an RTL etc. PUA area is quite useful.
> 
> BTW, I am not aware of joining in properly written cursive Hebrew.

I've seen a nice shin-lamed ligature in some styles. 

Michael Everson * http://www.evertype.com/




RE: RTL PUA?

2011-08-21 Thread Jonathan Rosenne
Yes, this is why an RTL etc. PUA area is quite useful.

BTW, I am not aware of joining in properly written cursive Hebrew.

Jony

> -Original Message-
> From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe
> Verdy
> Sent: Sunday, August 21, 2011 12:16 PM
> To: Jonathan Rosenne
> Cc: unicore UnicoRe Discussion; Unicode Discussion List
> Subject: Re: RTL PUA?
> 
> 2011/8/21 Jonathan Rosenne :
> > Several RTL scripts do not require shaping nor ligatures.
> 
> Yes but they still require BiDi reordering by the layout engine...
> Something that is not specified by OpenType fonts because it is
> implicit (and in fact mandatory) for the known RTL scripts assigned in
> non-PUA blocks.
> 
> I think you were meaning that those RTL scripts do not require the
> joining behavior for selection of contextual shapes (and after all
> this is how the basic unpointed Hebrew script behaves in its usual
> non-cursive styles : it still requires at least the implicit BiDi
> reordering by the layout engine, but not by specifications found in
> Hebrew fonts).
> 
> -- Philippe.




Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/21 Jonathan Rosenne :
> Several RTL scripts do not require shaping nor ligatures.

Yes but they still require BiDi reordering by the layout engine...
Something that is not specified by OpenType fonts because it is
implicit (and in fact mandatory) for the known RTL scripts assigned in
non-PUA blocks.

I think you were meaning that those RTL scripts do not require the
joining behavior for selection of contextual shapes (and after all
this is how the basic unpointed Hebrew script behaves in its usual
non-cursive styles : it still requires at least the implicit BiDi
reordering by the layout engine, but not by specifications found in
Hebrew fonts).

-- Philippe.



Re: ARABIC LETTER KORANIC YEH WITH HAMZA

2011-08-21 Thread Philippe Verdy
2011/8/20 mmarx :
> In the Qur'an the hamza is not always above:
> if there is kasra on the letter, the hamza
> is below too.
>
> Is there someone preparing a proposal for
> missing Quranic characters  or should one
> do it one by one?

I was told long ago that the normative placement of kasra below the
letter was also requiring it to go below the shadda (above the letter)
when there was one, and this suffered no exception, at least in
Koranic texts: the shadda effectively modifies the consonnant, not the
vowel, and defines the new higher baseline of the consonnant cluster,
under which the kasra is simply position

So the case is similar here, going in the reverse direction for the
placement of hamza, relative to kasra that logically comes after the
hamza and that may be omitted if vowel precision is not needed.

Both exceptions are highly related to the logical order of binding for
those hamza and shadda diacritics.

The rule relative to the shadda is so strong that this is even one of
the very first thing you're taught in some didactic tutorials on how
to read Arabic.

An example here, in the first lesson (redacted for French learners,
with audio description, and videos showing how to manually draw the
glyphs, with just three simple consonnants, the 3 basic short vowels,
the sukun as a mark for vowelless, and the shadda for gemination, and
the 3 basic long vowels represented using a mandatory matres
lectionis, after the optional short vowel on the previous consonnant):

http://www.e-apprendrearabe.com/member/index.php?page=lecon1

Then the rule for the placement of kasra relative to shadda is
constantly used in all the 12 lessons, in a systematic list of
syllables, as well as in sample words explained completely, and in
longer lists of words left as an exercice to the reader (there are a
few minor errors in this web version, which you can correct easily
even if you're a beginner). That's one of the free initiations to
Arabic writing I've found on the web that is the easiest to understand
and memoize rapidly (in about 15-30 minutes per lesson, you can
hear/pronounce and read/write correctly almost all the Arabic script,
at least phonetically, even if you don't know the vocaculary and
grammar and you're a complete beginner to the script and language).

This last method looks to me even much simpler than the one I had to
use years ago (with lot of difficulties because I could not have the
pronunciation, reading and writing rules all at the same time)... And
it is definitely simpler to understand than the complex introduction
to the Arabic script given in the Unicode standard (which immediately
formulates the joining behavior of Arabic letters, in a way that looks
much more complicate than necessary, but still forgets some essential
things about the 5 basic diacritics).

Yes the case of hamza is tricky, that's why these lessons above do not
enter in its details (these are left to the advanced lessons after
this initiation, only available on paid subscription). For the same
reason, there's nothing in this initiation about the variants of other
consonnant letters, turned into small diacritics used in Koranic texts
to annotate the correct reading or interpretation, where some basic
letters are not written in texts that are not fully pointed.

-- Philippe.




RE: RTL PUA?

2011-08-21 Thread Jonathan Rosenne
Several RTL scripts do not require shaping nor ligatures.

Jony

> -Original Message-
> From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
> Behalf Of Philippe Verdy
> Sent: Sunday, August 21, 2011 10:29 AM
> To: Michael Everson
> Cc: unicore UnicoRe Discussion; Unicode Discussion List
> Subject: Re: RTL PUA?
> 
> 2011/8/19 Michael Everson :
> > There is plenty of space. There would be no difficulty in assigning some
> rows to a RTL PUA. Mucking about with the directionality of the existing
> PUA would be extremely unwise.
> >
> >> Conceivably certain closed user-groups could be using closed-
> distribution rendering engines which would support bidi and glyph
> reordering or such for PUA codepoints.
> >
> > Not everyone is a programmer and can devise a rendering engine. But lots
> of people can make fonts that could support a RTL conscript or some
private
> Arabic characters.
> 
> Hmmm Given the current standard in OpenType, and the fact that
> OpenType fonts cannot reorder glyphs to support the BiDi algorithm and
> correctly handle featues like ligatures, I have serious doubt about
> the feasibility of an OpenType font capable of supporting an RTL
> conscript or some private Arabic characters, that will work with
> existing OpenType engines, simply because there's absolutely nothing
> to describe such properties.
> 
> This would be possible only if the engine can not only use the
> existing OpenType fonts, but also include some supplementary character
> properties tables for PUA assignments used in that font, or these
> custom properties can be integrated in extension tables added in the
> OpenType fonts, notably: directionality and mirroring, but also as
> well the combining classes, some decomposition mappings, and probably
> also fallback mapping. There would also be the need to represent a
> finite state machine needed to recognize grapheme cluster boundaries,
> at least, and list the feature names in which the substitution &
> positioning rules for recognized sequences of PUA characters (or their
> mapped glyphs).
> 
> What this means is that, in practice, PUA are only usable in fonts for
> characters with strong LTR directionality, excluding all reordering
> and mirroring. Those conscripts will then have to be represented in
> PUAs as if they were completely with strong LTR characters, like the
> sinograms. It's not impossible to do that, but you have to completely
> forget the logical encoding order and only use a strict visual order
> for these PUA-encoded conscripts, and even for unencoded rare Arabic
> letters/clusters for which you'd want to just use a PUA.
> 
> The alternative is to not use OpenType features, but use one of the
> alternatives: Apple's AAT or SIL's Graphite, which are less restricted
> than OpenType, or some newer font formats (in this case, you won't
> need any newer PUA ranges with strong RTL properties, you can just use
> the existing assignments).
> 
> -- Philippe.




Re: RTL PUA?

2011-08-21 Thread Michael Everson
On 21 Aug 2011, at 02:44, Doug Ewell wrote:

> Would that really be a better default? I thought the main RTL needs for the 
> PUA would be for unencoded scripts, not for even more Arabic letters.

Could easily be for work on new Arabic-script orthographies which use new 
letters. Or for similar scripts that treat numbers as Arabic does.

> (How many more are there anyway?)

No one knows. :-)

Michael Everson * http://www.evertype.com/





Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/20 Ken Whistler :
> There are 131,068 private use code points in the standard. That is all there
> ever  will be.

I also fully agree (sorry then to Michael Everson support for such new
RTL PUA assignments).

All that can be done is to fix the softwares. Notably the font formats
where you'll be able to define the necessary overrides for
directionality mirroring mappings (for RTL conscripts), and other
reordering properties that may be needed to support Indic conscripts
(such as prepended letters).

Adding new RTL PUAs will require any way modification of
renderers/layout engines to support it. These same engines can as well
be modified to support external character properties table needed to
override the existing PUAs, so that they can be rendered correctly.

May be it's the desire of OpenType designers to not use any such
overrides, but this was only intended for normal non-PUA characters.
An revised OpenType specification can perfectly integrate the
possibility of some new extension table, and assert that these custom
properties stored in fonts will ONLY by valid and usable for PUA
characters only, as a font validation constraint.



Re: RTL PUA?

2011-08-21 Thread Petr Tomasek
On Sun, Aug 21, 2011 at 12:21:28AM +, Doug Ewell wrote:
> The more I think of it, the more I like the idea of reassigning the default 
> BC of Plane 16 to 'R'. What would the arguments against this be?
> 

I found a font ("Asana Math") installed on my system that occupies 
U+10fddf..U+10fffd.

P.

-- 
Petr Tomasek 
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA






Re: RTL PUA?

2011-08-21 Thread Philippe Verdy
2011/8/19 Michael Everson :
> There is plenty of space. There would be no difficulty in assigning some rows 
> to a RTL PUA. Mucking about with the directionality of the existing PUA would 
> be extremely unwise.
>
>> Conceivably certain closed user-groups could be using closed-distribution 
>> rendering engines which would support bidi and glyph reordering or such for 
>> PUA codepoints.
>
> Not everyone is a programmer and can devise a rendering engine. But lots of 
> people can make fonts that could support a RTL conscript or some private 
> Arabic characters.

Hmmm Given the current standard in OpenType, and the fact that
OpenType fonts cannot reorder glyphs to support the BiDi algorithm and
correctly handle featues like ligatures, I have serious doubt about
the feasibility of an OpenType font capable of supporting an RTL
conscript or some private Arabic characters, that will work with
existing OpenType engines, simply because there's absolutely nothing
to describe such properties.

This would be possible only if the engine can not only use the
existing OpenType fonts, but also include some supplementary character
properties tables for PUA assignments used in that font, or these
custom properties can be integrated in extension tables added in the
OpenType fonts, notably: directionality and mirroring, but also as
well the combining classes, some decomposition mappings, and probably
also fallback mapping. There would also be the need to represent a
finite state machine needed to recognize grapheme cluster boundaries,
at least, and list the feature names in which the substitution &
positioning rules for recognized sequences of PUA characters (or their
mapped glyphs).

What this means is that, in practice, PUA are only usable in fonts for
characters with strong LTR directionality, excluding all reordering
and mirroring. Those conscripts will then have to be represented in
PUAs as if they were completely with strong LTR characters, like the
sinograms. It's not impossible to do that, but you have to completely
forget the logical encoding order and only use a strict visual order
for these PUA-encoded conscripts, and even for unencoded rare Arabic
letters/clusters for which you'd want to just use a PUA.

The alternative is to not use OpenType features, but use one of the
alternatives: Apple's AAT or SIL's Graphite, which are less restricted
than OpenType, or some newer font formats (in this case, you won't
need any newer PUA ranges with strong RTL properties, you can just use
the existing assignments).

-- Philippe.