On 21 Apr 2008, at 11:55 am, Anuradha Ratnaweera wrote:
On Sun, Apr 20, 2008 at 11:08 PM, Norbert Preining
<[EMAIL PROTECTED]> wrote:
If you have the *smallest* doubts let me know...
Adding Harshula to the CC list.
If you are looking for doubtful areas in the patch, check the
following. The rest of the patch is *adding* Sinhala related
variables, functions and switch conditions. Even this change makes
sure ZWJ is not discarded, which shouldn't a problem for others
scripts that doesn't use it.
--- layout/LEFontInstance.cpp
+++ layout/LEFontInstance.cpp
@@ -75,7 +75,7 @@
return 0xFFFF;
}
- if (mappedChar == 0x200C || mappedChar == 0x200D) {
+ if (mappedChar == 0x200C) {
return 1;
}
Anuradha
--
http://www.sayura.net/anuradha/
Yes, I realize the patch touches very little existing functionality,
as it is adding support for a new script rather than modifying an
existing one. One other part that might interact somehow would be the
change to the state table:
--- texlive-bin-2007.orig/build/source/libs/icu-xetex/layout/
IndicReordering.cpp
+++ texlive-bin-2007/build/source/libs/icu-xetex/layout/
IndicReordering.cpp
@@ -326,14 +346,15 @@
{ 1, 1, 1, 5, 8, 3, 2, 1, 5, 9, 5, 1, 1, 1}, // 0
- ground state
{-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}, // 1
- exit state
{-1, 6, 1, -1, -1, -1, -1, -1, 5, 9, 5, 5, 4, -1}, // 2
- consonant with nukta
- {-1, 6, 1, -1, -1, -1, -1, 2, 5, 9, 5, 5, 4, -1}, // 3
- consonant
+ {-1, 6, 1, -1, -1, -1, -1, 2, 5, 9, 5, 5, 4, 11}, // 3
- consonant
{-1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 7}, // 4
- consonant virama
{-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}, // 5
- dependent vowels
{-1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}, // 6
- vowel mark
{-1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, -1}, // 7
- ZWJ, ZWNJ
{-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 4, -1}, // 8
- independent vowels that can take a virama
{-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, 10, 5, -1, -1}, // 9
- first part of split vowel
- {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, 5, -1, -1} // 10
- second part of split vowel
+ {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, 5, -1, -1}, // 10
- second part of split vowel
+ {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 7, -1} // 11
- <ct> <zwj>
};
This adds the new state 11, and also changes one of the transitions
in the existing state 3 (in order to use the new state). So it
presumably could affect the processing of certain sequences in any
Indic script. I'm not saying this is wrong, or even that it would
make any difference to the actual results for other scripts; it's
probably fine. I simply haven't studied it in order to understand
what is really happening here. I notice that it has similarities to
the newer version in ICU 3.8.1, but is not identical to that (3.8.1
has transitions from both states 2 and 3 to this new state, and also
inserts another new state related to vowels; it also extensively
revises the ground state row of the table). But to feel really
confident about all this, I'll need to understand the Indic shaping
engine better.
Jonathan
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]