Robert Gurol created BATIK-1074:
-----------------------------------
Summary: ArrayIndexOutOfBoundsException in ArabicTextHandler with
Arabic diacritics
Key: BATIK-1074
URL: https://issues.apache.org/jira/browse/BATIK-1074
Project: Batik
Issue Type: Bug
Affects Versions: 1.7
Reporter: Robert Gurol
Priority: Minor
Trying out some Arabic characters, I got a ArrayIndexOutOfBoundsException in
ArabicTextHandler when the text contained Arabic diacritics
Here's a fix that works for my input:
ArabicTextHandler.doubleCharRemappings is missing some array entries:
<pre>
...
null, // 0x0629
// those were missing!
null, // 0x062A
null, // 0x062B
null, // 0x062C
null, // 0x062D
null, // 0x062E
null, // 0x062F
null, // 0x0630
...
</pre>
Some strings from my test SVG (I copied those from Wikipedia):
...
<text ns0:align="left middle" xmlns:ns1="http://oryx-editor.org"
ns1:anchors="left" fill="#000000" xmlns:ns2="http://oryx-editor.org"
ns2:fittoelem="sid-c3179252-02f3-48bd-8363-31952f62def3textannotationrect"
font-size="14" xmlns:ns3="http://oryx-editor.org" ns3:fontSize="14"
id="sid-c3179252-02f3-48bd-8363-31952f62def3text" letter-spacing="-0.01px"
stroke="black" stroke-width="0pt" text-anchor="start"
xmlns:ns4="http://oryx-editor.org" ns4:textWidth="360.61" transform="rotate(0)"
x="4" y="93.184">
<tspan
dy="-30" x="4" y="93.184">The Arabic script has numerous
diacritics,<v:newlineChar/>
</tspan>
<tspan
dy="-16" x="4" y="93.184">including i'jam 〈إِعْجَام〉 (i‘jām,
consonant<v:newlineChar/>
</tspan>
<tspan
dy="-2" x="4" y="93.184">pointing), and tashkil 〈تَشْكِيل〉
(tashkīl,<v:newlineChar/>
</tspan>
<tspan
dy="12" x="4" y="93.184">supplementary diacritics). The latter include
the<v:newlineChar/>
</tspan>
<tspan
dy="26" x="4" y="93.184">ḥarakāt 〈حَرَكَات〉 (vowel marks;
singular:<v:newlineChar/>
</tspan>
<tspan
dy="40" x="4" y="93.184">ḥarakah 〈حَرَكَة〉).</tspan>
</text>
...
<text xmlns:ns0="http://oryx-editor.org" ns0:align="center middle"
fill="#000000" xmlns:ns1="http://oryx-editor.org"
ns1:fittoelem="sid-408ec19b-8a4b-43a4-8787-36de6d17dc68unvisibleBorder"
font-size="14" xmlns:ns2="http://oryx-editor.org" ns2:fontSize="14"
id="sid-408ec19b-8a4b-43a4-8787-36de6d17dc68text_name" letter-spacing="-0.01px"
stroke="black" stroke-width="0pt" text-anchor="middle"
xmlns:ns3="http://oryx-editor.org" ns3:textWidth="360.323"
transform="rotate(0)" x="180.161" y="374.994">
<tspan
dy="-296" x="180.161" y="374.994">The ḥarakāt, which literally means 'motions',
are<v:newlineChar/>
</tspan>
<tspan
dy="-282" x="180.161" y="374.994">the short vowel marks.<v:newlineChar/>
</tspan>
<tspan
dy="-268" x="180.161" y="374.994">* The fatḥah 〈فَتْحَة〉 is a small diagonal
line<v:newlineChar/>
</tspan>
<tspan
dy="-254" x="180.161" y="374.994">placed above a letter, and represents a short
/a/.<v:newlineChar/>
</tspan>
<tspan
dy="-240" x="180.161" y="374.994">The word fatḥah itself (فَتْحَة) means
opening,<v:newlineChar/>
</tspan>
<tspan
dy="-226" x="180.161" y="374.994">and refers to the opening of the mouth
when<v:newlineChar/>
</tspan>
<tspan
dy="-212" x="180.161" y="374.994">producing an /a/. Example with dāl
(henceforth,<v:newlineChar/>
</tspan>
<tspan
dy="-198" x="180.161" y="374.994">the base consonant in the following
examples):<v:newlineChar/>
</tspan>
<tspan
dy="-184" x="180.161" y="374.994">〈دَ〉 /da/.<v:newlineChar/>
</tspan>
<tspan
dy="-170" x="180.161" y="374.994">* A similar diagonal line below a letter is
called a<v:newlineChar/>
</tspan>
<tspan
dy="-156" x="180.161" y="374.994">kasrah 〈كَسْرَة〉 and designates a short
/i/.<v:newlineChar/>
</tspan>
<tspan
dy="-142" x="180.161" y="374.994">Example: 〈دِ〉 /di/.<v:newlineChar/>
</tspan>
<tspan
dy="-128" x="180.161" y="374.994">* The ḍammah 〈ضَمَّة〉 is a small
curl-like<v:newlineChar/>
</tspan>
<tspan
dy="-114" x="180.161" y="374.994">diacritic placed above a letter to represent
a short<v:newlineChar/>
</tspan>
<tspan
dy="-100" x="180.161" y="374.994">/u/. Example: 〈دُ〉 /du/.<v:newlineChar/>
</tspan>
<tspan
dy="-86" x="180.161" y="374.994">* The maddah 〈مَدَّة〉 is a tilde-like
diacritic<v:newlineChar/>
</tspan>
<tspan
dy="-72" x="180.161" y="374.994">which can appear only on top of an alif
and<v:newlineChar/>
</tspan>
<tspan
dy="-58" x="180.161" y="374.994">indicates a glottal stop /ʔ/ followed by a
long /aː/.<v:newlineChar/>
</tspan>
<tspan
dy="-44" x="180.161" y="374.994">Example: 〈قُرْآن〉 /qurˈʔaːn/.<v:newlineChar/>
</tspan>
<tspan
dy="-30" x="180.161" y="374.994">* The superscript (or dagger) alif
〈أَلِف<v:newlineChar/>
</tspan>
<tspan
dy="-16" x="180.161" y="374.994">خَنْجَرِيَّة〉 (alif khanjarīyah), is written
as<v:newlineChar/>
</tspan>
<tspan
dy="-2" x="180.161" y="374.994">short vertical stroke on top of a consonant.
It<v:newlineChar/>
</tspan>
<tspan
dy="12" x="180.161" y="374.994">indicates a long /aː/ sound where alif is
normally<v:newlineChar/>
</tspan>
<tspan
dy="26" x="180.161" y="374.994">not written, e.g. 〈هٰذَا〉 (hādhā) or
〈رَحْمٰن〉<v:newlineChar/>
</tspan>
<tspan
dy="40" x="180.161" y="374.994">(raḥmān).<v:newlineChar/>
</tspan>
<tspan
dy="54" x="180.161" y="374.994">* The waṣlah 〈وَصْلَة〉, alif waṣlah
〈أَلِف<v:newlineChar/>
</tspan>
<tspan
dy="68" x="180.161" y="374.994">وَصْلَة〉 or hamzat waṣl 〈هَمْزَة
وَصْل〉<v:newlineChar/>
</tspan>
<tspan
dy="82" x="180.161" y="374.994">looks like a small letter ṣād on top of an alif
〈ٱ〉<v:newlineChar/>
</tspan>
<tspan
dy="96" x="180.161" y="374.994">* Sukun Example: 〈دَدْ〉 dad.<v:newlineChar/>
</tspan>
<tspan
dy="110" x="180.161" y="374.994">* Tanwin The sign 〈ـً〉 is most
commonly<v:newlineChar/>
</tspan>
<tspan
dy="124" x="180.161" y="374.994">written in combination with 〈ـًا〉 (alif),
〈ةً〉<v:newlineChar/>
</tspan>
<tspan
dy="138" x="180.161" y="374.994">(tā’ marbūṭah) or stand-alone 〈ءً〉
(hamzah).<v:newlineChar/>
</tspan>
<tspan
dy="152" x="180.161" y="374.994">* Shaddah Example: 〈دّ〉 /dd/;
madrasah<v:newlineChar/>
</tspan>
<tspan
dy="166" x="180.161" y="374.994">〈مَدْرَسَة〉 ('school') vs.
mudarrisah<v:newlineChar/>
</tspan>
<tspan
dy="180" x="180.161" y="374.994">〈مُدَرِّسَة〉 ('teacher',
female).<v:newlineChar/>
</tspan>
<tspan
dy="194" x="180.161" y="374.994">* The ijam 〈إِعْجَام〉 (i‘jām) are the
pointing<v:newlineChar/>
</tspan>
<tspan
dy="208" x="180.161" y="374.994">diacritics that distinguish various consonants
that<v:newlineChar/>
</tspan>
<tspan
dy="222" x="180.161" y="374.994">have the same form (rasm), such as 〈ـبـ〉
/b/,<v:newlineChar/>
</tspan>
<tspan
dy="236" x="180.161" y="374.994">〈ـتـ〉 /t/, 〈ـثـ〉 /θ/, 〈ـنـ〉 /n/, and 〈ـيـ〉
/j/.<v:newlineChar/>
</tspan>
<tspan
dy="250" x="180.161" y="374.994">Typically ijam are not considered diacritics
but<v:newlineChar/>
</tspan>
<tspan
dy="264" x="180.161" y="374.994">part of the letter.<v:newlineChar/>
</tspan>
<tspan
dy="278" x="180.161" y="374.994">* Hamza (glottal stop
semi-consonant)<v:newlineChar/>
</tspan>
<tspan
dy="292" x="180.161" y="374.994">Main article: Hamza<v:newlineChar/>
</tspan>
<tspan
dy="306" x="180.161" y="374.994">ئ ؤ إ أ</tspan>
</text>
...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
