[ 
https://issues.apache.org/jira/browse/FOP-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158802#comment-17158802
 ] 

Kelly H Wilkerson edited comment on FOP-2918 at 7/16/20, 12:36 AM:
-------------------------------------------------------------------

So far, I've tracked it down to a word break happening between the two halves 
of the surrogate pair that comprise #x10826 (#xD802 and #xDC26 ) (Edit: I 
removed the ampersands so that the code points can actually be read.)


In org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(), 
there's a check to see if it's a word break chance, and the bidi class property 
check says there can be a word break between the pair. That's the check here 
that is succeeding, when evaluating the character #xDC26

[https://github.com/apache/xmlgraphics-fop/blob/ccd9f65340c38010da0b96c2933abee9093fa984/fop-core/src/main/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java#L817]

I think (?) that this is just that there are more right-to-left blocks that 
need to be added to 
codegen/unicode/java/org/apache/fop/complexscripts/bidi/GenerateBidiClass?


was (Author: kwilkerson):
So far, I've tracked it down to a word break happening between the two halves 
of the surrogate pair that comprise #x10826 (#xD802 and #xDC26 ) (Edit: I 
removed the ampersands so that the code points can actually be read.)


In org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(), 
there's a check to see if it's a word break chance, and the bidi class property 
check says there can be a word break between the pair. That's the check here 
that is succeeding, when evaluating the character #xDC26

[https://github.com/apache/xmlgraphics-fop/blob/ccd9f65340c38010da0b96c2933abee9093fa984/fop-core/src/main/java/org/apache/fop/layoutmgr/inline/TextLayoutManager.java#L817]

> Surrogate pairs not handled in U+10800-U+1083F
> ----------------------------------------------
>
>                 Key: FOP-2918
>                 URL: https://issues.apache.org/jira/browse/FOP-2918
>             Project: FOP
>          Issue Type: Bug
>          Components: renderer/pdf
>    Affects Versions: 2.4
>         Environment: Windows 10
>            Reporter: Jan Driesen
>            Priority: Major
>         Attachments: NotoSansCypriot-Regular.ttf, fop.xconf, input.fo
>
>
> Fop is not properly handling surrogate pairs for characters in Unicode Block 
> 'Cypriot Syllabary' when rendering PDF.
> It tries to resolve the individual surrogate entities. This results in errors 
> saying the glyphs cannot be found.
> The attached test shows a font that supports characters in this range, and an 
> FO file holding the surrogate characters to be rendered.
> Similar issues arise with fonts "MPH 2b Damas" 
> ([https://fedoraproject.org/wiki/MPH_2B_Damase_fonts]) and "Segoe UI 
> Historic" 
> ([https://docs.microsoft.com/en-us/typography/font-list/segoe_ui_historic),] 
> but the error may differ. [I am unsure whether licensing allows me to add 
> these)
> Some fonts (Damas & Noto) result in a "String index out of range". Other 
> fonts (Segoe) deliver a "ill-formed UTF-16 sequence, contains isolated high 
> surrogate at end of sequence" FOPException.
> We expected this to work thanks to FOP-1969 (fop 2.3).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to