On Thu, 21 Sep 2023 16:21:05 GMT, Ichiroh Takiguchi <[email protected]> wrote:
>> "character set of font" (font charset) table was created by "Rich Text >> Format Specification 1.9.1" >> https://interoperability.blob.core.windows.net/files/Archive_References/[MSFT-RTF].pdf >> It refers windgi.h >> https://learn.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-textmetrica >> >> Test files and testcase are in bugid >> [JDK-6928542](https://bugs.openjdk.org/browse/JDK-6928542) >> >> Additional change: >> Special character `\line` should `\n` >> >> Additional information: >> >> Add 2 hash tables >> - fcharsetToCP: Predefined conversion table, `fcharset` with number control >> word, from control word to Java charset name, `fcharset0` refers >> `windows-1252` Java charset name >> - fcharsetTable: Conversion table for each RTF file, `f` control word with >> number, from integer font numbers to Charset font charsets, In case of >> `{\f0\fnil\fcharset0 Segoe UI;}`, `0` refers Java Charset `windows-1252` >> >> When RTF Character Set control word (like `\mac`) is used, unmappable >> character returns \u0000 and it's not written into RTF text.. >> When fcharset control word is used, unmappable character returns \uFFFD >> (it's the same as replacement character on decoder), \u0000 is used for DBCS >> lead byte detection. >> If `f` or `par` control word is there and lead byte is remains on byte >> buffer for decoder, this byte data is as invalid character and write \uFFFD >> into RTF text. >> >> If `f` control word is used without `fcharset`, `translationTable` char >> array is used. >> If `f` control word is used with `fcharset`, predefined Java Charset name is >> used (if missing, ISO8859_1 is used for fallback). >> >> **Note:** Following GitHub actions were failed >> linux-cross-compile / build (riscv64), I opened following JBS. >>> [JDK-8314624](https://bugs.openjdk.org/browse/JDK-8314624) GHA: RISC-V >>> cross-build was failed > > Ichiroh Takiguchi has updated the pull request incrementally with one > additional commit since the last revision: > > 6928542: Chinese characters in RTF are not decoded For me the added regression test still fails with the fix in WIndows 10...anything I need to do more as a prerequisite? Read data^M =========^M Gr\\u00fcezi - Switzerland 0^M \\u0082\\u00b1\\u0082\\u00f1\\u0082\\u00c9\\u0082\\u00bf\\u0082\\u00cd - Japanese 128^M \\u00be\\u00c8\\u00b3\\u00e7\\u00c7\\u00cf\\u00bc\\u00bc\\u00bf\\u00e4 - Korean 129^M \\u00c4\\u00e3\\u00ba\\u00c3 - China 134^M \\u00bbO\\u00c6W - Traditional Chinese - Taiwan 136^M \\u00e3\\u00e5\\u00e9\\u00e1 \\u00f3\\u00ef\\u00f5 - Greek 161^M A\\u00f0a\\u00e7 - Turkish (Tree) 162^M \\u00fe - Vietnam currency 163^M \\u00f9\\u00c8\\u00d1\\u00ec\\u00e5\\u00c9\\u00ed - Hebrew 177^M \\u00e3\\u00d1\\u00cd\\u00c8\\u00c7 - Arabic 178^M A\\u00e8i\\u00fb - Lithuanian (Thank you) 186^M \\u00c7\\u00e4\\u00f0\\u00e0\\u00e2\\u00f1\\u00f2\\u00e2\\u00f3\\u00e9\\u00f2\\u00e5 - Russian 204^M \\u00ca\\u00c7\\u00d1\\u00ca\\u00b4\\u00d5 - Thailand 222^M cze\\uc48f - Polish 238^M ^M Expected data^M =============^M Gr\\u00fcezi - Switzerland 0^M \\u3053\\u3093\\u306b\\u3061\\u306f - Japanese 128^M \\uc548\\ub155\\ud558\\uc138\\uc694 - Korean 129^M \\u4f60\\u597d - China 134^M \\u81fa\\u7063 - Traditional Chinese - Taiwan 136^M \\u03b3\\u03b5\\u03b9\\u03b1 \\u03c3\\u03bf\\u03c5 - Greek 161^M A\\u011fa\\u00e7 - Turkish (Tree) 162^M \\u20ab - Vietnam currency 163^M \\u05e9\\u05b8\\u05c1\\u05dc\\u05d5\\u05b9\\u05dd - Hebrew 177^M \\u0645\\u0631\\u062d\\u0628\\u0627 - Arabic 178^M A\\u010di\\u016b - Lithuanian (Thank you) 186^M \\u0417\\u0434\\u0440\\u0430\\u0432\\u0441\\u0442\\u0432\\u0443\\u0439\\u0442\\u0435 - Russian 204^M \\u0e2a\\u0e27\\u0e31\\u0e2a\\u0e14\\u0e35 - Thailand 222^M cze\\u015b\\u0107 - Polish 238^M ^M java.lang.RuntimeException: Test failed^M at RTFReadFontCharsetTest.main(RTFReadFontCharsetTest.java:114)^ ------------- PR Comment: https://git.openjdk.org/jdk/pull/13553#issuecomment-1781050285
