[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #11 from Dieter --- (In reply to David Huggins-Daines from comment #10) > Interesting, what PDF viewer are you using? I'm using Adobe Acrobat Reader -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 QA Administrators changed: What|Removed |Added Whiteboard| QA:needsComment| -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #10 from David Huggins-Daines --- (In reply to Dieter from comment #9) > David, thank you for reporting the bug. Unfortunately I'm not an expert with > unicode, so I can only follow your steps. Thanks for checking this out - as mentioned below you may be experiencing the problem anyway, but you have a smarter PDF reader which is able to repair the broken unicode map. > I can't confirm the problem with > Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community > Build ID: d2eab48f697a1e6097778158f623f11306ac7a3d Still present for me with Version: 24.8.0.0.beta1 (X86_64) / LibreOffice Community Build ID: 318462181c709ed29c01eb3239b4d600d7b82ecc CPU threads: 4; OS: macOS 13.6.7; UI render: Skia/Metal; VCL: osx Locale: fr-CA (fr_CA.UTF-8); UI: en-US Calc: threaded > And when I openattacment 194661 (PDF ith wrong unicode), copy the text and > paste it into LO writer, it also gives the correct result. Interesting, what PDF viewer are you using? Apple Preview and the GNOME document viewer insert a blank (or "tofu") character for the character <02> which is missing in the unicode mapping, giving: x̌ ux̌ ux̌ ux̌ -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 Dieter changed: What|Removed |Added CC||dgp-m...@gmx.de Component|Writer |Printing and PDF export --- Comment #9 from Dieter --- David, thank you for reporting the bug. Unfortunately I'm not an expert with unicode, so I can only follow your steps. I can't confirm the problem with Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: d2eab48f697a1e6097778158f623f11306ac7a3d CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win Locale: de-DE (de_DE); UI: en-GB Calc: CL threaded And when I openattacment 194661 (PDF ith wrong unicode), copy the text and paste it into LO writer, it also gives the correct result. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 QA Administrators changed: What|Removed |Added Whiteboard|| QA:needsComment -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #8 from David Huggins-Daines --- Ah, okay. In actual fact the "coalescing" should just not be done, because the font embedded in the PDF still contains the three separate characters <01>(=U+0078) <02>(=U+030C) and <03>(=U+0075) for display. The <02> character is not there by mistake, it is the actual character in the font. I suggest finding whatver change caused entries in the ToUnicode CMap to be clustered in this sense and just reverting it because there is no way the extracted text can ever be valid aside from using the /ActualText tag, which every PDF viewer I've tried this on does not actually look at. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #7 from David Huggins-Daines --- Ah, okay. In actual fact the "coalescing" should just not be done, because the font embedded in the PDF still contains the three separate characters <01>(=U+0078) <02>(=U+030C) and <03>(=U+0075) for display. The <02> character is not there by mistake, it is the actual character in the font. I suggest finding whatver change caused entries in the ToUnicode CMap to be clustered in this sense and just reverting it because there is no way the extracted text can ever be valid aside from using the /ActualText tag, which every PDF viewer I've tried this on does not actually look at. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #6 from David Huggins-Daines --- Right, 243 and -242 are character spacing commands, as in section 9.4.3 of PDF 1.7: array TJ Show one or more text strings, allowing individual glyph positioning. Each element of array shall be either a string or a number. If the element is a string, this operator shall show the string. If it is a number, the operator shall adjust the text position by that amount; that is, it shall translate the text matrix, Tm . The number shall be expressed in thousandths of a unit of text space (see 9.4.4, "Text Space Details"). This amount shall be subtracted from the current horizontal or vertical coordinate, depending on the writing mode. In the default coordinate system, a positive adjustment has the effect of moving the next glyph painted either to the left or down by the given amount. Figure 46 shows an example of the effect of passing offsets to TJ. [ (A) 120 (W) 120 (A) 95 (Y again) ] TJ Anyway the problem seems pretty straightforward to fix, but I have no knowledge of the relevant code, so perhaps that's optimistic :) -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #5 from David Huggins-Daines --- And upon looking at this again, it seems pretty clear what's happening: the text span has been created as <01><02>, with one-to-one mappings to code points <01> = U+0078 and <02> = U+030C. Then later on some other code has coalesced these into a single mapping (since it is a single grapheme) <01> = U+0078 U+030C But the text span itself is not actually getting updated to reflect this, and that's why the <02> is still there. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #4 from David Huggins-Daines --- Just an extra comment ... I am always rather lost in PDF internals ... the 242/243 are not of any consequence here, since you can see them in 7.4 as well. The problem is the <02> that doesn't map to anything. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #3 from David Huggins-Daines --- Created attachment 194663 --> https://bugs.documentfoundation.org/attachment.cgi?id=194663=edit Document used to create the two PDFs -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #2 from David Huggins-Daines --- Created attachment 194662 --> https://bugs.documentfoundation.org/attachment.cgi?id=194662=edit PDF from 7.4 with correct Unicode mapping -- You are receiving this mail because: You are the assignee for the bug.
[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)
https://bugs.documentfoundation.org/show_bug.cgi?id=161514 --- Comment #1 from David Huggins-Daines --- Created attachment 194661 --> https://bugs.documentfoundation.org/attachment.cgi?id=194661=edit PDF from 24.2 with incorrect Unicode -- You are receiving this mail because: You are the assignee for the bug.