[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-27 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #11 from Dieter  ---
(In reply to David Huggins-Daines from comment #10)
> Interesting, what PDF viewer are you using?
I'm using Adobe Acrobat Reader

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

QA Administrators  changed:

   What|Removed |Added

 Whiteboard| QA:needsComment|

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #10 from David Huggins-Daines  ---
(In reply to Dieter from comment #9)
> David, thank you for reporting the bug. Unfortunately I'm not an expert with
> unicode, so I can only follow your steps.

Thanks for checking this out - as mentioned below you may be experiencing the
problem anyway, but you have a smarter PDF reader which is able to repair the
broken unicode map.

> I can't confirm the problem with
> Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
> Build ID: d2eab48f697a1e6097778158f623f11306ac7a3d

Still present for me with
Version: 24.8.0.0.beta1 (X86_64) / LibreOffice Community
Build ID: 318462181c709ed29c01eb3239b4d600d7b82ecc
CPU threads: 4; OS: macOS 13.6.7; UI render: Skia/Metal; VCL: osx
Locale: fr-CA (fr_CA.UTF-8); UI: en-US
Calc: threaded

> And when I openattacment 194661 (PDF ith wrong unicode), copy the text and
> paste it into LO writer, it also gives the correct result.

Interesting, what PDF viewer are you using?  Apple Preview and the GNOME
document viewer insert a blank (or "tofu") character for the character <02>
which is missing in the unicode mapping, giving:

x̌ ux̌ ux̌ ux̌

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

Dieter  changed:

   What|Removed |Added

 CC||dgp-m...@gmx.de
  Component|Writer  |Printing and PDF export

--- Comment #9 from Dieter  ---
David, thank you for reporting the bug. Unfortunately I'm not an expert with
unicode, so I can only follow your steps.

I can't confirm the problem with
Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: d2eab48f697a1e6097778158f623f11306ac7a3d
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render:
Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL threaded

And when I openattacment 194661 (PDF ith wrong unicode), copy the text and
paste it into LO writer, it also gives the correct result.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-25 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

QA Administrators  changed:

   What|Removed |Added

 Whiteboard|| QA:needsComment

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #8 from David Huggins-Daines  ---
Ah, okay.  In actual fact the "coalescing" should just not be done, because the
font embedded in the PDF still contains the three separate characters
<01>(=U+0078) <02>(=U+030C) and <03>(=U+0075) for display.  The <02> character
is not there by mistake, it is the actual character in the font.

I suggest finding whatver change caused entries in the ToUnicode CMap to be
clustered in this sense and just reverting it because there is no way the
extracted text can ever be valid aside from using the /ActualText tag, which
every PDF viewer I've tried this on does not actually look at.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #7 from David Huggins-Daines  ---
Ah, okay.  In actual fact the "coalescing" should just not be done, because the
font embedded in the PDF still contains the three separate characters
<01>(=U+0078) <02>(=U+030C) and <03>(=U+0075) for display.  The <02> character
is not there by mistake, it is the actual character in the font.

I suggest finding whatver change caused entries in the ToUnicode CMap to be
clustered in this sense and just reverting it because there is no way the
extracted text can ever be valid aside from using the /ActualText tag, which
every PDF viewer I've tried this on does not actually look at.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #6 from David Huggins-Daines  ---
Right, 243 and -242 are character spacing commands, as in section 9.4.3 of PDF
1.7:

array TJ Show one or more text strings, allowing individual glyph positioning.
Each
element of array shall be either a string or a number. If the element is a
string, this operator shall show the string. If it is a number, the operator
shall adjust the text position by that amount; that is, it shall translate the
text matrix, Tm . The number shall be expressed in thousandths of a unit
of text space (see 9.4.4, "Text Space Details"). This amount shall be
subtracted from the current horizontal or vertical coordinate, depending
on the writing mode. In the default coordinate system, a positive
adjustment has the effect of moving the next glyph painted either to the
left or down by the given amount. Figure 46 shows an example of the
effect of passing offsets to TJ.

[ (A) 120 (W) 120 (A) 95 (Y again) ] TJ

Anyway the problem seems pretty straightforward to fix, but I have no knowledge
of the relevant code, so perhaps that's optimistic :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #5 from David Huggins-Daines  ---
And upon looking at this again, it seems pretty clear what's happening: the
text span has been created as <01><02>, with one-to-one mappings to code points
<01> = U+0078 and <02> = U+030C.

Then later on some other code has coalesced these into a single mapping (since
it is a single grapheme) <01> = U+0078 U+030C

But the text span itself is not actually getting updated to reflect this, and
that's why the <02> is still there.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #4 from David Huggins-Daines  ---
Just an extra comment ... I am always rather lost in PDF internals ... the
242/243 are not of any consequence here, since you can see them in 7.4 as well.
 The problem is the <02> that doesn't map to anything.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #3 from David Huggins-Daines  ---
Created attachment 194663
  --> https://bugs.documentfoundation.org/attachment.cgi?id=194663=edit
Document used to create the two PDFs

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #2 from David Huggins-Daines  ---
Created attachment 194662
  --> https://bugs.documentfoundation.org/attachment.cgi?id=194662=edit
PDF from 7.4 with correct Unicode mapping

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 161514] Invalid unicode mappings in PDF output for combining diacritics (regression in 24.2)

2024-06-11 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=161514

--- Comment #1 from David Huggins-Daines  ---
Created attachment 194661
  --> https://bugs.documentfoundation.org/attachment.cgi?id=194661=edit
PDF from 24.2 with incorrect Unicode

-- 
You are receiving this mail because:
You are the assignee for the bug.