https://bugs.documentfoundation.org/show_bug.cgi?id=156079
--- Comment #5 from خالد حسني <kha...@libreoffice.org> --- Copying from Adobe Reader, I get: 012345 6789 (no funny characters, but there an extra space which is not surprising as many PDF readers will interpret a large gap between glyphs as space even if the PDF does not have a space character there) If I use pdftotext, I get: 0123456789 The number grouping is a “feature” of Linux Libertine G font, but it is done in a very odd way that affects PDF export. $ hb-shape LinBiolinum_R_G.ttf "0123456789" --no-positions [zero=0|uni202F=1|one=1|two=2|three=3|uni202F=4|four=4|five=4|six=6|uni202F=7|seven=7|eight=7|nine=7] (the text before equal sign is the glyph name, and the number after it is the index of the input string corresponding to this character) The font output zero fine, no funny business. Then it outputs the glyph for NNBSP then glyph for one and gives both the same input string index, then two and three normally, then NNBSP, four and five and gives all the three of them the same input string index, then six normally, then NNBSP, seven, eight and nine and gives the four of them the same input string index. This funny business with input string index leads us to group the output as the following mapping between glyphs and input characters: zero => "0" uni202F,one => "1" two => "2" three => "3" uni202F,four,five => "45" six => "6" uni202F,seven,eight,nine => "789" This mapping of multiple glyphs to multiple input characters is problematic in PDF for text extraction, since PDF can represent only one glyph to one character or one glyph ti multiple characters mapping. To keep the text copy-able we have to resent to tagging the problematic glyph groups using /ActualText spans, and not all PDF viewers support this. So this a combination of oddly built font and buggy PDF viewers, we are doing our best and there is not much we can do about this. -- You are receiving this mail because: You are the assignee for the bug.