[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 postix changed: What|Removed |Added See Also||https://bugs.kde.org/show_b ||ug.cgi?id=401044 -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 Bug Janitor Service changed: What|Removed |Added Priority|NOR |HI -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 Albert Astals Cid changed: What|Removed |Added Priority|HI |NOR CC||aa...@kde.org -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 Bug Janitor Service changed: What|Removed |Added Priority|NOR |HI -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added CC||zbwu1...@gmail.com --- Comment #9 from David Hurka --- *** Bug 459447 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added See Also||https://bugs.kde.org/show_b ||ug.cgi?id=445851 -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 --- Comment #8 from David Hurka --- Created attachment 140133 --> https://bugs.kde.org/attachment.cgi?id=140133=edit Diagonal watermark text breaks text entity reordering I just got this link: http://files.pine64.org/doc/datasheet/pine64/AXP803_Datasheet_V1.0.pdf Text selection doesn’t work because of that “conf i dent i al” watermark. -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added See Also||https://bugs.kde.org/show_b ||ug.cgi?id=361538 -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added See Also||https://bugs.kde.org/show_b ||ug.cgi?id=207748 -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added CC||ea...@cornell.edu --- Comment #7 from David Hurka --- *** Bug 181559 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 Postix changed: What|Removed |Added Ever confirmed|0 |1 CC||pos...@posteo.eu Status|REPORTED|CONFIRMED -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added CC||vap...@gentoo.org --- Comment #6 from David Hurka --- *** Bug 300400 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added CC||martin.marmso...@gmail.com --- Comment #5 from David Hurka --- *** Bug 426171 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added CC||yury.tarasiev...@gmail.com --- Comment #4 from David Hurka --- *** Bug 338563 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 David Hurka changed: What|Removed |Added CC||lan...@web.de --- Comment #3 from David Hurka --- *** Bug 318768 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 --- Comment #2 from David Hurka --- Created attachment 120017 --> https://bugs.kde.org/attachment.cgi?id=120017=edit Diagonal text is not recognized as line Looking into core/textpage.cpp tells me that the generators just output characters with their bounding rectangles. (These informations become TinyTextEntitys.) There seems to be no information about orientation. There are some functions in core/textpage.cpp, whose code I didn’t read yet: removeSpace() Claims to remove space, to make output from different generators uniform. makeWordFromCharacters() Claims to rearrange characters to words, using spaces to distinguish between adjacent words. (But spaces are removed?) makeAndSortLines() Claims to look for adjacent words to make a line of them, and to sort the lines. calculateStatisticalInformation() Claims to be able to distinguish between character spacing, word spacing, and column spacing. Needed for multi-column layouts. XYCutForBoudingBoxes() Claims to apply the XY-cut algorithm, to seperate... something addNecessarySpace() Inserts the space that was probaby removed by removeSpace(), so selecting text does not result in words that are squashed together. TextPagePrivate::correctTextOrder() Calls the above, statically declared functions. Unfortunately, these functions don’t seem to be designed for vertical text. Even slightly diagonal text causes problems, see screenshot. (Possible reasons: XY-cut can’t “see” diagonal texts, makeAndSortLines() collects characters in a bad order) There are many commits on these functions, mainly done in 2011 by Albert Astals Cid and Mohammad Mahfuzur Rahman Mamun. The beginning was probably this commit? > commit 2eb5f270fd4befb6a84ff2e9bdd921271930e046 > Author: Mohammad Mahfuzur Rahman Mamun > Date: Mon Jun 27 19:58:24 2011 +0600 > > three functions added in textpage > > [snip a lot] Maybe these two people can give more information on how vertical text is supposed to be handled. -- You are receiving this mail because: You are the assignee for the bug.
[okular] [Bug 407133] Copy text from rotated pdf gives rubbish
https://bugs.kde.org/show_bug.cgi?id=407133 --- Comment #1 from David Hurka --- Created attachment 120007 --> https://bugs.kde.org/attachment.cgi?id=120007=edit Vertical texts are used for diagrams, but Okular can’t search for them You can fix the clipboard content with the following command ;) perl -e 'print reverse split //, <>;' Seems like the TextPage, which is used for search and text-copying, is filled this way. While the Generator adds horizontal words as words, vertical words are split into letters. Then, Okular thinks, that the uppermost letter is the first letter. Letters or words are stored in TextEntity objects in the TextPage. The TextEntity stores the letter/word as string and the bounding rectangle. The problem is one of these two: (choose what you like more) 1. TextPage and TextEntity can’t store transformations, or even simple rotation. So, the generator splits vertical words into single letters. *1 2. The generator, which uses poppler to read the pdf, gets vertical words already split into letters. *1) Possible reason: this way, one can (theoretically *2) use the Text Selection tool to select the word. *2) Practically not, because Okular adds any other letter on the same height to the selection. I have attached a screenshot which illustrates the practical relevance of this problem: In many datasheets (not only TI), vertical text is used to describe vertical axes of diagrams. Splitting them into words prevents searching for a specific diagram. -- You are receiving this mail because: You are the assignee for the bug.