[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2023-04-12 Thread postix
https://bugs.kde.org/show_bug.cgi?id=407133

postix  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=401044

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2022-09-21 Thread Bug Janitor Service
https://bugs.kde.org/show_bug.cgi?id=407133

Bug Janitor Service  changed:

   What|Removed |Added

   Priority|NOR |HI

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2022-09-21 Thread Albert Astals Cid
https://bugs.kde.org/show_bug.cgi?id=407133

Albert Astals Cid  changed:

   What|Removed |Added

   Priority|HI  |NOR
 CC||aa...@kde.org

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2022-09-21 Thread Bug Janitor Service
https://bugs.kde.org/show_bug.cgi?id=407133

Bug Janitor Service  changed:

   What|Removed |Added

   Priority|NOR |HI

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2022-09-21 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

 CC||zbwu1...@gmail.com

--- Comment #9 from David Hurka  ---
*** Bug 459447 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2021-11-21 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=445851

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2021-07-17 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

--- Comment #8 from David Hurka  ---
Created attachment 140133
  --> https://bugs.kde.org/attachment.cgi?id=140133=edit
Diagonal watermark text breaks text entity reordering

I just got this link:
http://files.pine64.org/doc/datasheet/pine64/AXP803_Datasheet_V1.0.pdf

Text selection doesn’t work because of that “conf i dent i al” watermark.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2021-07-13 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=361538

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2021-06-22 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=207748

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2020-12-20 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

 CC||ea...@cornell.edu

--- Comment #7 from David Hurka  ---
*** Bug 181559 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2020-09-05 Thread Postix
https://bugs.kde.org/show_bug.cgi?id=407133

Postix  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||pos...@posteo.eu
 Status|REPORTED|CONFIRMED

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2020-09-05 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

 CC||vap...@gentoo.org

--- Comment #6 from David Hurka  ---
*** Bug 300400 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2020-09-05 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

 CC||martin.marmso...@gmail.com

--- Comment #5 from David Hurka  ---
*** Bug 426171 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2020-09-05 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

 CC||yury.tarasiev...@gmail.com

--- Comment #4 from David Hurka  ---
*** Bug 338563 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2020-09-05 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

David Hurka  changed:

   What|Removed |Added

 CC||lan...@web.de

--- Comment #3 from David Hurka  ---
*** Bug 318768 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2019-05-12 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

--- Comment #2 from David Hurka  ---
Created attachment 120017
  --> https://bugs.kde.org/attachment.cgi?id=120017=edit
Diagonal text is not recognized as line

Looking into core/textpage.cpp tells me that the generators just output
characters with their bounding rectangles. (These informations become
TinyTextEntitys.) There seems to be no information about orientation.

There are some functions in core/textpage.cpp, whose code I didn’t read yet:


removeSpace()
Claims to remove space, to make output from different generators uniform.

makeWordFromCharacters()
Claims to rearrange characters to words, using spaces to distinguish between
adjacent words. (But spaces are removed?)

makeAndSortLines()
Claims to look for adjacent words to make a line of them, and to sort the
lines.

calculateStatisticalInformation()
Claims to be able to distinguish between character spacing, word spacing, and
column spacing. Needed for multi-column layouts.

XYCutForBoudingBoxes()
Claims to apply the XY-cut algorithm, to seperate... something

addNecessarySpace()
Inserts the space that was probaby removed by removeSpace(), so selecting text
does not result in words that are squashed together.

TextPagePrivate::correctTextOrder()
Calls the above, statically declared functions.


Unfortunately, these functions don’t seem to be designed for vertical text.
Even slightly diagonal text causes problems, see screenshot. (Possible reasons:
XY-cut can’t “see” diagonal texts, makeAndSortLines() collects characters in a
bad order)

There are many commits on these functions, mainly done in 2011 by Albert Astals
Cid and Mohammad Mahfuzur Rahman Mamun. The beginning was probably this commit?

> commit 2eb5f270fd4befb6a84ff2e9bdd921271930e046
> Author: Mohammad Mahfuzur Rahman Mamun 
> Date:   Mon Jun 27 19:58:24 2011 +0600
> 
> three functions added in textpage
> 
> [snip a lot]

Maybe these two people can give more information on how vertical text is
supposed to be handled.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

2019-05-12 Thread David Hurka
https://bugs.kde.org/show_bug.cgi?id=407133

--- Comment #1 from David Hurka  ---
Created attachment 120007
  --> https://bugs.kde.org/attachment.cgi?id=120007=edit
Vertical texts are used for diagrams, but Okular can’t search for them

You can fix the clipboard content with the following command ;)
perl -e 'print reverse split //, <>;'

Seems like the TextPage, which is used for search and text-copying, is filled
this way. While the Generator adds horizontal words as words, vertical words
are split into letters. Then, Okular thinks, that the uppermost letter is the
first letter.

Letters or words are stored in TextEntity objects in the TextPage. The
TextEntity stores the letter/word as string and the bounding rectangle.

The problem is one of these two: (choose what you like more)
1. TextPage and TextEntity can’t store transformations, or even simple
rotation. So, the generator splits vertical words into single letters. *1
2. The generator, which uses poppler to read the pdf, gets vertical words
already split into letters.

*1) Possible reason: this way, one can (theoretically *2) use the Text
Selection tool to select the word.
*2) Practically not, because Okular adds any other letter on the same height to
the selection.

I have attached a screenshot which illustrates the practical relevance of this
problem: In many datasheets (not only TI), vertical text is used to describe
vertical axes of diagrams. Splitting them into words prevents searching for a
specific diagram.

-- 
You are receiving this mail because:
You are the assignee for the bug.