[Desktop-packages] [Bug 33288]
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/516. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Unknown Status in poppler package in Ubuntu: Fix Released Status in poppler source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288] Re: Evince doesn't handle columns properly
** Changed in: poppler Status: Confirmed => Unknown -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Unknown Status in poppler package in Ubuntu: Fix Released Status in poppler source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Created attachment 121848 Cache result of inner loop in visitDepthFirst This is an alternative to Brian's patch in comment 65. This speeds up the visitDepthFirst function by caching the result in the inner loop. This provides a similar speedup without changing the output of pdftotext. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in poppler package in Ubuntu: Fix Released Status in poppler source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Comment on attachment 121848 Cache result of inner loop in visitDepthFirst Looks good to me, pushed. Thanks! -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in poppler package in Ubuntu: Fix Released Status in poppler source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
I went through the patch written by me and unfortunately I can not make it so that it returns the same result as before. I've separated axes in which it searches for immediate up/down/left/right neighbours, sorted them and found the neighbours by single pass (+ number of possible neighbour candidates in the other axis for given block, which should be sqrt(n) in average case). The difference is that the patch searches for right-down neighbour by looking at down neighbour of its right neighbour and at the right neighbour of its down neighbour. If they match then it selects it as the right-bottom neighbour. Previous version just searched for the closest block which is to the right of the block and below the block (and looking at the code, the result could depend on order of the blocks in the searched array). Modifying the patch so that it would return the same results as without it would cost the whole efficiency. Anyway, the efficiency improvement of my patch is not as big as the one from Brian so you can reject it (but I think that it improves the conversion :) ). -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in poppler package in Ubuntu: Fix Released Status in poppler source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
I just realized we never really followed up on the last two patches. My concern is that they change the pdftotext output. I though they where for a) speed b) text selection so i'd prefer if they did not change pdftotext output. I've checked a few files of the changed ones and the changes can be argued not to be for better or worse, but then again the problem is that 1 out of 3 files i have has a changed output in pdftotext and having 1600 files in the test suite it makes it impossible for me to go over them all and verify the changes are "not better nor worse". Is there any chance we get patches that don't influence pdftotext output or at least not such drastically? And yes, i know it's ages ago since you wrote the patches, sorry. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in poppler package in Ubuntu: Fix Released Status in poppler source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Created attachment 97356 Improve efficiency of searching for tables (In reply to comment #68) Created attachment 40308 [details] [review] optimization of search for tables This is an updated version of the patch. It needs Brian's patch to be already applied. (see #77087 for additional info) -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
(In reply to comment #70) (I originally replied on launchpad, which is supposed to copy it through to here, but it hasn't.) Carlos: it isn't a regression that lines outside a rectangle formed by the start and endpoints are included, it's the intent. Consider selecting in a document with two columns, starting in the 1st column 2/3 down the page, ending in the 2nd column 1/3 down the page. In this case, the correct selection consists entirely of lines that lie outside the rectangle formed by the start and endpoints (ie, the bottom 1/3 of the 1st column and the top 1/3 of the 2nd column). You get situations like this even for single column text; just choose start and end points vertically above each other. The motivation for this patch was that text selection by rectangles is fundamentally wrong. The correct approach is to reconstruct the reading order of text; then from two points on the page, find the nearest insertion points (where an edit cursor would go); swap the insertion points if necessary; then return the characters between them. The difficulties lie in inferring the reading order, and determining what 'nearest insertion point' means. Clicking inside a word, the nearest insertion point is obvious; it's the nearest character boundary. Click in a blank area, and it's less clear. In Breuel's algorithms that I used for determining reading order, there is something that helps here. There, line width is determined by expanding the line left and right to fit the column it contains. So the line 'box' contains the initial indent if it is the first line of a paragraph, or the trailing space in the last line; or the ragged space for left- or right- justified text. Poppler doesn't have columns as such, but blocks instead, and as I recall the line boxes are the tight bounding box of the words contained in the line. So we can try to determine insertion point by looking for the nearest block (horizontally and vertically), then the nearest line (vertically ONLY, so that we ignore indents/ragged space), then nearest character (horizontally). I mean these to be three different comparisons, discarding blocks, line and character candidates at each stage, not some single distance you sum up. The upshot would be that clicking in blank areas of a line that lie within its block's bounding box - or even nearby - will choose that line, not the one above or below. (It's been ages since I looked at the poppler code, I can't remember if this heuristic is what the patches do already) -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288] Re: Evince doesn't handle columns properly
(oops, replied on launchpad, not sure if Carlos reads there. Repeating for fdo) Carlos: it isn't a regression that lines outside a rectangle formed by the start and endpoints are included, it's the intent. Consider selecting in a document with two columns, starting in the 1st column 2/3 down the page, ending in the 2nd column 1/3 down the page. In this case, the correct selection consists entirely of lines that lie outside the rectangle formed by the start and endpoints (ie, the bottom 1/3 of the 1st column and the top 1/3 of the 2nd column). The motivation for this patch was that text selection by rectangles is fundamentally wrong. The correct approach is to reconstruct the reading order of text; then from two points on the page, find the nearest insertion points (where an edit cursor would go); swap the insertion points if necessary; then return the characters between them. The difficulties lie in inferring the reading order, and determining what 'nearest insertion point' means. Clicking inside a word, the nearest insertion point is obvious; it's the nearest character boundary. Click in a blank area, and it's less clear. In Breuel's algorithms that I used for determining reading order, there is something that helps here. There, line width is determined by expanding the line left and right to fit the column it contains. So the line 'box' contains the initial indent if it is the first line of a paragraph, or the trailing space in the last line; or the ragged space for left- or right- justified text. Poppler doesn't have columns as such, but blocks instead, and as I recall the line boxes are the tight bounding box of the words contained in the line. So we can try to determine insertion point by looking for the nearest block (horizontally and vertically), then the nearest line (vertically ONLY, so that we ignore indents/ragged space), then nearest character (horizontally). I mean these to be three different comparisons, discarding blocks, line and character candidates at each stage, not some single distance you sum up. The upshot would be that clicking in blank areas of a line that lie within its block's bounding box - or even nearby - will choose that line, not the one above or below. (It's been ages since I looked at the poppler code, I can't remember if this heuristic is what the patches do already) -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288] Re: Evince doesn't handle columns properly
Carlos: it isn't a regression that lines outside a rectangle formed by the start and endpoints are included, it's the intent. Consider selecting in a document with two columns, starting in the 1st column 2/3 down the page, ending in the 2nd column 1/3 down the page. In this case, the correct selection consists entirely of lines that lie outside the rectangle formed by the start and endpoints (ie, the bottom 1/3 of the 1st column and the top 1/3 of the 2nd column). The motivation for this patch was that text selection by rectangles is fundamentally wrong. The correct approach is to reconstruct the reading order of text; then from two points on the page, find the nearest insertion points (where an edit cursor would go); swap the insertion points if necessary; then return the characters between them. The difficulties lie in inferring the reading order, and determining what 'nearest insertion point' means. Clicking inside a word, the nearest insertion point is obvious; it's the nearest character boundary. Click in a blank area, and it's less clear. In Breuel's algorithms that I used for determining reading order, there is something that helps here. There, line width is determined by expanding the line left and right to fit the column it contains. So the line 'box' contains the initial indent if it is the first line of a paragraph, or the trailing space in the last line; or the ragged space for left- or right- justified text. Poppler doesn't have columns as such, but blocks instead, and as I recall the line boxes are the tight bounding box of the words contained in the line. So we can try to determine insertion point by looking for the nearest block (horizontally and vertically), then the nearest line (vertically ONLY, so that we ignore indents/ragged space), then nearest character (horizontally). I mean these to be three different comparisons, discarding blocks, line and character candidates at each stage, not some single distance you sum up. The upshot would be that clicking in blank areas of a line that lie within its block's bounding box - or even nearby - will choose that line, not the one above or below. (It's been ages since I looked at the poppler code, I can't remember if this heuristic is what the patches do already) -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
While working on bug #71160 I've found another regression introduced by this fix. In some cases, additional lines are added to the selection. For example, open the hig document and go to the first page. Start selecting the second line, but dragging from the margin, and you will see that the first line is selected too. This is because the second line is more indented than the first one. This fix changed the way blocks and lines are included in the selection by using the manhattan distance, and in this case, the distance of the first line is less than the second line, but the first line doesn't even intersect with the selection rectangle. If you start the selection closer to the beginning of the second line, then the first line is not included because distance to the second line is less in such case. You can play with it now using the text demo of poppler-glib-demo. I've added an area selector to get the text of a given area. Try using X1=0, Y1=122, which should discard the first line, but it doesn't. However using X1=257, Y1=122 discards the first line entirely. So, I think that we need to somehow discard blocks and lines that don't intersect with the selection rectangle even if the manhattan distance is less than any other block/line. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
(In reply to comment #42) Created an attachment (id=31406) [details] 3/5 reading order (bug fixed) It seems this bug introduced an important performance regression, see the detailed analysis in poppler mailing list: http://lists.freedesktop.org/archives/poppler/2010-October/006566.html -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Correctness trumps performance, right? -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
It should be possible to improve this substantially. When I wrote the patch I was being very conservative with the existing poppler data structures, so essentially that method is traversing an unordered list. If the block list was in isBeforeByRule1 order most of those comparisons would go away. I can't remember if this would break clients wanting access to the text in physical order-it's been a while since I looked at the code and I'm reading this on a phone. Can take a deeper look tomorrow. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Created attachment 40061 improved patch Had another look and tidied the code a bit removing repeated page orientation checks, and a redundant test for overlap in rule(2). This is noticeably faster rendering the bus map. (down to ~14.8s) -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Created attachment 40056 patch to fix performance regression Here's what I've got so far. On my very slow VM, this renders the paris bus map reported on the mailing list in ~15.2s, compared to ~60s without the patch. YX-sorting alone got the time down to ~16.2s. Rendering on other documents is as fast as ever. Almost all of the time rendering the bus map is prior to the sort, so there must still be some quadratic algorithms in there unrelated to reading order. There is one obvious fix on my list that I didn't implement (track the first unvisited block, start loops there) but I don't think this will make much difference for the effort it requires. I'll be offline until Monday 8 Nov, but I'd be grateful if some more eyes could look at this to make sure I haven't regressed anything. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Can you please attach a patch without unnecessary spacing changes like -before = gTrue; + before = gTrue; Thanks :-) -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
There are also 2 nested for cycles in the block preceded by this comment: /* set extended bounding boxes of all other blocks * so that they extend in x without hitting neighbours */ I'm working on an optimization of it. Marek -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
(In reply to comment #65) Created an attachment (id=40124) [details] patch without extraneous whitespace changes Oops! Ok, here's the patch without the whitespace changes. Tested with the new whitespace patch - renders the map PDFs much faster in cairo, other PDFs aven't changed much. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Created attachment 40308 optimization of search for tables Hi, this patch improves efficiency of searching for blocks which belong to a table. In the worst case it is still quadratic but it should be O(n*sqrt(n)) in average case. It creates a list of all y coordinates and then sort it. After that it goes through this list and for each y coordinate which begins a block it starts a local while cycle. This while cycle searches for blocks which overlaps with the actual block in y. It finds closest blocks from the left and from the right during this search. It does the same for the x coordinate and finds up and down adjacent blocks. After that, it uses this information for computing of ExMin, .., EyMax and for determining whether the actual block is part of a table. (You need to have the Brian's patch applied already.) Regards Marek -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
(In reply to comment #62) Created an attachment (id=40061) [details] improved patch Had another look and tidied the code a bit removing repeated page orientation checks, and a redundant test for overlap in rule(2). This is noticeably faster rendering the bus map. (down to ~14.8s) I tested this in glib. It improved rendering for me significantly for the PDF's prone to be affected. I tested other PDFs as well and didn't notice anything. I did a little bit of testing with test selection as well, but not as much, text selection seemed OK. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Brian, your patch changes the output of pdftotext in a file, is this to be expected? -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp
[Desktop-packages] [Bug 33288]
Created attachment 40124 patch without extraneous whitespace changes Oops! Ok, here's the patch without the whitespace changes. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/33288 Title: Evince doesn't handle columns properly Status in Poppler: Confirmed Status in “poppler” package in Ubuntu: Fix Released Status in “poppler” source package in Lucid: Fix Released Bug description: So, now that RC is here, let's propose it as an SRU. I've pushed it in lucid-proposed. The debdiff is poppler_0.12.4-0ubuntu4_2_0.12.4-0ubuntu5.debdiff attached there for information. I'm removing old debdiff to avoid confusion. poppler (0.12.4-0ubuntu5) lucid-proposed; urgency=low * debian/patches/11_column_selection.patch: - backport from upstream git commit to fix wrong selection in pdf when containing tables, long text, broken flow and so on. (fixing most of known issues with selection in pdf) (LP: #33288) When making a multi column selection from a PDF like this: http://www.specialist-games.com/mordheim/assets/lrb/1Rules.pdf And pasting the result into OpenOffice.org the columns are not maintained. The results unusable because the text from both columns becomes mixed. Please note, this is not a PDF problem, using Adobe Acrobat Reader 7.x under Windows does properly copy-paste columned text over to OpenOffice.org. Regards, Pascal de Bruijn To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/33288/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp