[ https://issues.apache.org/jira/browse/PDFBOX-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751920#action_12751920 ]
Andreas Lehmkühler commented on PDFBOX-508: ------------------------------------------- First of all thanks for the contribution. I've made some tests and it worked with your sample, but there are some unwanted sideeffects with other documents. I guess we have to do some more tests, as your patch affects a really fragile part of the textextract part of pdfbox. > Lost spacing as a result of operator "Tc" ignoring. > --------------------------------------------------- > > Key: PDFBOX-508 > URL: https://issues.apache.org/jira/browse/PDFBOX-508 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 0.8.0-incubator > Environment: JDK 1.6.0_16 > Reporter: Dmitry Gutso > Attachments: 2a.pdf, 2a_repl2.pdf, PDFStreamEngine_For_Spacing.diff, > TextPosition_for_Spacing.diff > > > Continue https://issues.apache.org/jira/browse/PDFBOX-234 > Lost spacing as a result of operator "Tc" ignoring. > Ex: > **************************************** > BT > 6 0 0 6 244.0800018311 795.8400268555 Tm > 6.5475001335 Tc > (41) Tj > **************************************** > Here PDFTextStripper.writeText() returns "41" (without spacing ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.