[ https://issues.apache.org/jira/browse/PDFBOX-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson updated PDFBOX-800: ------------------------------- Fix Version/s: (was: 2.0.0) > Wrong text extract from vertical textboxes in pdf files > ------------------------------------------------------- > > Key: PDFBOX-800 > URL: https://issues.apache.org/jira/browse/PDFBOX-800 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.7.0 > Environment: Windows 7, VS 2010 C#, Tika Library > Reporter: Sandor Dj > Attachments: problemdoc.doc, problemdoc.pdf > > > Vertical textboxes in pdf files are not extracted correctly (using the tika > library in C#). > For example if there is a vertical textbox "hello" in a pdf file (!WITHOUT! > line breaks): > H > E > L > L > O > the parser returns 5 strings, each with a single letter, even there is NO > line break after every letter. > Is there a option to avoid this problem? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org