[ https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917887#comment-13917887 ]
Vicente commented on PDFBOX-1956: --------------------------------- When I get file A to convert in text the result is OK but when I get file B the result is not OK. For example the original Text (Object) are converted to wrong character (2EMHFWV). Could be encoded problem ? > Wrong character on conversion PDF to TXT > ---------------------------------------- > > Key: PDFBOX-1956 > URL: https://issues.apache.org/jira/browse/PDFBOX-1956 > Project: PDFBox > Issue Type: Task > Components: Parsing > Affects Versions: 1.8.4 > Environment: Windows > Reporter: Vicente > Labels: parser > Attachments: example a.pdf, example b.pdf > > > I am trying to convert PDF to TXT and some PDF, after converted, the String > present wrong character. Could be UNICODE problem ? Can somebody help me ? > I oberved that the problem when try to convert PDF, created by PDFCreator, in > Text. The character are wrong. Any suggesting ? > the code > public class PDFTextParser { > > PDFParser parser; > String parsedText; > PDFTextStripper pdfStripper; > PDDocument pdDoc; > COSDocument cosDoc; > PDDocumentInformation pdDocInfo; > > // PDFTextParser Constructor > public PDFTextParser() { > } > > // Extract text from PDF Document > public String pdftoText(String fileName) { > > System.out.println("Parsing text from PDF file " + fileName + "...."); > File f = new File(fileName); > > if (!f.isFile()) { > System.out.println("File " + fileName + " does not exist."); > return null; > } > > try { > parser = new PDFParser(new FileInputStream(f)); > } catch (Exception e) { > System.out.println("Unable to open PDF Parser."); > return null; > } > > try { > parser.parse(); > cosDoc = parser.getDocument(); > pdfStripper = new PDFTextStripper(); > pdDoc = new PDDocument(cosDoc); > parsedText = pdfStripper.getText(pdDoc); > } catch (Exception e) { > System.out.println("An exception occured in parsing the PDF > Document."); > e.printStackTrace(); > try { > if (cosDoc != null) cosDoc.close(); > if (pdDoc != null) pdDoc.close(); > } catch (Exception e1) { > e.printStackTrace(); > } > return null; > } > System.out.println("Done."); > return parsedText; > } > -- This message was sent by Atlassian JIRA (v6.2#6252)