[
https://issues.apache.org/jira/browse/PDFBOX-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vicente updated PDFBOX-1956:
----------------------------
Comment: was deleted
(was: When I get file A to convert in text the result is OK but when I get file
B the result is not OK. For example the original Text (Object) are converted to
wrong character (2EMHFWV). Could be encoded problem ?)
> Wrong character on conversion PDF to TXT
> ----------------------------------------
>
> Key: PDFBOX-1956
> URL: https://issues.apache.org/jira/browse/PDFBOX-1956
> Project: PDFBox
> Issue Type: Task
> Components: Parsing
> Affects Versions: 1.8.4
> Environment: Windows
> Reporter: Vicente
> Labels: parser
> Attachments: example a.pdf, example b.pdf
>
>
> I am trying to convert PDF to TXT and some PDF, after converted, the String
> present wrong character. Could be UNICODE problem ? Can somebody help me ?
> I oberved that the problem when try to convert PDF, created by PDFCreator, in
> Text. The character are wrong. Any suggesting ?
> the code
> public class PDFTextParser {
>
> PDFParser parser;
> String parsedText;
> PDFTextStripper pdfStripper;
> PDDocument pdDoc;
> COSDocument cosDoc;
> PDDocumentInformation pdDocInfo;
>
> // PDFTextParser Constructor
> public PDFTextParser() {
> }
>
> // Extract text from PDF Document
> public String pdftoText(String fileName) {
>
> System.out.println("Parsing text from PDF file " + fileName + "....");
> File f = new File(fileName);
>
> if (!f.isFile()) {
> System.out.println("File " + fileName + " does not exist.");
> return null;
> }
>
> try {
> parser = new PDFParser(new FileInputStream(f));
> } catch (Exception e) {
> System.out.println("Unable to open PDF Parser.");
> return null;
> }
>
> try {
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
> parsedText = pdfStripper.getText(pdDoc);
> } catch (Exception e) {
> System.out.println("An exception occured in parsing the PDF
> Document.");
> e.printStackTrace();
> try {
> if (cosDoc != null) cosDoc.close();
> if (pdDoc != null) pdDoc.close();
> } catch (Exception e1) {
> e.printStackTrace();
> }
> return null;
> }
> System.out.println("Done.");
> return parsedText;
> }
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)