[
https://issues.apache.org/jira/browse/PDFBOX-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425697#comment-17425697
]
Tilman Hausherr edited comment on PDFBOX-5290 at 10/7/21, 5:11 PM:
-------------------------------------------------------------------
No they're the same (or rather, based on the same sub projects - the app is a
merge of several jars). Please try a clean build / remove all old versions from
the classpath, i.e. look into the directories what's there. If it still
happens, please share the stack trace.
was (Author: tilman):
No they're the same. Please try a clean build / remove all old versions from
the classpath, i.e. look into the directories what's there. If it still
happens, please share the stack trace.
> ClassCastException during Text Extraction
> -----------------------------------------
>
> Key: PDFBOX-5290
> URL: https://issues.apache.org/jira/browse/PDFBOX-5290
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.20, 2.0.24
> Reporter: Eric R Manzitti
> Priority: Major
> Attachments: newBroke.pdf, newBroke.txt
>
>
> I am getting:
>
> java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary cannot be
> cast to org.apache.pdfbox.cos.COSArray
> When executing the following code:
>
> public byte[] extractTextPDFBox(String fileNamePath) throws PQException {
> String UTF_8 = "UTF-8";
> PDFLibraryProperties pdfLibraryProperties =
> PDFLibraryProperties.getInstance();
> String regex =
> pdfLibraryProperties.getAsString(PDFLibraryConstants.REGEX_TO_REMOVE_FROM_EXTRACTED_TEXT);
> byte[] bytesToReturn;
> try {
> FileInputStream fis = new FileInputStream(new File(fileNamePath));
> PDDocument pdfDoc = PDDocument.load(fis);
> PDFTextStripper pdfStripper = new PDFTextStripper();
> String textFromPDF = pdfStripper.getText(pdfDoc);
> pdfDoc.close();
> bytesToReturn = textFromPDF.getBytes(UTF_8);
> String textStr = new String(bytesToReturn).replaceAll(regex,
> PDFLibraryConstants.BLANK_SPACE);
> bytesToReturn = textStr.getBytes();
> fis.close();
> } catch (IOException e) {
> pqUtilityLogger.logError(e.getMessage());
> throw new PQException("e.getMessage());
> }
> return bytesToReturn;
> }
>
> It dies on String textFromPDF = pdfStripper.getText(pdfDoc);
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]