Re: [jira] [Commented] (TIKA-2917) Extract metadata from inline images in PDFs

Tim Allison Fri, 20 Nov 2020 11:21:19 -0800

Y.  That should do it.  I don't think we're currently documenting this.  It
looks like POI and PDFBox also require jce unlimited to build.


Hmmm... Should we assumeTrue that jce is installed and then skip that unit
test if not or do we want to require it to build Tika?

On Fri, Nov 20, 2020 at 1:43 PM Ken Krugler <[email protected]>
wrote:

> Hi all,
>
> I was trying to build the 1.25-rc1 branch, and ran into this same issue
> while building the Tika parsers:
>
> > Tests run: 87, Failures: 0, Errors: 1, Skipped: 3, Time elapsed: 6.816 s
> <<< FAILURE! - in org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest
> > org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted
> Time elapsed: 0.286 s  <<< ERROR!
> > org.apache.tika.exception.TikaException: Unexpected RuntimeException
> from org.apache.tika.parser.microsoft.OfficeParser@c0de6c9
> >       at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1120)
> > Caused by: org.apache.poi.EncryptedDocumentException: Export
> Restrictions in place - please install JCE Unlimited Strength Jurisdiction
> Policy files
> >       at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1120)
>
> I assume I need to follow instructions at say
> https://dzone.com/articles/install-java-cryptography-extension-jce-unlimited
> to get the appropriate files installed, yes?
>
> And is this documented for Tika somewhere?
>
> Thanks,
>
> — Ken
>
>
> > On Jul 31, 2019, at 9:45 AM, Tim Allison <[email protected]> wrote:
> >
> > Dave,
> >  So that I can fix stuff in the future...can you share with me how to
> > fix this issue on Hudson?
> >
> > org.apache.tika.parser.microsoft.OfficeParser@6f1fd7c1
> > at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1234)
> > Caused by: org.apache.poi.EncryptedDocumentException: Export
> > Restrictions in place - please install JCE Unlimited Strength
> > Jurisdiction Policy files
> > at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1234)
> >
> > Many thanks!
> >
> >        Cheers,
> >
> >              Tim
> >
> > On Wed, Jul 31, 2019 at 12:43 PM Hudson (JIRA) <[email protected]> wrote:
> >>
> >>
> >>    [
> https://issues.apache.org/jira/browse/TIKA-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897315#comment-16897315
> ]
> >>
> >> Hudson commented on TIKA-2917:
> >> ------------------------------
> >>
> >> UNSTABLE: Integrated in Jenkins build tika-2.x-windows #446 (See [
> https://builds.apache.org/job/tika-2.x-windows/446/])
> >> TIKA-2917 -- extract metadata that accompanies inline images (tallison:
> rev 86325105ab206dca88d076dc865fcb17404c4531)
> >> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
> >> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
> >> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java
> >> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
> >> * (add)
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDMetadataExtractor.java
> >>
> >>
> >>> Extract metadata from inline images in PDFs
> >>> -------------------------------------------
> >>>
> >>>                Key: TIKA-2917
> >>>                URL: https://issues.apache.org/jira/browse/TIKA-2917
> >>>            Project: Tika
> >>>         Issue Type: Improvement
> >>>           Reporter: Tim Allison
> >>>           Assignee: Tim Allison
> >>>           Priority: Minor
> >>>
> >>> Inline images may have XMP associated with them.  We are not currently
> extracting this metadata.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v7.6.14#76016)
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>

Re: [jira] [Commented] (TIKA-2917) Extract metadata from inline images in PDFs

Reply via email to