Hi Tim, > On Nov 20, 2020, at 11:21 AM, Tim Allison <[email protected]> wrote: > > Y. That should do it. I don't think we're currently documenting this. It > looks like POI and PDFBox also require jce unlimited to build. > > Hmmm... Should we assumeTrue that jce is installed and then skip that unit > test if not or do we want to require it to build Tika?
I think we should document that when building, it’s required to install the JCE Unlimited Strength Jurisdiction Policy Files. For Java 8 on my Mac, this worked: 1. Go to https://www.oracle.com/java/technologies/javase-jce8-downloads.html, and click the download link. 2. Sign in with your Oracle account, accept the license, and wait for the (small) file to download. 3. Expand the downloaded zip From a terminal: > sudo cp ~/Downloads/UnlimitedJCEPolicyJDK8/US_export_policy.jar > $JAVA_HOME/jre/lib/security/ > sudo cp ~/Downloads/UnlimitedJCEPolicyJDK8/local_policy.jar > $JAVA_HOME/jre/lib/security/ — Ken > > On Fri, Nov 20, 2020 at 1:43 PM Ken Krugler <[email protected]> > wrote: > >> Hi all, >> >> I was trying to build the 1.25-rc1 branch, and ran into this same issue >> while building the Tika parsers: >> >>> Tests run: 87, Failures: 0, Errors: 1, Skipped: 3, Time elapsed: 6.816 s >> <<< FAILURE! - in org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest >>> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted >> Time elapsed: 0.286 s <<< ERROR! >>> org.apache.tika.exception.TikaException: Unexpected RuntimeException >> from org.apache.tika.parser.microsoft.OfficeParser@c0de6c9 >>> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1120) >>> Caused by: org.apache.poi.EncryptedDocumentException: Export >> Restrictions in place - please install JCE Unlimited Strength Jurisdiction >> Policy files >>> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1120) >> >> I assume I need to follow instructions at say >> https://dzone.com/articles/install-java-cryptography-extension-jce-unlimited >> to get the appropriate files installed, yes? >> >> And is this documented for Tika somewhere? >> >> Thanks, >> >> — Ken >> >> >>> On Jul 31, 2019, at 9:45 AM, Tim Allison <[email protected]> wrote: >>> >>> Dave, >>> So that I can fix stuff in the future...can you share with me how to >>> fix this issue on Hudson? >>> >>> org.apache.tika.parser.microsoft.OfficeParser@6f1fd7c1 >>> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1234) >>> Caused by: org.apache.poi.EncryptedDocumentException: Export >>> Restrictions in place - please install JCE Unlimited Strength >>> Jurisdiction Policy files >>> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest.testEncrypted(OOXMLParserTest.java:1234) >>> >>> Many thanks! >>> >>> Cheers, >>> >>> Tim >>> >>> On Wed, Jul 31, 2019 at 12:43 PM Hudson (JIRA) <[email protected]> wrote: >>>> >>>> >>>> [ >> https://issues.apache.org/jira/browse/TIKA-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897315#comment-16897315 >> ] >>>> >>>> Hudson commented on TIKA-2917: >>>> ------------------------------ >>>> >>>> UNSTABLE: Integrated in Jenkins build tika-2.x-windows #446 (See [ >> https://builds.apache.org/job/tika-2.x-windows/446/]) >>>> TIKA-2917 -- extract metadata that accompanies inline images (tallison: >> rev 86325105ab206dca88d076dc865fcb17404c4531) >>>> * (edit) >> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java >>>> * (edit) >> tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java >>>> * (edit) >> tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java >>>> * (edit) >> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java >>>> * (add) >> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDMetadataExtractor.java >>>> >>>> >>>>> Extract metadata from inline images in PDFs >>>>> ------------------------------------------- >>>>> >>>>> Key: TIKA-2917 >>>>> URL: https://issues.apache.org/jira/browse/TIKA-2917 >>>>> Project: Tika >>>>> Issue Type: Improvement >>>>> Reporter: Tim Allison >>>>> Assignee: Tim Allison >>>>> Priority: Minor >>>>> >>>>> Inline images may have XMP associated with them. We are not currently >> extracting this metadata. >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v7.6.14#76016) >> >> -------------------------- >> Ken Krugler >> http://www.scaleunlimited.com >> custom big data solutions & training >> Hadoop, Cascading, Cassandra & Solr >> >> -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
