[ 
https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318748#comment-14318748
 ] 

Tim Allison commented on TIKA-1548:
-----------------------------------

I'll fix our PDFParser to stop any processing if decryption is unsuccessful.  
This will prevent the logging errors among other things.

I was able to replicate the issue with pure PDFBox 1.8.8 on rhel, but not with 
PDFBox-1.8-branch.

{noformat}
java.lang.AssertionError:
Expected: null
     got: "sun.awt.X11FontManager"

        at org.junit.Assert.assertThat(Assert.java:778)
        at org.junit.Assert.assertThat(Assert.java:736)
        at org.apache.pdfbox.TestPropSet.testSystem(TestPropSet.java:29)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
{noformat}


with this:

{noformat}
    @Test
    public void testSystem() throws Exception  {
        Properties props = System.getProperties();
        assertThat(props.get("sun.font.fontmanager"), nullValue());
        String password = "";
        try {
            File pdfFile = new File("encrypted.pdf");
            String txt = extractText(pdfFile, password);

        } catch (Throwable e) {
        }
        assertThat(props.get("sun.font.fontmanager"), nullValue());
    }

    private String extractText(File pdfFile, String password) throws Exception {
        PDDocument document =
                PDDocument.load(pdfFile);
        PDFTextStripper stripper = null;
            stripper = new PDFTextStripper("UTF-8");
        StringWriter writer = new StringWriter();
        stripper.writeText(document, writer);
        return writer.toString();
    }
{noformat}

> System property added while catching exception on parsing PDF encrypted doc
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-1548
>                 URL: https://issues.apache.org/jira/browse/TIKA-1548
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.7
>         Environment: Mac OS 10.10.2
> java version "1.7.0_60"
>            Reporter: David Pilato
>
> I'm using Tika 1.7. I'm parsing an encrypted PDF document which raise an 
> exception. So far, so good.
> My concern is that after that I have a new System property set 
> {{sun.font.CFontManager}}. 
> Code to reproduce the error:
> {code:java}
> @Test
> public void testSystem() {
>     Properties props = System.getProperties();
>     assertThat(props.get("sun.font.fontmanager"), nullValue());
>     try {
>         tika().parseToString(new 
> URL("https://github.com/elasticsearch/elasticsearch-mapper-attachments/raw/master/src/test/resources/org/elasticsearch/index/mapper/xcontent/encrypted.pdf";));
>     } catch (Throwable e) {
>     }
>     assertThat(props.get("sun.font.fontmanager"), nullValue());
> }
> {code}
> With Tika 1.7:
> {code}
> [2015-02-11 16:43:36,166][INFO ][org.apache.pdfbox.pdfparser.PDFParser] 
> Document is encrypted
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,837][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,838][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,839][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,840][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,841][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> [2015-02-11 16:43:36,842][ERROR][org.apache.pdfbox.filter.FlateFilter] 
> FlateFilter: stop reading corrupt stream due to a DataFormatException
> java.lang.AssertionError: 
> Expected: null
>      but: was "sun.font.CFontManager"
>  <Click to see difference>
>       at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>       at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
>       at 
> org.elasticsearch.plugin.mapper.attachments.test.TikaSystemTest.testSystem(TikaSystemTest.java:41)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>       at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
>       at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
>       at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
>       at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> {code}
> With Tika 1.6:
> {code}
> [2015-02-11 16:38:42,922][INFO ][org.apache.pdfbox.pdfparser.PDFParser] 
> Document is encrypted
> {code}
> Note also that it logs a lot of errors which was not the case in Tika 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to