[ 
https://issues.apache.org/jira/browse/TIKA-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613123#comment-15613123
 ] 

Frank Refol commented on TIKA-2148:
-----------------------------------

Relating TIKA-1761 because it sounds like the same issue. However, reporter 
states that the problem does not occur when the document is created using 
Office 2007. Which is not the same as my experience.

> Tika app is unable to parse a password protected PowerPoint (97-2003) 
> document 
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-2148
>                 URL: https://issues.apache.org/jira/browse/TIKA-2148
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.13
>         Environment: Windows console.
>            Reporter: Frank Refol
>              Labels: Office, PowerPoint
>         Attachments: This is password protected (Created with MS 2003).ppt, 
> This is password protected (Created with MS 2007).ppt, This is password 
> protected (Created with MS 2010).ppt
>
>
> Using the Tika command-line application to extract text from a PowerPoint 
> 97-2003 document fails. Here's the basic command that was used:
> {quote}
> java -jar tika-app-1.13.jar -t --password=password "This is password 
> protected (Created with MS 2003).ppt"
> {quote}
> The following exception is thrown on the console:
> {noformat}
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@62204612
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
>       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
>       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
> Caused by: org.apache.poi.hslf.exceptions.EncryptedPowerPointFileException: 
> PowerPoint file is encrypted. The correct password needs to be set via 
> Biff8EncryptionKey.setCurrentUserPassword()
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowEncrypted.<init>(HSLFSlideShowEncrypted.java:106)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:284)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:275)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:179)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:182)
>       at 
> org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 5 more
> {noformat}
> Note that this happens with a PPT file that is created using Office 2010, 
> Office 2007, or Office 2003.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to