[ 
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768105#comment-15768105
 ] 

Tim Allison commented on TIKA-1946:
-----------------------------------

If we look at 978849.wp and 506544.wp, both are apparently the same version as 
our testWordPerfect.wpd:
{{filetype=10, productype=1, minorversion=1}}.  If we revert readWP() to 
read(), the parse finishes without exception, but the content is corrupt -- 
{{รงรค}}.

If the length that is stored in the header is meant to be close or even to 
equal the actual file length (testWordPerfect.wpd has a stored file size of 
2395, but an actual file size of 2044), then something may already be going 
wrong in the header.  File 506544.wp stores 17825842 as its length, but the 
file is actually only 3117.  File 978849.wp has a stored length of 50, but an 
actual length of 1389.

> Add mime detection and parser for WordPerfect
> ---------------------------------------------
>
>                 Key: TIKA-1946
>                 URL: https://issues.apache.org/jira/browse/TIKA-1946
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>            Reporter: Nick C
>             Fix For: 2.0, 1.15
>
>
> I noticed some code on github for parsing WordPerfect files 
> (https://github.com/Norconex/importer) Also looks like the author 
> [~pascal.essiembre] has contributed to Tika before



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to