[ 
https://issues.apache.org/jira/browse/TIKA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726598#comment-14726598
 ] 

mungeol heo commented on TIKA-330:
----------------------------------

HWP file has two file formats now which are HWP 3.0 and HWP 5.0.
The signature string start with "HWP Document File V" only can detect HWP 3.0.
It should be changed to "HWP Document File" for detecting both version of file 
formats of HWP file.

> Better HWP (Hangul Word Processor) detection pattern
> ----------------------------------------------------
>
>                 Key: TIKA-330
>                 URL: https://issues.apache.org/jira/browse/TIKA-330
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.6
>
>
> The current magic byte pattern we have for the HWP (Hangul Word Processor, 
> application/x-hwp) file format matches also the test-outlook.msg test file we 
> have. I looked for a better detection pattern and found one from 
> OpenOffice.org.
> The hwpfilter/source/hwpfile.cpp file suggests that all HWP files start with 
> the signature string "HWP Document File V", so I'll change the detection 
> pattern accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to