[ 
https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734531#comment-14734531
 ] 

Nick Burch commented on TIKA-1728:
----------------------------------

Detection of the v5 file is handled by the OLE2 container-aware detector. We 
can't do it with magic, as there is no predictable place in the file to look 
for some unique bytes

I think we still need to keep one of the formats as {{application/x-hwp}}, as 
that's what most other libraries/programs use. Just need to pick which to make 
the default

If you're able to put some time into building a parser with java-hwp, that'd be 
great! Probably best as a different jira though to track that

> Detection is not working properly for detecting HWP 5.0 file
> ------------------------------------------------------------
>
>                 Key: TIKA-1728
>                 URL: https://issues.apache.org/jira/browse/TIKA-1728
>             Project: Tika
>          Issue Type: Bug
>         Environment: OS: windows 7 and centos 6
> Java: 1.7
> Tika jar: tika-app-1.10.jar
> File: HWP 5.0
>            Reporter: mungeol heo
>         Attachments: HWP-document-file-formats-3.0-Korean.pdf, 
> HWP-document-file-formats-5.0-Korean.pdf, error-message.png, test_3.0.hwp, 
> test_5.0.hwp
>
>
> HWP file has two formats which are HWP 3.0 and HWP 5.0.
> 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly.
> But, not for HWP 5.0.
> Used commands and returned results are addresses below.
> > java -jar tika-app-1.10.jar --detect test_3.0.hwp
> > application/x-hwp
> > java -jar tika-app-1.10.jar --detect test_5.0.hwp
> > application/x-tika-msoffice



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to