[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737990#comment-14737990 ]
mungeol heo edited comment on TIKA-1731 at 9/10/15 1:52 AM: ------------------------------------------------------------ {quote}did hwp ever go the ooxml route after its OLE phase{quote} After a little search, I think it did. {quote}does it diverge from standard ooxml at all{quote} It supports microsoft OOXML(office open XML). You can load OOXML document or store as OOXML format from HWP editor. (I am not sure whether this information helps) For instance loading ms-doc file or store as ms-doc file. {quote}can Tika+POI as they are handle it{quote} I think so? since the author of java-hwp says he used apache POI's POIFS file system for handling compound file of HWP 5.0. was (Author: mungeol): {quote}did hwp ever go the ooxml route after its OLE phase{quote} After a little search, I think it did. {quote}does it diverge from standard ooxml at all{quote} It supports microsoft OOXML(office open XML). You can load OOXML document or store as OOXML format from HWP editor. (I am not sure whether this information helps) For instance loading ms-doc file or store as ms-doc file. {quote}can Tika+POI as they are handle it{quote} I think so(?) since the author of java-hwp says he used apache POI's POIFS file system for handling compound file of HWP 5.0. > Try to integrate java-hwp into Tika > ----------------------------------- > > Key: TIKA-1731 > URL: https://issues.apache.org/jira/browse/TIKA-1731 > Project: Tika > Issue Type: New Feature > Reporter: Tim Allison > Priority: Minor > > Now that we have detection working for hwp files, it would be great to add a > parser. > [java-hwp|https://github.com/ddoleye/java-hwp] looks like a promising > candidate. We'd need to ask ddoleye about a potential change in license and > then interest in maintenance + pushing to maven. > Any other candidates? -- This message was sent by Atlassian JIRA (v6.3.4#6332)