[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison reopened TIKA-1238: ------------------------------- Doh. Reopening until we get the mods to POI and then the updated Tika code after the next POI release. > Update OutlookExtractor to handle codepage identification more rigorously > ------------------------------------------------------------------------- > > Key: TIKA-1238 > URL: https://issues.apache.org/jira/browse/TIKA-1238 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Minor > Fix For: 1.10 > > > Since OutlookExtractor's codepage detection chunk was written, POI's HSMF has > added more robutst capabilities for identifying codepages in Outlook .msg > files. As a first step to integrating those improvements, I'll copy and > paste some of POI's code into OutlookExtractor. As a second step, I'll > expose more of HSMF's capabilities within POI and then factor out the > duplicate code in Tika. -- This message was sent by Atlassian JIRA (v6.3.4#6332)