[ https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308429#comment-16308429 ]
Nick Burch commented on TIKA-2462: ---------------------------------- While we wait for the re-license to go through, I've had a look at writing a parser. Outputting as CSV is very easy, as they've got a great class to do all the work. SAX events of a HTML table will be trickier, as the logic to format a raw value in a given column to "a string of how it looks in SAS" is currently in a private method. I've raised [#24|https://github.com/epam/parso/issues/24] to see if that can be refactored out, to avoid us needing to duplicate lots of their code Tika questions on column metadata, test files etc still remain for us though! > Add a parser for sas7bdat > ------------------------- > > Key: TIKA-2462 > URL: https://issues.apache.org/jira/browse/TIKA-2462 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > > EPAM recently agreed to migrate to Apache 2.0 so that we can incorporate > parso into Tika for sas7bdat files: https://github.com/epam/parso/issues/19 > !!! -- This message was sent by Atlassian JIRA (v6.4.14#64029)