[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277080#comment-14277080 ]
Konstantin Gribov commented on TIKA-1511: ----------------------------------------- [~talli...@mitre.org], working with tables as separate files looks good. Maybe, also migrate excel parsing to same behavior. Having consistent behavior is good from less surprise principle point. Treating BLOBs as embedded document gives library user ability to configure it's detection, parsing and extration via {{ParserContext}}, AFAIK. E. g. Tika user can just detect MIME-type (and, maybe, metadata) when parsing database table. But this lead to one issue, user may want different behavior for different levels of embedded document, e.g. parse first level (table) and only extract metadata for second (blob in some field). For me it'll be a real case in some projects. In such case user may want to pass some {{ParserContext}} or factory for it to {{EmbeddedDocumentExtractor}}. So, such improvement can be done after. > Create a parser for SQLite3 > --------------------------- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Fix For: 1.8 > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)