[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277080#comment-14277080
 ] 

Konstantin Gribov commented on TIKA-1511:
-----------------------------------------

[~talli...@mitre.org], working with tables as separate files looks good. Maybe, 
also migrate excel parsing to same behavior. Having consistent behavior is good 
from less surprise principle point.

Treating BLOBs as embedded document gives library user ability to configure 
it's detection, parsing and extration via {{ParserContext}}, AFAIK. E. g. Tika 
user can just detect MIME-type (and, maybe, metadata) when parsing database 
table.

But this lead to one issue, user may want different behavior for different 
levels of embedded document, e.g. parse first level (table) and only extract 
metadata for second (blob in some field). For me it'll be a real case in some 
projects. In such case user may want to pass some {{ParserContext}} or factory 
for it to {{EmbeddedDocumentExtractor}}. So, such improvement can be done after.

> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to