[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280433#comment-14280433
 ] 

Tim Allison commented on TIKA-1511:
-----------------------------------

Hmmmm... This will fail if someone sends in a custom EmbeddedDocumentExtractor 
because there is no way to pass the StatementTablePair to that interface via 
ParseContext. 

Some options:
1) We could go back to treating the db as one big doc, as we do with xls, but I 
think I'd prefer to treat each table as a separate doc.

2) We could get rid of the StatementTablePair hack, extract the text from each 
table into a String and then pass that into EmbeddedDocumentExtractor as the 
InputStream.  The drawback to this is that we'd ignore the handler and lose 
potential <tr> <td> markup....

 Any ideas on this?

> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>         Attachments: TIKA-1511v1.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to