[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285568#comment-14285568
 ] 

Tim Allison commented on TIKA-1511:
-----------------------------------

{quote}
A) I think it will work, as the patch works now. But I think an inputStream 
that can not be read is a bit strange.
{quote}
Agreed.  The new proposal is to make the InputStream readable, but the regular 
use case of an AutoDetectParser sent in via ParseContext won't bother to read 
the InputStream, rather, it will "read" the table object and use the 
user-supplied ContentHandler.

{quote}
B) Could it be better to send a xHTML inputStream with markup to client instead 
of simple UTF-8 encoded CSV?
{quote}
We could, but there are other ways of getting that...RecursiveParserWrapper or 
custom recursive embedded parser handler or even just sending in the plain 
AutoDetectParser as the EmbeddedDocumentExtractor/Parser in ParseContext.  The 
idea behind this is to support a ParserContainerExtractor that would normally 
pull just the bytes from embedded documents...because there are no bytes for a 
table object (i.e. it never exists as an actual standalone file), I propose a 
csv proxy.

{quote}
C) I agree, but it will work only if he adds the correct parser (eg TableParser 
or CompositeParser) to ParseContext, right?
{quote}
The user will have to add an AutoDetectParser to the ParseContext, and we will 
need to add org.apache.tika.parser.jdbc.SQLite3Parser
org.apache.tika.parser.jdbc.JDBCTableParser
to the parser services file. 

I have a draft of this proposal working.  The current downside is that if the 
client resets and rereads the InputStream, the blobs/clobs are processed twice 
via the EmbeddedDocumentExtractor.  

Any problems with the above?  Recommendations for an alternate design?

> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>         Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to