[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285568#comment-14285568 ]
Tim Allison commented on TIKA-1511: ----------------------------------- {quote} A) I think it will work, as the patch works now. But I think an inputStream that can not be read is a bit strange. {quote} Agreed. The new proposal is to make the InputStream readable, but the regular use case of an AutoDetectParser sent in via ParseContext won't bother to read the InputStream, rather, it will "read" the table object and use the user-supplied ContentHandler. {quote} B) Could it be better to send a xHTML inputStream with markup to client instead of simple UTF-8 encoded CSV? {quote} We could, but there are other ways of getting that...RecursiveParserWrapper or custom recursive embedded parser handler or even just sending in the plain AutoDetectParser as the EmbeddedDocumentExtractor/Parser in ParseContext. The idea behind this is to support a ParserContainerExtractor that would normally pull just the bytes from embedded documents...because there are no bytes for a table object (i.e. it never exists as an actual standalone file), I propose a csv proxy. {quote} C) I agree, but it will work only if he adds the correct parser (eg TableParser or CompositeParser) to ParseContext, right? {quote} The user will have to add an AutoDetectParser to the ParseContext, and we will need to add org.apache.tika.parser.jdbc.SQLite3Parser org.apache.tika.parser.jdbc.JDBCTableParser to the parser services file. I have a draft of this proposal working. The current downside is that if the client resets and rereads the InputStream, the blobs/clobs are processed twice via the EmbeddedDocumentExtractor. Any problems with the above? Recommendations for an alternate design? > Create a parser for SQLite3 > --------------------------- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Fix For: 1.8 > > Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)