[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318232#comment-14318232 ]
Tim Allison commented on TIKA-1511: ----------------------------------- Bottom line: it will be simpler to treat the full db with all tables as one big file. We can still treat clobs and blobs as embedded documents. Details: When I tried to cut out the {{JDBCInputStream}} and just send in a zero byte {{InputStream}}, regular parsing worked properly. However, if a user tries to use a {{ParserContainerExtractor}}, that fails to reach the BLOBs because of this: {code} MediaType type = detector.detect(tis, metadata); if (extractor == null) { // Let the handler process the embedded resource handler.handle(filename, type, tis); } else { // Use a temporary file to process the stream twice File file = tis.getFile(); // Let the handler process the embedded resource InputStream input = TikaInputStream.get(file); try { handler.handle(filename, type, input); } finally { input.close(); } // Recurse extractor.extract(tis, extractor, handler); } {code} When the extractor is called below the {{//Recurse}} comment, it only sees the zero-byte {{TikaInputStream}}. It does not see the {{type}} or the {{metadata}}. So, in the case of {{AutoDetectParser}}, it only sees a zero byte {{InputStream}} and therefore detects it as {{application/octet-stream}}. In short, there is no current way to pass the detected type through to the extractor. We could, of course, add a parameter for {{type}} or {{metadata}} to the ParserContainerExtractor's {{extract}} signature... > Create a parser for SQLite3 > --------------------------- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Fix For: 1.8 > > Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, > testSQLLite3b.db, testSQLLite3b.db > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)