[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298416#comment-14298416 ]
Nick Burch commented on TIKA-1511: ---------------------------------- Few minor things on Tim's github branch for this - I'm seeing some wildcard imports being added, and some assertContains being replaced with assertTrue(str.contains) - the latter doesn't give as helpful an exception for the assert failing. Does the branch need updating, or are there spurious changes that've come in? I've had a quick look at the diff to the branch, but not a full one. My initial impression is that there was more logic than I'd expected in JDBCResultSetInputStream and JDBCRowReader, but necessarily a problematic amount. I'm still not entirely sure of the idea that depending on how you access the embedded stream, you get different behaviour. If you have a Word document embedded in a PDF, the embedded stream doesn't say "I'll give you Word if you ask one way, Plain Text if you ask another", it just says "here's the content type, you'll need to find a suitable parser or fail trying" For the specific use case of "something that iterates through a file, dumping out all embedded resources without parsing them", if we do support it for these JDBC tables (I'm tempted to say for that use case we don't return anything for the table), we could just have a special case wrapper which parses to HTML as normal and returns that, rather than messing around with "maybe html via jdbc, maybe magically csv" Also, it'd be good if we could have implementations for 2 different jdbc-based formats if we can. That should help us verify we've got the split between abstract jdbc and sqlite parts correct! > Create a parser for SQLite3 > --------------------------- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Fix For: 1.8 > > Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, > testSQLLite3b.db, testSQLLite3b.db > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)