[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298416#comment-14298416
 ] 

Nick Burch commented on TIKA-1511:
----------------------------------

Few minor things on Tim's github branch for this - I'm seeing some wildcard 
imports being added, and some assertContains being replaced with 
assertTrue(str.contains) - the latter doesn't give as helpful an exception for 
the assert failing. Does the branch need updating, or are there spurious 
changes that've come in?

I've had a quick look at the diff to the branch, but not a full one. My initial 
impression is that there was more logic than I'd expected in 
JDBCResultSetInputStream and JDBCRowReader, but necessarily a problematic 
amount. I'm still not entirely sure of the idea that depending on how you 
access the embedded stream, you get different behaviour. If you have a Word 
document embedded in a PDF, the embedded stream doesn't say "I'll give you Word 
if you ask one way, Plain Text if you ask another", it just says "here's the 
content type, you'll need to find a suitable parser or fail trying"

For the specific use case of "something that iterates through a file, dumping 
out all embedded resources without parsing them", if we do support it for these 
JDBC tables (I'm tempted to say for that use case we don't return anything for 
the table), we could just have a special case wrapper which parses to HTML as 
normal and returns that, rather than messing around with "maybe html via jdbc, 
maybe magically csv"

Also, it'd be good if we could have implementations for 2 different jdbc-based 
formats if we can. That should help us verify we've got the split between 
abstract jdbc and sqlite parts correct!

> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>         Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, 
> testSQLLite3b.db, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to