[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280345#comment-14280345
 ] 

Tim Allison edited comment on TIKA-1511 at 1/16/15 3:08 PM:
------------------------------------------------------------

First draft of patch attached.  Need to build out tests, obviously, and I'll 
fix spelling of SQLLite in the class names! :)

For the design, I created a public parser that called a new *DBParser class for 
each call to parse (like many other parsers) to avoid thread safety issues. 

The *DBParser, in turn, calls the EmbeddedDocumentExtractor for each table, and 
it specifies via special mime-type, which *TableParser will be called. 

The *TableParser ignores the empty InputStream, and grabs the 
StatementTablePair from the ParseContext to parse each table.

Also, as part of the design, the EmbeddedDocumentExtractor is called for each 
BLOB and each CLOB.

The jdbc wrapper around sqlite is not able to read CLOBs (apparently?), 
although I could write them without exception (doesn't mean they were actually 
written), and it does some other stuff that is not standard JDBC, but that is 
all handled in SQLiteTableParser, a subclass of AbstractTableParser.

Any and all feedback is welcomed.  This is still drafty.



was (Author: talli...@mitre.org):
First draft of patch attached.  Need to build out tests, obviously, and I'll 
fix spelling of SQLLite in the class names! :)

For the design, I had to create a public parser that called a new *DBParser 
class for each call to parse (like many other parsers) to avoid thread safety 
issues. 

The *DBParser, in turn, calls the EmbeddedDocumentParser for each table, and it 
specifies via special mime-type, which *TableParser will be called. 

The *TableParser ignores the InputStream, and grabs the StatementTablePair from 
the ParseContext to parse each table.

The jdbc wrapper around sqlite is not able to read CLOBs (apparently?), 
although I could write them without exception (doesn't mean they were actually 
written), and it does some other stuff that is not standard JDBC, but that is 
all handled in SQLiteTableParser, a subclass of AbstractTableParser.

Any and all feedback is welcomed.  This is still drafty.


> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>         Attachments: TIKA-1511v1.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to