[ 
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167891#comment-15167891
 ] 

Ken Krugler commented on TIKA-1855:
-----------------------------------

The things I don't like about this approach are that (a) core becomes a dumping 
ground for everyone's test data, and (b) it couples module development with the 
core. Plus I'm waiting for the next crazy parser to be added that has 100MB of 
binary test data, which will create an el grande jar that everybody is going to 
be unzipping. So I guess I'd add scalability as another concern.

I haven't looked into where test files wind up, but I'd suspect that many of 
the core tests that wind up needing to be in parsers due to data dependencies 
aren't really the tests that should be run in core. I can see mime-type 
detection being an example of wanting to have one of each, and (maybe) some of 
the app/server tests, so I'd be fine with having a tika-test-corpus (or 
whatever you want to call it) that has a good sampling of docs which are used 
in these situations.

Finally, to make myself really popular, I'd prefer that we use the jar as a 
test dependency (vs. zip/unzip), and for cases where we need to have an actual 
file then use some utility code to extract/create the file.

Maybe we should have a Skype chat to discuss VF2F :)

> TIka 2.0 - Move shared test-code back to tika-core and distribute test files 
> to parser modules
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1855
>                 URL: https://issues.apache.org/jira/browse/TIKA-1855
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>
> Undo TIKA-1851, and divide test docs to appropriate parser modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to