[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167891#comment-15167891 ]
Ken Krugler commented on TIKA-1855: ----------------------------------- The things I don't like about this approach are that (a) core becomes a dumping ground for everyone's test data, and (b) it couples module development with the core. Plus I'm waiting for the next crazy parser to be added that has 100MB of binary test data, which will create an el grande jar that everybody is going to be unzipping. So I guess I'd add scalability as another concern. I haven't looked into where test files wind up, but I'd suspect that many of the core tests that wind up needing to be in parsers due to data dependencies aren't really the tests that should be run in core. I can see mime-type detection being an example of wanting to have one of each, and (maybe) some of the app/server tests, so I'd be fine with having a tika-test-corpus (or whatever you want to call it) that has a good sampling of docs which are used in these situations. Finally, to make myself really popular, I'd prefer that we use the jar as a test dependency (vs. zip/unzip), and for cases where we need to have an actual file then use some utility code to extract/create the file. Maybe we should have a Skype chat to discuss VF2F :) > TIka 2.0 - Move shared test-code back to tika-core and distribute test files > to parser modules > ---------------------------------------------------------------------------------------------- > > Key: TIKA-1855 > URL: https://issues.apache.org/jira/browse/TIKA-1855 > Project: Tika > Issue Type: Sub-task > Reporter: Tim Allison > Assignee: Tim Allison > > Undo TIKA-1851, and divide test docs to appropriate parser modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)