[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090285#comment-15090285 ]
Bob Paulin edited comment on TIKA-1824 at 1/9/16 1:43 AM: ---------------------------------------------------------- * Perhaps rename artifact names in parser sub-components to include "Parser(s?)", e.g. Apache Tika Parser Advanced Module so that the names sort more clearly (at least in the maven window in Intellij)? I think I felt it was redundant but in a maven repo it could be helpful so I can make that change. * Perhaps add "parser(s?) to the artifactId, e.g. tika-parser-cad-module Same as above. * Perhaps lowercase names in parser-subcomponents so that they're inline with legacy: "Apache Tika parser advanced module" I think I'm missing where this convention is coming from. * Pkcs7Parser ... should that be under advanced...or somewhere else ...own crypto package? So I don't feel strongly that it needs to be under advanced but I do want to be careful not to over do the number of modules. Do you feel crypto has room for growth or is this just going to forever be a one parser project? * iwork ...should we move that to office? Actually I had it that way initially. Issue is the iwork parser is used inside of the ZipContainerDetector which makes the dependency graph awkward. We would need to find a why to break that dependency to make this work. * tika-test-resources...should we move TikaTest into that and change the name to tika-test? I have a vague memory of wanting to carve out a separate test package earlier and adding TikaTest and something else... I think it could work in tika-core or tika-test. I don't think I feel strongly either way. * OutlookPSTParser...move that to office? I'd like to keep this class with all the other mbox classes. Maybe me mbox to office? * Does MBox belong in web? Not sure where to put it? Move to office? * Move CommonsDigester to core if we're willing to add a dependency on commons-codec into core? I'm fine with this. * Move Activator to tika-bundle? I believe tika-bundle already has an activator. Could just remove this. * Move pot to multimedia or add tika-parsers-multimedia-advanced-module? Not sure I understand POT in multimedia. Can you elaborate? * Move geo.topic to "advanced"...perhaps we rename "advanced" to ner? Is ner only applied to geo? My understanding of this domain is limited * Move ctakes to "advanced/ner"? Again my understanding of the domain is limited on what ctakes fits with. * Collapse web and text? Not sure I like that since a number of modules depend on text but not web. Seems like we'd be adding a lot of needless dependencies. was (Author: bobpaulin): * Perhaps rename artifact names in parser sub-components to include "Parser(s?)", e.g. Apache Tika Parser Advanced Module so that the names sort more clearly (at least in the maven window in Intellij)? I think I felt it was redundant but in a maven repo it could be helpful so I can make that change. * Perhaps add "parser(s?) to the artifactId, e.g. tika-parser-cad-module Same as above. * Perhaps lowercase names in parser-subcomponents so that they're inline with legacy: "Apache Tika parser advanced module" I think I'm missing where this convention is coming from. * Pkcs7Parser ... should that be under advanced...or somewhere else ...own crypto package? So I don't feel strongly that it needs to be under advanced but I do want to be careful not to over do the number of modules. Do you feel crypto has room for growth or is this just going to forever be a one parser project? * iwork ...should we move that to office? I think it could fit there too. No issues moving. * tika-test-resources...should we move TikaTest into that and change the name to tika-test? I have a vague memory of wanting to carve out a separate test package earlier and adding TikaTest and something else... I think it could work in tika-core or tika-test. I don't think I feel strongly either way. * OutlookPSTParser...move that to office? I'd like to keep this class with all the other mbox classes. Maybe me mbox to office? * Does MBox belong in web? Not sure where to put it? Move to office? * Move CommonsDigester to core if we're willing to add a dependency on commons-codec into core? I'm fine with this. * Move Activator to tika-bundle? I believe tika-bundle already has an activator. Could just remove this. * Move pot to multimedia or add tika-parsers-multimedia-advanced-module? Not sure I understand POT in multimedia. Can you elaborate? * Move geo.topic to "advanced"...perhaps we rename "advanced" to ner? Is ner only applied to geo? My understanding of this domain is limited * Move ctakes to "advanced/ner"? Again my understanding of the domain is limited on what ctakes fits with. * Collapse web and text? Not sure I like that since a number of modules depend on text but not web. Seems like we'd be adding a lot of needless dependencies. > Tika 2.0 - Create Initial Parser Modules > ----------------------------------------- > > Key: TIKA-1824 > URL: https://issues.apache.org/jira/browse/TIKA-1824 > Project: Tika > Issue Type: Improvement > Affects Versions: 2.0 > Reporter: Bob Paulin > Assignee: Bob Paulin > > Create initial break down of parser modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)