[ 
https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090285#comment-15090285
 ] 

Bob Paulin commented on TIKA-1824:
----------------------------------

* Perhaps rename artifact names in parser sub-components to include 
"Parser(s?)", e.g. Apache Tika Parser Advanced Module so that the names sort 
more clearly (at least in the maven window in Intellij)?

I think I felt it was redundant but in a maven repo it could be helpful so I 
can make that change.

* Perhaps add "parser(s?) to the artifactId, e.g. tika-parser-cad-module

Same as above.

* Perhaps lowercase names in parser-subcomponents so that they're inline with 
legacy: "Apache Tika parser advanced module"

I think I'm missing where this convention is coming from.

* Pkcs7Parser ... should that be under advanced...or somewhere else ...own 
crypto package?

So I don't feel strongly that it needs to be under advanced but I do want to be 
careful not to over do the number of modules.  Do you feel crypto has room for 
growth or is this just going to forever be a one parser project?  

* iwork ...should we move that to office?

I think it could fit there too.  No issues moving.

* tika-test-resources...should we move TikaTest into that and change the name 
to tika-test? I have a vague memory of wanting to carve out a separate test 
package earlier and adding TikaTest and something else...

I think it could work in tika-core or tika-test.  I don't think I feel strongly 
either way.

* OutlookPSTParser...move that to office?

I'd like to keep this class with all the other mbox classes.  Maybe me mbox to 
office?

* Does MBox belong in web? Not sure where to put it?

Move to office?

* Move CommonsDigester to core if we're willing to add a dependency on 
commons-codec into core?

I'm fine with this.

* Move Activator to tika-bundle?

I believe tika-bundle already has an activator.  Could just remove this.

* Move pot to multimedia or add tika-parsers-multimedia-advanced-module?

Not sure I understand POT in multimedia.  Can you elaborate?

* Move geo.topic to "advanced"...perhaps we rename "advanced" to ner?

Is ner only applied to geo?  My understanding of this domain is limited

* Move ctakes to "advanced/ner"?

Again my understanding of the domain is limited on what ctakes fits with.


* Collapse web and text?

Not sure I like that since a number of modules depend on text but not web.  
Seems like we'd be adding a lot of needless dependencies.

> Tika 2.0 -  Create Initial Parser Modules
> -----------------------------------------
>
>                 Key: TIKA-1824
>                 URL: https://issues.apache.org/jira/browse/TIKA-1824
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 2.0
>            Reporter: Bob Paulin
>            Assignee: Bob Paulin
>
> Create initial break down of parser modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to