I’d like to see the discriminators on the parsers be more about the type of
parser, and what it’s going to drag along/impact my system with, and these
names reflect more the history of Tika’s evolution.
Starting with the descriptive paragraphs, here is some brainstorming of names:
with the exception of optional OCR, these
should be lightish weight dependencies in pure java with no
parsers/resources that require network calls.
—tika-parsers-files
—tika-parsers-alljava
—tika-parsers-local
—tika-parsers-simple
—tika-parsers-lightweight
—tika-parsers-aluminum
these can require native libs and/or have
heavier dependencies, including network calls.
—tika-parsers-heavy
—tika-parsers-complex
—tika-parsers-extended-dependencies
—tika-parsers-iron
anything goes. dl4j as a dependency, etc.
—tika-parsers-anything-goes
—tika-parsers-sandbox
—tika-parsers-deep
—tika-parsers-model-driven
—tika-parsers-lead
> On Mar 9, 2021, at 12:03 PM, Tim Allison <[email protected]> wrote:
>
> All,
> I was recently chatting about Tika 2.x with some Tika friends and
> they had some hesitation about the names for the three high level
> parser modules.
>
> They are currently:
>
> tika-parsers-classic
> tika-parsers-extended
> tika-parsers-advanced
>
> The quibbles weren't with the delineation, but with the naming.
>
> In my mind, this is what I've been thinking as definitions:
>
> tika-parsers-classic -- with the exception of optional OCR, these
> should be lightish weight dependencies in pure java with no
> parsers/resources that require network calls.
>
> tika-parsers-extended -- these can require native libs and/or have
> heavier dependencies, including network calls.
>
> tika-parsers-advanced -- anything goes. dl4j as a dependency, etc.
>
> Some options for classic-> basic, base, ...what else?
>
> Any other recommendations for these names? Thank you!
>
> Best,
>
> Tim
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> |
My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless of whether
attachments are marked as such.