Update title for this thread...

It has been 7 months since we had consensus to move to Java 11 for 3.x
[0]. Should we reopen the discussion of moving to Java 17 for 3.x as
proposed by Eric, or should we stick with the Java 11 plan for now?

[0] https://lists.apache.org/thread/c330b12h1fvmq8x1099mgw3tfs0gcp6q



On Mon, Apr 8, 2024 at 12:09 PM Tim Allison <talli...@apache.org> wrote:
>
> From October 2023:
> https://www.brilworks.com/blog/java-11-countdown-to-end-of-support/
>
> Getting 3.x out has taken longer than I had anticipated. Should we
> reopen the 17 vs 11 discussion given Eric's input? Or do we continue
> with the plan to target 11 in 3x for the foreseeable future?
>
> On Mon, Apr 8, 2024 at 9:22 AM Eric Pugh
> <ep...@opensourceconnections.com> wrote:
> >
> > Time to move on?   Lucene 10 will be on 17+, Solr 10 will be on 17+, 
> > OpenNLP is already there….    Java 11 is EOL and has been for a while….
> >
> > Any other file parsers that are being optimized to take advantage of the 
> > newer features that are in recent Java versions that we know about?
> >
> > > On Apr 8, 2024, at 7:02 AM, Tim Allison <talli...@apache.org> wrote:
> > >
> > > Sorry, more correctly:
> > >
> > > OpenNLP is effectively EOL'd for our 3.x because OpenNLP >= 2.3.0
> > > requires Java 17 and our 3.x is still on 11.
> > >
> > > On Mon, Apr 8, 2024 at 6:30 AM Tim Allison <talli...@apache.org> wrote:
> > >>
> > >> All,
> > >>  As Brian pointed out, optimaize is no longer maintained, and it has
> > >> some dependencies that have aged out. Should we replace our baseline
> > >> langdetect in tika-app and tika-server in 3.x?
> > >>  I'd say that we should go with our OpenNLP based language detection,
> > >> but that, too, is effectively EOL'd because OpenNLP >= 2.3.0 requires
> > >> Java 17.
> > >>  Thoughts?
> > >>
> > >>            Best,
> > >>
> > >>                Tim
> > >>
> > >> ---------- Forwarded message ---------
> > >> From: Brian Laskey <blas...@us.ibm.com>
> > >> Date: Fri, Mar 8, 2024 at 2:38 PM
> > >> Subject: RE: Replacing full tika-app.jar to directly using tiki-core /
> > >> and parsers
> > >> To: u...@tika.apache.org <u...@tika.apache.org>
> > >>
> > >>
> > >> Hi Tim
> > >>
> > >>
> > >>
> > >> Thanks this is helpful.
> > >>
> > >>
> > >>
> > >> For tika-app we found the dependency on org.apache.tika »
> > >> tika-langdetect-optimaize brings in some older 3rd party jars, and
> > >> unfortunately it appears that the com.optimaize.languagedetector »
> > >> language-detector 0.6 is unmaintained so it’s dependencies on
> > >> vulnerable versions of guava (18.0) cause us problems with security
> > >> scans. I could be wrong but I don’t believe we need this component for
> > >> our usage of just detect and parse?
> > >>
> > >>
> > >>
> > >> We have a sort of microservice process (java based) which is ingesting
> > >> files parsed from tika. It was nice that we could separate the tika
> > >> process in it’s own heap space as a separate java process rather than
> > >> adding it to our app, but I suppose we could work around that
> > >>
> > >>
> > >>
> > >> Thank you
> > >>
> > >> Brian Laskey
> > >>
> > >>
> > >>
> > >> From: Tim Allison <talli...@apache.org>
> > >> Reply-To: "u...@tika.apache.org" <u...@tika.apache.org>
> > >> Date: Friday, March 8, 2024 at 9:44 AM
> > >> To: "u...@tika.apache.org" <u...@tika.apache.org>
> > >> Subject: [EXTERNAL] Re: Replacing full tika-app.jar to directly using
> > >> tiki-core / and parsers
> > >>
> > >>
> > >>
> > >> Hi Brian, A few thoughts: 1) tika-app is basically tika-core +
> > >> tika-parsers-standard-package. Which components are you trying to
> > >> avoid? tika-serialization and jackson? boilerpipecontenthandler and
> > >> some of its dependencies? I ask, because we
> > >>
> > >> Hi Brian,
> > >>
> > >>  A few thoughts:
> > >>
> > >>
> > >>
> > >> 1) tika-app is basically tika-core + tika-parsers-standard-package.
> > >> Which components are you trying to avoid? tika-serialization and
> > >> jackson? boilerpipecontenthandler and some of its dependencies? I ask,
> > >> because we could factor out a tika-app-core with no parsers in Tika
> > >> 3.x, which is what we do now with tika-server-core and
> > >> tika-server-standard.
> > >>
> > >>
> > >>
> > >> 2) Unrelated, there are probably more efficient ways of running Tika
> > >> than calling it per file on the commandline. That is a robust option,
> > >> at least!
> > >>
> > >>
> > >>
> > >> If all you want is detect and text extraction, and you want to run it
> > >> from the commandline, write two classes, whose main()s call:
> > >>
> > >> System.out.println(Tika.detect(File f));
> > >>
> > >>
> > >>
> > >> or
> > >>
> > >>
> > >>
> > >> System.out.println(Tika.parseToString(File f))
> > >>
> > >>
> > >>
> > >> On Thu, Mar 7, 2024 at 5:04 PM Brian Laskey <blas...@us.ibm.com> wrote:
> > >>
> > >> Hello Tika community,
> > >>
> > >>
> > >>
> > >> Our team is migrating away from usage of tika-app.jar (2.6 currently)
> > >> to something with more minimal third party dependencies which we can
> > >> control.
> > >>
> > >>
> > >>
> > >> Is there any good documentation or pathway to describe how a team
> > >> could map the tika-app functionality we use to the same behavior using
> > >> just tika-core and tika-parsers-standard-package
> > >>
> > >> (I assume)?
> > >>
> > >>
> > >>
> > >> The tika-app functions we use today are:
> > >>
> > >>
> > >>
> > >> Mime-type detection
> > >>
> > >> java -jar tika-app.jar -d <file>
> > >>
> > >>
> > >>
> > >> and
> > >>
> > >> Text extraction attempts
> > >>
> > >> java -jar tika-app.jar -t <file>
> > >>
> > >>
> > >>
> > >> Is there a subset of tika parser jars we would need to include to have
> > >> equivalent functionality if we wrote our own wrapper main class?
> > >>
> > >>
> > >>
> > >> Thank you,
> > >>
> > >> Brian Laskey
> >
> > _______________________
> > Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | 
> > http://www.opensourceconnections.com 
> > <http://www.opensourceconnections.com/> | My Free/Busy 
> > <http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> > This e-mail and all contents, including attachments, is considered to be 
> > Company Confidential unless explicitly stated otherwise, regardless of 
> > whether attachments are marked as such.
> >

Reply via email to