The usecase is really when embedding Tika and transitive dependencies.  I
prefer the Tika 2 modular approach as it pulls in less jars, however, I
don't have some much control over my existing version of PDFBox.  I will
explore using Tika Server!

On 25 October 2017 at 17:44, Allison, Timothy B. <talli...@mitre.org> wrote:

> Sorry, Tika 2.0 will require PDFBox 2.x at least.  There were some
> breaking changes btwn PDFBox 1.x and 2.x, and our PDFParser relies on 2.x
> now.
>
> Is there something in PDFBox 1.8.x that you need that doesn't exist in 2.x?
>
> -----Original Message-----
> From: Gethin James [mailto:gja...@nuxeo.com]
> Sent: Wednesday, October 25, 2017 8:20 AM
> To: dev@tika.apache.org
> Subject: Re: Tika 2 parsers
>
> Thanks for the help, I gave the parsers a go.  Just a question on the
> PDFBox dependency you mentioned.  Will Tika 2.0 require a minimum PDFBox
> version? I am embedding Tika and have pdfbox 1.8.9 so wondering if that
> work?
>
> On 25 October 2017 at 10:49, Sergey Beryozkin <sberyoz...@gmail.com>
> wrote:
>
> > As Tim indicated the 2.x line is not actively developed at the moment,
> > but what is already there now is sufficient for the initial try (ex.
> > with PDF/ODT parsers)
> >
> > Sergey
> >
> >
> >
> > On 25/10/17 08:30, Gethin James wrote:
> >
> >> I did have a look for the source, what branch is it?
> >> https://github.com/apache/tika/tree/2.x doesn't seem to have been
> >> updated since May.
> >>
> >> On 24 October 2017 at 22:15, Sergey Beryozkin <sberyoz...@gmail.com>
> >> wrote:
> >>
> >> I did try the modules in the earlier version of the CXF demo,
> >>>
> >>> see the right panel,
> >>>
> >>> https://github.com/apache/cxf/commit/c2ccecb23ba23497c95be89
> >>> f9b37f38c69faba7a#diff-b5ed531ebf92978dcbcf1ac6cc6331c0
> >>>
> >>> They should be available in the snapshot repo
> >>>
> >>> Cheers, Sergey
> >>>
> >>> On 24/10/17 19:45, Allison, Timothy B. wrote:
> >>>
> >>> We'll switch master over to the 2.0 layout after our next release,
> >>> which
> >>>> should happen shortly after the release of PDFBox 2.0.8...roughly
> >>>> in the next week for PDFBox, next month for Tika.
> >>>>
> >>>> We have abandoned keeping the current 2.x up to date, and I was
> >>>> hoping there would at least be a build here:
> >>>> https://builds.apache.org/view /T/view/Tika/job/tika-2.x/, but there
> isn't a clean build there.
> >>>>
> >>>> So, unfortunately, for now, your best bet is to build it yourself
> >>>> from source.  Sorry.
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Gethin James [mailto:gja...@nuxeo.com]
> >>>> Sent: Tuesday, October 24, 2017 12:19 PM
> >>>> To: dev@tika.apache.org
> >>>> Subject: Tika 2 parsers
> >>>>
> >>>> Hi, I am interested in trying the more modular approach of using
> >>>> the Tika
> >>>> 2 parsers.  Are the Tika 2 artifacts available in a maven repo
> >>>> somewhere?
> >>>> Is the any documentation on how to use them or how they differ from
> >>>> Tika 1?
> >>>>
> >>>> Thanks,
> >>>> Gethin.
> >>>>
> >>>>
> >>>>
> >>
>

Reply via email to