The usecase is really when embedding Tika and transitive dependencies. I prefer the Tika 2 modular approach as it pulls in less jars, however, I don't have some much control over my existing version of PDFBox. I will explore using Tika Server!
On 25 October 2017 at 17:44, Allison, Timothy B. <talli...@mitre.org> wrote: > Sorry, Tika 2.0 will require PDFBox 2.x at least. There were some > breaking changes btwn PDFBox 1.x and 2.x, and our PDFParser relies on 2.x > now. > > Is there something in PDFBox 1.8.x that you need that doesn't exist in 2.x? > > -----Original Message----- > From: Gethin James [mailto:gja...@nuxeo.com] > Sent: Wednesday, October 25, 2017 8:20 AM > To: dev@tika.apache.org > Subject: Re: Tika 2 parsers > > Thanks for the help, I gave the parsers a go. Just a question on the > PDFBox dependency you mentioned. Will Tika 2.0 require a minimum PDFBox > version? I am embedding Tika and have pdfbox 1.8.9 so wondering if that > work? > > On 25 October 2017 at 10:49, Sergey Beryozkin <sberyoz...@gmail.com> > wrote: > > > As Tim indicated the 2.x line is not actively developed at the moment, > > but what is already there now is sufficient for the initial try (ex. > > with PDF/ODT parsers) > > > > Sergey > > > > > > > > On 25/10/17 08:30, Gethin James wrote: > > > >> I did have a look for the source, what branch is it? > >> https://github.com/apache/tika/tree/2.x doesn't seem to have been > >> updated since May. > >> > >> On 24 October 2017 at 22:15, Sergey Beryozkin <sberyoz...@gmail.com> > >> wrote: > >> > >> I did try the modules in the earlier version of the CXF demo, > >>> > >>> see the right panel, > >>> > >>> https://github.com/apache/cxf/commit/c2ccecb23ba23497c95be89 > >>> f9b37f38c69faba7a#diff-b5ed531ebf92978dcbcf1ac6cc6331c0 > >>> > >>> They should be available in the snapshot repo > >>> > >>> Cheers, Sergey > >>> > >>> On 24/10/17 19:45, Allison, Timothy B. wrote: > >>> > >>> We'll switch master over to the 2.0 layout after our next release, > >>> which > >>>> should happen shortly after the release of PDFBox 2.0.8...roughly > >>>> in the next week for PDFBox, next month for Tika. > >>>> > >>>> We have abandoned keeping the current 2.x up to date, and I was > >>>> hoping there would at least be a build here: > >>>> https://builds.apache.org/view /T/view/Tika/job/tika-2.x/, but there > isn't a clean build there. > >>>> > >>>> So, unfortunately, for now, your best bet is to build it yourself > >>>> from source. Sorry. > >>>> > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Gethin James [mailto:gja...@nuxeo.com] > >>>> Sent: Tuesday, October 24, 2017 12:19 PM > >>>> To: dev@tika.apache.org > >>>> Subject: Tika 2 parsers > >>>> > >>>> Hi, I am interested in trying the more modular approach of using > >>>> the Tika > >>>> 2 parsers. Are the Tika 2 artifacts available in a maven repo > >>>> somewhere? > >>>> Is the any documentation on how to use them or how they differ from > >>>> Tika 1? > >>>> > >>>> Thanks, > >>>> Gethin. > >>>> > >>>> > >>>> > >> >