If someone from the large Tika team can give that extension a try, whenever time allows, it would be super, it will help me improve that extension. If you do decide to try, please post the feedback to https://groups.google.com/forum/#!forum/quarkus-dev or if it fails miserably for your documents, may be here first :-) Cheers, Sergey
On Thu, Aug 15, 2019 at 3:15 PM Sergey Beryozkin <sberyoz...@gmail.com> wrote: > Hi, > The initial documentation is here: > https://quarkus.io/guides/tika-guide > > Lots more to come over time, and we have already had users trying it (not > many but hope to see more feedback from them soon) > Sergey > > On Fri, May 10, 2019 at 6:04 PM Sergey Beryozkin <sberyoz...@gmail.com> > wrote: > >> I've managed to get the PDFParser running in the native mode, but I had >> to delay the initialization of >> org.apache.pdfbox.pdmodel.font.PDType1Font, this class has static >> PDType1Font instances, one of them leading to >> org.apache.fontbox.ttf.RAFDataStream which opens a file handler thus Graal >> can not convert it to the native code during the build time, so one needs >> to delay the initialization of PDType1Font till the run time. >> >> If we start from the PDF parser the the call path to RAFDataStream starts >> from: >> >> >> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.verifyOrCreateDefaults(PDAcroForm.java:106) >> at >> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.<init>(PDAcroForm.java:93) >> at >> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:108) >> >> org.apache.tika.parser.pdf.PDFParser.handleXFAOnly(PDFParser.java:534) >> >> I guess I may need to create a PR for PDFBox where RAFDataStream opens a >> stream lazily, with a check like ensureOpen() being added to its read >> methods... >> >> Sergey >> >> On Fri, May 3, 2019 at 1:22 PM Sergey Beryozkin <sberyoz...@gmail.com> >> wrote: >> >>> Yes, please add 'sergeyb', I've just assigned myself a CXF issue as >>> 'sergeyb'. Sorry about these multiple ids, but indeed I'll try to keep >>> using a single one. >>> >>> Thanks, Sergey >>> >>> >>> >>> On Fri, May 3, 2019 at 12:13 PM Tim Allison <talli...@apache.org> wrote: >>> >>>> I can add 'sergeyb' if you'd prefer! >>>> >>>> On Fri, May 3, 2019 at 5:43 AM Sergey Beryozkin <sberyoz...@gmail.com> >>>> wrote: >>>> > >>>> > Though I might need to settle on the 'sergeyb' eventually since it is >>>> my >>>> > apache committer id. >>>> > Thanks... >>>> > >>>> > On Fri, May 3, 2019 at 10:29 AM Sergey Beryozkin < >>>> sberyoz...@gmail.com> >>>> > wrote: >>>> > >>>> > > Oh, I forgot I had a 'sergey_beryozkin' id as well, this is not >>>> good, >>>> > > shows how long ago I did contribute :-) (did try sergey.beryozkin >>>> though). >>>> > > >>>> > > Thanks for checking it, I've just assigned this issue to myself. >>>> > > Cheers, Sergey >>>> > > >>>> > > >>>> > > On Thu, May 2, 2019 at 6:08 PM Sergey Beryozkin < >>>> sberyoz...@gmail.com> >>>> > > wrote: >>>> > > >>>> > >> Hi Tim >>>> > >> >>>> > >> I can't assign >>>> > >> https://issues.apache.org/jira/browse/TIKA-2862 >>>> > >> >>>> > >> to myself, I used to be able to assign, I know I had some time >>>> away from >>>> > >> Tika, but I'm keen to return with few contributions :-) >>>> > >> Please update my record for me to be able to assign the issues >>>> again >>>> > >> >>>> > >> Cheers, Sergey >>>> > >> >>>> > >> On Tue, Apr 30, 2019 at 6:22 PM Sergey Beryozkin < >>>> sberyoz...@gmail.com> >>>> > >> wrote: >>>> > >> >>>> > >>> Hi Tim, All >>>> > >>> >>>> > >>> I've started working on integrating Tika with Quarkus [1]. The >>>> main idea >>>> > >>> is to be able to use Tika in the native image mode. >>>> > >>> It's quite likely I'll start creating the PRs soon, to get the >>>> native >>>> > >>> image related issues resolved, these are related to some libraries >>>> > >>> statically initializing FileDescriptors, etc. >>>> > >>> >>>> > >>> Thanks, Sergey >>>> > >>> >>>> > >>> [1] >>>> > >>> >>>> https://github.com/sberyozkin/quarkus/tree/tika_extension/extensions/tika >>>> > >>> [2] >>>> > >>> >>>> https://github.com/sberyozkin/quarkus-quickstarts/tree/tika/getting-started-tika >>>> > >>> >>>> > >>> >>>> >>>