I've managed to get the PDFParser running in the native mode, but I had to
delay the initialization of
org.apache.pdfbox.pdmodel.font.PDType1Font, this class has static
PDType1Font instances, one of them leading to
org.apache.fontbox.ttf.RAFDataStream which opens a file handler thus Graal
can not convert it to the native code during the build time, so one needs
to delay the initialization of PDType1Font till the run time.

If we start from the PDF parser the the call path to RAFDataStream starts
from:

org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.verifyOrCreateDefaults(PDAcroForm.java:106)
     at
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.<init>(PDAcroForm.java:93)
     at
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:108)
     org.apache.tika.parser.pdf.PDFParser.handleXFAOnly(PDFParser.java:534)

I guess I may need to create a PR for PDFBox where RAFDataStream opens a
stream lazily, with a check like ensureOpen() being added to its read
methods...

Sergey

On Fri, May 3, 2019 at 1:22 PM Sergey Beryozkin <sberyoz...@gmail.com>
wrote:

> Yes, please add 'sergeyb', I've just assigned myself a CXF issue as
> 'sergeyb'. Sorry about these multiple ids, but indeed I'll try to keep
> using a single one.
>
> Thanks, Sergey
>
>
>
> On Fri, May 3, 2019 at 12:13 PM Tim Allison <talli...@apache.org> wrote:
>
>> I can add 'sergeyb' if you'd prefer!
>>
>> On Fri, May 3, 2019 at 5:43 AM Sergey Beryozkin <sberyoz...@gmail.com>
>> wrote:
>> >
>> > Though I might need to settle on the 'sergeyb' eventually since it is my
>> > apache committer id.
>> > Thanks...
>> >
>> > On Fri, May 3, 2019 at 10:29 AM Sergey Beryozkin <sberyoz...@gmail.com>
>> > wrote:
>> >
>> > > Oh, I forgot I had a 'sergey_beryozkin' id as well, this is not good,
>> > > shows how long ago I did contribute :-) (did try sergey.beryozkin
>> though).
>> > >
>> > > Thanks for checking it, I've just assigned this issue to myself.
>> > > Cheers, Sergey
>> > >
>> > >
>> > > On Thu, May 2, 2019 at 6:08 PM Sergey Beryozkin <sberyoz...@gmail.com
>> >
>> > > wrote:
>> > >
>> > >> Hi Tim
>> > >>
>> > >> I can't assign
>> > >> https://issues.apache.org/jira/browse/TIKA-2862
>> > >>
>> > >> to myself, I used to be able to assign, I know I had some time away
>> from
>> > >> Tika, but I'm keen to return with few contributions :-)
>> > >> Please update my record for me to be able to assign the issues again
>> > >>
>> > >> Cheers, Sergey
>> > >>
>> > >> On Tue, Apr 30, 2019 at 6:22 PM Sergey Beryozkin <
>> sberyoz...@gmail.com>
>> > >> wrote:
>> > >>
>> > >>> Hi Tim, All
>> > >>>
>> > >>> I've started working on integrating Tika with Quarkus [1]. The main
>> idea
>> > >>> is to be able to use Tika in the native image mode.
>> > >>> It's quite likely I'll start creating the PRs soon, to get the
>> native
>> > >>> image related issues resolved, these are related to some libraries
>> > >>> statically initializing FileDescriptors, etc.
>> > >>>
>> > >>> Thanks, Sergey
>> > >>>
>> > >>> [1]
>> > >>>
>> https://github.com/sberyozkin/quarkus/tree/tika_extension/extensions/tika
>> > >>> [2]
>> > >>>
>> https://github.com/sberyozkin/quarkus-quickstarts/tree/tika/getting-started-tika
>> > >>>
>> > >>>
>>
>

Reply via email to