If you call Tika yourself, and you aren't using streams, then that would be
an obvious reason why your memory problems occur in that environment.
Karl


On Fri, Oct 11, 2019 at 9:26 AM Donald Van den Driessche (Jira) <
j...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/CONNECTORS-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949443#comment-16949443
> ]
>
> Donald Van den Driessche commented on CONNECTORS-1625:
> ------------------------------------------------------
>
> After running the same process (with the same config) locally, we had no
> issues.
> So, it might be something with the streams.
>
>
>
> We've written a custom connector to fetch the files. It might use the
> wrong way to provide the file to the Tika parser.
>
> > When processing a specific PDF Manifold goes out of memory
> > ----------------------------------------------------------
> >
> >                 Key: CONNECTORS-1625
> >                 URL:
> https://issues.apache.org/jira/browse/CONNECTORS-1625
> >             Project: ManifoldCF
> >          Issue Type: Bug
> >          Components: Tika extractor
> >    Affects Versions: ManifoldCF 2.12
> >            Reporter: Donald Van den Driessche
> >            Assignee: Karl Wright
> >            Priority: Major
> >         Attachments: abd-serotec-antibodies-uk.pdf
> >
> >
> > When processing attached file with manifoldcf 2.12, we keep getting an
> out of memory error.
> > When just parsing it throug Tika 1.18, no issues are being found.
> > Can anyone look into it?
> > Thanks in advance!
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>

Reply via email to