[jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached

Luis Filipe Nassif (JIRA) Fri, 12 May 2017 15:40:20 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008843#comment-16008843
 ]


Luis Filipe Nassif commented on TIKA-2359:
------------------------------------------

Thank you Chris! Reviewing Tika-93, the original intent of my patch was to make 
the user have to configure a TesseractOcrConfig into parseContext to enable 
ocr. I think it happened to be enabled by default when some user, instead of 
configuring ocr properly, inserted a new Tesseractocrconfig if it was null in 
parseContext in the patch. But now it is the default since then.

I think all is ok with the "ocr is On/Off..." log warning, it could have helped 
Eugen and others.

> Extreme slow parsing on the attachment attached
> -----------------------------------------------
>
>                 Key: TIKA-2359
>                 URL: https://issues.apache.org/jira/browse/TIKA-2359
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Eugen Mayer
>         Attachments: Sample-doc-file-2000kb.doc
>
>
> i have 93s for parsing this document using 1.14 in server or in cli mode.
> Java:
> java version "1.8.0_121"
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> debian-jessie, 8GB ram in a docker container, current xeon 3GHz, so decent (2 
> cores limited)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached

Reply via email to