[ 
https://issues.apache.org/jira/browse/TIKA-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687660#comment-15687660
 ] 

Ashish Basran commented on TIKA-2180:
-------------------------------------

I tested with Word document and Excel. I observed this in 1.13 too. 

Passed 22 document to Tika server for processing. 2, 5 MB documents and rest 
less than 1 MB documents. Following are the processing time in seconds (totals 
at the end) while processing documents in parallel and one after other is done. 
I am not sure if this behavior is by design but difference in processing time 
is huge. 

Sequence        Parallel
77.4790976      22.6876726
0.9335904       17.9678267
0.8854624       26.0525849
5.0577852       15.5999804
0.8060567       26.6077107
0.7831427       17.7433509
0.8196296       26.7486071
0.7667276       26.7675274
0.7648827       26.8234494
0.7632169       22.8773994
0.8247712       16.9681799
0.9260035       26.9742814
79.6387803      21.0023846
0.7795755       14.0186599
0.7646085       27.0261048
0.8339278       26.0542291
0.8345049       15.0697296
0.8402716       24.0850932
0.7785933       20.1221993
0.9135003       13.1501129
0.9229104       170.2784636
0.8859913       178.3212539

178.0030304     782.9468017


> Multiple requests on Tika to extract text slows down
> ----------------------------------------------------
>
>                 Key: TIKA-2180
>                 URL: https://issues.apache.org/jira/browse/TIKA-2180
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.13, 1.14
>         Environment: Windows OS, Open JDK, 4 core 32 GB RAM
>            Reporter: Ashish Basran
>
> I observed that if I send multiple requests to Tika (eg. 
> http://localhost:8080/tika) with around 5MB files, Tika is very slow in 
> completing the action. I tried with ~20 random files, it took 170 seconds to 
> process all the files in sequence. If I pass all files in parallel, it took 
> around 780 seconds to process same set of files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to