[ https://issues.apache.org/jira/browse/TIKA-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687660#comment-15687660 ]
Ashish Basran commented on TIKA-2180: ------------------------------------- I tested with Word document and Excel. I observed this in 1.13 too. Passed 22 document to Tika server for processing. 2, 5 MB documents and rest less than 1 MB documents. Following are the processing time in seconds (totals at the end) while processing documents in parallel and one after other is done. I am not sure if this behavior is by design but difference in processing time is huge. Sequence Parallel 77.4790976 22.6876726 0.9335904 17.9678267 0.8854624 26.0525849 5.0577852 15.5999804 0.8060567 26.6077107 0.7831427 17.7433509 0.8196296 26.7486071 0.7667276 26.7675274 0.7648827 26.8234494 0.7632169 22.8773994 0.8247712 16.9681799 0.9260035 26.9742814 79.6387803 21.0023846 0.7795755 14.0186599 0.7646085 27.0261048 0.8339278 26.0542291 0.8345049 15.0697296 0.8402716 24.0850932 0.7785933 20.1221993 0.9135003 13.1501129 0.9229104 170.2784636 0.8859913 178.3212539 178.0030304 782.9468017 > Multiple requests on Tika to extract text slows down > ---------------------------------------------------- > > Key: TIKA-2180 > URL: https://issues.apache.org/jira/browse/TIKA-2180 > Project: Tika > Issue Type: Bug > Components: server > Affects Versions: 1.13, 1.14 > Environment: Windows OS, Open JDK, 4 core 32 GB RAM > Reporter: Ashish Basran > > I observed that if I send multiple requests to Tika (eg. > http://localhost:8080/tika) with around 5MB files, Tika is very slow in > completing the action. I tried with ~20 random files, it took 170 seconds to > process all the files in sequence. If I pass all files in parallel, it took > around 780 seconds to process same set of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)