[
https://issues.apache.org/jira/browse/TIKA-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355164#comment-17355164
]
Tim Allison commented on TIKA-3429:
-----------------------------------
That sounds frustrating. It feels to me like detection and loading that 7k xml
file of mimes is at the heart of what Tika does. Which components of Tika are
you using that don't require it?
If the problem is that Tika should error out earlier or if there's a common use
case for not needing detection, we should fix that. What's your sense of what
a fix would look like?
> Performance problems partially caused by tika eagerly loading configuration
> ---------------------------------------------------------------------------
>
> Key: TIKA-3429
> URL: https://issues.apache.org/jira/browse/TIKA-3429
> Project: Tika
> Issue Type: New Feature
> Reporter: Caleb Cushing
> Priority: Major
>
> referencing
> https://github.com/spring-projects/spring-boot/issues/26709#issuecomment-851953515
> {quote}
> the tika configuration (eagerly loading a 7K lines XML file)
> {quote}
> Here's the text of that issue
> I'm not sure the problem is spring boot, but I'm having problems finding it.
> The Jar is currently taking 3 seconds (9 if I live out tiered) to run on my
> system. Just to error out due to missing options and do nothing.
> https://github.com/xenoterracide/brix/tree/8e3d86bcf773e564cc24b51572b0bbd8bb60b73f
> {code}
> time java -Xverify:none -XX:TieredStopAtLevel=1 -jar
> modules/app/build/libs/app-0.1.0.jar
> # brix -> ccushing/copy-5-1
> Missing required parameters: '<language>', '<moduleType>', '<project>'
> Usage: <main class> [--repo=<repo>] [--workdir=<workdir>] <language>
> <moduleType> <project> [COMMAND]
> <language> The programming language you're generating code
> for. Directory under --dir
> <moduleType> The type of code you're generating e.g controller,
> also the name of the config file without the
> extension.
> <project> The name of the project you're generating code
> for.
> The name of the module to be created within the
> project.
> --repo=<repo> Repository path from the current working
> directory.
> Templates and configs are looked up relative to
> here. If the config isn't found here, then we
> will search ~/.config/brix
> --workdir=<workdir> The working directory you want your destination
> paths to be relative to. Defaults to current
> working directory
> Default:
> Commands:
> run
> java -Xverify:none -XX:TieredStopAtLevel=1 -jar 3.15s user 0.26s system
> 142% cpu 2.386 total
> {code}
> since it's a CLI app lazy init isn't helpful. This is worded like a question
> (that really would not be suitable for stackoverflow, I hate that SO is the
> support forum for things now, it's terrible because of the attitude of people
> that the objective is not to help people, also it's bad at getting answers
> for harder problems, spring should get a discourse or something again), but I
> also know I had a tika CLI app in the past that loaded in less than 1s
> without Tiered, so I'm also concerned it's a spring boot bug. I'm going to
> connect a profiler later to see what I can find, but I'm not sure that will
> do it.
> {code}
> Fedora 33
> 5.11.16-200.fc33.x86_64
> 14:08:34 up 3 days, 2:04, 1 user, load average: 0.79, 1.10, 1.66
> total used free shared buff/cache
> available
> Mem: 15G 11G 1.0G 1.4G 3.0G
> 2.3G
> Swap: 12G 1.5G 10G
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)