[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config
[ https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446147#comment-17446147 ] Hudson commented on TIKA-3575: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #363 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/363/]) TIKA-3575 -- let users configure ignoring load errors in TikaConfig (tallison: [https://github.com/apache/tika/commit/c91e96faddedceef6266609934f45d3d65e8fde4]) * (edit) tika-core/src/main/java/org/apache/tika/config/TikaConfig.java * (edit) CHANGES.txt > Cannot use loadErrorHandler="ignore" in tika config > --- > > Key: TIKA-3575 > URL: https://issues.apache.org/jira/browse/TIKA-3575 > Project: Tika > Issue Type: Bug > Components: config >Affects Versions: 2.0.0, 2.1.0 >Reporter: Andreas Hubold >Priority: Major > Labels: regression > Fix For: 2.1.1 > > > Tika 2.0.0 changed the default error handler to throw exceptions, and does > not ignore errors when loading parsers anymore as it was the case with Tika > 1.x. > See > [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470] > There's no configuration option to restore the previous behavior. It should > be possible to set > {code} > > {code} > but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement > only considers "warn" and "throw" as possible values. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config
[ https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446101#comment-17446101 ] Tim Allison commented on TIKA-3575: --- Got it. I'm sorry for by delay. I've added back IGNORE as an option. Please let me know if this doesn't fix the problem. Thank you, again. > Cannot use loadErrorHandler="ignore" in tika config > --- > > Key: TIKA-3575 > URL: https://issues.apache.org/jira/browse/TIKA-3575 > Project: Tika > Issue Type: Bug > Components: config >Affects Versions: 2.0.0, 2.1.0 >Reporter: Andreas Hubold >Priority: Major > Labels: regression > > Tika 2.0.0 changed the default error handler to throw exceptions, and does > not ignore errors when loading parsers anymore as it was the case with Tika > 1.x. > See > [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470] > There's no configuration option to restore the previous behavior. It should > be possible to set > {code} > > {code} > but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement > only considers "warn" and "throw" as possible values. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config
[ https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429131#comment-17429131 ] Andreas Hubold commented on TIKA-3575: -- Thanks [~tallison], I'd suggest to * either change the default for loadErrorHandler in TikaConfig#serviceLoaderFromDomElement back to IGNORE. (this would be my preferred choice, and a very simple change) * or keep the default at THROW but extend #serviceLoaderFromDomElement to check for a value of "ignore" in the attribute and respect that. And if the default is THROW now, it should also be the default if no service-loader element specified, otherwise it feels inconsistent and could surprise users. If you search for org.apache.tika.config.LoadErrorHandler#IGNORE, you can see that it's still the default at some places. {quote}The goal was to allow finer-grained module selection so that you'd never have load errors that you'd want to ignore. {quote} I really like the separation into modules in Tika 2.x. That's a great improvement! Our use case for LoadErrorHandler#IGNORE: It can still be useful to include a module but exclude some of its parsers/dependencies. For example we're using tika-parser-code-module but just don't need Matlab and SAS7BDATParser, so we want to exclude parso and jmatio dependencies to reduce the number of dependencies. It's a nice feature that this disables the parsers without additional necessary configuration in tika config (and our downstream users could simply add dependencies to enable parsers without touching configuration). I think it's a good idea to bundle different parsers into logical modules, like different code parsers in tika-parser-code-modules. But sometimes that may not be fine-grained enough, and that's where LoadErrorHandler#IGNORE plays a nice role, IMHO. > Cannot use loadErrorHandler="ignore" in tika config > --- > > Key: TIKA-3575 > URL: https://issues.apache.org/jira/browse/TIKA-3575 > Project: Tika > Issue Type: Bug > Components: config >Affects Versions: 2.0.0, 2.1.0 >Reporter: Andreas Hubold >Priority: Major > Labels: regression > > Tika 2.0.0 changed the default error handler to throw exceptions, and does > not ignore errors when loading parsers anymore as it was the case with Tika > 1.x. > See > [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470] > There's no configuration option to restore the previous behavior. It should > be possible to set > {code} > > {code} > but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement > only considers "warn" and "throw" as possible values. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config
[ https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428908#comment-17428908 ] Tim Allison commented on TIKA-3575: --- {noformat} I had this declaration to avoid warnings from the PDFParser in previous Tika versions, but that's not necessary anymore with Tika 2.x. {noformat} Y, sorry, I think that's what I was thinking when I made that change at some point during the development of 2.x. The goal was to allow finer-grained module selection so that you'd never have load errors that you'd want to ignore. So, what do we do now? Better documentation, change something in the code? Thank you for raising this, and I'm sorry for the surprise. > Cannot use loadErrorHandler="ignore" in tika config > --- > > Key: TIKA-3575 > URL: https://issues.apache.org/jira/browse/TIKA-3575 > Project: Tika > Issue Type: Bug > Components: config >Affects Versions: 2.0.0, 2.1.0 >Reporter: Andreas Hubold >Priority: Major > Labels: regression > > Tika 2.0.0 changed the default error handler to throw exceptions, and does > not ignore errors when loading parsers anymore as it was the case with Tika > 1.x. > See > [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470] > There's no configuration option to restore the previous behavior. It should > be possible to set > {code} > > {code} > but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement > only considers "warn" and "throw" as possible values. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config
[ https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428718#comment-17428718 ] Andreas Hubold commented on TIKA-3575: -- After looking more into this, I saw that the LoadErrorHandler.THROW is only used as default, if a `` element is specified. Otherwise, the default is still IGNORE. So maybe the default should just be changed back to IGNORE. BTW, I run into this with the following declaration {code:java} {code} But as it seems, I can simply remove the whole service-loader element to avoid the problem. IIUC, the InitializableProblemHandler isn't called by any predefined class anymore anyway. I had this declaration to avoid warnings from the PDFParser in previous Tika versions, but that's not necessary anymore with Tika 2.x. > Cannot use loadErrorHandler="ignore" in tika config > --- > > Key: TIKA-3575 > URL: https://issues.apache.org/jira/browse/TIKA-3575 > Project: Tika > Issue Type: Bug > Components: config >Affects Versions: 2.0.0, 2.1.0 >Reporter: Andreas Hubold >Priority: Major > Labels: regression > > Tika 2.0.0 changed the default error handler to throw exceptions, and does > not ignore errors when loading parsers anymore as it was the case with Tika > 1.x. > See > [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470)] > There's no configuration option to restore the previous behavior. It should > be possible to set > {code} > > {code} > but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement > only considers "warn" and "throw" as possible values. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)