[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config

2021-11-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446147#comment-17446147
 ] 

Hudson commented on TIKA-3575:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #363 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/363/])
TIKA-3575 -- let users configure ignoring load errors in TikaConfig (tallison: 
[https://github.com/apache/tika/commit/c91e96faddedceef6266609934f45d3d65e8fde4])
* (edit) tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
* (edit) CHANGES.txt


> Cannot use loadErrorHandler="ignore" in tika config
> ---
>
> Key: TIKA-3575
> URL: https://issues.apache.org/jira/browse/TIKA-3575
> Project: Tika
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Andreas Hubold
>Priority: Major
>  Labels: regression
> Fix For: 2.1.1
>
>
> Tika 2.0.0 changed the default error handler to throw exceptions, and does 
> not ignore errors when loading parsers anymore as it was the case with Tika 
> 1.x.
> See  
> [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470]
> There's no configuration option to restore the previous behavior. It should 
> be possible to set
> {code}
> 
> {code}
> but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement 
> only considers "warn" and "throw" as possible values.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config

2021-11-18 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446101#comment-17446101
 ] 

Tim Allison commented on TIKA-3575:
---

Got it.  I'm sorry for by delay.  I've added back IGNORE as an option.  Please 
let me know if this doesn't fix the problem.  Thank you, again.

> Cannot use loadErrorHandler="ignore" in tika config
> ---
>
> Key: TIKA-3575
> URL: https://issues.apache.org/jira/browse/TIKA-3575
> Project: Tika
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Andreas Hubold
>Priority: Major
>  Labels: regression
>
> Tika 2.0.0 changed the default error handler to throw exceptions, and does 
> not ignore errors when loading parsers anymore as it was the case with Tika 
> 1.x.
> See  
> [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470]
> There's no configuration option to restore the previous behavior. It should 
> be possible to set
> {code}
> 
> {code}
> but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement 
> only considers "warn" and "throw" as possible values.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config

2021-10-14 Thread Andreas Hubold (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429131#comment-17429131
 ] 

Andreas Hubold commented on TIKA-3575:
--

Thanks [~tallison], I'd suggest to
 * either change the default for loadErrorHandler in 
TikaConfig#serviceLoaderFromDomElement back to IGNORE. (this would be my 
preferred choice, and a very simple change)
 * or keep the default at THROW but extend #serviceLoaderFromDomElement to 
check for a value of "ignore" in the attribute and respect that. And if the 
default is THROW now, it should also be the default if no service-loader 
element specified, otherwise it feels inconsistent and could surprise users. If 
you search for org.apache.tika.config.LoadErrorHandler#IGNORE, you can see that 
it's still the default at some places.

{quote}The goal was to allow finer-grained module selection so that you'd never 
have load errors that you'd want to ignore.
{quote}
I really like the separation into modules in Tika 2.x. That's a great 
improvement!

Our use case for LoadErrorHandler#IGNORE: It can still be useful to include a 
module but exclude some of its parsers/dependencies. For example we're using 
tika-parser-code-module but just don't need Matlab and SAS7BDATParser, so we 
want to exclude parso and jmatio dependencies to reduce the number of 
dependencies. It's a nice feature that this disables the parsers without 
additional necessary configuration in tika config (and our downstream users 
could simply add dependencies to enable parsers without touching configuration).

I think it's a good idea to bundle different parsers into logical modules, like 
different code parsers in tika-parser-code-modules. But sometimes that may not 
be fine-grained enough, and that's where LoadErrorHandler#IGNORE plays a nice 
role, IMHO.

> Cannot use loadErrorHandler="ignore" in tika config
> ---
>
> Key: TIKA-3575
> URL: https://issues.apache.org/jira/browse/TIKA-3575
> Project: Tika
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Andreas Hubold
>Priority: Major
>  Labels: regression
>
> Tika 2.0.0 changed the default error handler to throw exceptions, and does 
> not ignore errors when loading parsers anymore as it was the case with Tika 
> 1.x.
> See  
> [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470]
> There's no configuration option to restore the previous behavior. It should 
> be possible to set
> {code}
> 
> {code}
> but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement 
> only considers "warn" and "throw" as possible values.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config

2021-10-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428908#comment-17428908
 ] 

Tim Allison commented on TIKA-3575:
---

{noformat}
I had this declaration to avoid warnings from the PDFParser in previous Tika 
versions, but that's not necessary anymore with Tika 2.x.
{noformat}
Y, sorry, I think that's what I was thinking when I made that change at some 
point during the development of 2.x.  The goal was to allow finer-grained 
module selection so that you'd never have load errors that you'd want to ignore.

So, what do we do now?  Better documentation, change something in the code?  
Thank you for raising this, and I'm sorry for the surprise.

> Cannot use loadErrorHandler="ignore" in tika config
> ---
>
> Key: TIKA-3575
> URL: https://issues.apache.org/jira/browse/TIKA-3575
> Project: Tika
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Andreas Hubold
>Priority: Major
>  Labels: regression
>
> Tika 2.0.0 changed the default error handler to throw exceptions, and does 
> not ignore errors when loading parsers anymore as it was the case with Tika 
> 1.x.
> See  
> [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470]
> There's no configuration option to restore the previous behavior. It should 
> be possible to set
> {code}
> 
> {code}
> but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement 
> only considers "warn" and "throw" as possible values.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config

2021-10-14 Thread Andreas Hubold (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428718#comment-17428718
 ] 

Andreas Hubold commented on TIKA-3575:
--

After looking more into this, I saw that the LoadErrorHandler.THROW is only 
used as default, if a `` element is specified. Otherwise, the 
default is still IGNORE. So maybe the default should just be changed back to 
IGNORE.

BTW, I run into this with the following declaration
{code:java}
 {code}
But as it seems, I can simply remove the whole service-loader element to avoid 
the problem. IIUC, the InitializableProblemHandler isn't called by any 
predefined class anymore anyway. I had this declaration to avoid warnings from 
the PDFParser in previous Tika versions, but that's not necessary anymore with 
Tika 2.x.

> Cannot use loadErrorHandler="ignore" in tika config
> ---
>
> Key: TIKA-3575
> URL: https://issues.apache.org/jira/browse/TIKA-3575
> Project: Tika
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Andreas Hubold
>Priority: Major
>  Labels: regression
>
> Tika 2.0.0 changed the default error handler to throw exceptions, and does 
> not ignore errors when loading parsers anymore as it was the case with Tika 
> 1.x.
> See  
> [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470)]
> There's no configuration option to restore the previous behavior. It should 
> be possible to set
> {code}
> 
> {code}
> but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement 
> only considers "warn" and "throw" as possible values.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)