[jira] [Commented] (TIKA-2245) Standardise logging

2023-03-08 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698065#comment-17698065
 ] 

Konstantin Gribov commented on TIKA-2245:
-

[~lfcnassif], yeah {{commons-logging}} must be excluded when using 
{{jcl-over-slf4j}}. In case of using Log4j2 bridge 
{{org.apache.logging.log4j:log4j-jcl}} it's opposite (must be present in the 
classpath) but still no need for explicit dependency, it would be brought 
transitively. 

{{jackcess}} should stay on {{commons-logging}} (at least without release with 
breaking change) but on our side we should have exclusion and 
{{jcl-over-slf4j}}. 

Not sure if it should be {{tika-parser-*-module}} level though. I'd prefer it 
in the {{tika-parsers-standard-package}} only.

That way if advanced downstream user choose one of the fine-grained 
{{tika-parser-*-module}}s they add either {{jcl-over-slf4j}} or {{log4j-jcl}} 
to their classpath. And in case of more mainstream usage 
{{tika-parsers-standard-package}} brings convenient bridge without much hustle.

I updated [Logging wiki 
page|https://cwiki.apache.org/confluence/display/TIKA/Logging] after 2.6.0 to 
more or less represent current state of affairs. Maybe I should migrate it to 
{{src/site}} in future. Confluence editor is so much pain in the arse when 
adding/editing code blocks if you have more than one on a wiki page..

> Standardise logging
> ---
>
> Key: TIKA-2245
> URL: https://issues.apache.org/jira/browse/TIKA-2245
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14, 1.15
>Reporter: Matthew Caruana Galizia
>Assignee: Konstantin Gribov
>Priority: Major
>  Labels: logging
> Fix For: 1.15
>
>
> Tika parsers sometimes use Log4j's Logger, sometimes the JUL 
> (java.util.logging) Logger and sometimes SLF4j.
> It would be better to standardise on a single facade, for the sake of not 
> having to configure multiple loggers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-3934) Reogranize POMs parent chain to avoid leaking dependency management downstream

2022-11-19 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636242#comment-17636242
 ] 

Konstantin Gribov edited comment on TIKA-3934 at 11/19/22 10:31 PM:


It seems that it doesn't if the dependency isn't used in the tika artifact in 
any way (including test dependencies).

If I have import for {{org.apache.tika:tika-bom}} and add 
{{org.apache.tika:tika-core}} and {{io.netty:netty-buffer}} without versions 
both Maven and Gradle build will fail.

On the other hand {{log4j-core}} version (and version constraint in Gradle 
case) leaks from {{tika-parent}} via {{tika-bom}}. Inconsistently in Maven case.

||Type||Use BOM||tika-core||log4j-core||Result||
|Maven|yes|-|-|log4j-api 2.19.0, log4j-core 2.19.0|
|Maven|yes|-|2.18.0|log4j-api 2.19.0, log4j-core 2.18.0|
|Maven|no|2.6.0|2.18.0|log4j-api 2.18.0, log4j-core 2.18.0|
|Gradle|yes|-|-|log4j-api 2.19.0, log4j-core 2.19.0|
|Gradle|yes|-|2.18.0|log4j-api 2.19.0, log4j-core 2.19.0|
|Gradle|no|2.6.0|2.18.0|log4j-api 2.18.0, log4j-core 2.18.0|

Test Maven project (run {{mvn package}} to see actual dependencies in the 
output):

{code:xml|title=pom.xml}

http://maven.apache.org/POM/4.0.0;
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
  4.0.0

  org.example
  bom-test
  1.0-SNAPSHOT

  
17
17
UTF-8
  

  

  
org.apache.tika
tika-bom
2.6.0
pom
import
  

  

  

  org.apache.tika
  tika-core
  



  org.apache.logging.log4j
  log4j-core
  

  

  

  
org.apache.maven.plugins
maven-dependency-plugin
3.3.0

  
test
package

  copy-dependencies


  ${project.build.directory}/deps

  

  

  

{code}

Gradle test project (run {{gradle dependencyInsight --dependency log4j}} or 
{{gradle dependencies --configuration rC}}):

{code:groovy|title=settings.gradle.kts}
dependencyResolutionManagement {
  repositories.mavenCentral()
}
{code}

{code:groovy|title=build.gradle.kts}
plugins {
  id("java-library")
}

dependencies {
  api(platform("org.apache.tika:tika-bom:2.6.0"))
  api("org.apache.tika:tika-core")
  implementation("org.apache.logging.log4j:log4j-core:2.18.0")
}
{code}


was (Author: grossws):
It seems that it doesn't, if I have import for {{org.apache.tika:tika-bom}} and 
add {{org.apache.tika:tika-core}} and {{io.netty:netty-buffer}} without 
versions both Maven and Gradle build will fail.

On the other hand {{log4j-core}} version (and version constraint in Gradle 
case) leaks from {{tika-parent}} via {{tika-bom}}.

||Type||Use BOM||tika-core||log4j-core||Result||
|Maven|yes|-|-|log4j-api 2.19.0, log4j-core 2.19.0|
|Maven|yes|-|2.18.0|log4j-api 2.19.0, log4j-core 2.18.0|
|Maven|no|2.6.0.|2.18.0|log4j-api 2.18.0, log4j-core 2.18.0|
|Gradle|yes|-|-|log4j-api 2.19.0, log4j-core 2.19.0|
|Gradle|yes|-|2.18.0|log4j-api 2.19.0, log4j-core 2.19.0|
|Gradle|no|2.6.0|2.18.0|log4j-api 2.18.0, log4j-core 2.18.0|

Test Maven project (run {{mvn package}} to see actual dependencies in the 
output):

{code:xml|title=pom.xml}

http://maven.apache.org/POM/4.0.0;
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
  4.0.0

  org.example
  bom-test
  1.0-SNAPSHOT

  
17
17
UTF-8
  

  

  
org.apache.tika
tika-bom
2.6.0
pom
import
  

  

  

  org.apache.tika
  tika-core
  



  org.apache.logging.log4j
  log4j-core
  

  

  

  
org.apache.maven.plugins
maven-dependency-plugin
3.3.0

  
test
package

  copy-dependencies


  ${project.build.directory}/deps

  

  

  

{code}

Gradle test project (run {{gradle dependencyInsight --dependency log4j}} or 
{{gradle dependencies --configuration rC}}):

{code:kotlin|title=settings.gradle.kts}
dependencyResolutionManagement {
  repositories.mavenCentral()
}
{code}

{code:kotlin|title=build.gradle.kts}
plugins {
  `java-library`
}

dependencies {
  api(platform("org.apache.tika:tika-bom:2.6.0"))
  api("org.apache.tika:tika-core")
  implementation("org.apache.logging.log4j:log4j-core:2.18.0")
}
{code}

> Reogranize POMs parent chain to avoid leaking dependency management downstream
> --
>
> Key: TIKA-3934
> URL: 

[jira] [Commented] (TIKA-3934) Reogranize POMs parent chain to avoid leaking dependency management downstream

2022-11-19 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636242#comment-17636242
 ] 

Konstantin Gribov commented on TIKA-3934:
-

It seems that it doesn't, if I have import for {{org.apache.tika:tika-bom}} and 
add {{org.apache.tika:tika-core}} and {{io.netty:netty-buffer}} without 
versions both Maven and Gradle build will fail.

On the other hand {{log4j-core}} version (and version constraint in Gradle 
case) leaks from {{tika-parent}} via {{tika-bom}}.

||Type||Use BOM||tika-core||log4j-core||Result||
|Maven|yes|-|-|log4j-api 2.19.0, log4j-core 2.19.0|
|Maven|yes|-|2.18.0|log4j-api 2.19.0, log4j-core 2.18.0|
|Maven|no|2.6.0.|2.18.0|log4j-api 2.18.0, log4j-core 2.18.0|
|Gradle|yes|-|-|log4j-api 2.19.0, log4j-core 2.19.0|
|Gradle|yes|-|2.18.0|log4j-api 2.19.0, log4j-core 2.19.0|
|Gradle|no|2.6.0|2.18.0|log4j-api 2.18.0, log4j-core 2.18.0|

Test Maven project (run {{mvn package}} to see actual dependencies in the 
output):

{code:xml|title=pom.xml}

http://maven.apache.org/POM/4.0.0;
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
  4.0.0

  org.example
  bom-test
  1.0-SNAPSHOT

  
17
17
UTF-8
  

  

  
org.apache.tika
tika-bom
2.6.0
pom
import
  

  

  

  org.apache.tika
  tika-core
  



  org.apache.logging.log4j
  log4j-core
  

  

  

  
org.apache.maven.plugins
maven-dependency-plugin
3.3.0

  
test
package

  copy-dependencies


  ${project.build.directory}/deps

  

  

  

{code}

Gradle test project (run {{gradle dependencyInsight --dependency log4j}} or 
{{gradle dependencies --configuration rC}}):

{code:kotlin|title=settings.gradle.kts}
dependencyResolutionManagement {
  repositories.mavenCentral()
}
{code}

{code:kotlin|title=build.gradle.kts}
plugins {
  `java-library`
}

dependencies {
  api(platform("org.apache.tika:tika-bom:2.6.0"))
  api("org.apache.tika:tika-core")
  implementation("org.apache.logging.log4j:log4j-core:2.18.0")
}
{code}

> Reogranize POMs parent chain to avoid leaking dependency management downstream
> --
>
> Key: TIKA-3934
> URL: https://issues.apache.org/jira/browse/TIKA-3934
> Project: Tika
>  Issue Type: Improvement
>  Components: depedency
>Affects Versions: 2.6.0
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.6.1, 2.7.0
>
>
> Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM 
> and thus forces a lot of dependency versions on downstream users. 
> For example if one use only PDF module there's no reason to force 
> Netty/Jetty/CXF/whatever versions.
> I propose the following:
> * make {{tika}} reactor depend on {{tika-parent}} and all other {{tika-*}} 
> modules on the reactor
> * move all our dependency management and build related configuration to the 
> reactor ({{tika}} root project)
> I've started these work last week and will publish first PR for review soon. 
> Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
> steps without build disruption is a must



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3934) Reogranize POMs parent chain to avoid leaking dependency management downstream

2022-11-19 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636239#comment-17636239
 ] 

Konstantin Gribov commented on TIKA-3934:
-

I need to recheck if Maven inherits parent dependencyManagement via imported 
BOM. Maybe this issue is invalid

> Reogranize POMs parent chain to avoid leaking dependency management downstream
> --
>
> Key: TIKA-3934
> URL: https://issues.apache.org/jira/browse/TIKA-3934
> Project: Tika
>  Issue Type: Improvement
>  Components: depedency
>Affects Versions: 2.6.0
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.6.1, 2.7.0
>
>
> Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM 
> and thus forces a lot of dependency versions on downstream users. 
> For example if one use only PDF module there's no reason to force 
> Netty/Jetty/CXF/whatever versions.
> I propose the following:
> * make {{tika}} reactor depend on {{tika-parent}} and all other {{tika-*}} 
> modules on the reactor
> * move all our dependency management and build related configuration to the 
> reactor ({{tika}} root project)
> I've started these work last week and will publish first PR for review soon. 
> Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
> steps without build disruption is a must



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3735) Require Java 11 for 2.x at some point

2022-11-19 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636238#comment-17636238
 ] 

Konstantin Gribov commented on TIKA-3735:
-

Another thing that comes to mind that we could have different required JDK 
version for Tika downstream consumers and to build Tika itself (including 
tests).

Maybe even for some modules that are for internal usage if we can consider any 
module internal

> Require Java 11 for 2.x at some point
> -
>
> Key: TIKA-3735
> URL: https://issues.apache.org/jira/browse/TIKA-3735
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> This follows on from discussion we had on the user/dev list for when we want 
> to require Java 11.  I think the consensus was: wait until we have to.
> The following libraries require > Java 8 at the moment.  I don't think 
> updating any of these is critical, but I do want to document where we're 
> stuck.
> We can modify/edit this list as necessary:
> * Apache OpenNLP 2.0.0 requires Java 11.
> * DL4J 1.0.0-M2.1 - datavec-data-image-1.0.0-M2.1.jar requires Java 11
> * Lucene 9.x -- used in tika-eval
> * icu4j -- we can't upgrade past 62.2 (April 2019) because that is the latest 
> version that is compatible with Lucene 8.11.1 
> (https://github.com/apache/tika/pull/587)
> * mime4j -- the last 2 (or three?) releases have been accidentally built with 
> Java 9 without the correct release=8. This should be fixed in the next 
> release.
> * Fakeload
> * 
> [checkstyle|https://mail.google.com/mail/u/0/#label/lists%2Ftika/WhctKKXXHvjnJRRdBSwLbKkDkXQtRnWGDhblVMQQZhjsDGrFpRMRQJJrZSdskrNCqcmTtjL]
> * errorprone requires Java 11 for the build (doesn't mean we can't target 8)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (TIKA-3175) Upgrade version of TPS: commons-io

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-3175.
---

> Upgrade version of TPS: commons-io
> --
>
> Key: TIKA-3175
> URL: https://issues.apache.org/jira/browse/TIKA-3175
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.23, 1.24, 1.24.1
>Reporter: Shubhangi Raut
>Priority: Critical
>
> Latest tika-bundle jars use commons-io-1.26.jar in them.
> There is a vulnerability reported for commons-io-2.6.jar which is fixed in 
> version 2.7.
> Details can be found in the following link:
> Project: https://issues.apache.org/jira/browse/IO-559
>  
> Please upgrade the version for commons-io to 2.7 in next release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3175) Upgrade version of TPS: commons-io

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3175.
-
Resolution: Duplicate

> Upgrade version of TPS: commons-io
> --
>
> Key: TIKA-3175
> URL: https://issues.apache.org/jira/browse/TIKA-3175
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.23, 1.24, 1.24.1
>Reporter: Shubhangi Raut
>Priority: Critical
>
> Latest tika-bundle jars use commons-io-1.26.jar in them.
> There is a vulnerability reported for commons-io-2.6.jar which is fixed in 
> version 2.7.
> Details can be found in the following link:
> Project: https://issues.apache.org/jira/browse/IO-559
>  
> Please upgrade the version for commons-io to 2.7 in next release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3387) Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3387.
-
Resolution: Incomplete

Please feel free to reopen the issue if it can be reproduced with more recent 
Tika version (2.6.0 at the moment) and you could provide a bit more info

> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser
> ---
>
> Key: TIKA-3387
> URL: https://issues.apache.org/jira/browse/TIKA-3387
> Project: Tika
>  Issue Type: Bug
>  Components: parser
> Environment: dev testing
>Reporter: Manojkumar M
>Priority: Critical
>
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser@7b6359a0
>   
>   
>  This is the only exception trace we are getting in the code. 
>  
> This is what is put in the pom.xml
>  
>  <*dependency*>
> <*groupId*>org.apache.tika
> <*artifactId*>tika-core
> 
> <*dependency*>
> <*groupId*>org.apache.tika
> <*artifactId*>tika-parsers
> <*exclusions*>
> <*exclusion*>
> <*groupId*>com.fasterxml.jackson.core
> <*artifactId*>jackson-core
> 
> <*exclusion*>
> <*groupId*>com.fasterxml.jackson.core
> <*artifactId*>jackson-annotations
> 
> 
> 
> {color:#FF}*Version*{color}
> tika-parsers: 1.24.1
> poi-ooxml: 4.1.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3712) update jackson-databind to 2.13.2.1 or greater in tika jars

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3712.
-
Resolution: Fixed

> update jackson-databind to 2.13.2.1 or greater in tika jars
> ---
>
> Key: TIKA-3712
> URL: https://issues.apache.org/jira/browse/TIKA-3712
> Project: Tika
>  Issue Type: Bug
>  Components: tika-eval
>Affects Versions: 2.3.0
>Reporter: Dhoka Pramod
>Priority: Critical
> Fix For: 2.4.1
>
>
> [com.fasterxml.jackson.core_jackson-databind_2.13.1|https://austsbldci-res.lab.opentext.com/static-files/FKgXaaJSguhZ4lO6UfpswhoSmhYTiF2UyQU-rrbduGUxNjQ4NzM4OTIzNDgzOjg6aHNjaGVpYm46dmlldy9UZWFtU2l0ZS9qb2IvRG9ja2VySW1hZ2UtVFMyMi4yL2xhc3RTdWNjZXNzZnVsQnVpbGQvYXJ0aWZhY3Q=/twistlock-report.html#sha256:55f19c5712346e29554e65473ac7c1ef988a2ae2fe1ffa71035426183d4ad4e9_com.fasterxml.jackson.core_jackson-databind_2.13.1]
>  in tika eval app is of version 2.13.1 which has 
> [CVE-2020-36518|https://nvd.nist.gov/vuln/detail/CVE-2020-36518] 
> vulnerability.
> jackson databind jars needs to be updated to *2.13.2.1 or greater.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3712) update jackson-databind to 2.13.2.1 or greater in tika jars

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3712:

Fix Version/s: 2.4.1

> update jackson-databind to 2.13.2.1 or greater in tika jars
> ---
>
> Key: TIKA-3712
> URL: https://issues.apache.org/jira/browse/TIKA-3712
> Project: Tika
>  Issue Type: Bug
>  Components: tika-eval
>Affects Versions: 2.3.0
>Reporter: Dhoka Pramod
>Priority: Critical
> Fix For: 2.4.1
>
>
> [com.fasterxml.jackson.core_jackson-databind_2.13.1|https://austsbldci-res.lab.opentext.com/static-files/FKgXaaJSguhZ4lO6UfpswhoSmhYTiF2UyQU-rrbduGUxNjQ4NzM4OTIzNDgzOjg6aHNjaGVpYm46dmlldy9UZWFtU2l0ZS9qb2IvRG9ja2VySW1hZ2UtVFMyMi4yL2xhc3RTdWNjZXNzZnVsQnVpbGQvYXJ0aWZhY3Q=/twistlock-report.html#sha256:55f19c5712346e29554e65473ac7c1ef988a2ae2fe1ffa71035426183d4ad4e9_com.fasterxml.jackson.core_jackson-databind_2.13.1]
>  in tika eval app is of version 2.13.1 which has 
> [CVE-2020-36518|https://nvd.nist.gov/vuln/detail/CVE-2020-36518] 
> vulnerability.
> jackson databind jars needs to be updated to *2.13.2.1 or greater.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3935) Remove log4j 1.2.x from dependencies

2022-11-19 Thread Konstantin Gribov (Jira)
Konstantin Gribov created TIKA-3935:
---

 Summary: Remove log4j 1.2.x from dependencies
 Key: TIKA-3935
 URL: https://issues.apache.org/jira/browse/TIKA-3935
 Project: Tika
  Issue Type: Task
  Components: depedency
Affects Versions: 2.6.0
Reporter: Konstantin Gribov
Assignee: Konstantin Gribov
 Fix For: 2.6.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3324) Add checkstyle checker

2022-11-19 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636228#comment-17636228
 ] 

Konstantin Gribov commented on TIKA-3324:
-

I certainly lost against checkstyle plugin. When I just run {{mvn 
checkstyle:checkstyle}} it fails on {{tika-core}} with something like 5.7k 
errors.

What do you think about using [spotless|https://github.com/diffplug/spotless]? 
It supports 
[ratchet|https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet] 
mode to avoid reformatting all files at once and to force reformat only on 
changed files. I'm going to experiment with it in a separate branch for POMs at 
first.

> Add checkstyle checker
> --
>
> Key: TIKA-3324
> URL: https://issues.apache.org/jira/browse/TIKA-3324
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> I _think_ we can introduce this gently at first. And slowly fix files as time 
> allows.  Obv, we can hope a bulk fix will work, and it won’t be much 
> effort... WDYT?
>  
> H/T [~ndipiazza_gmail]  for the recommendation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3934) Reogranize POMs parent chain to avoid leaking dependency management downstream

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3934:

Fix Version/s: 2.6.1

> Reogranize POMs parent chain to avoid leaking dependency management downstream
> --
>
> Key: TIKA-3934
> URL: https://issues.apache.org/jira/browse/TIKA-3934
> Project: Tika
>  Issue Type: Improvement
>  Components: depedency
>Affects Versions: 2.6.0
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.6.1, 2.7.0
>
>
> Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM 
> and thus forces a lot of dependency versions on downstream users. 
> For example if one use only PDF module there's no reason to force 
> Netty/Jetty/CXF/whatever versions.
> I propose the following:
> * make {{tika}} reactor depend on {{tika-parent}} and all other {{tika-*}} 
> modules on the reactor
> * move all our dependency management and build related configuration to the 
> reactor ({{tika}} root project)
> I've started these work last week and will publish first PR for review soon. 
> Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
> steps without build disruption is a must



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3934) Reogranize POMs parent chain to avoid leaking dependency management downstream

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3934:

Description: 
Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM and 
thus forces a lot of dependency versions on downstream users. 

For example if one use only PDF module there's no reason to force 
Netty/Jetty/CXF/whatever versions.

I propose the following:
* make {{tika}} reactor depend on {{tika-parent}} and all other {{tika-*}} 
modules on the reactor
* move all our dependency management and build related configuration to the 
reactor ({{tika}} root project)

I've started these work last week and will publish first PR for review soon. 
Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
steps without build disruption is a must

  was:
Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM and 
thus forces a lot of dependency versions on downstream users. 

For example if one use only PDF module there's no reason to force 
Netty/Jetty/CXF/whatever versions.

I propose the following:
* move all our dependency management and build related configuration to the 
reactor ({{tika}} root project)
* make {{tika}} rector depend on {{tika-parent}} and all other {{tika-*}} 
modules on the reactor

I've started these work last week and will publish first PR for review soon. 
Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
steps without build disruption is a must


> Reogranize POMs parent chain to avoid leaking dependency management downstream
> --
>
> Key: TIKA-3934
> URL: https://issues.apache.org/jira/browse/TIKA-3934
> Project: Tika
>  Issue Type: Improvement
>  Components: depedency
>Affects Versions: 2.6.0
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.7.0
>
>
> Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM 
> and thus forces a lot of dependency versions on downstream users. 
> For example if one use only PDF module there's no reason to force 
> Netty/Jetty/CXF/whatever versions.
> I propose the following:
> * make {{tika}} reactor depend on {{tika-parent}} and all other {{tika-*}} 
> modules on the reactor
> * move all our dependency management and build related configuration to the 
> reactor ({{tika}} root project)
> I've started these work last week and will publish first PR for review soon. 
> Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
> steps without build disruption is a must



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3934) Reogranize POMs parent chain to avoid leaking dependency management downstream

2022-11-19 Thread Konstantin Gribov (Jira)
Konstantin Gribov created TIKA-3934:
---

 Summary: Reogranize POMs parent chain to avoid leaking dependency 
management downstream
 Key: TIKA-3934
 URL: https://issues.apache.org/jira/browse/TIKA-3934
 Project: Tika
  Issue Type: Improvement
  Components: depedency
Affects Versions: 2.6.0
Reporter: Konstantin Gribov
Assignee: Konstantin Gribov
 Fix For: 2.7.0


Tika's BOM (Bill of Materials) artifact has {{tika-parent}} as a parent POM and 
thus forces a lot of dependency versions on downstream users. 

For example if one use only PDF module there's no reason to force 
Netty/Jetty/CXF/whatever versions.

I propose the following:
* move all our dependency management and build related configuration to the 
reactor ({{tika}} root project)
* make {{tika}} rector depend on {{tika-parent}} and all other {{tika-*}} 
modules on the reactor

I've started these work last week and will publish first PR for review soon. 
Moving parts from {{tika-parent}} to {{tika}} may take some time so little 
steps without build disruption is a must



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (TIKA-3368) Add Bill of Materials (BOM) artifact (Tika 1.x)

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-3368.
---

> Add Bill of Materials (BOM) artifact (Tika 1.x)
> ---
>
> Key: TIKA-3368
> URL: https://issues.apache.org/jira/browse/TIKA-3368
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 1.27
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3368) Add Bill of Materials (BOM) artifact (Tika 1.x)

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3368.
-
Resolution: Invalid

Tika 1.x reached EOL and PR was closed some time ago, just a JIRA cleanup

> Add Bill of Materials (BOM) artifact (Tika 1.x)
> ---
>
> Key: TIKA-3368
> URL: https://issues.apache.org/jira/browse/TIKA-3368
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 1.27
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3368) Add Bill of Materials (BOM) artifact (Tika 1.x)

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3368:

Fix Version/s: 1.27
   (was: 2.0.0-BETA)

> Add Bill of Materials (BOM) artifact (Tika 1.x)
> ---
>
> Key: TIKA-3368
> URL: https://issues.apache.org/jira/browse/TIKA-3368
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 1.27
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3367) Add Bill of Materials (BOM) artifact

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3367:

Fix Version/s: 2.3.0
   (was: 2.1.0)

> Add Bill of Materials (BOM) artifact
> 
>
> Key: TIKA-3367
> URL: https://issues.apache.org/jira/browse/TIKA-3367
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3367) Add Bill of Materials (BOM) artifact

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3367.
-
Resolution: Fixed

> Add Bill of Materials (BOM) artifact
> 
>
> Key: TIKA-3367
> URL: https://issues.apache.org/jira/browse/TIKA-3367
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (TIKA-3367) Add Bill of Materials (BOM) artifact

2022-11-19 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-3367.
---

> Add Bill of Materials (BOM) artifact
> 
>
> Key: TIKA-3367
> URL: https://issues.apache.org/jira/browse/TIKA-3367
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2022-11-18 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635951#comment-17635951
 ] 

Konstantin Gribov commented on TIKA-3493:
-

Just hit the same with one of the tests failing. I looked through RTF spec 1.9 
and they effectively have local date/time (just wallclock without time zone) 
there. 

Right now it's interpreted as date/time in current jvm timezone. Both 
LibreOffice and Word (on Mac) interpret them the same.

Maybe we should keep it without timezone in the metadata string (in 
{{dcterms:created}} or another property) and only reinterpret it with a TZ in 
{{Metadata#getDate}} but it would be a breaking change. Or if we can keep raw 
representation plus Tika's best guess what instant it meant. Likely to require 
breaking changes too. 

> dcterms:created date depends on the current TimeZone in RTF documents
> -
>
> Key: TIKA-3493
> URL: https://issues.apache.org/jira/browse/TIKA-3493
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.0.0
>Reporter: David Pilato
>Assignee: Tim Allison
>Priority: Minor
> Attachments: Test_case_to_demo_the_change_with_Tika_1_x1.patch
>
>
> {color:#33}I'm migrating an existing project to Tika 2.0.0.
> I'm seeing a strange behavior.
> TL;DR: the created date of the document changes depending on the timezone.
> Long story:
> I have a unit test which extracts content and metadata from a [RTF 
> document|[https://github.com/dadoonet/fscrawler/raw/master/test-documents/src/main/resources/documents/test.rtf]].
> When using Tika 1.27, whatever the timezone defined for my JVM, I'm always 
> getting the same value for "dcterms:created": "2016-07-07T13:38:00Z".
> When running the same test with Tika 2.0.0, the date changes depending on the 
> Timezone.
> For example:
> {color}
>  * {color:#33}Asia/Sakhalin gives dcterms:created=2016-07-06T23:38:00Z
> {color}
>  * {color:#33}Asia/Colombo gives dcterms:created=2016-07-07T05:08:00Z
> {color}
>  * {color:#33}Europe/Stockholm gives dcterms:created=2016-07-07T08:38:00Z
> {color}
>  
> {color:#33}I don't know if it's a bug or expected. May be the RTF format 
> does not specify the Timezone.
> I'm surprised that I don't see the same behavior for Office documents 
> actually.
> {color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3906) Build a new version of the Tika docker image to fix CVEs

2022-10-27 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625204#comment-17625204
 ] 

Konstantin Gribov commented on TIKA-3906:
-

+1 on such versioning scheme, it should be transparent enough for the 
downstream users 

> Build a new version of the Tika docker image to fix CVEs
> 
>
> Key: TIKA-3906
> URL: https://issues.apache.org/jira/browse/TIKA-3906
> Project: Tika
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 2.5.0
>Reporter: Felix Sperling
>Priority: Major
>
> Please rebuild and release a new version of the 2.5.0 docker image.
> The current one contains CVEs which have fixes already in the jammy repos.
> h2. zlib
> *_Note:_* _Versions mentioned in the description apply to the upstream 
> {{zlib}} package._ _See {{How to fix?}} for {{Ubuntu:22.04}} relevant 
> versions._
> zlib through 1.2.12 has a heap-based buffer over-read or buffer overflow in 
> inflate in inflate.c via a large gzip header extra field. NOTE: only 
> applications that call inflateGetHeader are affected. Some common 
> applications bundle the affected zlib source code but may be unable to call 
> inflateGetHeader (e.g., see the nodejs/node reference).
> h2. Remediation
> Upgrade {{Ubuntu:22.04}} {{zlib}} to version 1:1.2.11.dfsg-2ubuntu9.2 or 
> higher.
>  
> h2. perl
> *_Note:_* _Versions mentioned in the description apply to the upstream 
> {{perl}} package._ _See {{How to fix?}} for {{Ubuntu:22.04}} relevant 
> versions._
> CPAN 2.28 allows Signature Verification Bypass.
> h2. Remediation
> Upgrade {{Ubuntu:22.04}} {{perl}} to version 5.34.0-3ubuntu1.1 or higher.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3666) Detect and indicate file encrypted with Rights Management Service RMS/IRM

2022-01-31 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17484950#comment-17484950
 ] 

Konstantin Gribov commented on TIKA-3666:
-

When I looked into MS AD RMS some time ago it wasn't supported in Apache POI 
unfortunately. AFAIK POI 5.2.0 still doesn't support it.
I'm not sure if support should be added there first or if some support could be 
added to Tika. Anyway some test files are must have.

> Detect and indicate file encrypted with Rights Management Service RMS/IRM
> -
>
> Key: TIKA-3666
> URL: https://issues.apache.org/jira/browse/TIKA-3666
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Reporter: August Valera
>Priority: Major
>
> Rights Management Service (RMS), implemented in MS Office as Information 
> Rights Management (IRM), allows organizations to set file permissions that 
> are stored within the file. In most cases, this will result in the file 
> getting a new extension (with a prefix p, such as {{.txt}} becoming 
> {{{}.ptxt{}}}), but in the case of MS Office and PDF files, which support 
> this natively, the implementation results in the file contents being 
> encrypted without any extension change. 
> h4. Current behavior
> Running such files through Tika produces results as if it was an empty file 
> ran through {{DefaultParser}} and {{{}OfficeParser{}}}.
> h4. Expected behavior
> Extract more metadata about necessary permissions to view (if possible), and 
> throwing {{EncryptedDocumentException}} as is the case with Office files 
> encrypted in the more traditional manner.
> Reference: 
> [https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (TIKA-3631) Upgrade log4j 2 to version 2.17.0 in tika

2021-12-27 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3631.
-
Resolution: Fixed

> Upgrade log4j 2 to version 2.17.0 in tika
> -
>
> Key: TIKA-3631
> URL: https://issues.apache.org/jira/browse/TIKA-3631
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-server
>Affects Versions: 2.2.0
>Reporter: Dhoka Pramod
>Priority: Critical
> Fix For: 2.2.1
>
>
> Tika 2.2.0 is still using log4j 2.15 which have few vulnerabilities. Hence we 
> need log4j in tika to be updated to 2.17.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3632) Log4j appears to be running in a Servlet environment, but there's no log4j-web module available

2021-12-21 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463513#comment-17463513
 ] 

Konstantin Gribov commented on TIKA-3632:
-

I'll look into it. Seems it should be added from first glance

> Log4j appears to be running in a Servlet environment, but there's no 
> log4j-web module available
> ---
>
> Key: TIKA-3632
> URL: https://issues.apache.org/jira/browse/TIKA-3632
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 2.2.0
> Environment: Windows 10
>Reporter: Josh Burchard
>Assignee: Konstantin Gribov
>Priority: Minor
>
> I noticed the following issue when running the Tika server jar and trying to 
> troubleshoot log4j2 (with -Dlog4j2.debug set in the JVM): 
> {{INFO StatusLogger Log4j appears to be running in a Servlet environment, but 
> there's no log4j-web module available. If you want better web container 
> support, please add the log4j-web JAR to your web archive or server lib 
> directory.}}
> Is this something that needs to be added when the server jar is built?  It's 
> not _obviously_ impacting me right now but since it's a bit noisy (prints out 
> eight times) I attempted to quash the noise by downloading the 
> log4j-web-2.17.0.jar and add it to my classpath. Unfortunately that did 
> nothing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (TIKA-3632) Log4j appears to be running in a Servlet environment, but there's no log4j-web module available

2021-12-21 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov reassigned TIKA-3632:
---

Assignee: Konstantin Gribov

> Log4j appears to be running in a Servlet environment, but there's no 
> log4j-web module available
> ---
>
> Key: TIKA-3632
> URL: https://issues.apache.org/jira/browse/TIKA-3632
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 2.2.0
> Environment: Windows 10
>Reporter: Josh Burchard
>Assignee: Konstantin Gribov
>Priority: Minor
>
> I noticed the following issue when running the Tika server jar and trying to 
> troubleshoot log4j2 (with -Dlog4j2.debug set in the JVM): 
> {{INFO StatusLogger Log4j appears to be running in a Servlet environment, but 
> there's no log4j-web module available. If you want better web container 
> support, please add the log4j-web JAR to your web archive or server lib 
> directory.}}
> Is this something that needs to be added when the server jar is built?  It's 
> not _obviously_ impacting me right now but since it's a bit noisy (prints out 
> eight times) I attempted to quash the noise by downloading the 
> log4j-web-2.17.0.jar and add it to my classpath. Unfortunately that did 
> nothing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3628) Is tika 2.2 available

2021-12-20 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462557#comment-17462557
 ] 

Konstantin Gribov commented on TIKA-3628:
-

Great! 

For gradle-related help beside docs I highly recommend [Gradle Community 
Slack|https://gradle-community.slack.com/] #community-support channel.

> Is tika 2.2 available
> -
>
> Key: TIKA-3628
> URL: https://issues.apache.org/jira/browse/TIKA-3628
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Vamsi Molli
>Priority: Major
>
> As per  [https://tika.apache.org/] . Tika has released the 2.2 version.
> When trying to upgrade from 2.1.0 to 2.2 getting the following error.
> Could not resolve org.apache.tika:tika-core:2.2.0.
> [group: 'org.apache.tika', name: 'tika-core', version: '2.2.0'],
>             [group: 'org.apache.tika', name: 'tika-parsers-standard-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-microsoft-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-sqlite3-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-scientific-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-zip-commons', 
> version: '2.1.0'],
> I see only tika-core upgraded to 2.2.0 rest are seeing 2.1.0 only as per 
> (https://mvnrepository.com/artifact/org.apache.tika).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3628) Is tika 2.2 available

2021-12-20 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462541#comment-17462541
 ] 

Konstantin Gribov commented on TIKA-3628:
-

{quote}No cached version of org.apache.tika:tika-core:2.2.0 available for 
offline mode{quote} shows what's the problem. You have offline mode on, it 
allows Gradle to only use dependencies in local gradle cache. Remove 
{{--offline}} when running gradle (or uncheck offline mode in IDE if you see 
the issue there). 


> Is tika 2.2 available
> -
>
> Key: TIKA-3628
> URL: https://issues.apache.org/jira/browse/TIKA-3628
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Vamsi Molli
>Priority: Major
>
> As per  [https://tika.apache.org/] . Tika has released the 2.2 version.
> When trying to upgrade from 2.1.0 to 2.2 getting the following error.
> Could not resolve org.apache.tika:tika-core:2.2.0.
> [group: 'org.apache.tika', name: 'tika-core', version: '2.2.0'],
>             [group: 'org.apache.tika', name: 'tika-parsers-standard-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-microsoft-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-sqlite3-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-scientific-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-zip-commons', 
> version: '2.1.0'],
> I see only tika-core upgraded to 2.2.0 rest are seeing 2.1.0 only as per 
> (https://mvnrepository.com/artifact/org.apache.tika).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3628) Is tika 2.2 available

2021-12-20 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462491#comment-17462491
 ] 

Konstantin Gribov commented on TIKA-3628:
-

Yeah, I get it that you want to upgrade. 2.2.0 is available from Central (which 
is the primary maven repo).

How you upgrade depends on your build system. Since you didn't specified what 
do you use I can only give generic advice: change relevant version numbers from 
2.1.0 to 2.2.0 in your build definition (like pom.xml, build.gradle[.kts], 
*.project.clj or something else).

Excluding httpcomponents also depends on your build system but most likely you 
would want to just select a different version. Look for you build system 
documentation how to do this.

> Is tika 2.2 available
> -
>
> Key: TIKA-3628
> URL: https://issues.apache.org/jira/browse/TIKA-3628
> Project: Tika
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 2.2.0
>Reporter: Vamsi Molli
>Priority: Major
> Fix For: 2.1.0
>
>
> As per  [https://tika.apache.org/] . Tika has released the 2.2 version.
> When trying to upgrade from 2.1.0 to 2.2 getting the following error.
> Could not resolve org.apache.tika:tika-core:2.2.0.
> [group: 'org.apache.tika', name: 'tika-core', version: '2.2.0'],
>             [group: 'org.apache.tika', name: 'tika-parsers-standard-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-microsoft-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-sqlite3-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-scientific-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-zip-commons', 
> version: '2.1.0'],
> I see only tika-core upgraded to 2.2.0 rest are seeing 2.1.0 only as per 
> (https://mvnrepository.com/artifact/org.apache.tika).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3628) Is tika 2.2 available

2021-12-20 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462482#comment-17462482
 ] 

Konstantin Gribov commented on TIKA-3628:
-

Maven Central has Tika 2.2.0: 
https://search.maven.org/search?q=g:org.apache.tika.
I see that mvnrepository shows mix between 2.2.0 and 2.1.0 as last version, I 
guess it's still syncing from Central.

What repository do you use?

> Is tika 2.2 available
> -
>
> Key: TIKA-3628
> URL: https://issues.apache.org/jira/browse/TIKA-3628
> Project: Tika
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 2.2.0
>Reporter: Vamsi Molli
>Priority: Major
> Fix For: 2.1.0
>
>
> As per  [https://tika.apache.org/] . Tika has released the 2.2 version.
> When trying to upgrade from 2.1.0 to 2.2 getting the following error.
> Could not resolve org.apache.tika:tika-core:2.2.0.
> [group: 'org.apache.tika', name: 'tika-core', version: '2.2.0'],
>             [group: 'org.apache.tika', name: 'tika-parsers-standard-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-microsoft-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-sqlite3-package', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-scientific-module', 
> version: '2.1.0'],
>             [group: 'org.apache.tika', name: 'tika-parser-zip-commons', 
> version: '2.1.0'],
> I see only tika-core upgraded to 2.2.0 rest are seeing 2.1.0 only as per 
> (https://mvnrepository.com/artifact/org.apache.tika).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3623) Upgrade log4j to 2.16.0

2021-12-17 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3623:

Priority: Blocker  (was: Major)

> Upgrade log4j to 2.16.0
> ---
>
> Key: TIKA-3623
> URL: https://issues.apache.org/jira/browse/TIKA-3623
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Blocker
> Fix For: 1.28, 2.2.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3623) Upgrade log4j to 2.16.0

2021-12-17 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3623:

Summary: Upgrade log4j to 2.16.0  (was: Upgrade log4j to 2.0.16)

> Upgrade log4j to 2.16.0
> ---
>
> Key: TIKA-3623
> URL: https://issues.apache.org/jira/browse/TIKA-3623
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 1.28, 2.2.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3616) Upgrade log4j2 to 2.15.0

2021-12-17 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3616:

Summary: Upgrade log4j2 to 2.15.0  (was: Upgrade log4j2 to 2.0.15)

> Upgrade log4j2 to 2.15.0
> 
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Blocker
> Fix For: 2.2.0
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-15 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460168#comment-17460168
 ] 

Konstantin Gribov commented on TIKA-3616:
-

I looked a bit how Tika and it's upstream dependencies use 
{{MDC}}/{{ThreadContext}} which are vulnerable in 2.15 and Tika and deps use 
them quite sparsely (as far as IntelliJ IDEA sees usages). 

{{solrj}} puts Solr client URL into MDC, Zookeeper puts node id from config 
file into MDC and UIMA puts some ids into it which doesn't seem to be 
user-generated at least in Tika. 

Also {{testcontainers}} use MDC but only in {{test}} scope.

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3367) Add Bill of Materials (BOM) artifact

2021-07-26 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3367:

Fix Version/s: (was: 2.0.0-BETA)
   2.0.1

> Add Bill of Materials (BOM) artifact
> 
>
> Key: TIKA-3367
> URL: https://issues.apache.org/jira/browse/TIKA-3367
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-04-29 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3164:

Issue Type: Task  (was: Bug)

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3312) Support Log4j2 jar in Tika-app.jar

2021-04-24 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331234#comment-17331234
 ] 

Konstantin Gribov commented on TIKA-3312:
-

[~tallison], will do. I took a peek into it right now and found couple of 
things that I'd like to change in dependencies but it would required thoughtful 
and attentive approach not to break something ,)

> Support Log4j2 jar in Tika-app.jar
> --
>
> Key: TIKA-3312
> URL: https://issues.apache.org/jira/browse/TIKA-3312
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.22, 1.24.1
>Reporter: Charushila Nanekar
>Priority: Critical
>
> Latest version of Tika-app is using older version of Log4j jar which cause an 
> issue when Tika-app get integrated with other 3rd Party Application which 
> using latest log4j 2 jar.
> Additionally, Apache Log4j 2 is an upgrade to Log4j that provides significant 
> improvements over its predecessor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (TIKA-3149) Tikka 1.18 not working with tess4j 3.4.8 on linux

2021-04-23 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-3149.
---

> Tikka 1.18 not working with tess4j 3.4.8 on linux
> -
>
> Key: TIKA-3149
> URL: https://issues.apache.org/jira/browse/TIKA-3149
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.18
> Environment: linux and deployedo n weblogic
>Reporter: Vishakha 
>Assignee: Konstantin Gribov
>Priority: Blocker
>  Labels: starter
>
> I am using tikka 1.18 version to parse the docuemtn content. It is working 
> independently when deployed on linux but it is not working. If tessract is 
> used before it. It is giving below error while parseTostring 
> code : 
> Tika tika = new Tika();Tika tika = new Tika();
> try(InputStream stream = new 
> FileInputStream(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString()))
>  { String documentExt = 
> tika.detect(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString());
> String outputStr = tika.parseToString(stream);
> String tempStr = outputStr.replace("\n", ""); _Logger.info("tempStr: " 
> +tempStr); }
> catch (TikaException e) \{
>  // TODO Auto-generated catch block _Logger.error("Error :",e); }
> Error as :
> java.lang.StackOverflowError
>   at 
> org.slf4j.impl.JDK14LoggerAdapter.fillCallerData(JDK14LoggerAdapter.java:602)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:587)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
> ...
> > 
> kindly let us know the solution



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3149) Tikka 1.18 not working with tess4j 3.4.8 on linux

2021-04-23 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3149:

Description: 
I am using tikka 1.18 version to parse the docuemtn content. It is working 
independently when deployed on linux but it is not working. If tessract is used 
before it. It is giving below error while parseTostring 

code : 

Tika tika = new Tika();Tika tika = new Tika();

try(InputStream stream = new 
FileInputStream(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString()))
 { String documentExt = 
tika.detect(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString());

String outputStr = tika.parseToString(stream);

String tempStr = outputStr.replace("\n", ""); _Logger.info("tempStr: " 
+tempStr); }

catch (TikaException e) \{
 // TODO Auto-generated catch block _Logger.error("Error :",e); }


Error as :
java.lang.StackOverflowError
at 
org.slf4j.impl.JDK14LoggerAdapter.fillCallerData(JDK14LoggerAdapter.java:602)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:587)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
...
> 


kindly let us know the solution

  was:
I am using tikka 1.18 version to parse the docuemtn content. It is working 
independently when deployed on linux but it is not working. If tessract is used 
before it. It is giving below error while parseTostring 

code : 

Tika tika = new Tika();Tika tika = new Tika();

try(InputStream stream = new 
FileInputStream(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString()))
 { String documentExt = 
tika.detect(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString());

String outputStr = tika.parseToString(stream);

String tempStr = outputStr.replace("\n", ""); _Logger.info("tempStr: " 
+tempStr); }

catch (TikaException e) \{
 // TODO Auto-generated catch block _Logger.error("Error :",e); }


Error as :
java.lang.StackOverflowError
at 
org.slf4j.impl.JDK14LoggerAdapter.fillCallerData(JDK14LoggerAdapter.java:602)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:587)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 
org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
at 
org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
at java.util.logging.Logger.log(Logger.java:738)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
at 

[jira] [Resolved] (TIKA-3149) Tikka 1.18 not working with tess4j 3.4.8 on linux

2021-04-23 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3149.
-
  Assignee: Konstantin Gribov
Resolution: Not A Bug

> Tikka 1.18 not working with tess4j 3.4.8 on linux
> -
>
> Key: TIKA-3149
> URL: https://issues.apache.org/jira/browse/TIKA-3149
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.18
> Environment: linux and deployedo n weblogic
>Reporter: Vishakha 
>Assignee: Konstantin Gribov
>Priority: Blocker
>  Labels: starter
>
> I am using tikka 1.18 version to parse the docuemtn content. It is working 
> independently when deployed on linux but it is not working. If tessract is 
> used before it. It is giving below error while parseTostring 
> code : 
> Tika tika = new Tika();Tika tika = new Tika();
> try(InputStream stream = new 
> FileInputStream(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString()))
>  { String documentExt = 
> tika.detect(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString());
> String outputStr = tika.parseToString(stream);
> String tempStr = outputStr.replace("\n", ""); _Logger.info("tempStr: " 
> +tempStr); }
> catch (TikaException e) \{
>  // TODO Auto-generated catch block _Logger.error("Error :",e); }
> Error as :
> java.lang.StackOverflowError
>   at 
> org.slf4j.impl.JDK14LoggerAdapter.fillCallerData(JDK14LoggerAdapter.java:602)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:587)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
> ...
> > 
> kindly let us know the solution



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3149) Tikka 1.18 not working with tess4j 3.4.8 on linux

2021-04-23 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331122#comment-17331122
 ] 

Konstantin Gribov commented on TIKA-3149:
-

You have both slf4j-jdk14 (logger implementation using java.util.Logging) and 
jul-to-slf4j (bridge to redirect java.util.Logging to slf4j-api). I recommend 
to drop slf4j-jdk14 from classpath and use any other logging implementation 
(logback-classic, log4j2).

> Tikka 1.18 not working with tess4j 3.4.8 on linux
> -
>
> Key: TIKA-3149
> URL: https://issues.apache.org/jira/browse/TIKA-3149
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.18
> Environment: linux and deployedo n weblogic
>Reporter: Vishakha 
>Priority: Blocker
>  Labels: starter
>
> I am using tikka 1.18 version to parse the docuemtn content. It is working 
> independently when deployed on linux but it is not working. If tessract is 
> used before it. It is giving below error while parseTostring 
> code : 
> Tika tika = new Tika();Tika tika = new Tika();
> try(InputStream stream = new 
> FileInputStream(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString()))
>  { String documentExt = 
> tika.detect(Paths.get(documentPath.concat(documentName)).toAbsolutePath().toString());
> String outputStr = tika.parseToString(stream);
> String tempStr = outputStr.replace("\n", ""); _Logger.info("tempStr: " 
> +tempStr); }
> catch (TikaException e) \{
>  // TODO Auto-generated catch block _Logger.error("Error :",e); }
> Error as :
> java.lang.StackOverflowError
>   at 
> org.slf4j.impl.JDK14LoggerAdapter.fillCallerData(JDK14LoggerAdapter.java:602)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:587)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:303)
>   at java.util.logging.Logger.log(Logger.java:738)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:588)
>   at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:660)
>   at 
> org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:221)
>   at 
> 

[jira] [Updated] (TIKA-3369) Flaky Tesseract OCR confirmMultiPageTiffHandling test

2021-04-23 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3369:

Description: 
Current main@08793d360a838db04a3d23b902c34d9e6e7362e4 fails with

{noformat}
[ERROR]   
TesseractOCRParserTest.confirmMultiPageTiffHandling:108->TikaTest.assertContains:79
 Page 2 not found in:
http://www.w3.org/1999/xhtml;>






Multipage
TIFF
Example
Page 1
Multipage
TIFF
Example
Page?2


{noformat}

Take note that tesseract extract {{Page?2}} instead of {{Page 2}}.

  was:
Current main@08793d360a838db04a3d23b902c34d9e6e7362e4 fails with

{noformat}
[ERROR]   
TesseractOCRParserTest.confirmMultiPageTiffHandling:108->TikaTest.assertContains:79
 Page 2 not found in:
http://www.w3.org/1999/xhtml;>






Multipage
TIFF
Example
Page 1
Multipage
TIFF
Example
Page?2


{noformat}



> Flaky Tesseract OCR confirmMultiPageTiffHandling test
> -
>
> Key: TIKA-3369
> URL: https://issues.apache.org/jira/browse/TIKA-3369
> Project: Tika
>  Issue Type: Test
>  Components: ocr
>Affects Versions: 2.0.0
> Environment: Arch Linux, kernel: 5.11.16-arch1-1 #1 SMP PREEMPT Wed, 
> 21 Apr 2021 17:22:13 + x86_64 GNU/Linux
> OpenJDK 15.0.2.u7-1
> Tesseract 4.1.1-5 with icu 69.1-1, cairo 1.17.4-5, pango 1:1.48.4-1, 
> tesseract-data-{eng,deu,fra,rus,ukr} 2:4.0.0-1 (other languages not installed)
>Reporter: Konstantin Gribov
>Priority: Minor
>
> Current main@08793d360a838db04a3d23b902c34d9e6e7362e4 fails with
> {noformat}
> [ERROR]   
> TesseractOCRParserTest.confirmMultiPageTiffHandling:108->TikaTest.assertContains:79
>  Page 2 not found in:
> http://www.w3.org/1999/xhtml;>
> 
> 
>  />
>  content="org.apache.tika.parser.ocr.TesseractOCRParser" />
> 
> 
> Multipage
> TIFF
> Example
> Page 1
> Multipage
> TIFF
> Example
> Page?2
> 
> 
> {noformat}
> Take note that tesseract extract {{Page?2}} instead of {{Page 2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3369) Flaky Tesseract OCR confirmMultiPageTiffHandling test

2021-04-23 Thread Konstantin Gribov (Jira)
Konstantin Gribov created TIKA-3369:
---

 Summary: Flaky Tesseract OCR confirmMultiPageTiffHandling test
 Key: TIKA-3369
 URL: https://issues.apache.org/jira/browse/TIKA-3369
 Project: Tika
  Issue Type: Test
  Components: ocr
Affects Versions: 2.0.0
 Environment: Arch Linux, kernel: 5.11.16-arch1-1 #1 SMP PREEMPT Wed, 
21 Apr 2021 17:22:13 + x86_64 GNU/Linux
OpenJDK 15.0.2.u7-1
Tesseract 4.1.1-5 with icu 69.1-1, cairo 1.17.4-5, pango 1:1.48.4-1, 
tesseract-data-{eng,deu,fra,rus,ukr} 2:4.0.0-1 (other languages not installed)

Reporter: Konstantin Gribov


Current main@08793d360a838db04a3d23b902c34d9e6e7362e4 fails with

{noformat}
[ERROR]   
TesseractOCRParserTest.confirmMultiPageTiffHandling:108->TikaTest.assertContains:79
 Page 2 not found in:
http://www.w3.org/1999/xhtml;>






Multipage
TIFF
Example
Page 1
Multipage
TIFF
Example
Page?2


{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3368) Add Bill of Materials (BOM) artifact (Tika 1.x)

2021-04-23 Thread Konstantin Gribov (Jira)
Konstantin Gribov created TIKA-3368:
---

 Summary: Add Bill of Materials (BOM) artifact (Tika 1.x)
 Key: TIKA-3368
 URL: https://issues.apache.org/jira/browse/TIKA-3368
 Project: Tika
  Issue Type: Improvement
  Components: packaging
Reporter: Konstantin Gribov
Assignee: Konstantin Gribov
 Fix For: 1.27






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3367) Add Bill of Materials (BOM) artifact

2021-04-23 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-3367:

Fix Version/s: (was: 1.27)

> Add Bill of Materials (BOM) artifact
> 
>
> Key: TIKA-3367
> URL: https://issues.apache.org/jira/browse/TIKA-3367
> Project: Tika
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3367) Add Bill of Materials (BOM) artifact

2021-04-23 Thread Konstantin Gribov (Jira)
Konstantin Gribov created TIKA-3367:
---

 Summary: Add Bill of Materials (BOM) artifact
 Key: TIKA-3367
 URL: https://issues.apache.org/jira/browse/TIKA-3367
 Project: Tika
  Issue Type: Improvement
  Components: packaging
Reporter: Konstantin Gribov
Assignee: Konstantin Gribov
 Fix For: 2.0.0, 1.27






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3312) Support Log4j2 jar in Tika-app.jar

2021-03-09 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298345#comment-17298345
 ] 

Konstantin Gribov commented on TIKA-3312:
-

[~tallison], in that case I think it could safely go into tika-server-core 
since it's already end-user runnable application. What do you think about 
extracting a module with just a bunch of runtime deps and configs for all cli 
tools?

> Support Log4j2 jar in Tika-app.jar
> --
>
> Key: TIKA-3312
> URL: https://issues.apache.org/jira/browse/TIKA-3312
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.22, 1.24.1
>Reporter: Charushila Nanekar
>Priority: Critical
>
> Latest version of Tika-app is using older version of Log4j jar which cause an 
> issue when Tika-app get integrated with other 3rd Party Application which 
> using latest log4j 2 jar.
> Additionally, Apache Log4j 2 is an upgrade to Log4j that provides significant 
> improvements over its predecessor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3312) Support Log4j2 jar in Tika-app.jar

2021-03-09 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298099#comment-17298099
 ] 

Konstantin Gribov commented on TIKA-3312:
-

I agree that we should upgrade to log4j2 or logback-classic as logging 
implementation.

But I would advice against using tika-app as a library. 
[~cnanekar], could you tell us why you depend on it instead of 
tika-parsers/tika-batch etc? Than you could choose whichever logging impl you 
prefer with its configuration specific to your app.

> Support Log4j2 jar in Tika-app.jar
> --
>
> Key: TIKA-3312
> URL: https://issues.apache.org/jira/browse/TIKA-3312
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 1.22, 1.24.1
>Reporter: Charushila Nanekar
>Priority: Critical
>
> Latest version of Tika-app is using older version of Log4j jar which cause an 
> issue when Tika-app get integrated with other 3rd Party Application which 
> using latest log4j 2 jar.
> Additionally, Apache Log4j 2 is an upgrade to Log4j that provides significant 
> improvements over its predecessor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3120) Remove whitelist/blacklist terminology

2020-06-26 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146315#comment-17146315
 ] 

Konstantin Gribov commented on TIKA-3120:
-

[~tallison], I noticed messages from commits@tika.a.o about this and saw that 
you use include/skip pair. Did you choose one such pair or just gone with 
context dependent on case by case basis? If first it might be good idea to add 
recommended words for include/exclude to wiki for future contributors.

> Remove whitelist/blacklist terminology
> --
>
> Key: TIKA-3120
> URL: https://issues.apache.org/jira/browse/TIKA-3120
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 1.25
>
>
> Looks trivial...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3121) Rename master branch

2020-06-26 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146290#comment-17146290
 ] 

Konstantin Gribov commented on TIKA-3121:
-

Alternative is to use just branches like main, branch_1x, branch_2x etc, 
archive & lock master and set new branch as default HEAD. This way we will have 
much smoother transition with much smaller potential impact

> Rename master branch
> 
>
> Key: TIKA-3121
> URL: https://issues.apache.org/jira/browse/TIKA-3121
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> I started a discussion on the dev list for this here:
> http://mail-archives.us.apache.org/mod_mbox/tika-dev/202006.mbox/%3CCAC1dCwW9FuK%2BkSzokmweeYwLFiED9g0W-43J1TNhMwnv7rdp8g%40mail.gmail.com%3E
> One committer would prefer that we not make this change, but seems ok with it.
> Recommendations:
> * main
> * trunk
> * development
> * stable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3121) Rename master branch

2020-06-26 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146288#comment-17146288
 ] 

Konstantin Gribov commented on TIKA-3121:
-

I didn't vote before and a bit ambivalent about change. Despite all Rich's 
pushing towards renaming I'm a bit concerned about real impact on developer 
biased community. For me it looks more like populist decision but I may be 
biased by previous hate storms that used D ideas against anyone who don't 
kneel and plead to spare them despite not being in some minority.

We will have to go through documentation, wiki, configuration for CI etc to 
ensure that new branch name is used but we can do this only for our projects. 

All external developers who include Tika in their build systems, delivery 
pipelines, writes articles/books and using master branch would have to do some 
additional (and sometimes unexpected) work. In ideal world it would be done via 
usual scripts/configuration maintenance but a lot of thing with low prio 
support or without actual maintenance could break.

So, I'm basically -0.5, weak against 'cause potential impact on downstream 
users and fellow developers.

> Rename master branch
> 
>
> Key: TIKA-3121
> URL: https://issues.apache.org/jira/browse/TIKA-3121
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> I started a discussion on the dev list for this here:
> http://mail-archives.us.apache.org/mod_mbox/tika-dev/202006.mbox/%3CCAC1dCwW9FuK%2BkSzokmweeYwLFiED9g0W-43J1TNhMwnv7rdp8g%40mail.gmail.com%3E
> One committer would prefer that we not make this change, but seems ok with it.
> Recommendations:
> * main
> * trunk
> * development
> * stable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073020#comment-17073020
 ] 

Konstantin Gribov commented on TIKA-3082:
-

Also we could later add client modules for couple of popular libraries to give 
downstream users ready-to-fly libs with already declared deps.

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Lewis John McGibbney
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073017#comment-17073017
 ] 

Konstantin Gribov commented on TIKA-3082:
-

[~lewismc], my gratitude and big +1 than)

In my experience some OpenAPI/Swagger tools are quite fragile (like 
swagger-codegen could break on minor version update) but overall I'm very 
inclined to use it since it gives us better maintainability, documentation 
generation, easier API versioning.

Also, I'd like to propose moving current APIs to versioned namespace 
{{/api/v1/*}} (and redirecting existing methods (like {{/meta}}, {{/rmeta}} 
etc) there with HTTP status 301.

BTW, JetBrains IDEA has bundled OpenAPI plugin (at least 2020.1 RC does).

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Lewis John McGibbney
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3082) Consider adding an OpenAPI for tika-server

2020-04-01 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072957#comment-17072957
 ] 

Konstantin Gribov commented on TIKA-3082:
-

[~lewismc], could you please clarify how do you wish to use OpenAPI spec? 

Since such spec could be used to generate client libraries and stubs for JAX-RS 
or it could be generated from some additional annotations on say JAX-RS 
services. Both solutions are viable but certainly depend on your goals in 
introducing OpenAPI. Both solutions have pros and cons, so I hope you'll have a 
some time to expand your original idea.

> Consider adding an OpenAPI for tika-server
> --
>
> Key: TIKA-3082
> URL: https://issues.apache.org/jira/browse/TIKA-3082
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-2253, [~lewismc] asked:
> bq. I was planning on putting together an OpenAPI specification for Tika. Is 
> anyone in favor of this?
> What do people think?  How much will it change the current tika-server?  What 
> are the benefits?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3073) Add gzip in- and out- interceptors to tika-server

2020-03-19 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062662#comment-17062662
 ] 

Konstantin Gribov commented on TIKA-3073:
-

[~tallison], glad to help. I'm unfamiliar with CXF so here you go.

> Add gzip in- and out- interceptors to tika-server
> -
>
> Key: TIKA-3073
> URL: https://issues.apache.org/jira/browse/TIKA-3073
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.25
>
>
> On TIKA-3069, [~carina.antunes] requested compressing /rmeta output. This 
> makes sense as a start...we might also look into allowing more 
> configurability around which metadata fields and file types to send back over 
> the wire.  Few people need everything...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TIKA-3073) Add compression option to /rmeta output

2020-03-18 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061672#comment-17061672
 ] 

Konstantin Gribov edited comment on TIKA-3073 at 3/18/20, 12:32 PM:


[~tallison], usually webserver should accept HTTP {{Accept-Encoding: gzip, 
deflate}} header (you could set it with curl's {{\-\-compressed}}), but I don't 
know how this should be configured in CXF. But it seems tika-server ignores it 
and just use {{chunked}}. So, IMHO, it's out of scope for JAX-RS but more to do 
with CXF/Jetty. Jetty itself has 
[https://www.eclipse.org/jetty/documentation/current/gzip-filter.html] which 
can be enabled for whole server using by adding it with 
{{org.eclipse.jetty.server.Server#insertHandler}}.

Some servers would return {{Content-Encoding}} instead of {{Transfer-Encoding}} 
and curl supports both. To test just call {{curl \-\-compressed \-\-http1.1 -v 
https://code.jquery.com/jquery-3.3.1.slim.min.js}} with and without 
{{\-\-compressed}} flag.


was (Author: grossws):
[~tallison], usually webserver should accept HTTP {{Accept-Encoding: gzip, 
deflate}} header (you could set it with curl's --compressed), but I don't know 
how this should be configured in CXF. But it seems tika-server ignores it and 
just use {{chunked}}. So, IMHO, it's out of scope for JAX-RS but more to do 
with CXF/Jetty. Jetty itself has 
[https://www.eclipse.org/jetty/documentation/current/gzip-filter.html] which 
can be enabled for whole server using by adding it with 
{{org.eclipse.jetty.server.Server#insertHandler}}.

Some servers would return {{Content-Encoding}} instead of {{Transfer-Encoding}} 
and curl supports both. To test just call {{curl --compressed --http1.1 -v 
[https://code.jquery.com/jquery-3.3.1.slim.min.js]-}} with and without 
{{-compressed}} flag.

> Add compression option to /rmeta output
> ---
>
> Key: TIKA-3073
> URL: https://issues.apache.org/jira/browse/TIKA-3073
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-3069, [~carina.antunes] requested compressing /rmeta output. This 
> makes sense as a start...we might also look into allowing more 
> configurability around which metadata fields and file types to send back over 
> the wire.  Few people need everything...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TIKA-3073) Add compression option to /rmeta output

2020-03-18 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061672#comment-17061672
 ] 

Konstantin Gribov edited comment on TIKA-3073 at 3/18/20, 12:31 PM:


[~tallison], usually webserver should accept HTTP {{Accept-Encoding: gzip, 
deflate}} header (you could set it with curl's --compressed), but I don't know 
how this should be configured in CXF. But it seems tika-server ignores it and 
just use {{chunked}}. So, IMHO, it's out of scope for JAX-RS but more to do 
with CXF/Jetty. Jetty itself has 
[https://www.eclipse.org/jetty/documentation/current/gzip-filter.html] which 
can be enabled for whole server using by adding it with 
{{org.eclipse.jetty.server.Server#insertHandler}}.

Some servers would return {{Content-Encoding}} instead of {{Transfer-Encoding}} 
and curl supports both. To test just call {{curl --compressed --http1.1 -v 
[https://code.jquery.com/jquery-3.3.1.slim.min.js]-}} with and without 
{{-compressed}} flag.


was (Author: grossws):
[~tallison], usually webserver should accept HTTP {{Accept-Encoding: gzip, 
deflate}} header (you could set it with curl's --compressed), but I don't know 
how this should be configured in CXF. But it seems tika-server ignores it and 
just use {{chinked}}. So, IMHO, it's out of scope for JAX-RS but more to do 
with CXF/Jetty. Jetty itself has 
https://www.eclipse.org/jetty/documentation/current/gzip-filter.html which can 
be enabled for whole server using by adding it with 
{{org.eclipse.jetty.server.Server#insertHandler}}.

Some servers would return {{Content-Encoding}} instead of {{Transfer-Encoding}} 
and curl supports both. To test just call {{curl --compressed --http1.1 -v 
https://code.jquery.com/jquery-3.3.1.slim.min.js}} with and without 
{{--compressed}} flag.

> Add compression option to /rmeta output
> ---
>
> Key: TIKA-3073
> URL: https://issues.apache.org/jira/browse/TIKA-3073
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-3069, [~carina.antunes] requested compressing /rmeta output. This 
> makes sense as a start...we might also look into allowing more 
> configurability around which metadata fields and file types to send back over 
> the wire.  Few people need everything...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3073) Add compression option to /rmeta output

2020-03-18 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061672#comment-17061672
 ] 

Konstantin Gribov commented on TIKA-3073:
-

[~tallison], usually webserver should accept HTTP {{Accept-Encoding: gzip, 
deflate}} header (you could set it with curl's --compressed), but I don't know 
how this should be configured in CXF. But it seems tika-server ignores it and 
just use {{chinked}}. So, IMHO, it's out of scope for JAX-RS but more to do 
with CXF/Jetty. Jetty itself has 
https://www.eclipse.org/jetty/documentation/current/gzip-filter.html which can 
be enabled for whole server using by adding it with 
{{org.eclipse.jetty.server.Server#insertHandler}}.

Some servers would return {{Content-Encoding}} instead of {{Transfer-Encoding}} 
and curl supports both. To test just call {{curl --compressed --http1.1 -v 
https://code.jquery.com/jquery-3.3.1.slim.min.js}} with and without 
{{--compressed}} flag.

> Add compression option to /rmeta output
> ---
>
> Key: TIKA-3073
> URL: https://issues.apache.org/jira/browse/TIKA-3073
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-3069, [~carina.antunes] requested compressing /rmeta output. This 
> makes sense as a start...we might also look into allowing more 
> configurability around which metadata fields and file types to send back over 
> the wire.  Few people need everything...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3019) [9.8] [CVE-2019-17571] [tika-app] [1.23]

2020-01-15 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016096#comment-17016096
 ] 

Konstantin Gribov commented on TIKA-3019:
-

[~rgoers], yes, I mentioned {{log4j1.compatibility}} above. It may be not an 
ideal solution but could work in simple cases.

> [9.8] [CVE-2019-17571] [tika-app] [1.23]
> 
>
> Key: TIKA-3019
> URL: https://issues.apache.org/jira/browse/TIKA-3019
> Project: Tika
>  Issue Type: Bug
>  Components: tika-batch
>Affects Versions: 1.23
>Reporter: Aman Mishra
>Priority: Major
>
> *Description :*
> *Severity :* Sonatype CVSS 3: 9.8CVE CVSS 2.0: 0.0
> *Weakness :* Sonatype CWE: 502
> *Source :* National Vulnerability Database
> *Categories :* Data
> *Description from CVE :* Included in Log4j 1.2 is a SocketServer class that 
> is vulnerable to deserialization of untrusted data which can be exploited to 
> remotely execute arbitrary code when combined with a deserialization gadget 
> when listening to untrusted network traffic for log data. This affects Log4j 
> versions up to 1.2 up to 1.2.17.
> *Explanation :* The log4j:log4j package is vulnerable to Remote Code 
> Execution [RCE] due to Deserialization of Untrusted Data. The 
> configureHierarchy and genericHierarchy methods in SocketServer.class do not 
> verify if the file at a given file path contains any untrusted objects prior 
> to deserializing them. A remote attacker can exploit this vulnerability by 
> providing a path to crafted files, which result in arbitrary code execution 
> when deserialized.
> NOTE: Starting with version[s] 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2.
> *Detection :* The application is vulnerable by using this component.
> *Recommendation :* Starting with version[s] 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2. Therefore,it is recommended to upgrade to 
> org.apache.logging.log4j:log4j-core version[s] 2.8.2 and above. For 
> log4j:log4j 1.x versions however, a fix does not exist.
> *Root Cause :* tika-app-1.23.jarorg/apache/log4j/net/SocketServer.class : [,]
> *Advisories :* Project: [https://bugzilla.redhat.com/show_bug.cgi?id=1785616]
> *CVSS Details :* Sonatype CVSS 3: 9.8CVSS Vector: 
> CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (TIKA-3018) log4j 1.2 version used by Apache Tika 1.23 is vulnerable to CVE-2019-17571

2020-01-11 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-3018.
---

> log4j 1.2 version used by Apache Tika 1.23 is vulnerable to CVE-2019-17571
> --
>
> Key: TIKA-3018
> URL: https://issues.apache.org/jira/browse/TIKA-3018
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.23
>Reporter: Abhijit Rajwade
>Priority: Major
>
> Sonatype Nexus auditor is reporting following log4j related security issue on 
> Apache Tika 1.23.
> Recommendation is to use org.apache.logging.log4j:log4j-core version(s) 2.8.2 
> and above. Can you please check if Apache Tika vulnerable and if so upgrade 
> based on the recommendation?
> Description
> Description from CVE
> Included in Log4j 1.2 is a SocketServer class that is vulnerable to 
> deserialization of untrusted data which can be exploited to remotely execute 
> arbitrary code when combined with a deserialization gadget when listening to 
> untrusted network traffic for log data. This affects Log4j versions up to 1.2 
> up to 1.2.17. 
> Explanation
> The log4j:log4j package is vulnerable to Remote Code Execution (RCE) due 
> to Deserialization of Untrusted Data. The configureHierarchy and 
> genericHierarchy methods in SocketServer.class do not verify if the file at a 
> given file path contains any untrusted objects prior to deserializing them. A 
> remote attacker can exploit this vulnerability by providing a path to crafted 
> files, which result in arbitrary code execution when deserialized.
> NOTE: Starting with version(s) 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2.
> Detection
> The application is vulnerable by using this component.
> Recommendation
> Starting with version(s) 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2. Therefore, it is recommended to upgrade to 
> org.apache.logging.log4j:log4j-core version(s) 2.8.2 and above. For 
> log4j:log4j 1.x versions however, a fix does not exist.
> Root Cause
> tika-app-1.23.jar <= org/apache/log4j/net/SocketServer.class : (,) 
> Advisories
> Project: https://issues.apache.org/jira/browse/LOG4J2-1863
> Project: https://lists.apache.org/thread.html/84cc4266238e057b95eb95d…
> Third Party: https://bugzilla.redhat.com/show_bug.cgi?id=1785616 
> CVSS Details
> Sonatype CVSS 3: 9.8
> CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-3018) log4j 1.2 version used by Apache Tika 1.23 is vulnerable to CVE-2019-17571

2020-01-11 Thread Konstantin Gribov (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-3018.
-
Resolution: Duplicate

> log4j 1.2 version used by Apache Tika 1.23 is vulnerable to CVE-2019-17571
> --
>
> Key: TIKA-3018
> URL: https://issues.apache.org/jira/browse/TIKA-3018
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.23
>Reporter: Abhijit Rajwade
>Priority: Major
>
> Sonatype Nexus auditor is reporting following log4j related security issue on 
> Apache Tika 1.23.
> Recommendation is to use org.apache.logging.log4j:log4j-core version(s) 2.8.2 
> and above. Can you please check if Apache Tika vulnerable and if so upgrade 
> based on the recommendation?
> Description
> Description from CVE
> Included in Log4j 1.2 is a SocketServer class that is vulnerable to 
> deserialization of untrusted data which can be exploited to remotely execute 
> arbitrary code when combined with a deserialization gadget when listening to 
> untrusted network traffic for log data. This affects Log4j versions up to 1.2 
> up to 1.2.17. 
> Explanation
> The log4j:log4j package is vulnerable to Remote Code Execution (RCE) due 
> to Deserialization of Untrusted Data. The configureHierarchy and 
> genericHierarchy methods in SocketServer.class do not verify if the file at a 
> given file path contains any untrusted objects prior to deserializing them. A 
> remote attacker can exploit this vulnerability by providing a path to crafted 
> files, which result in arbitrary code execution when deserialized.
> NOTE: Starting with version(s) 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2.
> Detection
> The application is vulnerable by using this component.
> Recommendation
> Starting with version(s) 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2. Therefore, it is recommended to upgrade to 
> org.apache.logging.log4j:log4j-core version(s) 2.8.2 and above. For 
> log4j:log4j 1.x versions however, a fix does not exist.
> Root Cause
> tika-app-1.23.jar <= org/apache/log4j/net/SocketServer.class : (,) 
> Advisories
> Project: https://issues.apache.org/jira/browse/LOG4J2-1863
> Project: https://lists.apache.org/thread.html/84cc4266238e057b95eb95d…
> Third Party: https://bugzilla.redhat.com/show_bug.cgi?id=1785616 
> CVSS Details
> Sonatype CVSS 3: 9.8
> CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3019) [9.8] [CVE-2019-17571] [tika-app] [1.23]

2020-01-10 Thread Konstantin Gribov (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013000#comment-17013000
 ] 

Konstantin Gribov commented on TIKA-3019:
-

[~tallison], there seems to be actually twofold issue with downstream users who 
depends on tika-app/server/eval using log4j 1.2.x: logging backend 
configuration and directly using log4j 1.x API (e.g. LogManager etc). As I 
don't use log4j logging backend I may overlook something.

It's unlikely that downstream folks would depend on tika-app/server, so I'll 
say that if we encounter someone really using it that way we advice to update 
or use log4j-1.2-api module (1.2.x bridge to 2.x API). If they don't use 
*internal* API it should be ok. See 
https://logging.apache.org/log4j/2.x/manual/migration.html and 
https://logging.apache.org/log4j/2.x/manual/compatibility.html. Most likely we 
will break programmatic configuration in this case (like someone use their own 
main class with -q/-v parameters).

As for configuration side downstream user could use {{log4j1.compatibility}} 
system property to use old configs but there're some caveats (like custom 
appender depends on some log4j12 implementation). Again, recommend to update or 
downgrade to 1.2.x like [~kkrugler] said with clear warning about CVE is all we 
can do here, I guess.

Also it seems this vulnerability in SocketServer will only affect those who 
wish to accept logging events via tcp from different services. I couldn't 
imagine such use for tika-app/server off the top of my head. Most likely we 
aren't affected by this CVE at all.

My vote is for migration to 2.x and pointing users to aforementioned 
migration/compatibility guides.

> [9.8] [CVE-2019-17571] [tika-app] [1.23]
> 
>
> Key: TIKA-3019
> URL: https://issues.apache.org/jira/browse/TIKA-3019
> Project: Tika
>  Issue Type: Bug
>  Components: tika-batch
>Affects Versions: 1.23
>Reporter: Aman Mishra
>Priority: Major
>
> *Description :*
> *Severity :* Sonatype CVSS 3: 9.8CVE CVSS 2.0: 0.0
> *Weakness :* Sonatype CWE: 502
> *Source :* National Vulnerability Database
> *Categories :* Data
> *Description from CVE :* Included in Log4j 1.2 is a SocketServer class that 
> is vulnerable to deserialization of untrusted data which can be exploited to 
> remotely execute arbitrary code when combined with a deserialization gadget 
> when listening to untrusted network traffic for log data. This affects Log4j 
> versions up to 1.2 up to 1.2.17.
> *Explanation :* The log4j:log4j package is vulnerable to Remote Code 
> Execution [RCE] due to Deserialization of Untrusted Data. The 
> configureHierarchy and genericHierarchy methods in SocketServer.class do not 
> verify if the file at a given file path contains any untrusted objects prior 
> to deserializing them. A remote attacker can exploit this vulnerability by 
> providing a path to crafted files, which result in arbitrary code execution 
> when deserialized.
> NOTE: Starting with version[s] 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2.
> *Detection :* The application is vulnerable by using this component.
> *Recommendation :* Starting with version[s] 2.x, log4j:log4j was relocated to 
> org.apache.logging.log4j:log4j-core. A variation of this vulnerability exists 
> in org.apache.logging.log4j:log4j-core as CVE-2017-5645, in versions up to 
> but excluding 2.8.2. Therefore,it is recommended to upgrade to 
> org.apache.logging.log4j:log4j-core version[s] 2.8.2 and above. For 
> log4j:log4j 1.x versions however, a fix does not exist.
> *Root Cause :* tika-app-1.23.jarorg/apache/log4j/net/SocketServer.class : [,]
> *Advisories :* Project: [https://bugzilla.redhat.com/show_bug.cgi?id=1785616]
> *CVSS Details :* Sonatype CVSS 3: 9.8CVSS Vector: 
> CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (TIKA-2601) Invalid XHTML output (overlapping a and formatting tags) for some WORD documents

2019-06-27 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2601.
---

> Invalid XHTML output (overlapping a and formatting tags) for some WORD 
> documents
> 
>
> Key: TIKA-2601
> URL: https://issues.apache.org/jira/browse/TIKA-2601
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: Linked is a sample document with its corresponding 
> output.
>Reporter: Filip
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.21
>
> Attachments: Invalid-XML.doc, Test.doc, test.html
>
>
> In some WORD (.doc, .docx) documents the XHTML elements are not closed 
> properly. This usually happens when there are link elements () as well as 
> italic or bold elements ().
>  
> Fix should be done in 
> [https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2019-06-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-879.
--

> Detection problem: message/rfc822 file is detected as text/plain.
> -
>
> Key: TIKA-879
> URL: https://issues.apache.org/jira/browse/TIKA-879
> Project: Tika
>  Issue Type: Bug
>  Components: metadata, mime
>Affects Versions: 1.0, 1.1, 1.2
> Environment: linux 3.2.9
> oracle jdk7, openjdk7, sun jdk6
>Reporter: Konstantin Gribov
>Priority: Major
>  Labels: new-parser
> Fix For: 2.0, 1.18
>
> Attachments: TIKA-879-thunderbird.eml, mbox_email_section.txt, 
> mime_diffs_A_to_B.html
>
>
> When using {{DefaultDetector}} mime type for {{.eml}} files is different (you 
> can test it on {{testRFC822}} and {{testRFC822_base64}} in 
> {{tika-parsers/src/test/resources/test-documents/}}).
> Main reason for such behavior is that only magic detector is really works for 
> such files. Even if you set {{CONTENT_TYPE}} in metadata or some {{.eml}} 
> file name in {{RESOURCE_NAME_KEY}}.
> As I found {{MediaTypeRegistry.isSpecializationOf("message/rfc822", 
> "text/plain")}} returns {{false}}, so detection by {{MimeTypes.detect(...)}} 
> works only by magic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2209) Update PDFBox to 2.0.4

2019-06-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2209.
---

> Update PDFBox to 2.0.4
> --
>
> Key: TIKA-2209
> URL: https://issues.apache.org/jira/browse/TIKA-2209
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.14
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Trivial
> Fix For: 2.0, 1.15
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2681) Upgrade to PDFBox 2.0.11

2019-06-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2681.
---

> Upgrade to PDFBox 2.0.11
> 
>
> Key: TIKA-2681
> URL: https://issues.apache.org/jira/browse/TIKA-2681
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.18
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.19
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2622) Upgrade to PDFBox 2.0.10 when available

2019-06-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2622.
---

> Upgrade to PDFBox 2.0.10 when available
> ---
>
> Key: TIKA-2622
> URL: https://issues.apache.org/jira/browse/TIKA-2622
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2566) Move logging in tika-core to slf4j-api (with log4j in test scope) as we do in the rest of Tika

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2566.
-
Resolution: Fixed

> Move logging in tika-core to slf4j-api (with log4j in test scope) as we do in 
> the rest of Tika
> --
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2566) Move logging in tika-core to slf4j-api (with log4j in test scope) as we do in the rest of Tika

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-2566:

Summary: Move logging in tika-core to slf4j-api (with log4j in test scope) 
as we do in the rest of Tika  (was: Move logging in tika-core to log4j via 
slf4j as we do in the rest of Tika)

> Move logging in tika-core to slf4j-api (with log4j in test scope) as we do in 
> the rest of Tika
> --
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2314) Migrate logging to slf4j in master (2.x) branch

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2314.
-
Resolution: Resolved

> Migrate logging to slf4j in master (2.x) branch
> ---
>
> Key: TIKA-2314
> URL: https://issues.apache.org/jira/browse/TIKA-2314
> Project: Tika
>  Issue Type: Improvement
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
>  Labels: logging
> Fix For: 2.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2315) Update logging page at wiki with actual info

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2315.
---

> Update logging page at wiki with actual info
> 
>
> Key: TIKA-2315
> URL: https://issues.apache.org/jira/browse/TIKA-2315
> Project: Tika
>  Issue Type: Task
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Minor
>  Labels: logging
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2315) Update logging page at wiki with actual info

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2315.
-
Resolution: Fixed

> Update logging page at wiki with actual info
> 
>
> Key: TIKA-2315
> URL: https://issues.apache.org/jira/browse/TIKA-2315
> Project: Tika
>  Issue Type: Task
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Minor
>  Labels: logging
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2314) Migrate logging to slf4j in master (2.x) branch

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-2314:

Summary: Migrate logging to slf4j in master (2.x) branch  (was: Migrate 
logging to slf4j in 2.x branch)

> Migrate logging to slf4j in master (2.x) branch
> ---
>
> Key: TIKA-2314
> URL: https://issues.apache.org/jira/browse/TIKA-2314
> Project: Tika
>  Issue Type: Improvement
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Major
>  Labels: logging
> Fix For: 2.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-2566:

Fix Version/s: (was: 1.20)

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2566.
-
   Resolution: Fixed
Fix Version/s: 1.20

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0.0, 1.20
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov reopened TIKA-2566:
-

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0.0, 1.20
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2555) Text with [underline] + [another format] in word document generates overlapping html tags.

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2555.
-
   Resolution: Fixed
Fix Version/s: 1.21
   2.0

> Text with [underline] + [another format] in word document generates 
> overlapping html tags.
> --
>
> Key: TIKA-2555
> URL: https://issues.apache.org/jira/browse/TIKA-2555
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.17
>Reporter: Serban Alexe
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0, 1.21
>
> Attachments: Clipboard02.jpg
>
>
> I have a sample _.docx_ document which contains one single line of text**++.
> Making that text to be:
>  * +underlined+
>  ** AND at least one of the following two
>  * _italic_
>  * *bold*
> will cause the generated _.xhtml_ file to contain overlapping tags.
>  
> _+Example+_:
> *+The quick brown fox jumps over the lazy dog.+*
> will result in
> The quick brown fox jumps over the lazy dog. 
> which causes some browser (Firefox, Chrome) to give an error and not display 
> the content of the file...
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2601) Invalid XHTML output (overlapping a and formatting tags) for some WORD documents

2019-04-23 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2601.
-
   Resolution: Fixed
Fix Version/s: 1.21
   2.0

> Invalid XHTML output (overlapping a and formatting tags) for some WORD 
> documents
> 
>
> Key: TIKA-2601
> URL: https://issues.apache.org/jira/browse/TIKA-2601
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: Linked is a sample document with its corresponding 
> output.
>Reporter: Filip
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.21
>
> Attachments: Invalid-XML.doc, Test.doc, test.html
>
>
> In some WORD (.doc, .docx) documents the XHTML elements are not closed 
> properly. This usually happens when there are link elements () as well as 
> italic or bold elements ().
>  
> Fix should be done in 
> [https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-04-22 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov reassigned TIKA-2566:
---

Assignee: Konstantin Gribov

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (TIKA-2601) Invalid XHTML output (overlapping a and formatting tags) for some WORD documents

2019-04-18 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov reopened TIKA-2601:
-
  Assignee: Konstantin Gribov

> Invalid XHTML output (overlapping a and formatting tags) for some WORD 
> documents
> 
>
> Key: TIKA-2601
> URL: https://issues.apache.org/jira/browse/TIKA-2601
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: Linked is a sample document with its corresponding 
> output.
>Reporter: Filip
>Assignee: Konstantin Gribov
>Priority: Major
> Attachments: Invalid-XML.doc, Test.doc, test.html
>
>
> In some WORD (.doc, .docx) documents the XHTML elements are not closed 
> properly. This usually happens when there are link elements () as well as 
> italic or bold elements ().
>  
> Fix should be done in 
> [https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2601) Invalid XHTML output (overlapping a and formatting tags) for some WORD documents

2019-04-18 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov updated TIKA-2601:

Summary: Invalid XHTML output (overlapping a and formatting tags) for some 
WORD documents  (was: Invalid XHTML output for some WORD documents)

> Invalid XHTML output (overlapping a and formatting tags) for some WORD 
> documents
> 
>
> Key: TIKA-2601
> URL: https://issues.apache.org/jira/browse/TIKA-2601
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: Linked is a sample document with its corresponding 
> output.
>Reporter: Filip
>Priority: Major
> Attachments: Invalid-XML.doc, Test.doc, test.html
>
>
> In some WORD (.doc, .docx) documents the XHTML elements are not closed 
> properly. This usually happens when there are link elements () as well as 
> italic or bold elements ().
>  
> Fix should be done in 
> [https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2347) Underlined text is not decorated as such when extracting from word documents

2019-04-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2347.
---

> Underlined text is not decorated as such when extracting from word documents
> 
>
> Key: TIKA-2347
> URL: https://issues.apache.org/jira/browse/TIKA-2347
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.0, 1.14
>Reporter: Stuart Hendren
>Assignee: Dave Meikle
>Priority: Major
> Fix For: 1.17
>
>
> When extracting from doc and docx bold and italic text decoration is 
> extracted, however underlining is not.  Can be demonstrated in WordParserTest 
> or OOXMLParserTest (change to docx) with the following test case.
> {code:title=WordParserTest.java|borderStyle=solid}
> @Test
> public void testTextDecoration() throws Exception {
>   XMLResult result = getXML("testWORD_various.doc");
>   String xml = result.xml;
>   assertTrue(xml.contains("Bold"));
>   assertTrue(xml.contains("italic"));
>   assertTrue(xml.contains("underline"));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2601) Invalid XHTML output for some WORD documents

2019-04-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2601.
-
Resolution: Duplicate

I mark it as duplicate for TIKA-2555 which I'm currently looking into

> Invalid XHTML output for some WORD documents
> 
>
> Key: TIKA-2601
> URL: https://issues.apache.org/jira/browse/TIKA-2601
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: Linked is a sample document with its corresponding 
> output.
>Reporter: Filip
>Priority: Major
> Attachments: Invalid-XML.doc, Test.doc, test.html
>
>
> In some WORD (.doc, .docx) documents the XHTML elements are not closed 
> properly. This usually happens when there are link elements () as well as 
> italic or bold elements ().
>  
> Fix should be done in 
> [https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TIKA-2555) Text with [underline] + [another format] in word document generates overlapping html tags.

2019-04-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov reassigned TIKA-2555:
---

Assignee: Konstantin Gribov

> Text with [underline] + [another format] in word document generates 
> overlapping html tags.
> --
>
> Key: TIKA-2555
> URL: https://issues.apache.org/jira/browse/TIKA-2555
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.17
>Reporter: Serban Alexe
>Assignee: Konstantin Gribov
>Priority: Minor
> Attachments: Clipboard02.jpg
>
>
> I have a sample _.docx_ document which contains one single line of text**++.
> Making that text to be:
>  * +underlined+
>  ** AND at least one of the following two
>  * _italic_
>  * *bold*
> will cause the generated _.xhtml_ file to contain overlapping tags.
>  
> _+Example+_:
> *+The quick brown fox jumps over the lazy dog.+*
> will result in
> The quick brown fox jumps over the lazy dog. 
> which causes some browser (Firefox, Chrome) to give an error and not display 
> the content of the file...
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-03-20 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797354#comment-16797354
 ] 

Konstantin Gribov commented on TIKA-2566:
-

Just to clarify which option you propose:
1. use slf4j-api in tika-core; slf4j-api with bridges in tika-parsers and 
log4j:1.x or log4j-core:2.x as implementation in tika-app etc;
2. use log4j-api:2.x in tika-core, log4j-api:2.x with bridges (slf4j, jul & jcl 
to log4j2-api) in tika-parser and log4j-core:2.x as implementation;
3. use log4j-api:2.x in tika-core/tika-parsers; force user to configure logging 
deps correctly to use tika-parsers and use log4j-core:2.x as implementation in 
tika-app etc?

Option 1 is what I suggested initially in TIKA-2245 and as currently in master. 
Option 2 is similar but seems to be more complex since we will still have 
slf4j-api, bridge for commons-logging/jcl, bridge for JUL and bridge for slf4j.

Option 3 is less preferable since it requires downstream user to add all 
bridges manually, is error-prone and could be a bit fragile.

My preference in this case is to use option 1 since its logical improvement 
from current status quo (JUL in tika-core and slf4j+jul-to-slf4j+jcl-over-slf4j 
in tika-parsers).
Then downstream user can use:
- log4j 1.x: add log4j:1.x and slf4j-log4j12;
- logback-classic: just add logback-classic;
- log4j 2.x: add log4j-api, log4j-core, log4j-slf4j-impl (slf4j bridge), 
log4j-jcl (commons-logging/jcl bridge), log4j-jul (JUL bridge) and exclude 
jul-to-slf4j and jcl-over-slf4j.

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-03-20 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797288#comment-16797288
 ] 

Konstantin Gribov commented on TIKA-2566:
-

Since log4j2 has bridge to slf4j-api I'm don't see any major issues with using 
it. I prefer slf4j mostly because its wide adoption but log4j2 seems to be good 
alternative today.

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-03-19 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796251#comment-16796251
 ] 

Konstantin Gribov commented on TIKA-2566:
-

[~talli...@apache.org], why do you prefer log4j (which is implementation first 
of all) instead of thin facade (slf4j)? Log4j 1.2 and 2.x are good as 
implementation (like in tika-batch, tika-app, tika-server and tika-eval) but as 
library dependency seems much less preferable even to commons-logging/jcl 
(which is both facade and impl in one package) to me. 

Or I misunderstood you and you actually suggest to use log4j2-api? I personally 
prefer slf4j-api for its stability and wide adoption. Only known major issue 
with it is JPMS support (because of static binding approach used in 1.7.x) but 
they are going to fix it in 1.8.x branch without breaking API.

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2566) Move logging in tika-core to log4j via slf4j as we do in the rest of Tika

2019-03-19 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796262#comment-16796262
 ] 

Konstantin Gribov commented on TIKA-2566:
-

JFYI: https://www.slf4j.org/faq.html#changesInVersion18 states that "There are 
no client facing API changes in 1.8.x". It has version 1.8.0-beta4 right now in 
central but I hope it would be released soon.

> Move logging in tika-core to log4j via slf4j as we do in the rest of Tika
> -
>
> Key: TIKA-2566
> URL: https://issues.apache.org/jira/browse/TIKA-2566
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2245) Standardise logging

2019-03-19 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796242#comment-16796242
 ] 

Konstantin Gribov commented on TIKA-2245:
-

[~talli...@apache.org], slf4j-api is quite stable from API perspective, so it 
should be compatible with other 1.7.x versions. But it's better to use same 
slf4j-api and implementation versions as SPI compatibility is not guaranteed.

Sorry for belated answer.

> Standardise logging
> ---
>
> Key: TIKA-2245
> URL: https://issues.apache.org/jira/browse/TIKA-2245
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.14, 1.15
>Reporter: Matthew Caruana Galizia
>Assignee: Konstantin Gribov
>Priority: Major
>  Labels: logging
> Fix For: 1.15
>
>
> Tika parsers sometimes use Log4j's Logger, sometimes the JUL 
> (java.util.logging) Logger and sometimes SLF4j.
> It would be better to standardise on a single facade, for the sake of not 
> having to configure multiple loggers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2756) Switch to commons-lang 3

2018-11-20 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693459#comment-16693459
 ] 

Konstantin Gribov edited comment on TIKA-2756 at 11/20/18 4:36 PM:
---

FYI, issue seems to be present only with old commons-lang (2.4) and absent with 
more recent like 2.6.

*UPD*: Your code to reproduce issue seems to work with 2.6, I haven't tested 
original issue.


was (Author: grossws):
FYI, issue seems to be present only with old commons-lang (2.4) and absent with 
more recent like 2.6

> Switch to commons-lang 3
> 
>
> Key: TIKA-2756
> URL: https://issues.apache.org/jira/browse/TIKA-2756
> Project: Tika
>  Issue Type: Improvement
>Reporter: Robert Munteanu
>Priority: Major
>
> Tika 1.9.1 is using the legacy commons-lang 2.x series. This series is not 
> going to receive updates anymore and is completely superseded by commons-lang 
> 3.x .
> Projects that use Tika are blocked from dropping commons-lang 2.x due to this 
> dependency.
> The link that I found was from tika-parsers to jackcess and then to 
> commons-lang 2.6
> {noformat}
> [INFO] +- com.healthmarketscience.jackcess:jackcess:jar:2.1.12:compile
> [INFO] |  \- commons-lang:commons-lang:jar:2.6:compile
> {noformat}
> If I understand correctly, this is the only commons-lang 2.x dependency from 
> the Tika runtime and it would be great to remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2756) Switch to commons-lang 3

2018-11-20 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693459#comment-16693459
 ] 

Konstantin Gribov commented on TIKA-2756:
-

FYI, issue seems to be present only with old commons-lang (2.4) and absent with 
more recent like 2.6

> Switch to commons-lang 3
> 
>
> Key: TIKA-2756
> URL: https://issues.apache.org/jira/browse/TIKA-2756
> Project: Tika
>  Issue Type: Improvement
>Reporter: Robert Munteanu
>Priority: Major
>
> Tika 1.9.1 is using the legacy commons-lang 2.x series. This series is not 
> going to receive updates anymore and is completely superseded by commons-lang 
> 3.x .
> Projects that use Tika are blocked from dropping commons-lang 2.x due to this 
> dependency.
> The link that I found was from tika-parsers to jackcess and then to 
> commons-lang 2.6
> {noformat}
> [INFO] +- com.healthmarketscience.jackcess:jackcess:jar:2.1.12:compile
> [INFO] |  \- commons-lang:commons-lang:jar:2.6:compile
> {noformat}
> If I understand correctly, this is the only commons-lang 2.x dependency from 
> the Tika runtime and it would be great to remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers

2018-11-02 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2721.
---

> Exclude Spring (transitive dependency) from tika-parsers
> 
>
> Key: TIKA-2721
> URL: https://issues.apache.org/jira/browse/TIKA-2721
> Project: Tika
>  Issue Type: Bug
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0, 1.19
>
>
> {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and 
> {{spring-context}} with quite ancient version 3.2.x which is not required for 
> parsing and usually clash with actual Spring libs or just pollutes jar if 
> uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies 
> etc) is used.
> Its exclusion from deps seems more or less safe to me. But formally it can be 
> seen as breaking change if someone depends on that tika-parsers provides 
> spring libs transitively.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-09-17 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617718#comment-16617718
 ] 

Konstantin Gribov commented on TIKA-2552:
-

[~TigerC10], Tim rolled RC1 this weekend, so, hopefully this week.

> Upgrade to POI 4.0.0 when available
> ---
>
> Key: TIKA-2552
> URL: https://issues.apache.org/jira/browse/TIKA-2552
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 1.19, 2.0.0
>
> Attachments: TIKA-2552_--_first_draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable

2018-09-04 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603335#comment-16603335
 ] 

Konstantin Gribov commented on TIKA-2716:
-

Won't Fix because {{spring-*}} is excluded from dependency tree now (see 
TIKA-2721)

> Sonatype Nexus auditor is reporting that spring framework vesrion used by 
> Tika 1.18 is vulnerable
> -
>
> Key: TIKA-2716
> URL: https://issues.apache.org/jira/browse/TIKA-2716
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.18
>Reporter: Abhijit Rajwade
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.19
>
>
> Sonatype Nexus auditor is reporting that spring framework version used by 
> Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non 
> vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later
>  
> Refer following details
>  
> Issue 
> [CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270]
>  
> Source National Vulnerability Database
>  
> Severity
> CVE CVSS 3.0: 9.8
> CVE CVSS 2.0: 7.5
> Sonatype CVSS 3.0: 9.8
>  
> Weakness
> CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html]
>  
> Description from CVE
> Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to 
> 4.3.15 and older unsupported versions, allow applications to expose STOMP 
> over WebSocket endpoints with a simple, in-memory STOMP broker through the 
> spring-messaging module. A malicious user (or attacker) can craft a message 
> to the broker that can lead to a remote code execution attack.
> Explanation
> The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code 
> Execution (RCE). The {{getMethods()}} method in the 
> {{ReflectiveMethodResolver}} class, the {{canWrite}} method in the 
> {{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} 
> method in the {{DefaultSubscriptionRegistry}} class do not properly restrict 
> SpEL expression evaluation. A remote attacker can exploit this vulnerability 
> by crafting a request to an exposed STOMP endpoint and injecting a malicious 
> payload into the {{selector}} header. The application would then execute the 
> payload via a call to {{expression.getValue()}} whenever a new message is 
> sent to the broker.
>  
> Detection
> The application is vulnerable by using this component.
>  
> Recommendation
> We recommend upgrading to a version of this component that is not vulnerable 
> to this specific issue.
> Categories
> Data
> Root Cause
> tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
> tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
>  
> Advisories
> Attack: [http://www.polaris-lab.com/index.php/archives/501/]
> Attack: 
> [https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/]
> Project: [https://jira.spring.io/browse/SPR-16588]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable

2018-09-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2716.
---
   Resolution: Won't Fix
 Assignee: Konstantin Gribov
Fix Version/s: 1.19
   2.0

> Sonatype Nexus auditor is reporting that spring framework vesrion used by 
> Tika 1.18 is vulnerable
> -
>
> Key: TIKA-2716
> URL: https://issues.apache.org/jira/browse/TIKA-2716
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.18
>Reporter: Abhijit Rajwade
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.19
>
>
> Sonatype Nexus auditor is reporting that spring framework version used by 
> Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non 
> vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later
>  
> Refer following details
>  
> Issue 
> [CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270]
>  
> Source National Vulnerability Database
>  
> Severity
> CVE CVSS 3.0: 9.8
> CVE CVSS 2.0: 7.5
> Sonatype CVSS 3.0: 9.8
>  
> Weakness
> CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html]
>  
> Description from CVE
> Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to 
> 4.3.15 and older unsupported versions, allow applications to expose STOMP 
> over WebSocket endpoints with a simple, in-memory STOMP broker through the 
> spring-messaging module. A malicious user (or attacker) can craft a message 
> to the broker that can lead to a remote code execution attack.
> Explanation
> The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code 
> Execution (RCE). The {{getMethods()}} method in the 
> {{ReflectiveMethodResolver}} class, the {{canWrite}} method in the 
> {{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} 
> method in the {{DefaultSubscriptionRegistry}} class do not properly restrict 
> SpEL expression evaluation. A remote attacker can exploit this vulnerability 
> by crafting a request to an exposed STOMP endpoint and injecting a malicious 
> payload into the {{selector}} header. The application would then execute the 
> payload via a call to {{expression.getValue()}} whenever a new message is 
> sent to the broker.
>  
> Detection
> The application is vulnerable by using this component.
>  
> Recommendation
> We recommend upgrading to a version of this component that is not vulnerable 
> to this specific issue.
> Categories
> Data
> Root Cause
> tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
> tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
>  
> Advisories
> Attack: [http://www.polaris-lab.com/index.php/archives/501/]
> Attack: 
> [https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/]
> Project: [https://jira.spring.io/browse/SPR-16588]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers

2018-09-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov resolved TIKA-2721.
-
Resolution: Fixed

> Exclude Spring (transitive dependency) from tika-parsers
> 
>
> Key: TIKA-2721
> URL: https://issues.apache.org/jira/browse/TIKA-2721
> Project: Tika
>  Issue Type: Bug
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0, 1.19
>
>
> {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and 
> {{spring-context}} with quite ancient version 3.2.x which is not required for 
> parsing and usually clash with actual Spring libs or just pollutes jar if 
> uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies 
> etc) is used.
> Its exclusion from deps seems more or less safe to me. But formally it can be 
> seen as breaking change if someone depends on that tika-parsers provides 
> spring libs transitively.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers

2018-09-04 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603230#comment-16603230
 ] 

Konstantin Gribov commented on TIKA-2721:
-

All unit & integration tests passed after excluding {{spring-*}} from 
{{uimafit-core}}.

> Exclude Spring (transitive dependency) from tika-parsers
> 
>
> Key: TIKA-2721
> URL: https://issues.apache.org/jira/browse/TIKA-2721
> Project: Tika
>  Issue Type: Bug
>  Components: packaging
>Reporter: Konstantin Gribov
>Assignee: Konstantin Gribov
>Priority: Minor
> Fix For: 2.0, 1.19
>
>
> {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and 
> {{spring-context}} with quite ancient version 3.2.x which is not required for 
> parsing and usually clash with actual Spring libs or just pollutes jar if 
> uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies 
> etc) is used.
> Its exclusion from deps seems more or less safe to me. But formally it can be 
> seen as breaking change if someone depends on that tika-parsers provides 
> spring libs transitively.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers

2018-09-04 Thread Konstantin Gribov (JIRA)
Konstantin Gribov created TIKA-2721:
---

 Summary: Exclude Spring (transitive dependency) from tika-parsers
 Key: TIKA-2721
 URL: https://issues.apache.org/jira/browse/TIKA-2721
 Project: Tika
  Issue Type: Bug
  Components: packaging
Reporter: Konstantin Gribov
Assignee: Konstantin Gribov
 Fix For: 2.0, 1.19


{{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and 
{{spring-context}} with quite ancient version 3.2.x which is not required for 
parsing and usually clash with actual Spring libs or just pollutes jar if 
uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies etc) 
is used.

Its exclusion from deps seems more or less safe to me. But formally it can be 
seen as breaking change if someone depends on that tika-parsers provides spring 
libs transitively.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2680) Email attachments to an email are not extracted

2018-09-03 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602361#comment-16602361
 ] 

Konstantin Gribov commented on TIKA-2680:
-

Just my 2c, I've stopped using Tika for RFC822 parsing somewhere in 2012-2013 
and using mime4j directly for RFC822 and delegate attachment parsing to Tika. 
But in my case I know beforehand what I'll parse (normal files, plain emls, 
emls with external metadata from DLP system or MSE journaled emls) so I can 
parse them with specific parser. Of course I have to track if I'm parsing an 
attachment (set/reset flag in field handler if {{Content-Disposition}} found 
with/without it; and reset flag in {{startBodyPart}}) and current depth in 
multipart tree handling.

> Email attachments to an email are not extracted
> ---
>
> Key: TIKA-2680
> URL: https://issues.apache.org/jira/browse/TIKA-2680
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.18
>Reporter: Yury Kats
>Assignee: Tim Allison
>Priority: Major
> Attachments: main_email_in_outlook.jpg, nested.eml
>
>
> I have a number of email messages that contain other email messages as 
> attachments (with multiple levels of nesting).
> The email attachments are parts with "Content-Type: message/rfc822" but are 
> not being recognized as such.
> Attached is an example email, with the multiple levels of attachments:
>  * Subject: Test email within email
>  ** Subject: Email within email test
>  *** Subject: Stand-up today
>  
> I would like to see 3 separate emails parsed out (top level, 1st level 
> attached email, 2nd level attached email), but I only get 1 email and 1 
> unnamed text attachment:
> {noformat}
> $ java -jar tika-app-1.18.jar -m -J nested.eml | python -m json.tool
> [
> {
> "Author": "Smith Van der, H (Henry) ",
> "Content-Length": "16649",
> "Content-Type": "message/rfc822",
> "Creation-Date": "2018-04-25T12:46:41Z",
> "Message-From": "Smith Van der, H (Henry) ",
> "Message-To": [
> "fm.SAN Management Team ",
> "Smith Van der, H (Henry) "
> ],
> "Message:From-Email": "henry.van.der.sm...@bank.com",
> "Message:From-Name": "Smith Van der, H (Henry)",
> "Message:Raw-Header:Auto-Submitted": "auto-generated",
> "Message:Raw-Header:Content-Transfer-Encoding": "binary",
> "Message:Raw-Header:Keywords": "",
> "Message:Raw-Header:MIME-Version": "1.0",
> "Message:Raw-Header:Message-ID": 
> "",
> "Message:Raw-Header:Return-Path": "<>",
> "Message:Raw-Header:Sender": 
> "",
> "Message:Raw-Header:X-MS-Exchange-Generated-Message-Source": "Journal Agent",
> "Message:Raw-Header:X-MS-Exchange-Parent-Message-Id": 
> "<0fab98cd190c41f199a25c73f78a2...@bsts124002.eu.banknet.com>",
> "Message:Raw-Header:X-MS-Journal-Report": "",
> "Multipart-Boundary": "_728aa617-16cf-4d95-8bc2-9f1868397202_",
> "Multipart-Subtype": "mixed",
> "X-Parsed-By": [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.mail.RFC822Parser"
> ],
> "X-TIKA:parse_time_millis": "325",
> "creator": "Smith Van der, H (Henry) ",
> "dc:creator": "Smith Van der, H (Henry) ",
> "dc:title": "Test email within email",
> "dcterms:created": "2018-04-25T12:46:41Z",
> "meta:author": "Smith Van der, H (Henry) ",
> "meta:creation-date": "2018-04-25T12:46:41Z",
> "resourceName": "nested.eml",
> "subject": "Test email within email"
> },
> {
> "Content-Encoding": "US-ASCII",
> "Content-Type": "text/plain; charset=US-ASCII",
> "Multipart-Boundary": 
> "_004_8075737674787666767166806676697476787366657271727266777_",
> "Multipart-Subtype": "mixed",
> "X-Parsed-By": [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.txt.TXTParser"
> ],
> "X-TIKA:embedded_resource_path": "/embedded-1",
> "X-TIKA:parse_time_millis": "5",
> "embeddedResourceType": "ATTACHMENT"
> }
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >