[jira] [Commented] (TIKA-2722) Don't call Date.toString (Possible issue with JDK 11)
[ https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603979#comment-16603979 ] Nick Burch commented on TIKA-2722: -- Currently, Tika stores all metadata internally as Strings. For typed properties, getters and setters will convert to/from the native types and the strings, to eg let you get a {{Date}} back if you wanted it. (This also lets you get all metadata irrespective of the type if you want. Other approaches for storage have been suggested, none have won the argument to change just yet!) For {{Date}} properties, there's a bunch of logic in Tika that tries to take care of the formatting, thread safety etc. See {{org.apache.tika.utils.DateUtils.formatDate}} for the full details. That should all be going via {{String.format(Locale.Root, }} to avoid any issues For PDFs specifically, for the well-known typed Date properties, we ought to be getting a {{Calendar}} back from PDFBox, then getting a {{Date}} object from that to set on the {{Metadata}} object, which then internally formats, no {{toString}} calls. If you've found a case where that route isn't being followed, a small PDF and possibly a unit test to show it would be great, so we can fix that! > Don't call Date.toString (Possible issue with JDK 11) > - > > Key: TIKA-2722 > URL: https://issues.apache.org/jira/browse/TIKA-2722 > Project: Tika > Issue Type: Bug > Environment: Tika 1.18, JDK 11 with locale set to "ar-EG". >Reporter: David Smiley >Priority: Major > > I'm troubleshooting [a test failure in Apache > Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] > "extracting" contrib that occurs in JDK 11 with locale "ar-EG". JDK 8 & 9 > passes; I don't know about JDK 10. It has to do with extracting date metadata > from a PDF, particularly the created date but perhaps others too. > I stepped through the code into Tika and I think I've found out where the > troublesome code is. First note PDFParser line 271: {{addMetadata(metadata, > "created", info.getCreationDate());}}. That addMetadata overload variant > will call toString on a Date. IMO that's asking for trouble since the output > of that is Locale-dependent. I think that's okay to show to a user but not > for machine-to-machine information exchange. In the case of the test, it > yielded this odd looking date string: > Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008 > I pasted that in and it looks consistent with what I see in IntelliJ and in > Jenkins logs; hopefully will post correctly to JIRA. The odd part is the > hour & minutes relative to GMT. I won't be certain until after I click > "Create". > Perhaps this problem is also indicative of a JDK 11 bug? Nevertheless I > think Tika should avoid calling Date.toString(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2722) Don't call Date.toString (Possible issue with JDK 11)
[ https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603861#comment-16603861 ] David Smiley commented on TIKA-2722: BTW I believe I found a JDK bug so I reported it, including a demonstration program. When I get the official/public bug ID, I will report back here with it. > Don't call Date.toString (Possible issue with JDK 11) > - > > Key: TIKA-2722 > URL: https://issues.apache.org/jira/browse/TIKA-2722 > Project: Tika > Issue Type: Bug > Environment: Tika 1.18, JDK 11 with locale set to "ar-EG". >Reporter: David Smiley >Priority: Major > > I'm troubleshooting [a test failure in Apache > Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] > "extracting" contrib that occurs in JDK 11 with locale "ar-EG". JDK 8 & 9 > passes; I don't know about JDK 10. It has to do with extracting date metadata > from a PDF, particularly the created date but perhaps others too. > I stepped through the code into Tika and I think I've found out where the > troublesome code is. First note PDFParser line 271: {{addMetadata(metadata, > "created", info.getCreationDate());}}. That addMetadata overload variant > will call toString on a Date. IMO that's asking for trouble since the output > of that is Locale-dependent. I think that's okay to show to a user but not > for machine-to-machine information exchange. In the case of the test, it > yielded this odd looking date string: > Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008 > I pasted that in and it looks consistent with what I see in IntelliJ and in > Jenkins logs; hopefully will post correctly to JIRA. The odd part is the > hour & minutes relative to GMT. I won't be certain until after I click > "Create". > Perhaps this problem is also indicative of a JDK 11 bug? Nevertheless I > think Tika should avoid calling Date.toString(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TIKA-2722) Don't call Date.toString (Possible issue with JDK 11)
David Smiley created TIKA-2722: -- Summary: Don't call Date.toString (Possible issue with JDK 11) Key: TIKA-2722 URL: https://issues.apache.org/jira/browse/TIKA-2722 Project: Tika Issue Type: Bug Environment: Tika 1.18, JDK 11 with locale set to "ar-EG". Reporter: David Smiley I'm troubleshooting [a test failure in Apache Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] "extracting" contrib that occurs in JDK 11 with locale "ar-EG". JDK 8 & 9 passes; I don't know about JDK 10. It has to do with extracting date metadata from a PDF, particularly the created date but perhaps others too. I stepped through the code into Tika and I think I've found out where the troublesome code is. First note PDFParser line 271: {{addMetadata(metadata, "created", info.getCreationDate());}}. That addMetadata overload variant will call toString on a Date. IMO that's asking for trouble since the output of that is Locale-dependent. I think that's okay to show to a user but not for machine-to-machine information exchange. In the case of the test, it yielded this odd looking date string: Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008 I pasted that in and it looks consistent with what I see in IntelliJ and in Jenkins logs; hopefully will post correctly to JIRA. The odd part is the hour & minutes relative to GMT. I won't be certain until after I click "Create". Perhaps this problem is also indicative of a JDK 11 bug? Nevertheless I think Tika should avoid calling Date.toString(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
[ https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603420#comment-16603420 ] Hudson commented on TIKA-2721: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #79 (See [https://builds.apache.org/job/tika-branch-1x/79/]) TIKA-2721: removed spring-* from tika-parsers deps (grossws: [https://github.com/apache/tika/commit/0dbf67dccf4717c12843c8de94cee50f2972be16]) * (edit) tika-parsers/pom.xml > Exclude Spring (transitive dependency) from tika-parsers > > > Key: TIKA-2721 > URL: https://issues.apache.org/jira/browse/TIKA-2721 > Project: Tika > Issue Type: Bug > Components: packaging >Reporter: Konstantin Gribov >Assignee: Konstantin Gribov >Priority: Minor > Fix For: 2.0, 1.19 > > > {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and > {{spring-context}} with quite ancient version 3.2.x which is not required for > parsing and usually clash with actual Spring libs or just pollutes jar if > uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies > etc) is used. > Its exclusion from deps seems more or less safe to me. But formally it can be > seen as breaking change if someone depends on that tika-parsers provides > spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable
[ https://issues.apache.org/jira/browse/TIKA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603335#comment-16603335 ] Konstantin Gribov commented on TIKA-2716: - Won't Fix because {{spring-*}} is excluded from dependency tree now (see TIKA-2721) > Sonatype Nexus auditor is reporting that spring framework vesrion used by > Tika 1.18 is vulnerable > - > > Key: TIKA-2716 > URL: https://issues.apache.org/jira/browse/TIKA-2716 > Project: Tika > Issue Type: Bug > Components: core >Affects Versions: 1.18 >Reporter: Abhijit Rajwade >Assignee: Konstantin Gribov >Priority: Major > Fix For: 2.0, 1.19 > > > Sonatype Nexus auditor is reporting that spring framework version used by > Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non > vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later > > Refer following details > > Issue > [CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270] > > Source National Vulnerability Database > > Severity > CVE CVSS 3.0: 9.8 > CVE CVSS 2.0: 7.5 > Sonatype CVSS 3.0: 9.8 > > Weakness > CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html] > > Description from CVE > Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to > 4.3.15 and older unsupported versions, allow applications to expose STOMP > over WebSocket endpoints with a simple, in-memory STOMP broker through the > spring-messaging module. A malicious user (or attacker) can craft a message > to the broker that can lead to a remote code execution attack. > Explanation > The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code > Execution (RCE). The {{getMethods()}} method in the > {{ReflectiveMethodResolver}} class, the {{canWrite}} method in the > {{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} > method in the {{DefaultSubscriptionRegistry}} class do not properly restrict > SpEL expression evaluation. A remote attacker can exploit this vulnerability > by crafting a request to an exposed STOMP endpoint and injecting a malicious > payload into the {{selector}} header. The application would then execute the > payload via a call to {{expression.getValue()}} whenever a new message is > sent to the broker. > > Detection > The application is vulnerable by using this component. > > Recommendation > We recommend upgrading to a version of this component that is not vulnerable > to this specific issue. > Categories > Data > Root Cause > tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , > 4.3.15.RELEASE) > tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , > 4.3.15.RELEASE) > > Advisories > Attack: [http://www.polaris-lab.com/index.php/archives/501/] > Attack: > [https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/] > Project: [https://jira.spring.io/browse/SPR-16588] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable
[ https://issues.apache.org/jira/browse/TIKA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov closed TIKA-2716. --- Resolution: Won't Fix Assignee: Konstantin Gribov Fix Version/s: 1.19 2.0 > Sonatype Nexus auditor is reporting that spring framework vesrion used by > Tika 1.18 is vulnerable > - > > Key: TIKA-2716 > URL: https://issues.apache.org/jira/browse/TIKA-2716 > Project: Tika > Issue Type: Bug > Components: core >Affects Versions: 1.18 >Reporter: Abhijit Rajwade >Assignee: Konstantin Gribov >Priority: Major > Fix For: 2.0, 1.19 > > > Sonatype Nexus auditor is reporting that spring framework version used by > Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non > vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later > > Refer following details > > Issue > [CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270] > > Source National Vulnerability Database > > Severity > CVE CVSS 3.0: 9.8 > CVE CVSS 2.0: 7.5 > Sonatype CVSS 3.0: 9.8 > > Weakness > CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html] > > Description from CVE > Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to > 4.3.15 and older unsupported versions, allow applications to expose STOMP > over WebSocket endpoints with a simple, in-memory STOMP broker through the > spring-messaging module. A malicious user (or attacker) can craft a message > to the broker that can lead to a remote code execution attack. > Explanation > The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code > Execution (RCE). The {{getMethods()}} method in the > {{ReflectiveMethodResolver}} class, the {{canWrite}} method in the > {{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} > method in the {{DefaultSubscriptionRegistry}} class do not properly restrict > SpEL expression evaluation. A remote attacker can exploit this vulnerability > by crafting a request to an exposed STOMP endpoint and injecting a malicious > payload into the {{selector}} header. The application would then execute the > payload via a call to {{expression.getValue()}} whenever a new message is > sent to the broker. > > Detection > The application is vulnerable by using this component. > > Recommendation > We recommend upgrading to a version of this component that is not vulnerable > to this specific issue. > Categories > Data > Root Cause > tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , > 4.3.15.RELEASE) > tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , > 4.3.15.RELEASE) > > Advisories > Attack: [http://www.polaris-lab.com/index.php/archives/501/] > Attack: > [https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/] > Project: [https://jira.spring.io/browse/SPR-16588] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
[ https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov resolved TIKA-2721. - Resolution: Fixed > Exclude Spring (transitive dependency) from tika-parsers > > > Key: TIKA-2721 > URL: https://issues.apache.org/jira/browse/TIKA-2721 > Project: Tika > Issue Type: Bug > Components: packaging >Reporter: Konstantin Gribov >Assignee: Konstantin Gribov >Priority: Minor > Fix For: 2.0, 1.19 > > > {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and > {{spring-context}} with quite ancient version 3.2.x which is not required for > parsing and usually clash with actual Spring libs or just pollutes jar if > uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies > etc) is used. > Its exclusion from deps seems more or less safe to me. But formally it can be > seen as breaking change if someone depends on that tika-parsers provides > spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
[ https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603326#comment-16603326 ] Hudson commented on TIKA-2721: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1552 (See [https://builds.apache.org/job/Tika-trunk/1552/]) TIKA-2721: removed spring-* from tika-parsers deps (grossws: [https://github.com/apache/tika/commit/b6c4f8e2617840075d546433f461b7df566e401a]) * (edit) tika-parsers/pom.xml > Exclude Spring (transitive dependency) from tika-parsers > > > Key: TIKA-2721 > URL: https://issues.apache.org/jira/browse/TIKA-2721 > Project: Tika > Issue Type: Bug > Components: packaging >Reporter: Konstantin Gribov >Assignee: Konstantin Gribov >Priority: Minor > Fix For: 2.0, 1.19 > > > {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and > {{spring-context}} with quite ancient version 3.2.x which is not required for > parsing and usually clash with actual Spring libs or just pollutes jar if > uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies > etc) is used. > Its exclusion from deps seems more or less safe to me. But formally it can be > seen as breaking change if someone depends on that tika-parsers provides > spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
[ https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603300#comment-16603300 ] Hudson commented on TIKA-2721: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #305 (See [https://builds.apache.org/job/tika-2.x-windows/305/]) TIKA-2721: removed spring-* from tika-parsers deps (grossws: rev b6c4f8e2617840075d546433f461b7df566e401a) * (edit) tika-parsers/pom.xml > Exclude Spring (transitive dependency) from tika-parsers > > > Key: TIKA-2721 > URL: https://issues.apache.org/jira/browse/TIKA-2721 > Project: Tika > Issue Type: Bug > Components: packaging >Reporter: Konstantin Gribov >Assignee: Konstantin Gribov >Priority: Minor > Fix For: 2.0, 1.19 > > > {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and > {{spring-context}} with quite ancient version 3.2.x which is not required for > parsing and usually clash with actual Spring libs or just pollutes jar if > uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies > etc) is used. > Its exclusion from deps seems more or less safe to me. But formally it can be > seen as breaking change if someone depends on that tika-parsers provides > spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
[ https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603280#comment-16603280 ] Tim Allison commented on TIKA-2721: --- +1 > Exclude Spring (transitive dependency) from tika-parsers > > > Key: TIKA-2721 > URL: https://issues.apache.org/jira/browse/TIKA-2721 > Project: Tika > Issue Type: Bug > Components: packaging >Reporter: Konstantin Gribov >Assignee: Konstantin Gribov >Priority: Minor > Fix For: 2.0, 1.19 > > > {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and > {{spring-context}} with quite ancient version 3.2.x which is not required for > parsing and usually clash with actual Spring libs or just pollutes jar if > uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies > etc) is used. > Its exclusion from deps seems more or less safe to me. But formally it can be > seen as breaking change if someone depends on that tika-parsers provides > spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
[ https://issues.apache.org/jira/browse/TIKA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603230#comment-16603230 ] Konstantin Gribov commented on TIKA-2721: - All unit & integration tests passed after excluding {{spring-*}} from {{uimafit-core}}. > Exclude Spring (transitive dependency) from tika-parsers > > > Key: TIKA-2721 > URL: https://issues.apache.org/jira/browse/TIKA-2721 > Project: Tika > Issue Type: Bug > Components: packaging >Reporter: Konstantin Gribov >Assignee: Konstantin Gribov >Priority: Minor > Fix For: 2.0, 1.19 > > > {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and > {{spring-context}} with quite ancient version 3.2.x which is not required for > parsing and usually clash with actual Spring libs or just pollutes jar if > uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies > etc) is used. > Its exclusion from deps seems more or less safe to me. But formally it can be > seen as breaking change if someone depends on that tika-parsers provides > spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TIKA-2721) Exclude Spring (transitive dependency) from tika-parsers
Konstantin Gribov created TIKA-2721: --- Summary: Exclude Spring (transitive dependency) from tika-parsers Key: TIKA-2721 URL: https://issues.apache.org/jira/browse/TIKA-2721 Project: Tika Issue Type: Bug Components: packaging Reporter: Konstantin Gribov Assignee: Konstantin Gribov Fix For: 2.0, 1.19 {{uimafit-core}} brings {{spring-core}}, {{spring-beans}} and {{spring-context}} with quite ancient version 3.2.x which is not required for parsing and usually clash with actual Spring libs or just pollutes jar if uberjar (shade plugin, onejar, assembly plugin with jar-with-dependencies etc) is used. Its exclusion from deps seems more or less safe to me. But formally it can be seen as breaking change if someone depends on that tika-parsers provides spring libs transitively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)