[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234714#comment-17234714 ] Tim Allison commented on SOLR-14973: Backporting and confirming that I didn't break anything takes a day of intermittent work. If there are plans to do another 8.6.x release, I'll do it. Otherwise, onwards... Thank you [~krisden]! > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233697#comment-17233697 ] Tim Allison commented on SOLR-14973: Y. I believe that it is fixed in 8.7.0, too. It looks like the Tika versions were upgraded with SOLR-14367, but none of its dependencies. My fault was in not reviewing the commits back then. Sorry. https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e4b3fae7 https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=70d084c Should we backport SOLR-14439 to 8.6.x? > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228776#comment-17228776 ] Tim Allison commented on SOLR-14973: Thank you [~krisden] for the ping. > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14439) Upgrade to Tika 1.24.1
[ https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated SOLR-14439: --- Fix Version/s: 8.7 Resolution: Fixed Status: Resolved (was: Patch Available) > Upgrade to Tika 1.24.1 > -- > > Key: SOLR-14439 > URL: https://issues.apache.org/jira/browse/SOLR-14439 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 8.7 > > Attachments: SOLR-14339.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > We recently released 1.24.1 with several fixes for DoS vulnerabilities we > found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14439) Upgrade to Tika 1.24.1
[ https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193732#comment-17193732 ] Tim Allison commented on SOLR-14439: I'll merge the PR tomorrow (Friday ET) against {{branch_8x}} if there aren't any objections. > Upgrade to Tika 1.24.1 > -- > > Key: SOLR-14439 > URL: https://issues.apache.org/jira/browse/SOLR-14439 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Attachments: SOLR-14339.patch > > Time Spent: 10m > Remaining Estimate: 0h > > We recently released 1.24.1 with several fixes for DoS vulnerabilities we > found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14439) Upgrade to Tika 1.24.1
[ https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193151#comment-17193151 ] Tim Allison commented on SOLR-14439: Thank you [~erickerickson]! > Upgrade to Tika 1.24.1 > -- > > Key: SOLR-14439 > URL: https://issues.apache.org/jira/browse/SOLR-14439 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Attachments: SOLR-14339.patch > > > We recently released 1.24.1 with several fixes for DoS vulnerabilities we > found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14439) Upgrade to Tika 1.24.1
[ https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193095#comment-17193095 ] Tim Allison commented on SOLR-14439: I'm currently working on this against the {{branch_8x}}. I'll open a PR once I get a clean local build and local regression tests are favorable. Should I backport to {{branch_8_6}} or is this too big of a change for that branch? > Upgrade to Tika 1.24.1 > -- > > Key: SOLR-14439 > URL: https://issues.apache.org/jira/browse/SOLR-14439 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Attachments: SOLR-14339.patch > > > We recently released 1.24.1 with several fixes for DoS vulnerabilities we > found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14439) Upgrade to Tika 1.24.1
[ https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated SOLR-14439: --- Status: Patch Available (was: Open) > Upgrade to Tika 1.24.1 > -- > > Key: SOLR-14439 > URL: https://issues.apache.org/jira/browse/SOLR-14439 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Attachments: SOLR-14339.patch > > > We recently released 1.24.1 with several fixes for DoS vulnerabilities we > found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190716#comment-17190716 ] Tim Allison commented on SOLR-13973: So that'd be SOLR-7632 as [~erickerickson] pointed out? > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190711#comment-17190711 ] Tim Allison commented on SOLR-13973: [~mkalkbrenner] I've been thinking about adding an "indexer" endpoint to Tika. You'd configure your Solr/ES connection info and error handling choices via json at startup and then send the bytes to tika-server's /indexer endpoint. It would parse the file and forward the result to Solr. Would that simplify anything? I'm thoroughly on board with "don't break the user experience", but we've got to get Tika out of Solr's jvm. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158723#comment-17158723 ] Tim Allison commented on SOLR-13973: For ease of use with SolrJ and several other use cases(?), we could add a tika-client in the Tika project? > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158719#comment-17158719 ] Tim Allison commented on SOLR-13973: I cannot express the joy that will come to me, whether I'm the one to do it or not, to take out the kitchensink of dependencies that Tika has forced on Solr. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158719#comment-17158719 ] Tim Allison edited comment on SOLR-13973 at 7/15/20, 9:35 PM: -- I cannot express the joy that will come to me, whether I'm the one to do it or not, to take out the kitchensink of dependencies that Tika has forced on Solr. If we do want a forwarding option within tika-server, please chime in on TIKA-3093. Otherwise, please let me know how I can help. I suspect [~epugh] has a better sense of how to get started, and I stand by to help him. was (Author: talli...@mitre.org): I cannot express the joy that will come to me, whether I'm the one to do it or not, to take out the kitchensink of dependencies that Tika has forced on Solr. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158678#comment-17158678 ] Tim Allison commented on SOLR-13973: I think [~epugh] just volunteered for this! > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158639#comment-17158639 ] Tim Allison edited comment on SOLR-13973 at 7/15/20, 7:15 PM: -- I've been toying with adding a forwarding capability to tika-server (TIKA-3093). So, if you curl a document to tika-server {{/tika2solr}}, we'd use our tika parsing stuff in tika-server and the extracted text to Solr. This would keep the dangerous part (tika parsing a document) out of the client code. was (Author: talli...@mitre.org): I've been toying with adding a forwarding capability to tika-server. So, if you curl a document to tika-server {{/tika2solr}}, we'd use our tika parsing stuff in tika-server and the extracted text to Solr. This would keep the dangerous part (tika parsing a document) out of the client code. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158639#comment-17158639 ] Tim Allison commented on SOLR-13973: I've been toying with adding a forwarding capability to tika-server. So, if you curl a document to tika-server {{/tika2solr}}, we'd use our tika parsing stuff in tika-server and the extracted text to Solr. This would keep the dangerous part (tika parsing a document) out of the client code. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158629#comment-17158629 ] Tim Allison edited comment on SOLR-13973 at 7/15/20, 7:06 PM: -- [~ichattopadhyaya], thank you for the ping! Y, I might be able to find some time to work on this over the next few weeks. How do I start? Do I have the freedom to start from greenfields (use tika-server), or do we need seamless migration with the same capabilities? was (Author: talli...@mitre.org): Y, I might be able to find some time to work on this over the next few weeks. How do I start? Do I have the freedom to start from greenfields (use tika-server), or do we need seamless migration with the same capabilities? > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158629#comment-17158629 ] Tim Allison commented on SOLR-13973: Y, I might be able to find some time to work on this over the next few weeks. How do I start? Do I have the freedom to start from greenfields (use tika-server), or do we need seamless migration with the same capabilities? > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14439) Upgrade to Tika 1.24.1
Tim Allison created SOLR-14439: -- Summary: Upgrade to Tika 1.24.1 Key: SOLR-14439 URL: https://issues.apache.org/jira/browse/SOLR-14439 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Components: contrib - DataImportHandler Reporter: Tim Allison Assignee: Tim Allison We recently released 1.24.1 with several fixes for DoS vulnerabilities we found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14367) Upgrade Tika to 1.24
[ https://issues.apache.org/jira/browse/SOLR-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068672#comment-17068672 ] Tim Allison commented on SOLR-14367: Ha...ok, our posts passed in the ether. I'll standdown. I'm more than happy to take this, though. Let me know if you have luck. > Upgrade Tika to 1.24 > > > Key: SOLR-14367 > URL: https://issues.apache.org/jira/browse/SOLR-14367 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5 >Reporter: mibo >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Upgrade Apache Tika to new released 1.24 to handle > [CVE-2020-1950|https://nvd.nist.gov/vuln/detail/CVE-2020-1950]. > Created [PR #1383|https://github.com/apache/lucene-solr/pull/1383] but > afterwards I found https://issues.apache.org/jira/browse/SOLR-14054 and it > looks like an update is much more complicated. > I someone support me I will update my contribution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14367) Upgrade Tika to 1.24
[ https://issues.apache.org/jira/browse/SOLR-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068666#comment-17068666 ] Tim Allison commented on SOLR-14367: Y, we upgraded a bunch of dependencies and had to do some awful forking for metadata-extractor. I'll take this. [~mirbo], I strongly, strongly encourage you and everyone to avoid using the Tika integration with Solr. https://cwiki.apache.org/confluence/display/TIKA/UpgradingTikaInSolr > Upgrade Tika to 1.24 > > > Key: SOLR-14367 > URL: https://issues.apache.org/jira/browse/SOLR-14367 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5 >Reporter: mibo >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Upgrade Apache Tika to new released 1.24 to handle > [CVE-2020-1950|https://nvd.nist.gov/vuln/detail/CVE-2020-1950]. > Created [PR #1383|https://github.com/apache/lucene-solr/pull/1383] but > afterwards I found https://issues.apache.org/jira/browse/SOLR-14054 and it > looks like an update is much more complicated. > I someone support me I will update my contribution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14367) Upgrade Tika to 1.24
[ https://issues.apache.org/jira/browse/SOLR-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068660#comment-17068660 ] Tim Allison commented on SOLR-14367: I'll take a look. That was a one-off problem, but upgrading is always a nightmare, and I look forward to getting Tika out of Solr asap. Here's my idiot's guide: https://cwiki.apache.org/confluence/display/TIKA/UpgradingTikaInSolr > Upgrade Tika to 1.24 > > > Key: SOLR-14367 > URL: https://issues.apache.org/jira/browse/SOLR-14367 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5 >Reporter: mibo >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Upgrade Apache Tika to new released 1.24 to handle > [CVE-2020-1950|https://nvd.nist.gov/vuln/detail/CVE-2020-1950]. > Created [PR #1383|https://github.com/apache/lucene-solr/pull/1383] but > afterwards I found https://issues.apache.org/jira/browse/SOLR-14054 and it > looks like an update is much more complicated. > I someone support me I will update my contribution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057147#comment-17057147 ] Tim Allison commented on SOLR-14054: Thank you! I realize it is trivial for you. Onward! > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057143#comment-17057143 ] Tim Allison commented on SOLR-14054: Y. I think that'd be best for Solr 8.x. The problem disappears in master with Java > 8. Would you be willing to take that, or should I give it a spin? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056413#comment-17056413 ] Tim Allison commented on SOLR-14054: Would something like this be acceptable? https://stackoverflow.com/a/24497206 > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056403#comment-17056403 ] Tim Allison edited comment on SOLR-14054 at 3/10/20, 8:54 PM: -- We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 8...see above. In master, we get rid of xml-apis because we don't need it with Java > 8. Any recommendations for a fix in 8.x when building with Java > 8? Is there an ant/ivy version of maven's profiles, activated by Java > 8, e.g.: https://github.com/apache/pdfbox/blob/trunk/parent/pom.xml#L176 ? was (Author: talli...@mitre.org): We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 8...see above. In master, we get rid of xml-apis because we don't need it with Java > 8. Any recommendations for a fix in 8.x when building with Java > 8? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056403#comment-17056403 ] Tim Allison edited comment on SOLR-14054 at 3/10/20, 8:45 PM: -- We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 8...see above. In master, we get rid of xml-apis because we don't need it with Java > 8. Any recommendations for a fix in 8.x when building with Java > 8? was (Author: talli...@mitre.org): We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 8...see above. In master, we get rid of xml-apis because we don't need it with Java > 8. Any recommendations for a fix in 8.x? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056403#comment-17056403 ] Tim Allison commented on SOLR-14054: We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 8...see above. In master, we get rid of xml-apis because we don't need it with Java > 8. Any recommendations for a fix in 8.x? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14318) Missing dependency on commons-lang in solr-cell 8.4.1
[ https://issues.apache.org/jira/browse/SOLR-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056344#comment-17056344 ] Tim Allison commented on SOLR-14318: Y. Confirmed we removed commons-lang from Tika in 1.23 so 8.5. > Missing dependency on commons-lang in solr-cell 8.4.1 > - > > Key: SOLR-14318 > URL: https://issues.apache.org/jira/browse/SOLR-14318 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.4.1 >Reporter: Markus Günther >Priority: Minor > > During a migration from Solr 7.x to Solr 8.4.1 we noticed that the > commons-lang:commons-lang:2.6 dependency has been removed, and thus, no > longer is part of org.apache.solr:solr-cell. solr-cell however comes bundled > with Apache Tika Parsers (org.apache.tika:tika-parsers) in version 1.19.1 > which - although it is not an explicit dependency - does require > commons-lang:commons-lang:2.6. > This raises an issue when trying to extract the content from Microsoft Access > database files using Tika. See the stacktrace below. > {code:java} > java.lang.NoClassDefFoundError: > org/apache/commons/lang/ObjectUtilsjava.lang.NoClassDefFoundError: > org/apache/commons/lang/ObjectUtils at > com.healthmarketscience.jackcess.util.SimpleColumnMatcher.equals(SimpleColumnMatcher.java:74) > at > com.healthmarketscience.jackcess.util.SimpleColumnMatcher.matches(SimpleColumnMatcher.java:46) > at > com.healthmarketscience.jackcess.util.CaseInsensitiveColumnMatcher.matches(CaseInsensitiveColumnMatcher.java:49) > at > com.healthmarketscience.jackcess.impl.CursorImpl.currentRowMatchesImpl(CursorImpl.java:571) > at > com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRowImpl(CursorImpl.java:627) > at > com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRow(CursorImpl.java:517) > at > com.healthmarketscience.jackcess.impl.CursorImpl.findFirstRow(CursorImpl.java:494) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl$FallbackTableFinder.findRow(DatabaseImpl.java:2376) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl$TableFinder.findObjectId(DatabaseImpl.java:2176) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl.readSystemCatalog(DatabaseImpl.java:879) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:534) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:401) > at > com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:252) > at > org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:94) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at > org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) > at > org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:350) > at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287) at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) at > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > at >
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056259#comment-17056259 ] Tim Allison commented on SOLR-14054: Looking... > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001255#comment-17001255 ] Tim Allison commented on SOLR-14054: Y > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14066) Deprecate DIH
[ https://issues.apache.org/jira/browse/SOLR-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1778#comment-1778 ] Tim Allison commented on SOLR-14066: If I'm not tracking this when it happens, please ping me on Tika stuff. I'm happy to chip in and thrilled to get Tika out of Solr. > Deprecate DIH > - > > Key: SOLR-14066 > URL: https://issues.apache.org/jira/browse/SOLR-14066 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: image-2019-12-14-19-58-39-314.png > > Time Spent: 40m > Remaining Estimate: 0h > > DataImportHandler has outlived its utility. DIH doesn't need to remain inside > Solr anymore. Let us deprecate DIH in 8.4 (and remove it from the Solr distro > in 9x or 10x). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1772#comment-1772 ] Tim Allison edited comment on SOLR-14054 at 12/19/19 1:39 PM: -- I think this is resolved now: [https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.x-Solaris/463/] and the other test failures look unrelated. Please re-open if there are any (more) surprises. Thank you, [~krisden], [~hossman] and [~dweiss]! was (Author: talli...@mitre.org): I think this is resolved now: [|https://jenkins.thetaphi.de/view/Lucene-Solr/] [https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.4-Linux/] and Windows. Please re-open if there are any (more) surprises. Thank you, [~krisden], [~hossman] and [~dweiss]! > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved SOLR-14054. Resolution: Fixed I think this is resolved now: [|https://jenkins.thetaphi.de/view/Lucene-Solr/] [https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.4-Linux/] and Windows. Please re-open if there are any (more) surprises. Thank you, [~krisden], [~hossman] and [~dweiss]! > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14113) Add more file types to DIH's unit tests to ensure dependency coverage
Tim Allison created SOLR-14113: -- Summary: Add more file types to DIH's unit tests to ensure dependency coverage Key: SOLR-14113 URL: https://issues.apache.org/jira/browse/SOLR-14113 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Reporter: Tim Allison As part of SOLR-14054, [~dweiss] noted that the unit tests pass without the commons-csv dependency, which is, in fact, required if a csv file is sent to DIH. Let's add several more file types to the unit tests to include dependency coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999295#comment-16999295 ] Tim Allison commented on SOLR-14054: Got it. Thank you, [~dweiss]. I see that the Lucene benchmarks module also relies on xerces. Should I add a dependency on xml-apis there, too? Or, given that its unit tests pass, should we hope for the best? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999254#comment-16999254 ] Tim Allison commented on SOLR-14054: [~dweiss], will do on a separate issue if that's ok. You can tell Tika to avoid loading the TextAndCSVParser and use the TXTParser instead via tika-config.xml. If you'd prefer this behavior either offline or in Solr, I can show you how to do that. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999247#comment-16999247 ] Tim Allison commented on SOLR-14054: [~hossman] I can replicate this now. I should have caught this before the commit. I clearly tested with Java 11 when I thought I was testing with Java 8. This is my fault. The problem is solved if we add the xml-apis dependency, which xerces requires. It looks like the earlier version of xerces didn't happen to require xml-apis on the execution paths the unit tests were exercising. I can't explain why this isn't a problem with Java 11. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999151#comment-16999151 ] Tim Allison commented on SOLR-14054: [~dweiss], you are right. I only found this issue when I ran all of Tika's unit test docs against the upgraded Solr. I think users would be surprised to get a ClassNotFoundException when they send a csv file to DIH. I can add unit tests for more file format coverage (including csv) or we can configure Tika to use only the TXTParser in Solr. Let me know your preference. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999034#comment-16999034 ] Tim Allison commented on SOLR-14054: [~hossman]...ugh. Worked locally. Will take a look. Sorry and thank you. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999033#comment-16999033 ] Tim Allison commented on SOLR-14054: [~dweiss], commons-csv is used in Tika's in TextAndCSVParser, which is new since 1.19.1. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998583#comment-16998583 ] Tim Allison commented on SOLR-14054: Let me know if I botched anything. I _think_ we're good to go. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved SOLR-14054. Fix Version/s: 8.5 Resolution: Fixed Please reopen if I've broken anything. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Fix For: 8.5 > > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997435#comment-16997435 ] Tim Allison commented on SOLR-14054: Thank you, [~krisden]!!! I'll take a look. Unrelated to commons-compress, I may have found the source of the PDFontType1 problem I was seeing: PDFBOX-4715. IIUC, we need to fix this in PDFBox and Tika so that we can safely build both w JDK > 8. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995989#comment-16995989 ] Tim Allison commented on SOLR-14054: Seeing weird reproducibility issues...ugh. Will pick up again on Monday. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995951#comment-16995951 ] Tim Allison commented on SOLR-14054: Thank you, Robert! If we can get confirmation that I'm not doing something stupid -- that this really is a problem -- I'll open a new ticket. I need to do some more investigation. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995800#comment-16995800 ] Tim Allison edited comment on SOLR-14054 at 12/13/19 6:20 PM: -- # clone master or my personal issue branch: https://github.com/tballison/lucene-solr/tree/jira/SOLR-14054 (tukaani issue happens in both). # cd solr ... ant package # unzip the shiny new Solr # put the attached collection conf where it belongs # start solr # {{curl 'http://localhost:8983/solr/tika-integration-example/update/extract?literal.id=doc1=true' -F "myfile=@test-documents.7z"}} was (Author: talli...@mitre.org): # checkout https://github.com/tballison/lucene-solr/tree/jira/SOLR-14054 # cd solr ... ant package # unzip the shiny new Solr # put the attached collection conf where it belongs # start solr # {{curl 'http://localhost:8983/solr/tika-integration-example/update/extract?literal.id=doc1=true' -F "myfile=@test-documents.7z"}} > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995800#comment-16995800 ] Tim Allison commented on SOLR-14054: # checkout https://github.com/tballison/lucene-solr/tree/jira/SOLR-14054 # cd solr ... ant package # unzip the shiny new Solr # put the attached collection conf where it belongs # start solr # {{curl 'http://localhost:8983/solr/tika-integration-example/update/extract?literal.id=doc1=true' -F "myfile=@test-documents.7z"}} > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated SOLR-14054: --- Attachment: tika-integration-example-9.0.0-SNAPSHOT.tgz > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z, > tika-integration-example-9.0.0-SNAPSHOT.tgz > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated SOLR-14054: --- Attachment: test-documents.7z > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > Attachments: test-documents.7z > > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995749#comment-16995749 ] Tim Allison commented on SOLR-14054: [~tilman]...please ignore...PDFBox issues appear to be spurious/user error. [~krisden] will send reproduction steps shortly. Thank you! > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995724#comment-16995724 ] Tim Allison commented on SOLR-14054: > but /contrib/extraction/lib might not be in the core classloader? Makes sense...I'm not able to replicate this problem in unit tests. Do you know if classloading works differently in unit tests? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995683#comment-16995683 ] Tim Allison commented on SOLR-14054: I can replicate this reliably on Ubuntu 19.10, but I'm not seeing this issue on Mojave 10.14.6. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995675#comment-16995675 ] Tim Allison commented on SOLR-14054: [~tilman], I'm still getting this issue with PDFBox 2.0.17 when packaged in Solr. Is this more likely to be a Solr issue or a PDFBox issue? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995159#comment-16995159 ] Tim Allison edited comment on SOLR-14054 at 12/13/19 2:36 PM: -- I'm seeing similar behavior in Solr at least back to 8.3.1 and with other classes, e.g.: {noformat} java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.font.PDType1Font at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62) ~[?:?] at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) ~[?:?] at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) ~[?:?] at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) ~[?:?] at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) ~[?:?] at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117) ~[?:?] at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?] at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) ~[?:?] at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) ~[?:?] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198) ~[?:?] at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576) ~[?:?] at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) ~[?:?] {noformat} was (Author: talli...@mitre.org): I'm seeing similar behavior in Solr at least back to 8.3.1 but with different classes, e.g.: {noformat} java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.font.PDType1Font at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62) ~[?:?] at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) ~[?:?] at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) ~[?:?] at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) ~[?:?] at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) ~[?:?] at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117) ~[?:?] at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?] at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) ~[?:?] at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) ~[?:?] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198) ~[?:?] at
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995645#comment-16995645 ] Tim Allison commented on SOLR-14054: https://github.com/curationexperts/epigaea/issues/748 :P > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995634#comment-16995634 ] Tim Allison edited comment on SOLR-14054 at 12/13/19 1:57 PM: -- Interesting. Thank you! I'm reliably getting the PDType1Font and FontMapperImpl$DefaultFontProvider class loading issue back to 8.0.0. How has this not been reported?! was (Author: talli...@mitre.org): Interesting. Thank you! I'm reliably getting the PDType1Font class loading issue back to 8.0.0. How has this not been reported?! Will try different versions of Java. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995634#comment-16995634 ] Tim Allison commented on SOLR-14054: Interesting. Thank you! I'm reliably getting the PDType1Font class loading issue back to 8.0.0. How has this not been reported?! Will try different versions of Java. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995159#comment-16995159 ] Tim Allison commented on SOLR-14054: I'm seeing similar behavior in Solr at least back to 8.3.1 but with different classes, e.g.: {noformat} java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.font.PDType1Font at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62) ~[?:?] at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) ~[?:?] at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) ~[?:?] at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) ~[?:?] at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) ~[?:?] at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) ~[?:?] at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117) ~[?:?] at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) ~[?:?] at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) ~[?:?] at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) ~[?:?] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198) ~[?:?] at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576) ~[?:?] at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) ~[?:?] {noformat} > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994975#comment-16994975 ] Tim Allison commented on SOLR-14054: I can reproduce this in master without this patch. > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994927#comment-16994927 ] Tim Allison edited comment on SOLR-14054 at 12/12/19 6:20 PM: -- The tests all pass, and I can get a successful build locally, however when I try a full integration test (package, unzip, deploy), I'm getting a NoClassDefFoundError: {noformat} Exception in thread "Thread-15" java.lang.NoClassDefFoundError: org/tukaani/xz/FilterOptions at org.apache.commons.compress.archivers.sevenz.Coders.(Coders.java:47) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:1153) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:1106) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:405) at org.apache.tika.parser.pkg.PackageParser$SevenZWrapper.getNextEntry(PackageParser.java:424) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:285) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.ClassNotFoundException: org.tukaani.xz.FilterOptions {noformat} I can tell from the logs that the jars in contrib/extraction/lib are loading: {noformat} Added 44 libs to classloader, from paths: [/home/tim/work/solr-9.0.0-SNAPSHOT/contrib/extraction/lib, /home/tim/work/solr-9.0.0-SNAPSHOT/dist] {noformat} The xz.jar is where it belongs and it is the right version, and when I unzip that jar, the class is there. Any idea what might be going on? Code here: https://github.com/tballison/lucene-solr/tree/jira/SOLR-14054 was (Author: talli...@mitre.org): The tests all pass, and I can get a successful build locally, however when I try a full integration test (package, unzip, deploy), I'm getting a NoClassDefFoundError: {noformat} Exception in thread "Thread-15" java.lang.NoClassDefFoundError: org/tukaani/xz/FilterOptions at org.apache.commons.compress.archivers.sevenz.Coders.(Coders.java:47) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:1153) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:1106) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:405) at org.apache.tika.parser.pkg.PackageParser$SevenZWrapper.getNextEntry(PackageParser.java:424) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:285) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at
[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994927#comment-16994927 ] Tim Allison edited comment on SOLR-14054 at 12/12/19 6:07 PM: -- The tests all pass, and I can get a successful build locally, however when I try a full integration test (package, unzip, deploy), I'm getting a NoClassDefFoundError: {noformat} Exception in thread "Thread-15" java.lang.NoClassDefFoundError: org/tukaani/xz/FilterOptions at org.apache.commons.compress.archivers.sevenz.Coders.(Coders.java:47) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:1153) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:1106) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:405) at org.apache.tika.parser.pkg.PackageParser$SevenZWrapper.getNextEntry(PackageParser.java:424) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:285) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.ClassNotFoundException: org.tukaani.xz.FilterOptions {noformat} I can tell from the logs that the jars in contrib/extraction/lib are loading: {noformat} Added 44 libs to classloader, from paths: [/home/tim/work/solr-9.0.0-SNAPSHOT/contrib/extraction/lib, /home/tim/work/solr-9.0.0-SNAPSHOT/dist] {noformat} The xz.jar is where it belongs and it is the right version, and when I unzip that jar, the class is there. Any idea what might be going on? was (Author: talli...@mitre.org): The tests all pass, and I can get a successful build locally, however when I try a full integration test, I'm getting a NoClassDeffFoundError: {noformat} Exception in thread "Thread-15" java.lang.NoClassDefFoundError: org/tukaani/xz/FilterOptions at org.apache.commons.compress.archivers.sevenz.Coders.(Coders.java:47) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:1153) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:1106) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:405) at org.apache.tika.parser.pkg.PackageParser$SevenZWrapper.getNextEntry(PackageParser.java:424) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:285) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994927#comment-16994927 ] Tim Allison commented on SOLR-14054: The tests all pass, and I can get a successful build locally, however when I try a full integration test, I'm getting a NoClassDeffFoundError: {noformat} Exception in thread "Thread-15" java.lang.NoClassDefFoundError: org/tukaani/xz/FilterOptions at org.apache.commons.compress.archivers.sevenz.Coders.(Coders.java:47) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:1153) at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:1106) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:405) at org.apache.tika.parser.pkg.PackageParser$SevenZWrapper.getNextEntry(PackageParser.java:424) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:285) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.ClassNotFoundException: org.tukaani.xz.FilterOptions {noformat} I can tell from the logs that the jars in contrib/extraction/lib are loading: {noformat} Added 44 libs to classloader, from paths: [/home/tim/work/solr-9.0.0-SNAPSHOT/contrib/extraction/lib, /home/tim/work/solr-9.0.0-SNAPSHOT/dist] {noformat} Any idea what might be going on? > Upgrade Tika to 1.23 > > > Key: SOLR-14054 > URL: https://issues.apache.org/jira/browse/SOLR-14054 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Minor > > We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23
[ https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994711#comment-16994711 ] Tim Allison commented on SOLR-14054: This shouldn't be a problem, but I noticed that we can't bump guava to 28.1-jre: {noformat} java.lang.NoClassDefFoundError com/google/common/util/concurrent/internal/InternalFutureFailureAccess [junit4]>at __randomizedtesting.SeedInfo.seed([EC9FF1FD80627747:E1D4DE448383E382]:0) [junit4]>at com.google.common.cache.LocalCache$LoadingValueReference.(LocalCache.java:3472) [junit4]>at com.google.common.cache.LocalCache$LoadingValueReference.(LocalCache.java:3476) [junit4]>at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2134) [junit4]>at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045) [junit4]>at com.google.common.cache.LocalCache.get(LocalCache.java:3953) [junit4]>at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4873) [junit4]>at org.apache.solr.schema.AbstractSpatialFieldType.getStrategy(AbstractSpatialFieldType.java:430) [junit4]>at org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:236) [junit4]>at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:65) [junit4]>at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:171) [junit4]>at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:109) [junit4]>at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:969) [junit4]>at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:339) [junit4]>at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:286) [junit4]>at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:233) [junit4]>at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76) [junit4]>at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) [junit4]>at org.apache.solr.update.processor.NestedUpdateProcessorFactory$NestedUpdateProcessor.processAdd(NestedUpdateProcessorFactory.java:79) [junit4]>at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) [junit4]>at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:259) [junit4]>at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:489) [junit4]>at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339) [junit4]>at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) [junit4]>at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339) [junit4]>at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225) [junit4]>at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) [junit4]>at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110) [junit4]>at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332) [junit4]>at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281) [junit4]>at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338) [junit4]>at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) [junit4]>at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236) [junit4]>at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303) [junit4]>at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) [junit4]>at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196) [junit4]>at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:127) [junit4]>at
[jira] [Commented] (SOLR-14066) Deprecate DIH
[ https://issues.apache.org/jira/browse/SOLR-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994706#comment-16994706 ] Tim Allison commented on SOLR-14066: {quote}Then be “DIH is no longer part of Apache but thrives in its new home at github...” {quote} Is the notion that we'd break different components of DIH into different personal repos on github, like: [https://github.com/dadoonet/fscrawler] I worry about moving critical code to personal repos, even though it can be forked/maintained by others. And, by "critical", I appreciate and completely agree with Jan's point about how it should be "demo only", but is in fact used across the land in production. :( I'm very much in favor of moving Tika, at least, out of Solr...but to where? Smaller, less pressing question: does this mean green fields (start fresh) for https://issues.apache.org/jira/browse/SOLR-7632? In short, rather than implementing SOLR-7632, we should start a side project that uses tika-server as the default? > Deprecate DIH > - > > Key: SOLR-14066 > URL: https://issues.apache.org/jira/browse/SOLR-14066 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.4 > > > DataImportHandler has outlived its utility. DIH doesn't need to remain inside > Solr anymore. Let us deprecate DIH in 8.4 (and remove it from the Solr distro > in 9x or 10x). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server
[ https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993649#comment-16993649 ] Tim Allison commented on SOLR-7632: --- All, what's the current thinking on this? Is there a model for plugins I can follow? How do we test with external dependencies (e.g. a running tika-server)? How far do we want to extricate Tika from Solr? The farther the better, IMHO. :D > Change the ExtractingRequestHandler to use Tika-Server > -- > > Key: SOLR-7632 > URL: https://issues.apache.org/jira/browse/SOLR-7632 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Reporter: Chris A. Mattmann >Priority: Major > Labels: gsoc2017, memex > > It's a pain to upgrade Tika's jars all the times when we release, and if Tika > fails it messes up the ExtractingRequestHandler (e.g., the document type > caused Tika to fail, etc). A more reliable way and also separated, and easier > to deploy version of the ExtractingRequestHandler would make a network call > to the Tika JAXRS server, and then call Tika on the Solr server side, get the > results and then index the information that way. I have a patch in the works > from the DARPA Memex project and I hope to post it soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14054) Upgrade Tika to 1.23
Tim Allison created SOLR-14054: -- Summary: Upgrade Tika to 1.23 Key: SOLR-14054 URL: https://issues.apache.org/jira/browse/SOLR-14054 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Components: contrib - DataImportHandler Reporter: Tim Allison Assignee: Tim Allison We just released 1.23. Let's upgrade Tika. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org