[jira] [Comment Edited] (SOLR-14768) Error 500 on PDF extraction: java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts

2020-09-09 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192711#comment-17192711
 ] 

Markus Kalkbrenner edited comment on SOLR-14768 at 9/9/20, 7:54 AM:


I just want to mention that I created this issue because it has been caught by 
automated tests ;)

[https://github.com/solariumphp/solarium/actions] and 
[https://github.com/mkalkbrenner/search_api_solr/actions]

Beside PDF extraction we test things like the Collections API, Streaming 
Expressions, Facets, JSON Facets, ...

Therefore we run Solr docker containers. Maybe we can add a test that includes 
a build of a docker container using the latest Solr dev version?

BTW both test suites allways use the latest versions of Solr 7 and 8. At the 
moment we had to break this rule and force Solr 8.5 instead of the latest 
version to get the tests to pass again to not interrupt the development of 
these frameworks.


was (Author: mkalkbrenner):
I just want to mention that I created this issue because it has been caught by 
automated tests ;)

[https://github.com/solariumphp/solarium/actions] and 
[https://github.com/mkalkbrenner/search_api_solr/actions]

Beside PDF extraction we test things like the Collections API, Streaming 
Expressions, Facets, JSON Facets, ...

Therefore we run Solr docker containers. Maybe we can add a test that includes 
a build of a docker container using the latest Solr dev version?

> Error 500 on PDF extraction: java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/server/MultiParts
> 
>
> Key: SOLR-14768
> URL: https://issues.apache.org/jira/browse/SOLR-14768
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6, 8.6.1
>Reporter: Markus Kalkbrenner
>Assignee: David Smiley
>Priority: Major
> Attachments: Solr v8.6.x fails with multipart MIME in commands.eml, 
> Solr v8.6.x fails with multipart MIME in commands.eml, Solr v8.6.x fails with 
> multipart MIME in commands.eml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See [https://www.mail-archive.com/solr-user@lucene.apache.org/msg152182.html]
> The integration tests of the solarium PHP library and the integration tests 
> of the Search API Solr Drupal module both fail on PDF extraction if executed 
> on Solr 8.6.
> They still work on Solr 8.5.1 an earlier versions.
> {quote}2020-08-20 12:30:35.279 INFO (qtp855700733-19) [ x:5f3e6ce2810ef] 
> o.a.s.u.p.LogUpdateProcessorFactory [5f3e6ce2810ef] webapp=/solr 
> path=/update/extract 
> params=\{json.nl=flat=0=false=testpdf.pdf=extract-test=true=false=attr_=json}{add=[extract-test
>  (1675547519474466816)],commit=} 0 957
> solr8_1 | 2020-08-20 12:30:35.280 WARN (qtp855700733-19) [ ] 
> o.e.j.s.HttpChannel /solr/5f3e6ce2810ef/update/extract => 
> java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
> solr8_1 | java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:443)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>  ~[?:?]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>  ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) 
> ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
> ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) 
> ~[jetty-security-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]

[jira] [Comment Edited] (SOLR-14768) Error 500 on PDF extraction: java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts

2020-09-08 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192450#comment-17192450
 ] 

Ishan Chattopadhyaya edited comment on SOLR-14768 at 9/8/20, 8:59 PM:
--

bq. That's good news, but looking from a distance it is not yet convincing. 

Please have a closer look.

{quote}I have volunteered my Linux based search service (PHP based) for this, 
found on [https://netlab1.net/] as the Solr/Lucene Search Service. That takes 
less than an hour to build up completely. As its command line crawler does the 
work of interest here it easily blends into the needed final assembly and test 
operation. 
{quote}
If you rely on Solr for this functionality and also believe your tool would 
take an hour to build up, why don't you set it up for your own automated 
testing? Alternately, feel free to open a separate JIRA/submit a patch there to 
help us do so.

bq. What would help us is using external agents ...

Please feel free to chime in in this dev list discussion.
https://lists.apache.org/thread.html/r27d742357f9e1342cce1c9aaa215a10d94f64b43b2e681f19fa8d150%40%3Cdev.lucene.apache.org%3E
 


was (Author: ichattopadhyaya):
bq. That's good news, but looking from a distance it is not yet convincing. 

Please have a closer look.

{quote}I have volunteered my Linux based search service (PHP based) for this, 
found on [https://netlab1.net/] as the Solr/Lucene Search Service. That takes 
less than an hour to build up completely. As its command line crawler does the 
work of interest here it easily blends into the needed final assembly and test 
operation. 
{quote}
If you rely on Solr for this functionality and also believe your tool would 
take an hour to build up, why don't you set it up for your own automated 
testing? Alternately, feel free to submit a patch to help us do so.

 

> Error 500 on PDF extraction: java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/server/MultiParts
> 
>
> Key: SOLR-14768
> URL: https://issues.apache.org/jira/browse/SOLR-14768
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6, 8.6.1
>Reporter: Markus Kalkbrenner
>Assignee: David Smiley
>Priority: Major
> Attachments: Solr v8.6.x fails with multipart MIME in commands.eml, 
> Solr v8.6.x fails with multipart MIME in commands.eml, Solr v8.6.x fails with 
> multipart MIME in commands.eml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See [https://www.mail-archive.com/solr-user@lucene.apache.org/msg152182.html]
> The integration tests of the solarium PHP library and the integration tests 
> of the Search API Solr Drupal module both fail on PDF extraction if executed 
> on Solr 8.6.
> They still work on Solr 8.5.1 an earlier versions.
> {quote}2020-08-20 12:30:35.279 INFO (qtp855700733-19) [ x:5f3e6ce2810ef] 
> o.a.s.u.p.LogUpdateProcessorFactory [5f3e6ce2810ef] webapp=/solr 
> path=/update/extract 
> params=\{json.nl=flat=0=false=testpdf.pdf=extract-test=true=false=attr_=json}{add=[extract-test
>  (1675547519474466816)],commit=} 0 957
> solr8_1 | 2020-08-20 12:30:35.280 WARN (qtp855700733-19) [ ] 
> o.e.j.s.HttpChannel /solr/5f3e6ce2810ef/update/extract => 
> java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
> solr8_1 | java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:443)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>  ~[?:?]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>  ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) 
> ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
> ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) 
> ~[jetty-security-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> 

[jira] [Comment Edited] (SOLR-14768) Error 500 on PDF extraction: java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts

2020-08-26 Thread Joe Doupnik (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185070#comment-17185070
 ] 

Joe Doupnik edited comment on SOLR-14768 at 8/26/20, 10:09 AM:
---

After some review of the source code of Solr 8.6.1, file 
SolrRequestParsers.java, and then asking Google about MultiParts, one finds 
that the function cleanupMultipartFiles (which is the locus of the present 
error) is coupled to functions (MultiPartInputStream etc) which are depreciated 
and even parts actively blocked. Also such a cleanup process itself is 
vulnerable to side effects 
([https://github.com/eclipse/jetty.project/issues/4383). 
|https://github.com/eclipse/jetty.project/issues/4383).**]and 
([https://github.com/eclipse/jetty.project/issues/4350|https://github.com/eclipse/jetty])
 . These are from Nov 2019. A comment in the first reference is useful to note:

"A simple fix was added in 9.4.26 to avoid the NPE 
([#4479|https://github.com/eclipse/jetty.project/pull/4479]).
 A more substantial fix using synchronization has been merged to jetty-10.0.x 
([#4498|https://github.com/eclipse/jetty.project/pull/4498]) which also ensures 
the parts are always cleaned up properly when parsing the multipart form 
asynchronously."

The second reference, 4350, is has topic title of "Depreciated 
MultiPartInputStreamParser still used in jetty-server (MultiPartsUtilParser) 
but OSGi ExportPackage suppressed."

Thus the present functions need to be replaced, and perhaps they are with jetty 
10. In any case, the material is broken in Solr 8.6.x, and any fixes need 
adequate testing.

Thanks,

Joe D.


was (Author: jdoupnik):
After some review of the source code of Solr 8.6.1, file 
SolrRequestParsers.java, and then asking Google about MultiParts, one finds 
that the function cleanupMultipartFiles (which is the locus of the present 
error) is coupled to functions (MultiPartInputStream etc) which are depreciated 
and even parts actively blocked. Also such a cleanup process itself is 
vulnerable to side effects 
([https://github.com/eclipse/jetty.project/issues/4383).**] A comment in that 
reference is useful to note:

"A simple fix was added in 9.4.26 to avoid the NPE 
([#4479|https://github.com/eclipse/jetty.project/pull/4479]).
 A more substantial fix using synchronization has been merged to jetty-10.0.x 
([#4498|https://github.com/eclipse/jetty.project/pull/4498]) which also ensures 
the parts are always cleaned up properly when parsing the multipart form 
asynchronously."

Thus the present functions need to be replaced, and perhaps they are with jetty 
10. In any case, the material is broken in Solr 8.6.x, and any fixes need 
adequate testing.

Thanks,

Joe D.

> Error 500 on PDF extraction: java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/server/MultiParts
> 
>
> Key: SOLR-14768
> URL: https://issues.apache.org/jira/browse/SOLR-14768
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6, 8.6.1
>Reporter: Markus Kalkbrenner
>Priority: Major
> Attachments: Solr v8.6.x fails with multipart MIME in commands.eml
>
>
> See [https://www.mail-archive.com/solr-user@lucene.apache.org/msg152182.html]
> The integration tests of the solarium PHP library and the integration tests 
> of the Search API Solr Drupal module both fail on PDF extraction if executed 
> on Solr 8.6.
> They still work on Solr 8.5.1 an earlier versions.
> {quote}2020-08-20 12:30:35.279 INFO (qtp855700733-19) [ x:5f3e6ce2810ef] 
> o.a.s.u.p.LogUpdateProcessorFactory [5f3e6ce2810ef] webapp=/solr 
> path=/update/extract 
> params=\{json.nl=flat=0=false=testpdf.pdf=extract-test=true=false=attr_=json}{add=[extract-test
>  (1675547519474466816)],commit=} 0 957
> solr8_1 | 2020-08-20 12:30:35.280 WARN (qtp855700733-19) [ ] 
> o.e.j.s.HttpChannel /solr/5f3e6ce2810ef/update/extract => 
> java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
> solr8_1 | java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:443)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>  ~[?:?]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>  ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 

[jira] [Comment Edited] (SOLR-14768) Error 500 on PDF extraction: java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts

2020-08-25 Thread Joe Doupnik (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183515#comment-17183515
 ] 

Joe Doupnik edited comment on SOLR-14768 at 8/25/20, 9:00 AM:
--

My observations were described with pictures added, thus to see them please 
refer to the user's list, 23 August. This include how to reproduce code 
snippets.

I have added my report (from solr-users) as an email attachment to the main 
item.

Joe D.


was (Author: jdoupnik):
My observations were described with pictures added, thus to see them please 
refer to the user's list, 23 August. This include how to reproduce code 
snippets.

Joe D.

> Error 500 on PDF extraction: java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/server/MultiParts
> 
>
> Key: SOLR-14768
> URL: https://issues.apache.org/jira/browse/SOLR-14768
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6, 8.6.1
>Reporter: Markus Kalkbrenner
>Priority: Major
> Attachments: Solr v8.6.x fails with multipart MIME in commands.eml
>
>
> See [https://www.mail-archive.com/solr-user@lucene.apache.org/msg152182.html]
> The integration tests of the solarium PHP library and the integration tests 
> of the Search API Solr Drupal module both fail on PDF extraction if executed 
> on Solr 8.6.
> They still work on Solr 8.5.1 an earlier versions.
> {quote}2020-08-20 12:30:35.279 INFO (qtp855700733-19) [ x:5f3e6ce2810ef] 
> o.a.s.u.p.LogUpdateProcessorFactory [5f3e6ce2810ef] webapp=/solr 
> path=/update/extract 
> params=\{json.nl=flat=0=false=testpdf.pdf=extract-test=true=false=attr_=json}{add=[extract-test
>  (1675547519474466816)],commit=} 0 957
> solr8_1 | 2020-08-20 12:30:35.280 WARN (qtp855700733-19) [ ] 
> o.e.j.s.HttpChannel /solr/5f3e6ce2810ef/update/extract => 
> java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
> solr8_1 | java.lang.NoClassDefFoundError: org/eclipse/jetty/server/MultiParts
> solr8_1 | at 
> org.apache.solr.servlet.SolrRequestParsers.cleanupMultipartFiles(SolrRequestParsers.java:624)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:443)
>  ~[?:?]
> solr8_1 | at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>  ~[?:?]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>  ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) 
> ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
> ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) 
> ~[jetty-security-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) 
> ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
>  ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
> solr8_1 | at 
>