[
https://issues.apache.org/jira/browse/CONNECTORS-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057282#comment-16057282
]
Karl Wright commented on CONNECTORS-1434:
-----------------------------------------
It appears that HttpClient does no escaping of the form name or body content of
any kind. The filename appears as the title of the body content in the
multipart area, and it appears also in the content type of the response
surrounded by double quotes. The file name that gets passed in would have to
be legal in both of those contexts.
> Bad characters in file name can cause Solr 500 errors
> -----------------------------------------------------
>
> Key: CONNECTORS-1434
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1434
> Project: ManifoldCF
> Issue Type: Bug
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 2.7
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.8
>
>
> There are reports that quotes or spaces in a file name can blow up the Solr
> indexing of the document and cause it to throw a 500 error.
> The code in question (from ModifiedHttpSolrClient) is the following:
> {code}
> String name = content.getName();
> if (name == null) {
> name = "";
> }
> parts.add(new FormBodyPart(name,
> new InputStreamBody(
> content.getStream(),
> contentType,
> content.getName())));
> {code}
> ... where content.getName() would be returning a name with illegal
> characters. The question is, what does httpclient do with this name, and
> should it be escaping it in some way?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)