[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685084#comment-13685084
 ] 

Karl Wright commented on HTTPCLIENT-1372:
-----------------------------------------

The class HttpMultipart.java has the following logic for multipart form parts:

{code}
                MinimalField cd = 
part.getHeader().getField(MIME.CONTENT_DISPOSITION);
                writeField(cd, this.charset, out);
                String filename = part.getBody().getFilename();
                if (filename != null) {
                    MinimalField ct = 
part.getHeader().getField(MIME.CONTENT_TYPE);
                    writeField(ct, this.charset, out);
                }
{code}

This means that *only* in the form sections that have a filename set will there 
be a content-type set.  Unfortunately, that means that while by using 
COMPATIBLE mode I can get the filename itself correctly decoded, I lose the 
ability to get the rest of the form correctly decoded.  I'm not convinced this 
is intentional behavior, either.

Since this code is in HttpMultipart, I cannot see any way of overriding this 
behavior in 4.2.x other than by overriding all the public methods of this class 
in a new ModifiedMultiPartEntity class that basically does everything for form 
processing.  Oleg, do see any simpler way?

                
> Content-Disposition header in form data does not adhere to RFC6266
> ------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1372
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1372
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpMime
>    Affects Versions: 4.2.5
>            Reporter: Karl Wright
>             Fix For: 4.3 Beta3
>
>
> The Content-disposition header, as it appears for an item of form data, does 
> not allow for UTF-8 encoding as specified in RFC6266, as described here:
> http://tools.ietf.org/html/rfc6266
> This is causing ManifoldCF severe problems working in Japan with Solr, since 
> Solr content extraction relies on accurate filenames in order to determine 
> the likely document encoding.
> A fix for the 4.2.x branch will be needed, I am afraid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to