Re: RFC6265, cookie parsing and UTF-8

Mark Thomas Wed, 27 Aug 2014 02:37:45 -0700

On 26/08/2014 23:09, Rémy Maucherat wrote:
> 2014-08-26 21:53 GMT+02:00 Mark Thomas <ma...@apache.org>:
> 
>> One of the aims of the proposed cookie changes [1] was to deal with the
>> HTML 5 changes that mean UTF-8 can appear in cookie headers.
>>
>> This has some potentially large implications for Tomcat.
>>
>> Currently, Tomcat handles cookies as MessageBytes, processing everything
>> in bytes and only converting to String when necessary. This is largely
>> possible because of the assumption that everything is ASCII.
>>
>> Introduce UTF-8 and processing everything in bytes gets a whole lot
>> harder. You essentially have to decode to UTF-8 to ensure that you have
>> valid data - at a which point why not just use Strings anyway?
>>
>> I am currently leaning towards removing a lot of the current cookie
>> header caching  recycling and doing something along the following lines:
>> - Lazy parsing as currently (but unless cookie based session tracking is
>>   disabled this is going to run on every request)
>> - Convert headers to UTF-8 strings
>> - Parse them with a new parser along the lines of o.a.t.u.http.parser
>> - Have that parser return an array of javax.servlet.http.Cookie objects
>> - Pass those to the app if/when requested
>>
>> In terms of handling RFC6265 and RFC2109 my plan is to have two parsers,
>> share as much code as possible and switch between them based on the
>> cookie header with the expectation that 99.9% of cookies will be parsed
>> by the RFC6265 parser. We could add some options to this switching to
>> enable other parsers (e.g. a Netscape parser) to be used.
>>
>> I'd also like to keep the current cookie parsing implementation for now.
>> Until we are happy with the new parsing, the current implementation will
>> be the default. Once we are happy with the new parsing we can change the
>> default. We can add an option to switch between the current and the new
>> parsing.
>>
>> Thoughts?
>>
> 
> As far as I am concerned, this could turn out badly.


I agree. I remember the last time I made changes to the cookie parsing
to improve spec compliance as a result of some security issues. It broke
a lot of stuff and the fall out lasted for months. I don't want to
repeat that.

> String manipulation is
> consistently the slowest thing overall other than IO, and rather often
> webapps use a massive amount of cookies [to the point they get errors
> because the HTTP header size is too small by default].

I agree the new code is going to have to keep a careful eye on performance.

> So the current processing should probably be the default [as proposed],
> then remain an option until it can be demonstrated this is not slower
> [which IMO is not possible, so it would have to remain].

The problem is that the current approach simply can't work for UTF-8
cookie values. I intend to start with some performance tests so we can
see what the difference really is. I'm expecting that we will need to
trade a little performance to be able to handle UTF-8. Whether or not
that trade is a reasonable one will depend on the performance figures. I
suggest we hold off on that debate until we have some hard numbers to
work with.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Re: RFC6265, cookie parsing and UTF-8

Reply via email to