> On Nov. 20, 2014, 9:11 p.m., Ben Mahler wrote:
> > 3rdparty/libprocess/include/process/http.hpp, lines 415-416
> > <https://reviews.apache.org/r/28253/diff/1/?file=770208#file770208line415>
> >
> >     Here's what we'll have if we added join:
> >     
> >     parse: takes a __decoded__ query string, construct a key,value map of 
> > the parameters
> >     
> >     join: takes a key,value map of parameters, creates an __encoded__ query 
> > string.
> >     
> >     That's where I find this to be non-intuitive:
> >     
> >     ```
> >     join(parse(query)) == encode(query)
> >     ```
> >     
> >     It seems like only request encoding should deal with constructing an 
> > encoded query string, is that not the case? Would like to avoid exposing 
> > this if possible, and if not possible, would like to make this more 
> > intuitive.
> 
> Cody Maloney wrote:
>     So that behavior of parse is going to produce incorrect results I 
> believe. Suppose one of my parameters in my query string is a small blob of 
> JSON (Contains '=' and '&' potentially). If we encode/decode on the outside 
> of query string parsing/joining, we can't represent this (The & will look 
> like the start of a new parameter).
>     
>     ```
>     std::map<> query = {{"a%20","b&c=d!@#foobar"}}
>     
>     EXPECT_EQ(query, parse(join(query)));
>     ```
>     
>     Based on my reading of rfc3986 2.1 (snippet inline):
>     ```
>     A percent-encoding mechanism is used to represent a data octet in a
>     component when that octet's corresponding character is outside the
>     allowed set or is being used as a delimiter of, or within, the
>     component.
>     ```
>     
>     This should work, and is the reason why there is percent encoding. From 
> what I can tell grepping throuhg the source code nothing currently does the 
> decoding before we parse the query string, so making the decode live inside 
> of the parse function should work. Currently callers just end up with percent 
> encoded chunks in their query string, which causes badness (And could explain 
> an issue which a customer was having with files/browse.json and things not 
> showing up)
> 
> Ben Mahler wrote:
>     Hm.. so I did the grep and as expected the callers parse decoded strings.
>     
>     The original use of parse is in Decoder:
>     
>     ```
>       static int on_message_complete(http_parser* p)
>       {
>         DataDecoder* decoder = (DataDecoder*) p->data;
>     
>         // Parse the query key/values.
>         Try<std::string> decoded = http::decode(decoder->query);
>         if (decoded.isError()) {
>           return 1;
>         }
>         decoder->request->query = http::query::parse(decoded.get());
>         ...
>     ```
>     
>     Note that here we parse a decoded string, as expected. The other use 
> appears to be within the http code in the master that parses request 
> __bodies__:
>     
>     ```
>     Future<Response> Master::Http::shutdown(const Request& request)
>     {
>       if (request.method != "POST") {
>         return BadRequest("Expecting POST");
>       }
>     
>       // Parse the query string in the request body (since this is a POST)
>       // in order to determine the framework ID to shutdown.
>       hashmap<string, string> values =
>         process::http::query::parse(request.body);
>     ```
>     
>     Bodies are not expected to be percent encoded, IIUC.
>     
>     Anything I missed...?
> 
> Cody Maloney wrote:
>     So we do, not sure how I missed that...
>     
>     So I can fix up the decoding to be inside the query string. Simple extra 
> patch, then add some round trip tests. Sound good?
>     
>     The body one is interesting in that it looks like we're doing it 
> incorrectly:
>     
> http://stackoverflow.com/questions/4007969/application-x-www-form-urlencoded-or-multipart-form-data/4073451#4073451
>     
>     We're assuming application/x-www-form-urlencoded at the moment which we 
> should decode. None of the requests happen to be from non-libprocess things 
> or contain illegal characters so it happens to work.
>     
>     *random side thoughts*: We aren't a very good HTTP/webserver / parsing 
> into request/response objects, processing Content-Type, doing simple clean 
> routing. The more I dig into this code the more I think it would be nice if 
> there was just an nginx, apache, proxygen / etc. which are good at doing the 
> HTTP parts and we just take their parsing + methods, getting us out of the 
> game of writing an HTTP server, then just have the bits to bridge / convert 
> their request objects to the appropriate libprocess dispatches.
> 
> Ben Mahler wrote:
>     > So I can fix up the decoding to be inside the query string.
>     
>     Do you mean decoding to be inside query::parse? Sounds bad because it 
> cannot assume the string is always encoded and decoding is not idempotent 
> IIUC.
>     
>     If you _really_ need this, let's just provide the map -> string 
> functionality, and leave the encoding responsibility to the Request encoder. 
> As much as possible, the Encoder / Decoder code should be the only code 
> responsible for encoding and decoding, but maybe I'm missing some context on 
> this.

I'm saying the string is always encoded according to the specification at all 
our callsites. Otherwise it isn't a HTTP query string. Decoding a query string 
which has no encoded parameters in it doesn't make any change to the query 
string.

The decoding has to happen after the query string is parsed apart to function 
correctly (The encoding exists so that otherwise illegal characters can be 
placed there). `parse(decode(query))` will not produce the right set of query 
parameters given query parameters/values which contain '&', '='. The encoding 
and parsing are fundamentally entertwined. It's part of transforming the wire 
format to native datastructures.

I can do this by either making parse() decode as it parses or making a function 
which you call that takes the output from a query::parse() and decodes the 
key/value pairs inside of it. Either way works fine for me. I think it is 
important to note here every current callsite of parse should be doing a 
decode(parse()) if the decode function needs to wrap the parse. Parse of a 
decoded query string is not a functional operation, unless you are accepting 
only a subset of what should be valid query strings.


- Cody


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28253/#review62422
-----------------------------------------------------------


On Nov. 20, 2014, 12:50 a.m., Cody Maloney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28253/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2014, 12:50 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-2132
>     https://issues.apache.org/jira/browse/MESOS-2132
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is needed to turn Request objects into http request which we send across 
> the wire. The existing HTTP request sending code expects the query string to 
> be already constructed.
> 
> Ideally people can either just forward the query string they get to send it 
> back across the wire or they can manipulate it using the native C++ datatype 
> (A hashmap), and then have it automatically re-encoded.
> 
> The parsing logic to go from a string to a map is already exists in this 
> header, this just adds the other direction.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/include/process/http.hpp 
> 9cf05acbb724ab9af8010d1788621d37a0e48e86 
>   3rdparty/libprocess/src/http.cpp b00f33339366f5c06b6f20e38c5ae0c23b8a9358 
>   3rdparty/libprocess/src/tests/http_tests.cpp 
> a90e65f77904da0a45e1cc0cc9889ae69354a1a5 
> 
> Diff: https://reviews.apache.org/r/28253/diff/
> 
> 
> Testing
> -------
> 
> make distcheck ubuntu 14.04
> 
> 
> Thanks,
> 
> Cody Maloney
> 
>

Reply via email to