[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus
Stephen Day stevv...@gmail.com added the comment: While it's likely that adding a `quote`/`quote_plus` function paramater to urlencode is the right solution, I want to ensure that the key point is communicated clearly: encoding a space as a '+' is pathological, in that in the common case, an unescaped encoded character is indistinguishable from a literal '+'. Take the case of the literal string '+ '. If one uses the javascript encodeURI function to encode the string in a browser console, one gets the following: encodeURI('+ ') +%20 Now, we have a string that will not decode symmetrically. In other words, we cannot tell if this string should decode to ' ' or '+ '. And while use of encodeURI is discouraged, application developers still use it places, introducing these kinds of errors. Conversely, we can see that the behavior of encodeURIComponent, is unambiguous: encodeURIComponent('+ ') %2B%20 And while these are analogues to quote and quote_plus (there exists now analogue to javascripts urlencode), it's easy to see that disambiguating the encoding of the resulting output of urlencode would be desirable. There is a similar situation with php library functions. Furthermore, it is agreed that urlencode does follow the rules, but the rules, as they are, introduce an asymmetrical, pathological encoding. Most services accept '%20' as space in lieu of '+' when data is encoded as 'application/x-www-form-urlencoded' anyway. Concluding, I know it seems a little silly to spend time filing this bug and provide relevant cases, but I'd like to cite professional experience in this matter; I have seen pluses-for-spaces introduce errors time and time again. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13866 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus
Stephen Day stevv...@gmail.com added the comment: I apologize for reopening this bug, but I find your interpretation to be inaccurate. While technically valid, the combination of the documentation, the function name and the main use cases yields pathological invocations of urlencode. My bug report is to help mitigate these problems. The main use case for url encoding of mapping types is not for posting form data; the main use case is appending url parameters to a url: from urllib import urlencode from urlparse import urlunparse urlunparse(('http', 'example.com', '/', None, urlencode({'a': 'some string'}), None)) 'http://example.com/?a=some+string' Any sane person would naturally gravitate to a function called urlencode to url encode a mapping type. If the urllib.urlencode function is indeed intended for form-encoding, as I agree is hinted in the documentation, it should indicate that its result is 'application/x-www-form-urlencoded' or it should be called formencode. The quote or quote_plus is not at all what I am looking for; I am quite familiar with these library functions. These functions are for encoding component strings; they don't meet the use case described at all: quote({'a': 1}) Traceback (most recent call last): File stdin, line 1, in module File /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py, line 1248, in quote if not s.rstrip(safe): AttributeError: 'dict' object has no attribute 'rstrip' In addition, Java's URLEncoder implementation is hardly a good example of standards compliant URL manipulation. Python is not Java. The Python community needs to make its own, independent, mature language decisions. In general, the use of '+' to encode spaces in content, even if it is compliant against an arbitrary standard, is pathological, especially when used in urls. Even though python's quote_plus function works symmetrically on its own, when pluses are used in a multi-language environment it can become impossible to tell whether a plus is a literal '+' or an encoded space. In addition, the usage of '%20' for spaces will work in almost all cases. RFC3986, Section 2 [1] describes the use of percent-encoding as a solution to representing reserved characters. In practice, percent-encoding is used on the value component of 'key=value' productions and this works in nearly all cases. The referenced standard [2], while relevant to the implied use case, is not applicable to url assembly. Given your interpretation, it seems that there is no function in the python standard library to meet the use case of correctly assembling url parameter values, leaving application developers to come up with something like this: ''.join(['='.join((quote(k), quote(v))) for k,v in {'a': '1', 'b': 'with spaces'}.iteritems()]) 'a=1b=with%20spaces' In most cases, people will just use urlencode, which uses pluses for spaces, yielding pathological, noncompliant urls. In deference to this bug closure, there are a few options: 1. Close this issue and keep polluting the world's urls with pluses for spaces. 2. Make urlencode target path/query parameter encoding and then create a new function, formencode, for use in encoding form data, breaking backwards compatibility. 3. Simply add a keyword argument to urlencode to allow the caller to specify the encoding function and separator, retaining compatibility and satisfying all of the above use cases. Naturally, 3 seems to be a very reasonable solution to this bug. [1] http://tools.ietf.org/html/rfc3986#section-2 explicitly covers [2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 -- resolution: invalid - status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13866 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus
New submission from Stephen Day stevv...@gmail.com: The current behavior of the urlencode function (2.7: urllib, 3.x: urllib.parse) encodes spaces as pluses: from urllib import urlencode urlencode({'a': 'some param'}) 'a=some+param' However, in most instances, it would be desirable to merely encode spaces using percent encoding: urlencode({'a': 'some param'}) 'a=some%20param' But there is no way to get this behavior in the standard library. It would probably best to change this so it defaults to use the regular quote function, but allows callers who need the legacy quote_plus behavior to pass that in as a function parameter. An acceptable fix would be to have the quote function taken as a keyword parameter, so legacy behavior remains: urlencode({'a': 'some param'}) 'a=some+param' Then the behavior could be adjusted where needed: from urllib import quote urlencode({'a': 'some param'}, quote=quote) 'a=some%20param' -- components: Library (Lib) messages: 151980 nosy: Stephen.Day priority: normal severity: normal status: open title: {urllib,urllib.parse}.urlencode should not use quote_plus versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13866 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5183] wsgiref.simple_server not working
New submission from Stephen Day stephen.h@gmail.com: The attached application doesn't work. I think the value of self.headers (see line 114) has a blank line at the end that it did not in Python 2.5 Here is the error message that occurs when it gets a request (http://127.0.0.1:8080/): Exception happened during processing of request from ('127.0.0.1', 60549) Traceback (most recent call last): File C:\Python30\lib\socketserver.py, line 281, in _handle_request_noblock self.process_request(request, client_address) File C:\Python30\lib\socketserver.py, line 307, in process_request self.finish_request(request, client_address) File C:\Python30\lib\socketserver.py, line 320, in finish_request self.RequestHandlerClass(request, client_address, self) File C:\Python30\lib\socketserver.py, line 614, in __init__ self.handle() File C:\Python30\lib\wsgiref\simple_server.py, line 136, in handle self.rfile, self.wfile, self.get_stderr(), self.get_environ() File C:\Python30\lib\wsgiref\simple_server.py, line 115, in get_environ k,v = h.split(':',1) ValueError: need more than 1 value to unpack -- components: Library (Lib) files: test_server.py messages: 81366 nosy: StephenDay severity: normal status: open title: wsgiref.simple_server not working type: crash versions: Python 3.0 Added file: http://bugs.python.org/file12976/test_server.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5183] wsgiref.simple_server not working
Stephen Day stephen.h@gmail.com added the comment: This seems to be fixed already (see Issue4718). Next time I'll search more... ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5183 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com