[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus

2012-02-23 Thread Stephen Day

Stephen Day stevv...@gmail.com added the comment:

While it's likely that adding a `quote`/`quote_plus` function paramater to 
urlencode is the right solution, I want to ensure that the key point is 
communicated clearly: encoding a space as a '+' is pathological, in that in the 
common case, an unescaped encoded character is indistinguishable from a literal 
'+'. Take the case of the literal string '+ '. If one uses the javascript 
encodeURI function to encode the string in a browser console, one gets the 
following:

 encodeURI('+ ')
+%20

Now, we have a string that will not decode symmetrically. In other words, we 
cannot tell if this string should decode to '  ' or '+ '. And while use of 
encodeURI is discouraged, application developers still use it places, 
introducing these kinds of errors.

Conversely, we can see that the behavior of encodeURIComponent, is unambiguous:

encodeURIComponent('+ ')
%2B%20

And while these are analogues to quote and quote_plus (there exists now 
analogue to javascripts urlencode), it's easy to see that disambiguating the 
encoding of the resulting output of urlencode would be desirable.

There is a similar situation with php library functions. 

Furthermore, it is agreed that urlencode does follow the rules, but the rules, 
as they are, introduce an asymmetrical, pathological encoding. Most services 
accept '%20' as space in lieu of '+' when data is encoded as 
'application/x-www-form-urlencoded' anyway.

Concluding, I know it seems a little silly to spend time filing this bug and 
provide relevant cases, but I'd like to cite professional experience in this 
matter; I have seen pluses-for-spaces introduce errors time and time again.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13866
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus

2012-02-13 Thread Stephen Day

Stephen Day stevv...@gmail.com added the comment:

I apologize for reopening this bug, but I find your interpretation to be 
inaccurate. While technically valid, the combination of the documentation, the 
function name and the main use cases yields pathological invocations of 
urlencode. My bug report is to help mitigate these problems.

The main use case for url encoding of mapping types is not for posting form 
data; the main use case is appending url parameters to a url:

 from urllib import urlencode
 from urlparse import urlunparse
 urlunparse(('http', 'example.com', '/', None, urlencode({'a': 'some 
 string'}), None))
'http://example.com/?a=some+string'

Any sane person would naturally gravitate to a function called urlencode to 
url encode a mapping type. If the urllib.urlencode function is indeed intended 
for form-encoding, as I agree is hinted in the documentation, it should 
indicate that its result is 'application/x-www-form-urlencoded' or it should be 
called formencode.

The quote or quote_plus is not at all what I am looking for; I am quite 
familiar with these library functions. These functions are for encoding 
component strings; they don't meet the use case described at all:

 quote({'a': 1})
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py,
 line 1248, in quote
if not s.rstrip(safe):
AttributeError: 'dict' object has no attribute 'rstrip'

In addition, Java's URLEncoder implementation is hardly a good example of 
standards compliant URL manipulation. Python is not Java. The Python community 
needs to make its own, independent, mature language decisions. In general, the 
use of '+' to encode spaces in content, even if it is compliant against an 
arbitrary standard, is pathological, especially when used in urls. Even though 
python's quote_plus function works symmetrically on its own, when pluses are 
used in a multi-language environment it can become impossible to tell whether a 
plus is a literal '+' or an encoded space. In addition, the usage of '%20' for 
spaces will work in almost all cases.

RFC3986, Section 2 [1] describes the use of percent-encoding as a solution to 
representing reserved characters. In practice, percent-encoding is used on the 
value component of 'key=value' productions and this works in nearly all cases. 
The referenced standard [2], while relevant to the implied use case, is not 
applicable to url assembly.

Given your interpretation, it seems that there is no function in the python 
standard library to meet the use case of correctly assembling url parameter 
values, leaving application developers to come up with something like this:

 ''.join(['='.join((quote(k), quote(v))) for k,v in {'a': '1', 'b': 'with 
 spaces'}.iteritems()])
'a=1b=with%20spaces'

In most cases, people will just use urlencode, which uses pluses for spaces, 
yielding pathological, noncompliant urls.

In deference to this bug closure, there are a few options:

1. Close this issue and keep polluting the world's urls with pluses for spaces.

2. Make urlencode target path/query parameter encoding and then create a new 
function, formencode, for use in encoding form data, breaking backwards 
compatibility.

3. Simply add a keyword argument to urlencode to allow the caller to specify 
the encoding function and separator, retaining compatibility and satisfying all 
of the above use cases.

Naturally, 3 seems to be a very reasonable solution to this bug.

[1] http://tools.ietf.org/html/rfc3986#section-2 explicitly covers 
[2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

--
resolution: invalid - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13866
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus

2012-01-25 Thread Stephen Day

New submission from Stephen Day stevv...@gmail.com:

The current behavior of the urlencode function (2.7: urllib, 3.x: urllib.parse) 
encodes spaces as pluses:

 from urllib import urlencode
 urlencode({'a': 'some param'})
'a=some+param'

However, in most instances, it would be desirable to merely encode spaces using 
percent encoding:

 urlencode({'a': 'some param'})
'a=some%20param'

But there is no way to get this behavior in the standard library. 

It would probably best to change this so it defaults to use the regular quote 
function, but allows callers who need the legacy quote_plus behavior to pass 
that in as a function parameter.

An acceptable fix would be to have the quote function taken as a keyword 
parameter, so legacy behavior remains:

 urlencode({'a': 'some param'})
'a=some+param'

Then the behavior could be adjusted where needed:

 from urllib import quote
 urlencode({'a': 'some param'}, quote=quote)
'a=some%20param'

--
components: Library (Lib)
messages: 151980
nosy: Stephen.Day
priority: normal
severity: normal
status: open
title: {urllib,urllib.parse}.urlencode should not use quote_plus
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13866
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5183] wsgiref.simple_server not working

2009-02-07 Thread Stephen Day

New submission from Stephen Day stephen.h@gmail.com:

The attached application doesn't work. I think the value of self.headers  
(see line 114) has a blank line at the end that it did not in Python 2.5

Here is the error message that occurs when it gets a request 
(http://127.0.0.1:8080/):

Exception happened during processing of request from ('127.0.0.1', 
60549)
Traceback (most recent call last):
  File C:\Python30\lib\socketserver.py, line 281, in 
_handle_request_noblock
self.process_request(request, client_address)
  File C:\Python30\lib\socketserver.py, line 307, in process_request
self.finish_request(request, client_address)
  File C:\Python30\lib\socketserver.py, line 320, in finish_request
self.RequestHandlerClass(request, client_address, self)
  File C:\Python30\lib\socketserver.py, line 614, in __init__
self.handle()
  File C:\Python30\lib\wsgiref\simple_server.py, line 136, in handle
self.rfile, self.wfile, self.get_stderr(), self.get_environ()
  File C:\Python30\lib\wsgiref\simple_server.py, line 115, in 
get_environ
k,v = h.split(':',1)
ValueError: need more than 1 value to unpack

--
components: Library (Lib)
files: test_server.py
messages: 81366
nosy: StephenDay
severity: normal
status: open
title: wsgiref.simple_server not working
type: crash
versions: Python 3.0
Added file: http://bugs.python.org/file12976/test_server.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5183] wsgiref.simple_server not working

2009-02-07 Thread Stephen Day

Stephen Day stephen.h@gmail.com added the comment:

This seems to be fixed already (see Issue4718). Next time I'll search 
more...

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5183
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com