Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Peter Otten
Guido van Rossum wrote:

 I wonder if it wouldn't make sense to change urlencode() to generate
 URLs that don't depend on the hash order, for all versions of Python
 that support PYTHONHASHSEED? It seems a one-line fix:
 
 query = query.items()
 
 with this:
 
 query = sorted(query.items())
 
 This would not prevent breakage of unit tests, but it would make a
 much simpler fix possible: simply sort the parameters in the URL.
 
 Thoughts?

There may be people who mix bytes and str or pass other non-str keys:

 query = {ba:bb, c:d, 5:6}
 urlencode(query)
'a=bc=d5=6'
 sorted(query.items())
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: unorderable types: str()  bytes()

Not pretty, but a bugfix should not break such constructs.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Antoine Pitrou
On Sat, 18 Aug 2012 14:23:13 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
 Joao S. O. Bueno writes:
 
   I don't think this behavior is only desirable to unit tests: having
   URL's been formed in predictable way  a good thing in any way one
   thinks about it.
 
 Especially if you're a hacker.  One more thing you may be able to use
 against careless sites that don't expect the unexpected to occur in
 URLs.

That's unsubstantiated. Give an example of how sorted URLs compromise
security.

Regards

Antoine.


-- 
Software development and contracting: http://pro.pitrou.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Joao S. O. Bueno
On 18 August 2012 02:23, Stephen J. Turnbull step...@xemacs.org wrote:
 Joao S. O. Bueno writes:

   I don't think this behavior is only desirable to unit tests: having
   URL's been formed in predictable way  a good thing in any way one
   thinks about it.

 Especially if you're a hacker.  One more thing you may be able to use
 against careless sites that don't expect the unexpected to occur in
 URLs.

 I'm not saying this is a bad thing, but we should remember that the
 whole point of PYTHONHASHSEED is that regularities can be exploited
 for devious and malicious purposes, and reducing regularity makes many
 attacks more difficult.  *Any* way one thinks about it is far too
 strong a claim.

Ageeded that any way one thinks about it is far too strong a claim -
but I still hold to the point. Maybe most ways one thinks about it
:-)  .



 Steve




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Christian Heimes
Am 17.08.2012 21:27, schrieb Guido van Rossum:
 I wonder if it wouldn't make sense to change urlencode() to generate
 URLs that don't depend on the hash order, for all versions of Python
 that support PYTHONHASHSEED? It seems a one-line fix:
 
 query = query.items()
 
 with this:
 
 query = sorted(query.items())
 
 This would not prevent breakage of unit tests, but it would make a
 much simpler fix possible: simply sort the parameters in the URL.

I vote -0. The issue can also be addressed with a small and simple
helper function that wraps urlparse and compares the query parameter. Or
you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
application.

The order of query string parameter is actually important for some
applications, for example Zope, colander+deform and other form
frameworks use the parameter order to group parameters.

Therefore I propose that the query string is only sorted when the query
is exactly a dict and not some subclass or class that has an items() method.

if type(query) is dict:
query = sorted(query.items())
else:
query = query.items()

Christian

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Guido van Rossum
On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes li...@cheimes.de wrote:
 Am 17.08.2012 21:27, schrieb Guido van Rossum:
 I wonder if it wouldn't make sense to change urlencode() to generate
 URLs that don't depend on the hash order, for all versions of Python
 that support PYTHONHASHSEED? It seems a one-line fix:

 query = query.items()

 with this:

 query = sorted(query.items())

 This would not prevent breakage of unit tests, but it would make a
 much simpler fix possible: simply sort the parameters in the URL.

 I vote -0. The issue can also be addressed with a small and simple
 helper function that wraps urlparse and compares the query parameter. Or
 you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
 application.

Hm. That's actually a good point.

 The order of query string parameter is actually important for some
 applications, for example Zope, colander+deform and other form
 frameworks use the parameter order to group parameters.

 Therefore I propose that the query string is only sorted when the query
 is exactly a dict and not some subclass or class that has an items() method.

 if type(query) is dict:
 query = sorted(query.items())
 else:
 query = query.items()

That's already in the bug I filed. :-) I also added that the sort may
fail if the keys mix e.g. bytes and str (or int and str, for that
matter).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread MRAB

On 18/08/2012 18:34, Guido van Rossum wrote:

On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes li...@cheimes.de wrote:

Am 17.08.2012 21:27, schrieb Guido van Rossum:

I wonder if it wouldn't make sense to change urlencode() to generate
URLs that don't depend on the hash order, for all versions of Python
that support PYTHONHASHSEED? It seems a one-line fix:

query = query.items()

with this:

query = sorted(query.items())

This would not prevent breakage of unit tests, but it would make a
much simpler fix possible: simply sort the parameters in the URL.


I vote -0. The issue can also be addressed with a small and simple
helper function that wraps urlparse and compares the query parameter. Or
you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
application.


Hm. That's actually a good point.


The order of query string parameter is actually important for some
applications, for example Zope, colander+deform and other form
frameworks use the parameter order to group parameters.

Therefore I propose that the query string is only sorted when the query
is exactly a dict and not some subclass or class that has an items() method.

if type(query) is dict:
query = sorted(query.items())
else:
query = query.items()


That's already in the bug I filed. :-) I also added that the sort may
fail if the keys mix e.g. bytes and str (or int and str, for that
matter).


One possible way around that is to add the class names, perhaps only if
sorting raises an exception:

def make_key(pair):
return type(pair[0]).__name__, type(pair[1]).__name__, pair

if type(query) is dict:
try:
query = sorted(query.items())
except TypeError:
query = sorted(query.items(), key=make_key)
else:
query = query.items()

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Guido van Rossum
On Saturday, August 18, 2012, MRAB wrote:

 On 18/08/2012 18:34, Guido van Rossum wrote:

 On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes li...@cheimes.de
 wrote:

 Am 17.08.2012 21:27, schrieb Guido van Rossum:

 I wonder if it wouldn't make sense to change urlencode() to generate
 URLs that don't depend on the hash order, for all versions of Python
 that support PYTHONHASHSEED? It seems a one-line fix:

 query = query.items()

 with this:

 query = sorted(query.items())

 This would not prevent breakage of unit tests, but it would make a
 much simpler fix possible: simply sort the parameters in the URL.


 I vote -0. The issue can also be addressed with a small and simple
 helper function that wraps urlparse and compares the query parameter. Or
 you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
 application.


 Hm. That's actually a good point.

  The order of query string parameter is actually important for some
 applications, for example Zope, colander+deform and other form
 frameworks use the parameter order to group parameters.

 Therefore I propose that the query string is only sorted when the query
 is exactly a dict and not some subclass or class that has an items()
 method.

 if type(query) is dict:
 query = sorted(query.items())
 else:
 query = query.items()


 That's already in the bug I filed. :-) I also added that the sort may
 fail if the keys mix e.g. bytes and str (or int and str, for that
 matter).

  One possible way around that is to add the class names, perhaps only if
 sorting raises an exception:

 def make_key(pair):
 return type(pair[0]).__name__, type(pair[1]).__name__, pair

 if type(query) is dict:
 try:
 query = sorted(query.items())
 except TypeError:
 query = sorted(query.items(), key=make_key)
 else:
 query = query.items()


Doesn't strike me as necessary.


 __**_
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
 guido%40python.orghttp://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
Sent from Gmail Mobile
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?

2012-08-18 Thread Glenn Linderman

On 8/18/2012 11:47 AM, MRAB wrote:

I vote -0. The issue can also be addressed with a small and simple
helper function that wraps urlparse and compares the query parameter. Or
you cann urlencode() with `sorted(qs.items)` instead of `qs` in the
application.


Hm. That's actually a good point. 


Seems adequate to me. Most programs wouldn't care about the order, 
because most web frameworks grab whatever is there in whatever order, 
and present it to the web app in their own order.


Programs that care, or which talk to web apps that care, are unlikely to 
want the order from a non-randomized dict, and so have already taken 
care of ordering issues, so undoing the randomization seems like a 
solution in search of a problem (other than for poorly written test cases).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 3.3 str timings

2012-08-18 Thread Terry Reedy
The issue came up in python-list about string operations being slower in 
3.3. (The categorical claim is false as some things are actually 
faster.) Some things I understand, this one I do not.


Win7-64, 3.3.0b2 versus 3.2.3
print(timeit(c in a, c  = '…'; a = 'a'*1000+c)) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3

Why is searching for a two-byte char in a two-bytes per char string so 
much faster in 3.2? Is this worth a tracker issue (I searched and could 
not find one) or is there a known and un-fixable cause?


print(timeit(a.encode(), a = 'a'*1000))
# 1.5 in 3.2, .26 in 3.3

print(timeit(a.encode(encoding='utf-8'), a = 'a'*1000))
# 1.7 in 3.2, .51 in 3.3

This is one of the 3.3 improvements. But since the results are equal:
('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
and 3.3 should know that for an all-ascii string, I do not see why 
adding the parameter should double the the time. Another issue or known 
and un-fixable?


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread Antoine Pitrou
On Sat, 18 Aug 2012 17:17:14 -0400
Terry Reedy tjre...@udel.edu wrote:
 The issue came up in python-list about string operations being slower in 
 3.3. (The categorical claim is false as some things are actually 
 faster.) Some things I understand, this one I do not.
 
 Win7-64, 3.3.0b2 versus 3.2.3
 print(timeit(c in a, c  = '…'; a = 'a'*1000+c)) # ord(c) = 8230
 # .6 in 3.2, 1.2 in 3.3

I get opposite numbers:

$ python3.2 -m timeit -s c = '…'; a = 'a'*1000+c c in a
100 loops, best of 3: 0.599 usec per loop
$ python3.3 -m timeit -s c = '…'; a = 'a'*1000+c c in a
1000 loops, best of 3: 0.119 usec per loop

However, in both cases the operation is blindingly fast (less than
1µs), which should make it pretty much a non-issue.

 Why is searching for a two-byte char in a two-bytes per char string so 
 much faster in 3.2? Is this worth a tracker issue (I searched and could 
 not find one) or is there a known and un-fixable cause?

I don't think it's worth a tracker issue. First, because as said above
it's practically a non-issue. Second, given the nature and depth of
changes brought by the switch to the PEP 393 implementation, an
individual micro-benchmark like this is not very useful; you'd need to
make a more extensive analysis of string performance (as a hint, we
have the stringbench benchmark in the Tools directory).

 This is one of the 3.3 improvements. But since the results are equal:
 ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
 and 3.3 should know that for an all-ascii string, I do not see why 
 adding the parameter should double the the time. Another issue or known 
 and un-fixable?

When observing performance differences, you should ask yourself whether
they matter at all or not.

Regards

Antoine.



-- 
Software development and contracting: http://pro.pitrou.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread martin


Zitat von Terry Reedy tjre...@udel.edu:

Is this worth a tracker issue (I searched and could not find one) or  
is there a known and un-fixable cause?


There is a third option: it's not known, but it's also unimportant.
I'd say posting it to python-dev is enough: either there is somebody
with sufficient time and interest to research it and provide you
with an explanation (or a fix). If nobody picks it up right away,
it's IMO fine to wait for somebody to report it who has a real
problem with this change in runtime.

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread R. David Murray
On Sat, 18 Aug 2012 17:17:14 -0400, Terry Reedy tjre...@udel.edu wrote:
 print(timeit(a.encode(), a = 'a'*1000))
 # 1.5 in 3.2, .26 in 3.3
 
 print(timeit(a.encode(encoding='utf-8'), a = 'a'*1000))
 # 1.7 in 3.2, .51 in 3.3
 
 This is one of the 3.3 improvements. But since the results are equal:
 ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
 and 3.3 should know that for an all-ascii string, I do not see why 
 adding the parameter should double the the time. Another issue or known 
 and un-fixable?

At one point there was an issue with certain spellings taking a fast path
(avoiding a codec lookup?) and other spellings not.  I thought we'd fixed
that, but perhaps we didn't?

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.3 str timings

2012-08-18 Thread Terry Reedy

On 8/18/2012 5:27 PM, Antoine Pitrou wrote:

On Sat, 18 Aug 2012 17:17:14 -0400
Terry Reedy tjre...@udel.edu wrote:

The issue came up in python-list about string operations being slower in
3.3. (The categorical claim is false as some things are actually
faster.) Some things I understand, this one I do not.

Win7-64, 3.3.0b2 versus 3.2.3
print(timeit(c in a, c  = '…'; a = 'a'*1000+c)) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3


I get opposite numbers:


Just curious, what system?


$ python3.2 -m timeit -s c = '…'; a = 'a'*1000+c c in a
100 loops, best of 3: 0.599 usec per loop
$ python3.3 -m timeit -s c = '…'; a = 'a'*1000+c c in a
1000 loops, best of 3: 0.119 usec per loop

However, in both cases the operation is blindingly fast (less than
1µs), which should make it pretty much a non-issue.


The current default 'number' of 100 is higher that I remember. Good 
to know.



Why is searching for a two-byte char in a two-bytes per char string so
much faster in 3.2? Is this worth a tracker issue (I searched and could
not find one) or is there a known and un-fixable cause?


I don't think it's worth a tracker issue. First, because as said above
it's practically a non-issue. Second, given the nature and depth of
changes brought by the switch to the PEP 393 implementation, an
individual micro-benchmark like this is not very useful; you'd need to
make a more extensive analysis of string performance (as a hint, we
have the stringbench benchmark in the Tools directory).


It is not in my 3.3.0b2 windows install, but I have heard of it. Another 
good reminder. My main interest was in refuting '3.3 strings ops are 
always slower'. Both points above are also good 'ammo'. I am sure this 
discussion will re-occur after the release.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com