Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
Guido van Rossum wrote: I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix: query = query.items() with this: query = sorted(query.items()) This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL. Thoughts? There may be people who mix bytes and str or pass other non-str keys: query = {ba:bb, c:d, 5:6} urlencode(query) 'a=bc=d5=6' sorted(query.items()) Traceback (most recent call last): File stdin, line 1, in module TypeError: unorderable types: str() bytes() Not pretty, but a bugfix should not break such constructs. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
On Sat, 18 Aug 2012 14:23:13 +0900 Stephen J. Turnbull step...@xemacs.org wrote: Joao S. O. Bueno writes: I don't think this behavior is only desirable to unit tests: having URL's been formed in predictable way a good thing in any way one thinks about it. Especially if you're a hacker. One more thing you may be able to use against careless sites that don't expect the unexpected to occur in URLs. That's unsubstantiated. Give an example of how sorted URLs compromise security. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
On 18 August 2012 02:23, Stephen J. Turnbull step...@xemacs.org wrote: Joao S. O. Bueno writes: I don't think this behavior is only desirable to unit tests: having URL's been formed in predictable way a good thing in any way one thinks about it. Especially if you're a hacker. One more thing you may be able to use against careless sites that don't expect the unexpected to occur in URLs. I'm not saying this is a bad thing, but we should remember that the whole point of PYTHONHASHSEED is that regularities can be exploited for devious and malicious purposes, and reducing regularity makes many attacks more difficult. *Any* way one thinks about it is far too strong a claim. Ageeded that any way one thinks about it is far too strong a claim - but I still hold to the point. Maybe most ways one thinks about it :-) . Steve ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
Am 17.08.2012 21:27, schrieb Guido van Rossum: I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix: query = query.items() with this: query = sorted(query.items()) This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL. I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application. The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters. Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method. if type(query) is dict: query = sorted(query.items()) else: query = query.items() Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes li...@cheimes.de wrote: Am 17.08.2012 21:27, schrieb Guido van Rossum: I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix: query = query.items() with this: query = sorted(query.items()) This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL. I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application. Hm. That's actually a good point. The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters. Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method. if type(query) is dict: query = sorted(query.items()) else: query = query.items() That's already in the bug I filed. :-) I also added that the sort may fail if the keys mix e.g. bytes and str (or int and str, for that matter). -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
On 18/08/2012 18:34, Guido van Rossum wrote: On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes li...@cheimes.de wrote: Am 17.08.2012 21:27, schrieb Guido van Rossum: I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix: query = query.items() with this: query = sorted(query.items()) This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL. I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application. Hm. That's actually a good point. The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters. Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method. if type(query) is dict: query = sorted(query.items()) else: query = query.items() That's already in the bug I filed. :-) I also added that the sort may fail if the keys mix e.g. bytes and str (or int and str, for that matter). One possible way around that is to add the class names, perhaps only if sorting raises an exception: def make_key(pair): return type(pair[0]).__name__, type(pair[1]).__name__, pair if type(query) is dict: try: query = sorted(query.items()) except TypeError: query = sorted(query.items(), key=make_key) else: query = query.items() ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
On Saturday, August 18, 2012, MRAB wrote: On 18/08/2012 18:34, Guido van Rossum wrote: On Sat, Aug 18, 2012 at 6:28 AM, Christian Heimes li...@cheimes.de wrote: Am 17.08.2012 21:27, schrieb Guido van Rossum: I wonder if it wouldn't make sense to change urlencode() to generate URLs that don't depend on the hash order, for all versions of Python that support PYTHONHASHSEED? It seems a one-line fix: query = query.items() with this: query = sorted(query.items()) This would not prevent breakage of unit tests, but it would make a much simpler fix possible: simply sort the parameters in the URL. I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application. Hm. That's actually a good point. The order of query string parameter is actually important for some applications, for example Zope, colander+deform and other form frameworks use the parameter order to group parameters. Therefore I propose that the query string is only sorted when the query is exactly a dict and not some subclass or class that has an items() method. if type(query) is dict: query = sorted(query.items()) else: query = query.items() That's already in the bug I filed. :-) I also added that the sort may fail if the keys mix e.g. bytes and str (or int and str, for that matter). One possible way around that is to add the class names, perhaps only if sorting raises an exception: def make_key(pair): return type(pair[0]).__name__, type(pair[1]).__name__, pair if type(query) is dict: try: query = sorted(query.items()) except TypeError: query = sorted(query.items(), key=make_key) else: query = query.items() Doesn't strike me as necessary. __**_ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.orghttp://mail.python.org/mailman/options/python-dev/guido%40python.org -- Sent from Gmail Mobile ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should urlencode() sort the query parameters (if they come from a dict)?
On 8/18/2012 11:47 AM, MRAB wrote: I vote -0. The issue can also be addressed with a small and simple helper function that wraps urlparse and compares the query parameter. Or you cann urlencode() with `sorted(qs.items)` instead of `qs` in the application. Hm. That's actually a good point. Seems adequate to me. Most programs wouldn't care about the order, because most web frameworks grab whatever is there in whatever order, and present it to the web app in their own order. Programs that care, or which talk to web apps that care, are unlikely to want the order from a non-randomized dict, and so have already taken care of ordering issues, so undoing the randomization seems like a solution in search of a problem (other than for poorly written test cases). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] 3.3 str timings
The issue came up in python-list about string operations being slower in 3.3. (The categorical claim is false as some things are actually faster.) Some things I understand, this one I do not. Win7-64, 3.3.0b2 versus 3.2.3 print(timeit(c in a, c = '…'; a = 'a'*1000+c)) # ord(c) = 8230 # .6 in 3.2, 1.2 in 3.3 Why is searching for a two-byte char in a two-bytes per char string so much faster in 3.2? Is this worth a tracker issue (I searched and could not find one) or is there a known and un-fixable cause? print(timeit(a.encode(), a = 'a'*1000)) # 1.5 in 3.2, .26 in 3.3 print(timeit(a.encode(encoding='utf-8'), a = 'a'*1000)) # 1.7 in 3.2, .51 in 3.3 This is one of the 3.3 improvements. But since the results are equal: ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8') and 3.3 should know that for an all-ascii string, I do not see why adding the parameter should double the the time. Another issue or known and un-fixable? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.3 str timings
On Sat, 18 Aug 2012 17:17:14 -0400 Terry Reedy tjre...@udel.edu wrote: The issue came up in python-list about string operations being slower in 3.3. (The categorical claim is false as some things are actually faster.) Some things I understand, this one I do not. Win7-64, 3.3.0b2 versus 3.2.3 print(timeit(c in a, c = '…'; a = 'a'*1000+c)) # ord(c) = 8230 # .6 in 3.2, 1.2 in 3.3 I get opposite numbers: $ python3.2 -m timeit -s c = '…'; a = 'a'*1000+c c in a 100 loops, best of 3: 0.599 usec per loop $ python3.3 -m timeit -s c = '…'; a = 'a'*1000+c c in a 1000 loops, best of 3: 0.119 usec per loop However, in both cases the operation is blindingly fast (less than 1µs), which should make it pretty much a non-issue. Why is searching for a two-byte char in a two-bytes per char string so much faster in 3.2? Is this worth a tracker issue (I searched and could not find one) or is there a known and un-fixable cause? I don't think it's worth a tracker issue. First, because as said above it's practically a non-issue. Second, given the nature and depth of changes brought by the switch to the PEP 393 implementation, an individual micro-benchmark like this is not very useful; you'd need to make a more extensive analysis of string performance (as a hint, we have the stringbench benchmark in the Tools directory). This is one of the 3.3 improvements. But since the results are equal: ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8') and 3.3 should know that for an all-ascii string, I do not see why adding the parameter should double the the time. Another issue or known and un-fixable? When observing performance differences, you should ask yourself whether they matter at all or not. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.3 str timings
Zitat von Terry Reedy tjre...@udel.edu: Is this worth a tracker issue (I searched and could not find one) or is there a known and un-fixable cause? There is a third option: it's not known, but it's also unimportant. I'd say posting it to python-dev is enough: either there is somebody with sufficient time and interest to research it and provide you with an explanation (or a fix). If nobody picks it up right away, it's IMO fine to wait for somebody to report it who has a real problem with this change in runtime. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.3 str timings
On Sat, 18 Aug 2012 17:17:14 -0400, Terry Reedy tjre...@udel.edu wrote: print(timeit(a.encode(), a = 'a'*1000)) # 1.5 in 3.2, .26 in 3.3 print(timeit(a.encode(encoding='utf-8'), a = 'a'*1000)) # 1.7 in 3.2, .51 in 3.3 This is one of the 3.3 improvements. But since the results are equal: ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8') and 3.3 should know that for an all-ascii string, I do not see why adding the parameter should double the the time. Another issue or known and un-fixable? At one point there was an issue with certain spellings taking a fast path (avoiding a codec lookup?) and other spellings not. I thought we'd fixed that, but perhaps we didn't? --David ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.3 str timings
On 8/18/2012 5:27 PM, Antoine Pitrou wrote: On Sat, 18 Aug 2012 17:17:14 -0400 Terry Reedy tjre...@udel.edu wrote: The issue came up in python-list about string operations being slower in 3.3. (The categorical claim is false as some things are actually faster.) Some things I understand, this one I do not. Win7-64, 3.3.0b2 versus 3.2.3 print(timeit(c in a, c = '…'; a = 'a'*1000+c)) # ord(c) = 8230 # .6 in 3.2, 1.2 in 3.3 I get opposite numbers: Just curious, what system? $ python3.2 -m timeit -s c = '…'; a = 'a'*1000+c c in a 100 loops, best of 3: 0.599 usec per loop $ python3.3 -m timeit -s c = '…'; a = 'a'*1000+c c in a 1000 loops, best of 3: 0.119 usec per loop However, in both cases the operation is blindingly fast (less than 1µs), which should make it pretty much a non-issue. The current default 'number' of 100 is higher that I remember. Good to know. Why is searching for a two-byte char in a two-bytes per char string so much faster in 3.2? Is this worth a tracker issue (I searched and could not find one) or is there a known and un-fixable cause? I don't think it's worth a tracker issue. First, because as said above it's practically a non-issue. Second, given the nature and depth of changes brought by the switch to the PEP 393 implementation, an individual micro-benchmark like this is not very useful; you'd need to make a more extensive analysis of string performance (as a hint, we have the stringbench benchmark in the Tools directory). It is not in my 3.3.0b2 windows install, but I have heard of it. Another good reminder. My main interest was in refuting '3.3 strings ops are always slower'. Both points above are also good 'ammo'. I am sure this discussion will re-occur after the release. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com