Re: [Python-Dev] Status of the fix for the hash collision vulnerability
Am 13.01.2012 02:24, schrieb Victor Stinner: My patch doesn't fix the DoS, it just make the attack more complex. The attacker cannot pregenerate data for an attack: (s)he has first to compute the hash secret, and then compute hash collisions using the secret. The hash secret is a least 64 bits long (128 bits on a 64 bit system). So I hope that computing collisions requires a lot of CPU time (is slow) to make the attack ineffective with today computers. Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data. I'm sure it's possible to make this less than a second. In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^ suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is possible. So the question is: How difficult is it to guess the seed? Frank ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data. Are you running the hash function backward to generate strings with the same value, or you are more trying something like brute forcing? And how do you get the hash secret? You need it to run an attack. In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^ suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is possible. My change adds also a prefix (a prefix and a suffix). I don't know if it changes anything for generating collisions. So the question is: How difficult is it to guess the seed? I wrote some remarks about that in the issue. For example: (hash(\0)^1) ^ (hash(\0\0)^2) gives ((prefix * 103) HASH_MASK) ^ ((prefix * 103**2) HASH_MASK) I suppose that you don't have directly the full output of hash(str) in practical, but hash(str) DICT_MASK where DICT_MASK depends is the size of the internal dict array minus 1. For example, for a dictionary of 65,536 items, the mask is 0x1 and so cannot gives you more than 17 bits of hash(str) output. I still don't know how difficult it is to retreive hash(str) bits from repr(dict). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 02:24, Victor Stinner victor.stin...@haypocalc.com wrote: - Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash of a key. This is my preferred solution. The vulnerability is basically only in the dictionary you keep the form data you get from a request. This solves it easily and nicely. It can also be a separate module installable for Python 2, which many web frameworks still use, so it can be practical implementable now, and not in a couple of years. Then again, nothing prevents us from having both this, *and* one of the other solutions. :-) //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 380 (yield from) is now Final
I marked PEP 380 as Final this evening, after pushing the tested and documented implementation to hg.python.org: http://hg.python.org/cpython/rev/d64ac9ab4cd0 As the list of names in the NEWS and What's New entries suggests, it was quite a collaborative effort to get this one over the line, and that's without even listing all the people that offered helpful suggestions and comments along the way :) print(\n.join(list((lambda:(yield from (Cheers,, Nick)))( -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data. Are you running the hash function backward to generate strings with the same value, or you are more trying something like brute forcing? If you try it brute force to hit a specific target, you'll only find only one good string every 4 billion tries. That's why you first blow up your target: You start backward from an arbitrary target-value. You brute force for 3 characters, for example, this will give you 16 million intermediate values from which you know that they'll end up in your target-value. Those 16 million values are a huge target for now brute-forcing forward: Every 256 tries you'll hit one of these values. And how do you get the hash secret? You need it to run an attack. I don't know. This was meant as an answer to the quoted text So I hope that computing collisions requires a lot of CPU time (is slow) to make the attack ineffective with today computers.. What I wanted to say is: The security relies on the fact that the attacker can't guess the prefix, not that he can't precompute the values and it takes hours or days to compute the collisions. If the prefix leaks out of the application, then the rest is trivial and done in a few seconds. The suffix is not important for the collision-prevention, but it will probably make it much harder to guess the prefix. I don't know an effective way to get the prefix either, (if the application doesn't leak full hash(X) values). Frank ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 380 (yield from) is now Final
Great work Nick, I've been looking forward to this one. Thanks all for putting the effort in. On Fri, Jan 13, 2012 at 11:14 PM, Nick Coghlan ncogh...@gmail.com wrote: I marked PEP 380 as Final this evening, after pushing the tested and documented implementation to hg.python.org: http://hg.python.org/cpython/rev/d64ac9ab4cd0 As the list of names in the NEWS and What's New entries suggests, it was quite a collaborative effort to get this one over the line, and that's without even listing all the people that offered helpful suggestions and comments along the way :) print(\n.join(list((lambda:(yield from (Cheers,, Nick)))( -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On 2012-01-13 11:20, Lennart Regebro wrote: The vulnerability is basically only in the dictionary you keep the form data you get from a request. I'd have to disagree with this statement. The vulnerability is anywhere that creates a dictionary (or set) from attacker-provided keys. That would include HTTP headers, RFC822-family subheaders and parameters, the environ, input taken from JSON or XML, and so on - and indeed hash collision attacks are not at all web-specific. The problem with having two dict implementations is that a caller would have to tell libraries that use dictionaries which implementation to use. So for example an argument would have to be passed to json.load[s] to specify whether the input was known-sane or potentially hostile. Any library could ever use dictionaries to process untrusted input *or any library that used another library that did* would have to pass such a flag through, which would quickly get very unwieldy indeed... or else they'd have to just always use safedict, in which case we're in pretty much the same position as we are with changing dict anyway. -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/ gtalk:chat?jid=bobi...@gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682)
Caution, long review ahead. On 01/13/2012 12:43 PM, nick.coghlan wrote: http://hg.python.org/cpython/rev/d64ac9ab4cd0 changeset: 74356:d64ac9ab4cd0 user:Nick Coghlan ncogh...@gmail.com date:Fri Jan 13 21:43:40 2012 +1000 summary: Implement PEP 380 - 'yield from' (closes #11682) diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -318,7 +318,7 @@ There should probably be a versionadded somewhere on this page. .. productionlist:: yield_atom: ( `yield_expression` ) - yield_expression: yield [`expression_list`] + yield_expression: yield [`expression_list` | from `expression`] The :keyword:`yield` expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a @@ -336,7 +336,10 @@ the generator's methods, the function can proceed exactly as if the :keyword:`yield` expression was just another external call. The value of the :keyword:`yield` expression after resuming depends on the method which resumed -the execution. +the execution. If :meth:`__next__` is used (typically via either a +:keyword:`for` or the :func:`next` builtin) then the result is :const:`None`, +otherwise, if :meth:`send` is used, then the result will be the value passed +in to that method. .. index:: single: coroutine @@ -346,12 +349,29 @@ where should the execution continue after it yields; the control is always transferred to the generator's caller. -The :keyword:`yield` statement is allowed in the :keyword:`try` clause of a +:keyword:`yield` expressions are allowed in the :keyword:`try` clause of a :keyword:`try` ... :keyword:`finally` construct. If the generator is not resumed before it is finalized (by reaching a zero reference count or by being garbage collected), the generator-iterator's :meth:`close` method will be called, allowing any pending :keyword:`finally` clauses to execute. +When ``yield from expression`` is used, it treats the supplied expression as +a subiterator. All values produced by that subiterator are passed directly +to the caller of the current generator's methods. Any values passed in with +:meth:`send` and any exceptions passed in with :meth:`throw` are passed to +the underlying iterator if it has the appropriate methods. If this is not the +case, then :meth:`send` will raise :exc:`AttributeError` or :exc:`TypeError`, +while :meth:`throw` will just raise the passed in exception immediately. + +When the underlying iterator is complete, the :attr:`~StopIteration.value` +attribute of the raised :exc:`StopIteration` instance becomes the value of +the yield expression. It can be either set explicitly when raising +:exc:`StopIteration`, or automatically when the sub-iterator is a generator +(by returning a value from the sub-generator). + +The parentheses can be omitted when the :keyword:`yield` expression is the +sole expression on the right hand side of an assignment statement. + .. index:: object: generator The following generator's methods can be used to control the execution of a @@ -444,6 +464,10 @@ The proposal to enhance the API and syntax of generators, making them usable as simple coroutines. + :pep:`0380` - Syntax for Delegating to a Subgenerator + The proposal to introduce the :token:`yield_from` syntax, making delegation + to sub-generators easy. + .. _primaries: PEP 3155: Qualified name for classes and functions == @@ -208,7 +224,6 @@ how they might be accessible from the global scope. Example with (non-bound) methods:: - class C: ... def meth(self): ... pass This looks like a spurious (and syntax-breaking) change. diff --git a/Grammar/Grammar b/Grammar/Grammar --- a/Grammar/Grammar +++ b/Grammar/Grammar @@ -121,7 +121,7 @@ |'**' test) # The reason that keywords are test nodes instead of NAME is that using NAME # results in an ambiguity. ast.c makes sure it's a NAME. -argument: test [comp_for] | test '=' test # Really [keyword '='] test +argument: (test) [comp_for] | test '=' test # Really [keyword '='] test This looks like a change without effect? diff --git a/Include/genobject.h b/Include/genobject.h --- a/Include/genobject.h +++ b/Include/genobject.h @@ -11,20 +11,20 @@ struct _frame; /* Avoid including frameobject.h */ typedef struct { - PyObject_HEAD - /* The gi_ prefix is intended to remind of generator-iterator. */ +PyObject_HEAD +/* The gi_ prefix is intended to remind of generator-iterator. */ - /* Note: gi_frame can be NULL if the generator is finished */ - struct _frame *gi_frame; +/* Note: gi_frame can be NULL if the generator is finished */ +struct _frame
Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
I think this may be because in Python 2, there is a coupling between stdin and stderr (in the C stdlib code) that flushes stdout when you read stdin. This doesn't seem to be required by the C std, but most implementations seem to do it. http://stackoverflow.com/questions/2123528/does-reading-from-stdin-flush-stdout I think it was a nice feature but I can see problems with it; apps that want this behavior ought to bite the bullet and flush stdout. On Fri, Jan 13, 2012 at 7:34 AM, anatoly techtonik techto...@gmail.comwrote: Posting to python-dev as it is no more relates to the idea of improving print(). sys.stdout.write() in Python 3 causes backwards incompatible behavior that breaks recipe for unbuffered character reading from stdin on Linux - http://code.activestate.com/recipes/134892/ At first I though that the problem is in the new print() function, but it appeared that the culprit is sys.stdout.write() Attached is a test script which is a stripped down version of the recipe above. If executed with Python 2, you can see the prompt to press a key (even though output on Linux is buffered in Python 2). With Python 3, there is not prompt until you press a key. Is it a bug or intended behavior? What is the cause of this break? -- anatoly t. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 380 (yield from) is now Final
AWESOME!!! On Fri, Jan 13, 2012 at 4:14 AM, Nick Coghlan ncogh...@gmail.com wrote: I marked PEP 380 as Final this evening, after pushing the tested and documented implementation to hg.python.org: http://hg.python.org/cpython/rev/d64ac9ab4cd0 As the list of names in the NEWS and What's New entries suggests, it was quite a collaborative effort to get this one over the line, and that's without even listing all the people that offered helpful suggestions and comments along the way :) print(\n.join(list((lambda:(yield from (Cheers,, Nick)))( -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
On 2012-01-13, at 16:34 , anatoly techtonik wrote: Posting to python-dev as it is no more relates to the idea of improving print(). sys.stdout.write() in Python 3 causes backwards incompatible behavior that breaks recipe for unbuffered character reading from stdin on Linux - http://code.activestate.com/recipes/134892/ At first I though that the problem is in the new print() function, but it appeared that the culprit is sys.stdout.write() Attached is a test script which is a stripped down version of the recipe above. If executed with Python 2, you can see the prompt to press a key (even though output on Linux is buffered in Python 2). With Python 3, there is not prompt until you press a key. Is it a bug or intended behavior? What is the cause of this break? FWIW this is not restricted to Linux (the same behavior change can be observed in OSX), and the script is overly complex you can expose the change with 3 lines import sys sys.stdout.write('promt') sys.stdin.read(1) Python 2 displays prompt and terminates execution on [Return], Python 3 does not display anything until [Return] is pressed. Interestingly, the `-u` option is not sufficient to make prompt appear in Python 3, the stream has to be flushed explicitly unless the input is ~16k characters (I guess that's an internal buffer size of some sort) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
On Fri, 13 Jan 2012 17:00:57 +0100 Xavier Morel python-...@masklinn.net wrote: FWIW this is not restricted to Linux (the same behavior change can be observed in OSX), and the script is overly complex you can expose the change with 3 lines import sys sys.stdout.write('promt') sys.stdin.read(1) Python 2 displays prompt and terminates execution on [Return], Python 3 does not display anything until [Return] is pressed. Interestingly, the `-u` option is not sufficient to make prompt appear in Python 3, the stream has to be flushed explicitly unless the input is ~16k characters (I guess that's an internal buffer size of some sort) -u forces line-buffering mode for stdout/stderr, which is already the default if they are wired to an interactive device (isattr() returning True). But this was already rehashed on python-ideas and the bug tracker, and apparently Anatoly thought it would be a good idea to post on a third medium. Sigh. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
On 2012-01-13, at 17:19 , Antoine Pitrou wrote: -u forces line-buffering mode for stdout/stderr, which is already the default if they are wired to an interactive device (isattr() returning True). Oh, I had not noticed the documentation had changed in Python 3 (in Python 2 it stated that `-u` made IO unbuffered, on Python 3 it now states that only binary IO is unbuffered and text IO remains line-buffered). Sorry about that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2012-01-06 - 2012-01-13) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open3210 (+30) closed 22352 (+30) total 25562 (+60) Open issues with patches: 1384 Issues opened (42) == #6774: socket.shutdown documentation: on some platforms, closing one http://bugs.python.org/issue6774 reopened by neologix #13721: ssl.wrap_socket on a connected but failed connection succeeds http://bugs.python.org/issue13721 opened by kiilerix #13722: distributions can disable the encodings package http://bugs.python.org/issue13722 opened by pitrou #13723: Regular expressions: (?:X|\s+)*$ takes a long time http://bugs.python.org/issue13723 opened by ericp #13725: regrtest does not recognize -d flag http://bugs.python.org/issue13725 opened by etukia #13726: regrtest ambiguous -S flag http://bugs.python.org/issue13726 opened by etukia #13727: Accessor macros for PyDateTime_Delta members http://bugs.python.org/issue13727 opened by amaury.forgeotdarc #13728: Description of -m and -c cli options wrong? http://bugs.python.org/issue13728 opened by sandro.tosi #13730: Grammar mistake in Decimal documentation http://bugs.python.org/issue13730 opened by zacherates #13733: Change required to sysconfig.py for Python 2.7.2 on OS/2 http://bugs.python.org/issue13733 opened by Paul.Smedley #13734: Add a generic directory walker method to avoid symlink attacks http://bugs.python.org/issue13734 opened by hynek #13736: urllib.request.urlopen leaks exceptions from socket and httpli http://bugs.python.org/issue13736 opened by jmoy #13737: bugs.python.org/review's Django settings file DEBUG=True http://bugs.python.org/issue13737 opened by Bithin.A #13740: winsound.SND_NOWAIT ignored on modern Windows platforms http://bugs.python.org/issue13740 opened by bughunter2 #13742: Add a key parameter (like sorted) to heapq.merge http://bugs.python.org/issue13742 opened by ssapin #13743: xml.dom.minidom.Document class is not documented http://bugs.python.org/issue13743 opened by sandro.tosi #13744: raw byte strings are described in a confusing way http://bugs.python.org/issue13744 opened by barry #13745: configuring --with-dbmliborder=bdb doesn't build the gdbm exte http://bugs.python.org/issue13745 opened by doko #13746: ast.Tuple's have an inconsistent col_offset value http://bugs.python.org/issue13746 opened by bronikkk #13747: ssl_version documentation error http://bugs.python.org/issue13747 opened by Ben.Darnell #13749: socketserver can't stop http://bugs.python.org/issue13749 opened by teamnoir #13751: multiprocessing.pool hangs if any worker raises an Exception w http://bugs.python.org/issue13751 opened by fmitha #13752: add a str.casefold() method http://bugs.python.org/issue13752 opened by benjamin.peterson #13756: Python3.2.2 make fail on cygwin http://bugs.python.org/issue13756 opened by holgerd00d #13758: compile() should not encode 'filename' (at least on Windows) http://bugs.python.org/issue13758 opened by terry.reedy #13759: Python 3.2.2 Mac installer version doesn't accept multibyte ch http://bugs.python.org/issue13759 opened by ats #13760: ConfigParser exceptions are not pickleable http://bugs.python.org/issue13760 opened by fmitha #13761: Add flush keyword to print() http://bugs.python.org/issue13761 opened by georg.brandl #13763: rm obsolete reference in devguide http://bugs.python.org/issue13763 opened by tshepang #13764: Misc/build.sh is outdated... talks about svn http://bugs.python.org/issue13764 opened by tshepang #13766: explain the relationship between Lib/lib2to3/Grammar.txt and G http://bugs.python.org/issue13766 opened by tshepang #13768: Doc/tools/dailybuild.py available only on 2.7 branch http://bugs.python.org/issue13768 opened by tshepang #13769: json.dump(ensure_ascii=False) return str instead of unicode http://bugs.python.org/issue13769 opened by mmarkk #13770: python3 json: add ensure_ascii documentation http://bugs.python.org/issue13770 opened by mmarkk #13771: HTTPSConnection __init__ super implementation causes recursion http://bugs.python.org/issue13771 opened by michael.mulich #13772: listdir() doesn't work with non-trivial symlinks http://bugs.python.org/issue13772 opened by pitrou #13773: Support sqlite3 uri filenames http://bugs.python.org/issue13773 opened by poq #13774: json.loads raises a SystemError for invalid encoding on 2.7.2 http://bugs.python.org/issue13774 opened by Julian #13775: Access Denied message on symlink creation misleading for an ex http://bugs.python.org/issue13775 opened by santa4nt #13777: socket: communicating with Mac OS X KEXT controls http://bugs.python.org/issue13777 opened by goderbauer #13779: os.walk: bottom-up http://bugs.python.org/issue13779 opened by patrick.vrijlandt #13780: make YieldFrom its own node
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org wrote: How pathological the data needs to be before the collision counter triggers? I'd expect *very* pathological. How pathological do you consider the set {1 n for n in range(2000)} to be? What about the set: ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} ? The 2000 elements of the latter set have only 61 distinct hash values on 64-bit machine, so there will be over 2000 total collisions involved in creating this set (though admittedly only around 30 collisions per hash value). -- Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 9:08 AM, Mark Dickinson dicki...@gmail.com wrote: On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org wrote: How pathological the data needs to be before the collision counter triggers? I'd expect *very* pathological. How pathological do you consider the set {1 n for n in range(2000)} to be? What about the set: ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} ? The 2000 elements of the latter set have only 61 distinct hash values on 64-bit machine, so there will be over 2000 total collisions involved in creating this set (though admittedly only around 30 collisions per hash value). Hm... So how does the collision counting work for this case? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 380 (yield from) is now Final
On Fri, 13 Jan 2012 22:14:43 +1000 Nick Coghlan ncogh...@gmail.com wrote: I marked PEP 380 as Final this evening, after pushing the tested and documented implementation to hg.python.org: http://hg.python.org/cpython/rev/d64ac9ab4cd0 I don't know if this is supposed to work, but the exception looks wrong: def g(): yield from () ... f = list(g()) Traceback (most recent call last): File stdin, line 1, in module File stdin, line 1, in g SystemError: error return without exception set Also, the checkin lacked a bytecode magic number bump. It is not really a problem since I've just bumped it anyway. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum gu...@python.org wrote: How pathological do you consider the set {1 n for n in range(2000)} to be? What about the set: ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} ? The 2000 elements of the latter set have only 61 distinct hash values on 64-bit machine, so there will be over 2000 total collisions involved in creating this set (though admittedly only around 30 collisions per hash value). Hm... So how does the collision counting work for this case? Ah, my bad. It looks like the ieee754_powers_of_two is safe---IIUC, it's the number of collisions involved in a single key-set operation that's limited. So a dictionary with keys {1n for n in range(2000)} is fine, but a dictionary with keys {1(61*n) for n in range(2000)} is not: {1(n*61):True for n in range(2000)} Traceback (most recent call last): File stdin, line 1, in module File stdin, line 1, in dictcomp KeyError: 'too many hash collisions' [67961 refs] I'd still not consider this particularly pathological, though. -- Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson dicki...@gmail.com wrote: On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum gu...@python.org wrote: How pathological do you consider the set {1 n for n in range(2000)} to be? What about the set: ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} ? The 2000 elements of the latter set have only 61 distinct hash values on 64-bit machine, so there will be over 2000 total collisions involved in creating this set (though admittedly only around 30 collisions per hash value). Hm... So how does the collision counting work for this case? Ah, my bad. It looks like the ieee754_powers_of_two is safe---IIUC, it's the number of collisions involved in a single key-set operation that's limited. So a dictionary with keys {1n for n in range(2000)} is fine, but a dictionary with keys {1(61*n) for n in range(2000)} is not: {1(n*61):True for n in range(2000)} Traceback (most recent call last): File stdin, line 1, in module File stdin, line 1, in dictcomp KeyError: 'too many hash collisions' [67961 refs] I'd still not consider this particularly pathological, though. Really? Even though you came up with specifically to prove me wrong? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 380 (yield from) is now Final
On 1/13/2012 7:14 AM, Nick Coghlan wrote: print(\n.join(list((lambda:(yield from (Cheers,, Nick)))( I pulled, rebuilt, and it indeed works (on Win 7). I just remembered that Tim Peters somewhere (generator.c?) left a large comment with examples of recursive generators, such as knight's tours. Could these be rewritten with (and benefit from) 'yield from'? (It occurs to me his stuff might be worth exposing in an iterator/generator how-to.) -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python as a Metro-style App
Dino wrote: Martin wrote: See the start of the thread: I tried to create a WinRT Component DLL, and that failed, as VS would refuse to compile any C file in such a project. Not sure whether this is triggered by defining WINAPI_FAMILY=2, or any other compiler setting. I'd really love to use WINAPI_FAMILY=2, as compiler errors are much easier to fix than verifier errors. ... I'm going to ping some people on the windows team and see if the app container bit is or will be necessary for DLLs. I heard back from the Windows team and they are going to require the app container bit to be set on all PE files (although they don't currently enforce it). I was able to compile a simple .c file and pass /link /appcontainer and that worked, so I'm going to try and figure out if there's some way to get the .vcxproj to build a working command line that includes that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
2012/1/13 Guido van Rossum gu...@python.org: Really? Even though you came up with specifically to prove me wrong? Coming up with a counterexample now invalidates it? -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
- Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash of a key. We could mix Marc's collision counter with SafeDict idea (being able to use a different secret for each dict): use hash(key, secret) (simple example: hash(secret+key)) instead of hash(key) in dict (and set), and change the secret if we have more than N collisions. But it would slow down all dict lookup (dict creation, get, set, del, ...). And getting new random data can also be slow. SafeDict and hash(secret+key) lose the benefit of the cached hash result. Because the hash result depends on a argument, we cannot cache the result anymore, and we have to recompute the hash for each lookup (even if you lookup the same key twice ore more). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.net wrote: On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens. Fair enough. But I'm now uncomfortable with turning this on for bugfix releases. I'm fine with making this the default in 3.3, just not in 3.2, 3.1 or 2.x -- it will break too much code and organizations will have to roll back the release or do extensive testing before installing a bugfix release -- exactly what we *don't* want for those. FWIW, I don't believe in the SafeDict solution -- you never know which dicts you have to change. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum gu...@python.org wrote: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.netwrote: On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens. Fair enough. But I'm now uncomfortable with turning this on for bugfix releases. I'm fine with making this the default in 3.3, just not in 3.2, 3.1 or 2.x -- it will break too much code and organizations will have to roll back the release or do extensive testing before installing a bugfix release -- exactly what we *don't* want for those. FWIW, I don't believe in the SafeDict solution -- you never know which dicts you have to change. Agreed. Of the three options Victor listed only one is good. I don't like *SafeDict*. *-1*. It puts the onerous on the coder to always get everything right with regards to data that came from outside the process never ending up hashed in a non-safe dict or set *anywhere*. Safe needs to be the default option for all hash tables. I don't like the *too many hash collisions* exception. *-1*. It provides non-deterministic application behavior for data driven applications with no way for them to predict when it'll happen or where and prepare for it. It may work in practice for many applications but is simply odd behavior. I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the feature off at their own peril which they can use in their test harnesses that are stupid enough to use doctests with order dependencies. This approach worked fine for Perl 9 years ago. https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On 1/13/2012 5:35 PM, Victor Stinner wrote: - Glenn Linderman proposes to fix the vulnerability by adding a new safe dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash of a key. We could mix Marc's collision counter with SafeDict idea (being able to use a different secret for each dict): use hash(key, secret) (simple example: hash(secret+key)) instead of hash(key) in dict (and set), and change the secret if we have more than N collisions. But it would slow down all dict lookup (dict creation, get, set, del, ...). And getting new random data can also be slow. SafeDict and hash(secret+key) lose the benefit of the cached hash result. Because the hash result depends on a argument, we cannot cache the result anymore, and we have to recompute the hash for each lookup (even if you lookup the same key twice ore more). Victor So integrating SafeDict into dict so it could be automatically converted would mean changing the data structures underneath dict. Given that, a technique for hash caching could be created, that isn't quite as good as the one in place, but may be less expensive than not caching the hashes. It would also take more space, a second dict, internally, as well as the secret. So once the collision counter reaches some threshold (since there would be a functional fallback, it could be much lower than 1000), the secret is obtained, and the keys are rehashed using hash(secret+key). Now when lookups occur, the object id of the key and the hash of the key are used as the index and hash(secret+key) is stored as a cached value. This would only benefit lookups by the same object, other objects with the same key value would be recalculated (at least the first time). Some limit on the number of cached values would probably be appropriate. This would add complexity, of course, in trying to save time. An alternate solution would be to convert a dict to a tree once the number of collisions produces poor performance. Converting to a tree would result in O(log N) instead of O(1) lookup performance, but that is better than the degenerate case of O(N) which is produced by the excessive number of collisions resulting from an attack. This would require new tree code to be included in the core, of course, probably a red-black tree, which stays balanced. In either of these cases, the conversion is expensive, because a collision threshold must first be reached to determine the need for conversion, so the hash could already contain lots of data. If it were too expensive, the attack could still be effective. Another solution would be to change the collision code, so that colliding keys don't produce O(N) behavior, but some other behavior. Each colliding entry could convert that entry to a tree of entries, perhaps. This would require no conversion of bad dicts, and an attack could at worst convert O(1) performance to O(log N). Clearly these ideas are more complex than adding randomization, but adding randomization doesn't seem to be produce immunity from attack, when data about the randomness is leaked. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
Clearly these ideas are more complex than adding randomization, but adding randomization doesn't seem to be produce immunity from attack, when data about the randomness is leaked. Which will not normally happen. I'm firmly in the camp that believes the random seed can be probed and determined by creatively injecting values and measuring timing of things. But doing that is difficult and time and bandwidth intensive so the per process random hash seed is good enough. There's another elephant in the room here, if you want to avoid this attack use a 64-bit Python build as it uses 64-bit hash values that are significantly more difficult to force a collision on. -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
btw, Tim's commit message on this one is amusingly relevant. :) http://hg.python.org/cpython/diff/8d2f2cb9/Objects/dictobject.c On Fri, Jan 13, 2012 at 6:25 PM, Gregory P. Smith g...@krypto.org wrote: Clearly these ideas are more complex than adding randomization, but adding randomization doesn't seem to be produce immunity from attack, when data about the randomness is leaked. Which will not normally happen. I'm firmly in the camp that believes the random seed can be probed and determined by creatively injecting values and measuring timing of things. But doing that is difficult and time and bandwidth intensive so the per process random hash seed is good enough. There's another elephant in the room here, if you want to avoid this attack use a 64-bit Python build as it uses 64-bit hash values that are significantly more difficult to force a collision on. -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On 14/01/12 12:58, Gregory P. Smith wrote: I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. For the record: steve@runes:~$ python -c print(hash('spam ham')) -376510515 steve@runes:~$ jython -c print(hash('spam ham')) 2054637885 So it is already the case that Python code that assumes stable hashing is broken. For what it's worth, I'm not convinced that we should be overly-concerned by poor saps (Guido's words) who rely on accidents of implementation regarding hash. We shouldn't break their code unless we have a good reason, but this strikes me as a good reason. The documentation for hash certainly makes no promise about stability, and relying on it strikes me as about as sensible as relying on the stability of error messages. I'm also not convinced that the option to raise an exception after 1000 collisions actually solves the problem. That relies on the application being re-written to catch the exception and recover from it (how?). Otherwise, all it does is change the attack vector from cause an indefinite number of hash collisions to cause 999 hash collisions followed by crashing the application with an exception, which doesn't strike me as much of an improvement. +1 on random seeding. Default to on in 3.3+ and default to off in older versions, which allows people to avoid breaking their code until they're ready for it to be broken. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote: On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum gu...@python.orgwrote: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.netwrote: On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens. Fair enough. But I'm now uncomfortable with turning this on for bugfix releases. I'm fine with making this the default in 3.3, just not in 3.2, 3.1 or 2.x -- it will break too much code and organizations will have to roll back the release or do extensive testing before installing a bugfix release -- exactly what we *don't* want for those. FWIW, I don't believe in the SafeDict solution -- you never know which dicts you have to change. Agreed. Of the three options Victor listed only one is good. I don't like *SafeDict*. *-1*. It puts the onerous on the coder to always get everything right with regards to data that came from outside the process never ending up hashed in a non-safe dict or set *anywhere*. Safe needs to be the default option for all hash tables. I don't like the *too many hash collisions* exception. *-1*. It provides non-deterministic application behavior for data driven applications with no way for them to predict when it'll happen or where and prepare for it. It may work in practice for many applications but is simply odd behavior. I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the feature off at their own peril which they can use in their test harnesses that are stupid enough to use doctests with order dependencies. What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. add logic to allow a particular seed to be specified or forced to 0 from the command line or environment. add the logic to grab random bytes. add the autoconf glue to disable it. done. -gps This approach worked fine for Perl 9 years ago. https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Sphinx version for Python 2.x docs
Hi Sandro, Thanks for getting the ball rolling on this. One style for markup, one Sphinx version to code our extensions against and one location for the documenting guidelines will make our work a bit easier. During the build process, there are some warnings that I can understand: I assume you mean “can’t”, as you later ask how to fix them. As a general rule, they’re only warnings, so they don’t break the build, only some links or stylings, so I think it’s okay to ignore them *right now*. Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal That’s a mistake I did in cefe4f38fa0e. This sentence should be removed. Doc/library/stdtypes.rst:2372: WARNING: more than one target found for cross-reference u'next': Need to use :meth:`.next` to let Sphinx find the right target (more info on request :) Doc/library/sys.rst:651: WARNING: unknown keyword: None Should use ``None``. Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not I don’t know if these should work (i.e. create a link to the appropriate language reference section) or abuse the markup (there are “not” and “in” keywords, but no “not in” keyword → use ``not in``). I’d say ignore them. Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. I think this statement (and the patch) is wrong. You also need to change the byte string hashing, at least for 2.x. This I consider the biggest flaw in that approach - other people may have written string-like objects which continue to compare equal to a string but now hash different. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the feature off at their own peril which they can use in their test harnesses that are stupid enough to use doctests with order dependencies. No, that is not how we usually take compatibility between bugfix releases. Your code is already broken is not an argument to break forcefully what worked (even if by happenstance) before. The difference between CPython and Jython (or between different CPython feature releases) also isn't relevant -- historically we have often bent over backwards to avoid changing behavior that was technically undefined, if we believed it would affect a significant fraction of users. I don't think anyone doubts that this will break lots of code (at least, the arguments I've heard have been their code is broken, not nobody does that). This approach worked fine for Perl 9 years ago. https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 I don't know what the Perl attitude about breaking undefined behavior between micro versions was at the time. But ours is pretty clear -- don't do it. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: add test, which was missing from d64ac9ab4cd0
On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson python-check...@python.org wrote: http://hg.python.org/cpython/rev/be85914b611c changeset: 74363:be85914b611c parent: 74361:609482c6710e user: Benjamin Peterson benja...@python.org date: Fri Jan 13 14:39:38 2012 -0500 summary: add test, which was missing from d64ac9ab4cd0 Ah, that's where that came from, thanks. I still haven't fully trained myself to use hg import instead of patch, which would avoid precisely this kind of error :P Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. Given that the doc says Return the hash value of the object, I do not think we should be so hard-nosed. The above clearly implies that there is such a thing as *the* Python hash value for an object. And indeed, that has been true across many versions. If we had written Return a hash value for the object, which can vary from run to run, the case would be different. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum gu...@python.org wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. Python's dicts are designed to avoid hash conflicts by resizing and keeping the available slots bountiful. 1000 conflicts sounds like a number that couldn't be hit accidentally unless you had a single dict using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're good). The hashes also look to exploit cache locality but that is very unlikely to get one thousand conflicts by chance. If you get that many there is an attack. This is depending on how the counting is done (I didn't look at MAL's patch), and assuming that increasing the hash table size will generally reduce collisions if items collide but their hashes are different. The patch counts conflicts on an individual insert and not lifetime conflicts. Looks sane to me. That said, even with collision counting I'd like a way to disable it without changing the code, e.g. a flag or environment variable. Agreed. Paranoid people can turn the behavior off and if it ever were to become a problem in practice we could point people to a solution. -Jack ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682)
On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl g.bra...@gmx.net wrote: On 01/13/2012 12:43 PM, nick.coghlan wrote: diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst There should probably be a versionadded somewhere on this page. Good catch, I added versionchanged notes to this page, simple_stmts and the StopIteration entry in the library reference. PEP 3155: Qualified name for classes and functions == This looks like a spurious (and syntax-breaking) change. Yeah, it was an error I introduced last time I merged from default. Fixed. diff --git a/Grammar/Grammar b/Grammar/Grammar -argument: test [comp_for] | test '=' test # Really [keyword '='] test +argument: (test) [comp_for] | test '=' test # Really [keyword '='] test This looks like a change without effect? Fixed. It was a lingering after-effect of Greg's original patch (which also modified the function call syntax to allow yield from expressions with extra parens). I reverted the change to the function call syntax, but forgot to ditch the added parens while doing so. diff --git a/Include/genobject.h b/Include/genobject.h - /* List of weak reference. */ - PyObject *gi_weakreflist; + /* List of weak reference. */ + PyObject *gi_weakreflist; } PyGenObject; While these change tabs into spaces, it should be 4 spaces, not 8. Fixed. +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **); Does this API need to be public? If yes, it needs to be documented. Hmm, good point - that one needs a bit of thought, so I've put it on the tracker: http://bugs.python.org/issue13783 (that issue also covers your comments regarding the docstring for this function and whether or not we even need the StopIteration instance creation API) -#define CALL_FUNCTION 131 /* #args + (#kwargs8) */ -#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults8 + #annotations16 */ -#define BUILD_SLICE 133 /* Number of items */ +#define CALL_FUNCTION 131 /* #args + (#kwargs8) */ +#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults8 + #annotations16 */ +#define BUILD_SLICE 133 /* Number of items */ Not sure putting these and all the other cosmetic changes into an already big patch is such a good idea... I agree, but it's one of the challenges of a long-lived branch like the PEP 380 one (I believe some of these cosmetic changes started life in Greg's original patch and separating them out would have been quite a pain). Anyone that wants to see the gory details of the branch history can take a look at my bitbucket repo: https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29 diff --git a/Objects/abstract.c b/Objects/abstract.c --- a/Objects/abstract.c +++ b/Objects/abstract.c @@ -2267,7 +2267,6 @@ func = PyObject_GetAttrString(o, name); if (func == NULL) { - PyErr_SetString(PyExc_AttributeError, name); return 0; } @@ -2311,7 +2310,6 @@ func = PyObject_GetAttrString(o, name); if (func == NULL) { - PyErr_SetString(PyExc_AttributeError, name); return 0; } va_start(va, format); These two changes also look suspiciously unrelated? IIRC, I removed those lines while working on the patch because the message they produce (just the attribute name) is worse than the one produced by the call to PyObject_GetAttrString (which also includes the type of the object being accessed). Leaving the original exceptions alone helped me track down some failures I was getting at the time. I've now made the various CallMethod helper APIs in abstract.c (1 public, 3 private) consistently leave the GetAttr exception alone and added an explicit C API note to NEWS. (Vaguely related tangent: the new code added by the patch probably has a few parts that could benefit from the new GetAttrId private API) diff --git a/Objects/genobject.c b/Objects/genobject.c + } else { + PyObject *e = PyStopIteration_Create(result); + if (e != NULL) { + PyErr_SetObject(PyExc_StopIteration, e); + Py_DECREF(e); + } Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here anyway? I think you're right - so noted in the tracker issue about the C API additions. Thanks for the thorough review, a fresh set of eyes is very helpful :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of the fix for the hash collision vulnerability
On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich jackd...@gmail.com wrote: This is depending on how the counting is done (I didn't look at MAL's patch), and assuming that increasing the hash table size will generally reduce collisions if items collide but their hashes are different. The patch counts conflicts on an individual insert and not lifetime conflicts. Looks sane to me. Having a hard limit on the worst-case behaviour certainly sounds like an attractive prospect. And there's nothing to worry about in terms of secrecy or sufficient randomness - by default, attackers cannot generate more than 1000 hash collisions in one lookup, period. That said, even with collision counting I'd like a way to disable it without changing the code, e.g. a flag or environment variable. Agreed. Paranoid people can turn the behavior off and if it ever were to become a problem in practice we could point people to a solution. Does MAL's patch allow the limit to be set on a per-dict basis (including setting it to None to disable collision limiting completely)? If people have data sets that need to tolerate that kind of collision level (and haven't already decided to move to a data structure other than the builtin dict), then it may make sense to allow them to remove the limit when using trusted input. For maintenance versions though, it would definitely need to be possible to switch it off without touching the code. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682)
On 01/14/2012 07:53 AM, Nick Coghlan wrote: +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **); Does this API need to be public? If yes, it needs to be documented. Hmm, good point - that one needs a bit of thought, so I've put it on the tracker: http://bugs.python.org/issue13783 (that issue also covers your comments regarding the docstring for this function and whether or not we even need the StopIteration instance creation API) Great. -#define CALL_FUNCTION131 /* #args + (#kwargs8) */ -#define MAKE_FUNCTION132 /* #defaults + #kwdefaults8 + #annotations16 */ -#define BUILD_SLICE 133 /* Number of items */ +#define CALL_FUNCTION 131 /* #args + (#kwargs8) */ +#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults8 + #annotations16 */ +#define BUILD_SLICE 133 /* Number of items */ Not sure putting these and all the other cosmetic changes into an already big patch is such a good idea... I agree, but it's one of the challenges of a long-lived branch like the PEP 380 one (I believe some of these cosmetic changes started life in Greg's original patch and separating them out would have been quite a pain). Anyone that wants to see the gory details of the branch history can take a look at my bitbucket repo: https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29 I see. I hadn't followed the development of PEP 380 closely before. In any case, it is probably a good idea to mention this branch URL in the commit message in case it is meant to be kept permanently (it would also be possible to put only that branch of your sandbox into another clone at hg.python.org). diff --git a/Objects/abstract.c b/Objects/abstract.c --- a/Objects/abstract.c +++ b/Objects/abstract.c @@ -2267,7 +2267,6 @@ func = PyObject_GetAttrString(o, name); if (func == NULL) { -PyErr_SetString(PyExc_AttributeError, name); return 0; } @@ -2311,7 +2310,6 @@ func = PyObject_GetAttrString(o, name); if (func == NULL) { -PyErr_SetString(PyExc_AttributeError, name); return 0; } va_start(va, format); These two changes also look suspiciously unrelated? IIRC, I removed those lines while working on the patch because the message they produce (just the attribute name) is worse than the one produced by the call to PyObject_GetAttrString (which also includes the type of the object being accessed). Leaving the original exceptions alone helped me track down some failures I was getting at the time. I agree that it's useful. I've now made the various CallMethod helper APIs in abstract.c (1 public, 3 private) consistently leave the GetAttr exception alone and added an explicit C API note to NEWS. (Vaguely related tangent: the new code added by the patch probably has a few parts that could benefit from the new GetAttrId private API) Maybe another candidate for an issue, so that we don't forget? cheers, Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com