Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Frank Sievertsen

Am 13.01.2012 02:24, schrieb Victor Stinner:

My patch doesn't fix the DoS, it just make the attack more complex.
The attacker cannot pregenerate data for an attack: (s)he has first to
compute the hash secret, and then compute hash collisions using the
secret. The hash secret is a least 64 bits long (128 bits on a 64 bit
system). So I hope that computing collisions requires a lot of CPU
time (is slow) to make the attack ineffective with today computers.
Unfortunately it requires only a few seconds to compute enough 32bit 
collisions on one core with no precomputed data.  I'm sure it's possible 
to make this less than a second.


In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) 
^ suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) 
is possible.


So the question is: How difficult is it to guess the seed?

Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Victor Stinner
 Unfortunately it requires only a few seconds to compute enough 32bit
 collisions on one core with no precomputed data.

Are you running the hash function backward to generate strings with
the same value, or you are more trying something like brute forcing?

And how do you get the hash secret? You need it to run an attack.

 In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^
 suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is
 possible.

My change adds also a prefix (a prefix and a suffix). I don't know if
it changes anything for generating collisions.

 So the question is: How difficult is it to guess the seed?

I wrote some remarks about that in the issue. For example:

(hash(\0)^1) ^ (hash(\0\0)^2) gives ((prefix * 103) 
HASH_MASK) ^ ((prefix * 103**2)   HASH_MASK)

I suppose that you don't have directly the full output of hash(str) in
practical, but hash(str)  DICT_MASK where DICT_MASK depends is the
size of the internal dict array minus 1. For example, for a dictionary
of 65,536 items, the mask is 0x1 and so cannot gives you more than
17 bits of hash(str) output. I still don't know how difficult it is to
retreive hash(str) bits from repr(dict).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Lennart Regebro
On Fri, Jan 13, 2012 at 02:24, Victor Stinner
victor.stin...@haypocalc.com wrote:
 - Glenn Linderman proposes to fix the vulnerability by adding a new
 safe dict type (only accepting string keys). His proof-of-concept
 (SafeDict.py) uses a secret of 64 random bits and uses it to compute
 the hash of a key.

This is my preferred solution. The vulnerability is basically only in
the dictionary you keep the form data you get from a request. This
solves it easily and nicely. It can also be a separate module
installable for Python 2, which many web frameworks still use, so it
can be practical implementable now, and not in a couple of years.

Then again, nothing prevents us from having both this, *and* one of
the other solutions.  :-)

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 380 (yield from) is now Final

2012-01-13 Thread Nick Coghlan
I marked PEP 380 as Final this evening, after pushing the tested and
documented implementation to hg.python.org:
http://hg.python.org/cpython/rev/d64ac9ab4cd0

As the list of names in the NEWS and What's New entries suggests, it
was quite a collaborative effort to get this one over the line, and
that's without even listing all the people that offered helpful
suggestions and comments along the way :)

print(\n.join(list((lambda:(yield from (Cheers,, Nick)))(

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Frank Sievertsen



Unfortunately it requires only a few seconds to compute enough 32bit
collisions on one core with no precomputed data.

Are you running the hash function backward to generate strings with
the same value, or you are more trying something like brute forcing?


If you try it brute force to hit a specific target, you'll only find 
only one good string every 4 billion tries. That's why you first blow up 
your target:


You start backward from an arbitrary target-value. You brute force for 3 
characters, for example, this will give you 16 million intermediate 
values from which you know that they'll end up in your target-value.


Those 16 million values are a huge target for now brute-forcing forward: 
Every 256 tries you'll hit one of these values.



And how do you get the hash secret? You need it to run an attack.


I don't know. This was meant as an answer to the quoted text So I hope 
that computing collisions requires a lot of CPU time (is slow) to make 
the attack ineffective with today computers..


What I wanted to say is: The security relies on the fact that the 
attacker can't guess the prefix, not that he can't precompute the values 
and it takes hours or days to compute the collisions. If the prefix 
leaks out of the application, then the rest is trivial and done in a few 
seconds. The suffix is not important for the collision-prevention, but 
it will probably make it much harder to guess the prefix.


I don't know an effective way to get the prefix either, (if the 
application doesn't leak full hash(X) values).


Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 380 (yield from) is now Final

2012-01-13 Thread Matt Joiner
Great work Nick, I've been looking forward to this one. Thanks all for
putting the effort in.

On Fri, Jan 13, 2012 at 11:14 PM, Nick Coghlan ncogh...@gmail.com wrote:
 I marked PEP 380 as Final this evening, after pushing the tested and
 documented implementation to hg.python.org:
 http://hg.python.org/cpython/rev/d64ac9ab4cd0

 As the list of names in the NEWS and What's New entries suggests, it
 was quite a collaborative effort to get this one over the line, and
 that's without even listing all the people that offered helpful
 suggestions and comments along the way :)

 print(\n.join(list((lambda:(yield from (Cheers,, Nick)))(

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread And Clover

On 2012-01-13 11:20, Lennart Regebro wrote:

The vulnerability is basically only in the dictionary you keep the
form data you get from a request.


I'd have to disagree with this statement. The vulnerability is anywhere 
that creates a dictionary (or set) from attacker-provided keys. That 
would include HTTP headers, RFC822-family subheaders and parameters, the 
environ, input taken from JSON or XML, and so on - and indeed hash 
collision attacks are not at all web-specific.


The problem with having two dict implementations is that a caller would 
have to tell libraries that use dictionaries which implementation to 
use. So for example an argument would have to be passed to json.load[s] 
to specify whether the input was known-sane or potentially hostile.


Any library could ever use dictionaries to process untrusted input *or 
any library that used another library that did* would have to pass such 
a flag through, which would quickly get very unwieldy indeed... or else 
they'd have to just always use safedict, in which case we're in pretty 
much the same position as we are with changing dict anyway.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
gtalk:chat?jid=bobi...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682)

2012-01-13 Thread Georg Brandl
Caution, long review ahead.

On 01/13/2012 12:43 PM, nick.coghlan wrote:
 http://hg.python.org/cpython/rev/d64ac9ab4cd0
 changeset:   74356:d64ac9ab4cd0
 user:Nick Coghlan ncogh...@gmail.com
 date:Fri Jan 13 21:43:40 2012 +1000
 summary:
   Implement PEP 380 - 'yield from' (closes #11682)
 diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
 --- a/Doc/reference/expressions.rst
 +++ b/Doc/reference/expressions.rst
 @@ -318,7 +318,7 @@

There should probably be a versionadded somewhere on this page.


  .. productionlist::
 yield_atom: ( `yield_expression` )
 -   yield_expression: yield [`expression_list`]
 +   yield_expression: yield [`expression_list` | from `expression`]
  
  The :keyword:`yield` expression is only used when defining a generator 
 function,
  and can only be used in the body of a function definition.  Using a
 @@ -336,7 +336,10 @@
  the generator's methods, the function can proceed exactly as if the
  :keyword:`yield` expression was just another external call.  The value of the
  :keyword:`yield` expression after resuming depends on the method which 
 resumed
 -the execution.
 +the execution. If :meth:`__next__` is used (typically via either a
 +:keyword:`for` or the :func:`next` builtin) then the result is :const:`None`,
 +otherwise, if :meth:`send` is used, then the result will be the value passed
 +in to that method.
  
  .. index:: single: coroutine
  
 @@ -346,12 +349,29 @@
  where should the execution continue after it yields; the control is always
  transferred to the generator's caller.
  
 -The :keyword:`yield` statement is allowed in the :keyword:`try` clause of a
 +:keyword:`yield` expressions are allowed in the :keyword:`try` clause of a
  :keyword:`try` ...  :keyword:`finally` construct.  If the generator is not
  resumed before it is finalized (by reaching a zero reference count or by 
 being
  garbage collected), the generator-iterator's :meth:`close` method will be
  called, allowing any pending :keyword:`finally` clauses to execute.
  
 +When ``yield from expression`` is used, it treats the supplied expression as
 +a subiterator. All values produced by that subiterator are passed directly
 +to the caller of the current generator's methods. Any values passed in with
 +:meth:`send` and any exceptions passed in with :meth:`throw` are passed to
 +the underlying iterator if it has the appropriate methods. If this is not the
 +case, then :meth:`send` will raise :exc:`AttributeError` or :exc:`TypeError`,
 +while :meth:`throw` will just raise the passed in exception immediately.
 +
 +When the underlying iterator is complete, the :attr:`~StopIteration.value`
 +attribute of the raised :exc:`StopIteration` instance becomes the value of
 +the yield expression. It can be either set explicitly when raising
 +:exc:`StopIteration`, or automatically when the sub-iterator is a generator
 +(by returning a value from the sub-generator).
 +
 +The parentheses can be omitted when the :keyword:`yield` expression is the
 +sole expression on the right hand side of an assignment statement.
 +
  .. index:: object: generator
  
  The following generator's methods can be used to control the execution of a
 @@ -444,6 +464,10 @@
The proposal to enhance the API and syntax of generators, making them
usable as simple coroutines.
  
 +   :pep:`0380` - Syntax for Delegating to a Subgenerator
 +  The proposal to introduce the :token:`yield_from` syntax, making 
 delegation
 +  to sub-generators easy.
 +
  
  .. _primaries:
  
  PEP 3155: Qualified name for classes and functions
  ==
  
 @@ -208,7 +224,6 @@
  how they might be accessible from the global scope.
  
  Example with (non-bound) methods::
 -
  class C:
 ... def meth(self):
 ... pass

This looks like a spurious (and syntax-breaking) change.

 diff --git a/Grammar/Grammar b/Grammar/Grammar
 --- a/Grammar/Grammar
 +++ b/Grammar/Grammar
 @@ -121,7 +121,7 @@
   |'**' test)
  # The reason that keywords are test nodes instead of NAME is that using NAME
  # results in an ambiguity. ast.c makes sure it's a NAME.
 -argument: test [comp_for] | test '=' test  # Really [keyword '='] test
 +argument: (test) [comp_for] | test '=' test  # Really [keyword '='] test

This looks like a change without effect?

 diff --git a/Include/genobject.h b/Include/genobject.h
 --- a/Include/genobject.h
 +++ b/Include/genobject.h
 @@ -11,20 +11,20 @@
  struct _frame; /* Avoid including frameobject.h */
  
  typedef struct {
 - PyObject_HEAD
 - /* The gi_ prefix is intended to remind of generator-iterator. */
 +PyObject_HEAD
 +/* The gi_ prefix is intended to remind of generator-iterator. */
  
 - /* Note: gi_frame can be NULL if the generator is finished */
 - struct _frame *gi_frame;
 +/* Note: gi_frame can be NULL if the generator is finished */
 +struct _frame 

Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())

2012-01-13 Thread Guido van Rossum
I think this may be because in Python 2, there is a coupling between stdin
and stderr (in the C stdlib code) that flushes stdout when you read stdin.
This doesn't seem to be required by the C std, but most implementations
seem to do it.
http://stackoverflow.com/questions/2123528/does-reading-from-stdin-flush-stdout

I think it was a nice feature but I can see problems with it; apps that
want this behavior ought to bite the bullet and flush stdout.

On Fri, Jan 13, 2012 at 7:34 AM, anatoly techtonik techto...@gmail.comwrote:

 Posting to python-dev as it is no more relates to the idea of improving
 print().


 sys.stdout.write() in Python 3 causes backwards incompatible behavior that
 breaks recipe for unbuffered character reading from stdin on Linux -
 http://code.activestate.com/recipes/134892/  At first I though that the
 problem is in the new print() function, but it appeared that the culprit is
 sys.stdout.write()

 Attached is a test script which is a stripped down version of the recipe
 above.

 If executed with Python 2, you can see the prompt to press a key (even
 though output on Linux is buffered in Python 2).
 With Python 3, there is not prompt until you press a key.

 Is it a bug or intended behavior? What is the cause of this break?
 --
 anatoly t.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 380 (yield from) is now Final

2012-01-13 Thread Guido van Rossum
AWESOME!!!

On Fri, Jan 13, 2012 at 4:14 AM, Nick Coghlan ncogh...@gmail.com wrote:

 I marked PEP 380 as Final this evening, after pushing the tested and
 documented implementation to hg.python.org:
 http://hg.python.org/cpython/rev/d64ac9ab4cd0

 As the list of names in the NEWS and What's New entries suggests, it
 was quite a collaborative effort to get this one over the line, and
 that's without even listing all the people that offered helpful
 suggestions and comments along the way :)

 print(\n.join(list((lambda:(yield from (Cheers,, Nick)))(


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())

2012-01-13 Thread Xavier Morel
On 2012-01-13, at 16:34 , anatoly techtonik wrote:
 Posting to python-dev as it is no more relates to the idea of improving
 print().
 
 
 sys.stdout.write() in Python 3 causes backwards incompatible behavior that
 breaks recipe for unbuffered character reading from stdin on Linux -
 http://code.activestate.com/recipes/134892/  At first I though that the
 problem is in the new print() function, but it appeared that the culprit is
 sys.stdout.write()
 
 Attached is a test script which is a stripped down version of the recipe
 above.
 
 If executed with Python 2, you can see the prompt to press a key (even
 though output on Linux is buffered in Python 2).
 With Python 3, there is not prompt until you press a key.
 
 Is it a bug or intended behavior? What is the cause of this break?
FWIW this is not restricted to Linux (the same behavior change can
be observed in OSX), and the script is overly complex you can expose
the change with 3 lines

import sys
sys.stdout.write('promt')
sys.stdin.read(1)

Python 2 displays prompt and terminates execution on [Return],
Python 3 does not display anything until [Return] is pressed.

Interestingly, the `-u` option is not sufficient to make
prompt appear in Python 3, the stream has to be flushed
explicitly unless the input is ~16k characters (I guess that's
an internal buffer size of some sort)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())

2012-01-13 Thread Antoine Pitrou
On Fri, 13 Jan 2012 17:00:57 +0100
Xavier Morel python-...@masklinn.net wrote:
 FWIW this is not restricted to Linux (the same behavior change can
 be observed in OSX), and the script is overly complex you can expose
 the change with 3 lines
 
 import sys
 sys.stdout.write('promt')
 sys.stdin.read(1)
 
 Python 2 displays prompt and terminates execution on [Return],
 Python 3 does not display anything until [Return] is pressed.
 
 Interestingly, the `-u` option is not sufficient to make
 prompt appear in Python 3, the stream has to be flushed
 explicitly unless the input is ~16k characters (I guess that's
 an internal buffer size of some sort)

-u forces line-buffering mode for stdout/stderr, which is already the
default if they are wired to an interactive device (isattr() returning
True).

But this was already rehashed on python-ideas and the bug tracker, and
apparently Anatoly thought it would be a good idea to post on a third
medium. Sigh.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())

2012-01-13 Thread Xavier Morel
On 2012-01-13, at 17:19 , Antoine Pitrou wrote:
 
 -u forces line-buffering mode for stdout/stderr, which is already the
 default if they are wired to an interactive device (isattr() returning
 True).
Oh, I had not noticed the documentation had changed in Python 3 (in
Python 2 it stated that `-u` made IO unbuffered, on Python 3 it now
states that only binary IO is unbuffered and text IO remains
line-buffered). Sorry about that.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2012-01-13 Thread Python tracker

ACTIVITY SUMMARY (2012-01-06 - 2012-01-13)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open3210 (+30)
  closed 22352 (+30)
  total  25562 (+60)

Open issues with patches: 1384 


Issues opened (42)
==

#6774: socket.shutdown documentation: on some platforms, closing one 
http://bugs.python.org/issue6774  reopened by neologix

#13721: ssl.wrap_socket on a connected but failed connection succeeds 
http://bugs.python.org/issue13721  opened by kiilerix

#13722: distributions can disable the encodings package
http://bugs.python.org/issue13722  opened by pitrou

#13723: Regular expressions: (?:X|\s+)*$ takes a long time
http://bugs.python.org/issue13723  opened by ericp

#13725: regrtest does not recognize -d flag
http://bugs.python.org/issue13725  opened by etukia

#13726: regrtest ambiguous -S flag
http://bugs.python.org/issue13726  opened by etukia

#13727: Accessor macros for PyDateTime_Delta members
http://bugs.python.org/issue13727  opened by amaury.forgeotdarc

#13728: Description of -m and -c cli options wrong?
http://bugs.python.org/issue13728  opened by sandro.tosi

#13730: Grammar mistake in Decimal documentation
http://bugs.python.org/issue13730  opened by zacherates

#13733: Change required to sysconfig.py for Python 2.7.2 on OS/2
http://bugs.python.org/issue13733  opened by Paul.Smedley

#13734: Add a generic directory walker method to avoid symlink attacks
http://bugs.python.org/issue13734  opened by hynek

#13736: urllib.request.urlopen leaks exceptions from socket and httpli
http://bugs.python.org/issue13736  opened by jmoy

#13737: bugs.python.org/review's Django settings file DEBUG=True
http://bugs.python.org/issue13737  opened by Bithin.A

#13740: winsound.SND_NOWAIT ignored on modern Windows platforms
http://bugs.python.org/issue13740  opened by bughunter2

#13742: Add a key parameter (like sorted) to heapq.merge
http://bugs.python.org/issue13742  opened by ssapin

#13743: xml.dom.minidom.Document class is not documented
http://bugs.python.org/issue13743  opened by sandro.tosi

#13744: raw byte strings are described in a confusing way
http://bugs.python.org/issue13744  opened by barry

#13745: configuring --with-dbmliborder=bdb doesn't build the gdbm exte
http://bugs.python.org/issue13745  opened by doko

#13746: ast.Tuple's have an inconsistent col_offset value
http://bugs.python.org/issue13746  opened by bronikkk

#13747: ssl_version documentation error
http://bugs.python.org/issue13747  opened by Ben.Darnell

#13749: socketserver can't stop
http://bugs.python.org/issue13749  opened by teamnoir

#13751: multiprocessing.pool hangs if any worker raises an Exception w
http://bugs.python.org/issue13751  opened by fmitha

#13752: add a str.casefold() method
http://bugs.python.org/issue13752  opened by benjamin.peterson

#13756: Python3.2.2 make fail on cygwin
http://bugs.python.org/issue13756  opened by holgerd00d

#13758: compile() should not encode 'filename' (at least on Windows)
http://bugs.python.org/issue13758  opened by terry.reedy

#13759: Python 3.2.2 Mac installer version doesn't accept multibyte ch
http://bugs.python.org/issue13759  opened by ats

#13760: ConfigParser exceptions are not pickleable
http://bugs.python.org/issue13760  opened by fmitha

#13761: Add flush keyword to print()
http://bugs.python.org/issue13761  opened by georg.brandl

#13763: rm obsolete reference in devguide
http://bugs.python.org/issue13763  opened by tshepang

#13764: Misc/build.sh is outdated... talks about svn
http://bugs.python.org/issue13764  opened by tshepang

#13766: explain the relationship between Lib/lib2to3/Grammar.txt and G
http://bugs.python.org/issue13766  opened by tshepang

#13768: Doc/tools/dailybuild.py available only on 2.7 branch
http://bugs.python.org/issue13768  opened by tshepang

#13769: json.dump(ensure_ascii=False) return str instead of unicode
http://bugs.python.org/issue13769  opened by mmarkk

#13770: python3  json: add ensure_ascii documentation
http://bugs.python.org/issue13770  opened by mmarkk

#13771: HTTPSConnection __init__ super implementation causes recursion
http://bugs.python.org/issue13771  opened by michael.mulich

#13772: listdir() doesn't work with non-trivial symlinks
http://bugs.python.org/issue13772  opened by pitrou

#13773: Support sqlite3 uri filenames
http://bugs.python.org/issue13773  opened by poq

#13774: json.loads raises a SystemError for invalid encoding on 2.7.2
http://bugs.python.org/issue13774  opened by Julian

#13775: Access Denied message on symlink creation misleading for an ex
http://bugs.python.org/issue13775  opened by santa4nt

#13777: socket: communicating with Mac OS X KEXT controls
http://bugs.python.org/issue13777  opened by goderbauer

#13779: os.walk: bottom-up
http://bugs.python.org/issue13779  opened by patrick.vrijlandt

#13780: make YieldFrom its own node

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Mark Dickinson
On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org wrote:
 How
 pathological the data needs to be before the collision counter triggers? I'd
 expect *very* pathological.

How pathological do you consider the set

   {1  n for n in range(2000)}

to be?  What about the set:

   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}

?  The  2000 elements of the latter set have only 61 distinct hash
values on 64-bit machine, so there will be over 2000 total collisions
involved in creating this set (though admittedly only around 30
collisions per hash value).

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 9:08 AM, Mark Dickinson dicki...@gmail.com wrote:

 On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum gu...@python.org
 wrote:
  How
  pathological the data needs to be before the collision counter triggers?
 I'd
  expect *very* pathological.

 How pathological do you consider the set

   {1  n for n in range(2000)}

 to be?  What about the set:

   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}

 ?  The  2000 elements of the latter set have only 61 distinct hash
 values on 64-bit machine, so there will be over 2000 total collisions
 involved in creating this set (though admittedly only around 30
 collisions per hash value).


Hm... So how does the collision counting work for this case?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 380 (yield from) is now Final

2012-01-13 Thread Antoine Pitrou
On Fri, 13 Jan 2012 22:14:43 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 I marked PEP 380 as Final this evening, after pushing the tested and
 documented implementation to hg.python.org:
 http://hg.python.org/cpython/rev/d64ac9ab4cd0

I don't know if this is supposed to work, but the exception looks wrong:

 def g(): yield from ()
... 
 f = list(g())
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 1, in g
SystemError: error return without exception set


Also, the checkin lacked a bytecode magic number bump. It is not really
a problem since I've just bumped it anyway.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Mark Dickinson
On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum gu...@python.org wrote:
 How pathological do you consider the set

   {1  n for n in range(2000)}

 to be?  What about the set:

   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}

 ?  The  2000 elements of the latter set have only 61 distinct hash
 values on 64-bit machine, so there will be over 2000 total collisions
 involved in creating this set (though admittedly only around 30
 collisions per hash value).

 Hm... So how does the collision counting work for this case?

Ah, my bad.  It looks like the ieee754_powers_of_two is safe---IIUC,
it's the number of collisions involved in a single key-set operation
that's limited.  So a dictionary with keys {1n for n in range(2000)}
is fine, but a dictionary with keys  {1(61*n) for n in range(2000)}
is not:

 {1(n*61):True for n in range(2000)}
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 1, in dictcomp
KeyError: 'too many hash collisions'
[67961 refs]

I'd still not consider this particularly pathological, though.

-- 
Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson dicki...@gmail.com wrote:

 On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum gu...@python.org
 wrote:
  How pathological do you consider the set
 
{1  n for n in range(2000)}
 
  to be?  What about the set:
 
ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}
 
  ?  The  2000 elements of the latter set have only 61 distinct hash
  values on 64-bit machine, so there will be over 2000 total collisions
  involved in creating this set (though admittedly only around 30
  collisions per hash value).
 
  Hm... So how does the collision counting work for this case?

 Ah, my bad.  It looks like the ieee754_powers_of_two is safe---IIUC,
 it's the number of collisions involved in a single key-set operation
 that's limited.  So a dictionary with keys {1n for n in range(2000)}
 is fine, but a dictionary with keys  {1(61*n) for n in range(2000)}
 is not:

  {1(n*61):True for n in range(2000)}
 Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 1, in dictcomp
 KeyError: 'too many hash collisions'
 [67961 refs]

 I'd still not consider this particularly pathological, though.


Really? Even though you came up with specifically to prove me wrong?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 380 (yield from) is now Final

2012-01-13 Thread Terry Reedy

On 1/13/2012 7:14 AM, Nick Coghlan wrote:

print(\n.join(list((lambda:(yield from (Cheers,, Nick)))(


I pulled, rebuilt, and it indeed works (on Win 7).

I just remembered that Tim Peters somewhere (generator.c?) left a large 
comment with examples of recursive generators, such as knight's tours. 
Could these be rewritten with (and benefit from) 'yield from'? (It 
occurs to me his stuff might be worth exposing in an iterator/generator 
how-to.)


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python as a Metro-style App

2012-01-13 Thread Dino Viehland
Dino wrote:
 Martin wrote:
  See the start of the thread: I tried to create a WinRT Component
  DLL, and that failed, as VS would refuse to compile any C file in
  such a project. Not sure whether this is triggered by defining
  WINAPI_FAMILY=2, or any other compiler setting.
 
  I'd really love to use WINAPI_FAMILY=2, as compiler errors are much
  easier to fix than verifier errors.
 
 ...

 I'm going to ping some people on the windows team and see if the app
 container bit is or will be necessary for DLLs.
 

I heard back from the Windows team and they are going to require the app 
container bit to be set on all PE files (although they don't currently enforce 
it).  
I was able to compile a simple .c file and pass /link /appcontainer and that 
worked, so I'm going to try and figure out if there's some way to get the 
.vcxproj 
to build a working command line that includes that.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Benjamin Peterson
2012/1/13 Guido van Rossum gu...@python.org:
 Really? Even though you came up with specifically to prove me wrong?

Coming up with a counterexample now invalidates it?



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Antoine Pitrou
On Thu, 12 Jan 2012 18:57:42 -0800
Guido van Rossum gu...@python.org wrote:
 Hm... I started out as a big fan of the randomized hash, but thinking more
 about it, I actually believe that the chances of some legitimate app having
 1000 collisions are way smaller than the chances that somebody's code will
 break due to the variable hashing.

Breaking due to variable hashing is deterministic: you notice it as
soon as you upgrade (and then you use PYTHONHASHSEED to disable
variable hashing). That seems better than unpredictable breaking when
some legitimate collision chain happens.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Victor Stinner
 - Glenn Linderman proposes to fix the vulnerability by adding a new
 safe dict type (only accepting string keys). His proof-of-concept
 (SafeDict.py) uses a secret of 64 random bits and uses it to compute
 the hash of a key.

We could mix Marc's collision counter with SafeDict idea (being able
to use a different secret for each dict): use hash(key, secret)
(simple example: hash(secret+key)) instead of hash(key) in dict (and
set), and change the secret if we have more than N collisions. But it
would slow down all dict lookup (dict creation, get, set, del, ...).
And getting new random data can also be slow.

SafeDict and hash(secret+key) lose the benefit of the cached hash
result. Because the hash result depends on a argument, we cannot cache
the result anymore, and we have to recompute the hash for each lookup
(even if you lookup the same key twice ore more).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.net wrote:

 On Thu, 12 Jan 2012 18:57:42 -0800
 Guido van Rossum gu...@python.org wrote:
  Hm... I started out as a big fan of the randomized hash, but thinking
 more
  about it, I actually believe that the chances of some legitimate app
 having
  1000 collisions are way smaller than the chances that somebody's code
 will
  break due to the variable hashing.

 Breaking due to variable hashing is deterministic: you notice it as
 soon as you upgrade (and then you use PYTHONHASHSEED to disable
 variable hashing). That seems better than unpredictable breaking when
 some legitimate collision chain happens.


Fair enough. But I'm now uncomfortable with turning this on for bugfix
releases. I'm fine with making this the default in 3.3, just not in 3.2,
3.1 or 2.x -- it will break too much code and organizations will have to
roll back the release or do extensive testing before installing a bugfix
release -- exactly what we *don't* want for those.

FWIW, I don't believe in the SafeDict solution -- you never know which
dicts you have to change.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum gu...@python.org wrote:

 On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.netwrote:

 On Thu, 12 Jan 2012 18:57:42 -0800
 Guido van Rossum gu...@python.org wrote:
  Hm... I started out as a big fan of the randomized hash, but thinking
 more
  about it, I actually believe that the chances of some legitimate app
 having
  1000 collisions are way smaller than the chances that somebody's code
 will
  break due to the variable hashing.

 Breaking due to variable hashing is deterministic: you notice it as
 soon as you upgrade (and then you use PYTHONHASHSEED to disable
 variable hashing). That seems better than unpredictable breaking when
 some legitimate collision chain happens.


 Fair enough. But I'm now uncomfortable with turning this on for bugfix
 releases. I'm fine with making this the default in 3.3, just not in 3.2,
 3.1 or 2.x -- it will break too much code and organizations will have to
 roll back the release or do extensive testing before installing a bugfix
 release -- exactly what we *don't* want for those.

 FWIW, I don't believe in the SafeDict solution -- you never know which
 dicts you have to change.


Agreed.

Of the three options Victor listed only one is good.

I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to always
get everything right with regards to data that came from outside the
process never ending up hashed in a non-safe dict or set *anywhere*.
 Safe needs to be the default option for all hash tables.

I don't like the *too many hash collisions* exception. *-1*. It provides
non-deterministic application behavior for data driven applications with no
way for them to predict when it'll happen or where and prepare for it. It
may work in practice for many applications but is simply odd behavior.

I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
back ported to any Python version.

It is perfectly okay to break existing users who had anything depending on
ordering of internal hash tables. Their code was already broken. We
*will*provide a flag and/or environment variable that can be set to
turn the
feature off at their own peril which they can use in their test harnesses
that are stupid enough to use doctests with order dependencies.

This approach worked fine for Perl 9 years ago.
https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Glenn Linderman

On 1/13/2012 5:35 PM, Victor Stinner wrote:

- Glenn Linderman proposes to fix the vulnerability by adding a new
safe dict type (only accepting string keys). His proof-of-concept
(SafeDict.py) uses a secret of 64 random bits and uses it to compute
the hash of a key.

We could mix Marc's collision counter with SafeDict idea (being able
to use a different secret for each dict): use hash(key, secret)
(simple example: hash(secret+key)) instead of hash(key) in dict (and
set), and change the secret if we have more than N collisions. But it
would slow down all dict lookup (dict creation, get, set, del, ...).
And getting new random data can also be slow.

SafeDict and hash(secret+key) lose the benefit of the cached hash
result. Because the hash result depends on a argument, we cannot cache
the result anymore, and we have to recompute the hash for each lookup
(even if you lookup the same key twice ore more).

Victor


So integrating SafeDict into dict so it could be automatically converted 
would mean changing the data structures underneath dict.  Given that, a 
technique for hash caching could be created, that isn't quite as good as 
the one in place, but may be less expensive than not caching the 
hashes.  It would also take more space, a second dict, internally, as 
well as the secret.


So once the collision counter reaches some threshold (since there would 
be a functional fallback, it could be much lower than 1000), the secret 
is obtained, and the keys are rehashed using hash(secret+key).  Now when 
lookups occur, the object id of the key and the hash of the key are used 
as the index and hash(secret+key) is stored as a cached value.  This 
would only benefit lookups by the same object, other objects with the 
same key value would be recalculated (at least the first time).  Some 
limit on the number of cached values would probably be appropriate.  
This would add complexity, of course, in trying to save time.


An alternate solution would be to convert a dict to a tree once the 
number of collisions produces poor performance.  Converting to a tree 
would result in O(log N) instead of O(1) lookup performance, but that is 
better than the degenerate case of O(N) which is produced by the 
excessive number of collisions resulting from an attack.  This would 
require new tree code to be included in the core, of course, probably a 
red-black tree, which stays balanced.


In either of these cases, the conversion is expensive, because a 
collision threshold must first be reached to determine the need for 
conversion, so the hash could already contain lots of data.  If it were 
too expensive, the attack could still be effective.


Another solution would be to change the collision code, so that 
colliding keys don't produce O(N) behavior, but some other behavior.  
Each colliding entry could convert that entry to a tree of entries, 
perhaps.  This would require no conversion of bad dicts, and an attack 
could at worst convert O(1) performance to O(log N).


Clearly these ideas are more complex than adding randomization, but 
adding randomization doesn't seem to be produce immunity from attack, 
when data about the randomness is leaked.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith


 Clearly these ideas are more complex than adding randomization, but adding
 randomization doesn't seem to be produce immunity from attack, when data
 about the randomness is leaked.


Which will not normally happen.

I'm firmly in the camp that believes the random seed can be probed and
determined by creatively injecting values and measuring timing of things.
 But doing that is difficult and time and bandwidth intensive so the per
process random hash seed is good enough.

There's another elephant in the room here, if you want to avoid this attack
use a 64-bit Python build as it uses 64-bit hash values that are
significantly more difficult to force a collision on.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
btw, Tim's commit message on this one is amusingly relevant. :)

 http://hg.python.org/cpython/diff/8d2f2cb9/Objects/dictobject.c


On Fri, Jan 13, 2012 at 6:25 PM, Gregory P. Smith g...@krypto.org wrote:


 Clearly these ideas are more complex than adding randomization, but
 adding randomization doesn't seem to be produce immunity from attack, when
 data about the randomness is leaked.


 Which will not normally happen.

 I'm firmly in the camp that believes the random seed can be probed and
 determined by creatively injecting values and measuring timing of things.
  But doing that is difficult and time and bandwidth intensive so the per
 process random hash seed is good enough.

 There's another elephant in the room here, if you want to avoid this
 attack use a 64-bit Python build as it uses 64-bit hash values that are
 significantly more difficult to force a collision on.

 -gps

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Steven D'Aprano

On 14/01/12 12:58, Gregory P. Smith wrote:


I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
back ported to any Python version.

It is perfectly okay to break existing users who had anything depending on
ordering of internal hash tables. Their code was already broken.


For the record:

steve@runes:~$ python -c print(hash('spam ham'))
-376510515
steve@runes:~$ jython -c print(hash('spam ham'))
2054637885

So it is already the case that Python code that assumes stable hashing is 
broken.

For what it's worth, I'm not convinced that we should be overly-concerned by 
poor saps (Guido's words) who rely on accidents of implementation regarding 
hash. We shouldn't break their code unless we have a good reason, but this 
strikes me as a good reason. The documentation for hash certainly makes no 
promise about stability, and relying on it strikes me as about as sensible as 
relying on the stability of error messages.


I'm also not convinced that the option to raise an exception after 1000 
collisions actually solves the problem. That relies on the application being 
re-written to catch the exception and recover from it (how?). Otherwise, all 
it does is change the attack vector from cause an indefinite number of hash 
collisions to cause 999 hash collisions followed by crashing the application 
with an exception, which doesn't strike me as much of an improvement.


+1 on random seeding. Default to on in 3.3+ and default to off in older 
versions, which allows people to avoid breaking their code until they're ready 
for it to be broken.




--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Gregory P. Smith
On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote:


 On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum gu...@python.orgwrote:

 On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou solip...@pitrou.netwrote:

 On Thu, 12 Jan 2012 18:57:42 -0800
 Guido van Rossum gu...@python.org wrote:
  Hm... I started out as a big fan of the randomized hash, but thinking
 more
  about it, I actually believe that the chances of some legitimate app
 having
  1000 collisions are way smaller than the chances that somebody's code
 will
  break due to the variable hashing.

 Breaking due to variable hashing is deterministic: you notice it as
 soon as you upgrade (and then you use PYTHONHASHSEED to disable
 variable hashing). That seems better than unpredictable breaking when
 some legitimate collision chain happens.


 Fair enough. But I'm now uncomfortable with turning this on for bugfix
 releases. I'm fine with making this the default in 3.3, just not in 3.2,
 3.1 or 2.x -- it will break too much code and organizations will have to
 roll back the release or do extensive testing before installing a bugfix
 release -- exactly what we *don't* want for those.

 FWIW, I don't believe in the SafeDict solution -- you never know which
 dicts you have to change.


 Agreed.

 Of the three options Victor listed only one is good.

 I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to
 always get everything right with regards to data that came from outside the
 process never ending up hashed in a non-safe dict or set *anywhere*.
  Safe needs to be the default option for all hash tables.

 I don't like the *too many hash collisions* exception. *-1*. It
 provides non-deterministic application behavior for data driven
 applications with no way for them to predict when it'll happen or where and
 prepare for it. It may work in practice for many applications but is simply
 odd behavior.

 I do like *randomly seeding the hash*. *+1*. This is easy. It can easily
 be back ported to any Python version.

 It is perfectly okay to break existing users who had anything depending on
 ordering of internal hash tables. Their code was already broken. We 
 *will*provide a flag and/or environment variable that can be set to turn the
 feature off at their own peril which they can use in their test harnesses
 that are stupid enough to use doctests with order dependencies.


What an implementation looks like:

 http://pastebin.com/9ydETTag

some stuff to be filled in, but this is all that is really required.  add
logic to allow a particular seed to be specified or forced to 0 from the
command line or environment.  add the logic to grab random bytes.  add the
autoconf glue to disable it.  done.

-gps


 This approach worked fine for Perl 9 years ago.
 https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371

 -gps

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sphinx version for Python 2.x docs

2012-01-13 Thread Éric Araujo

Hi Sandro,

Thanks for getting the ball rolling on this.  One style for markup, one
Sphinx version to code our extensions against and one location for the
documenting guidelines will make our work a bit easier.

During the build process, there are some warnings that I can 
understand:

I assume you mean “can’t”, as you later ask how to fix them.  As a
general rule, they’re only warnings, so they don’t break the build, 
only
some links or stylings, so I think it’s okay to ignore them *right 
now*.



Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
That’s a mistake I did in cefe4f38fa0e.  This sentence should be 
removed.


Doc/library/stdtypes.rst:2372: WARNING: more than one target found 
for

cross-reference u'next':
Need to use :meth:`.next` to let Sphinx find the right target (more 
info

on request :)


Doc/library/sys.rst:651: WARNING: unknown keyword: None

Should use ``None``.


Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in
Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not
I don’t know if these should work (i.e. create a link to the 
appropriate

language reference section) or abuse the markup (there are “not” and
“in” keywords, but no “not in” keyword → use ``not in``).  I’d say 
ignore

them.

Cheers
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread martin

What an implementation looks like:

 http://pastebin.com/9ydETTag

some stuff to be filled in, but this is all that is really required.


I think this statement (and the patch) is wrong. You also need to change
the byte string hashing, at least for 2.x. This I consider the biggest
flaw in that approach - other people may have written string-like objects
which continue to compare equal to a string but now hash different.

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Guido van Rossum
On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith g...@krypto.org wrote:

 It is perfectly okay to break existing users who had anything depending on
 ordering of internal hash tables. Their code was already broken. We 
 *will*provide a flag and/or environment variable that can be set to turn the
 feature off at their own peril which they can use in their test harnesses
 that are stupid enough to use doctests with order dependencies.


No, that is not how we usually take compatibility between bugfix releases.
Your code is already broken is not an argument to break forcefully what
worked (even if by happenstance) before. The difference between CPython and
Jython (or between different CPython feature releases) also isn't relevant
-- historically we have often bent over backwards to avoid changing
behavior that was technically undefined, if we believed it would affect a
significant fraction of users.

I don't think anyone doubts that this will break lots of code (at least,
the arguments I've heard have been their code is broken, not nobody does
that).

This approach worked fine for Perl 9 years ago.
 https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371


I don't know what the Perl attitude about breaking undefined behavior
between micro versions was at the time. But ours is pretty clear -- don't
do it.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: add test, which was missing from d64ac9ab4cd0

2012-01-13 Thread Nick Coghlan
On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson
python-check...@python.org wrote:
 http://hg.python.org/cpython/rev/be85914b611c
 changeset:   74363:be85914b611c
 parent:      74361:609482c6710e
 user:        Benjamin Peterson benja...@python.org
 date:        Fri Jan 13 14:39:38 2012 -0500
 summary:
  add test, which was missing from d64ac9ab4cd0

Ah, that's where that came from, thanks.

I still haven't fully trained myself to use hg import instead of
patch, which would avoid precisely this kind of error :P

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Terry Reedy

On 1/13/2012 8:58 PM, Gregory P. Smith wrote:


It is perfectly okay to break existing users who had anything depending
on ordering of internal hash tables. Their code was already broken.


Given that the doc says Return the hash value of the object, I do not 
think we should be so hard-nosed. The above clearly implies that there 
is such a thing as *the* Python hash value for an object. And indeed, 
that has been true across many versions. If we had written Return a 
hash value for the object, which can vary from run to run, the case 
would be different.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Jack Diederich
On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum gu...@python.org wrote:
 Hm... I started out as a big fan of the randomized hash, but thinking more
 about it, I actually believe that the chances of some legitimate app having
1000 collisions are way smaller than the chances that somebody's code will
 break due to the variable hashing.

Python's dicts are designed to avoid hash conflicts by resizing and
keeping the available slots bountiful.  1000 conflicts sounds like a
number that couldn't be hit accidentally unless you had a single dict
using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're
good).   The hashes also look to exploit cache locality but that is
very unlikely to get one thousand conflicts by chance.  If you get
that many there is an attack.

 This is depending on how the counting is done (I didn't look at MAL's
 patch), and assuming that increasing the hash table size will generally
 reduce collisions if items collide but their hashes are different.

The patch counts conflicts on an individual insert and not lifetime
conflicts.  Looks sane to me.

 That said, even with collision counting I'd like a way to disable it without
 changing the code, e.g. a flag or environment variable.

Agreed.  Paranoid people can turn the behavior off and if it ever were
to become a problem in practice we could point people to a solution.

-Jack
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682)

2012-01-13 Thread Nick Coghlan
On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl g.bra...@gmx.net wrote:
 On 01/13/2012 12:43 PM, nick.coghlan wrote:
 diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst

 There should probably be a versionadded somewhere on this page.

Good catch, I added versionchanged notes to this page, simple_stmts
and the StopIteration entry in the library reference.

  PEP 3155: Qualified name for classes and functions
  ==

 This looks like a spurious (and syntax-breaking) change.

Yeah, it was an error I introduced last time I merged from default. Fixed.

 diff --git a/Grammar/Grammar b/Grammar/Grammar
 -argument: test [comp_for] | test '=' test  # Really [keyword '='] test
 +argument: (test) [comp_for] | test '=' test  # Really [keyword '='] test

 This looks like a change without effect?

Fixed.

It was a lingering after-effect of Greg's original patch (which also
modified the function call syntax to allow yield from expressions
with extra parens). I reverted the change to the function call syntax,
but forgot to ditch the added parens while doing so.

 diff --git a/Include/genobject.h b/Include/genobject.h

 -     /* List of weak reference. */
 -     PyObject *gi_weakreflist;
 +        /* List of weak reference. */
 +        PyObject *gi_weakreflist;
  } PyGenObject;

 While these change tabs into spaces, it should be 4 spaces, not 8.

Fixed.

 +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);

 Does this API need to be public? If yes, it needs to be documented.

Hmm, good point - that one needs a bit of thought, so I've put it on
the tracker: http://bugs.python.org/issue13783

(that issue also covers your comments regarding the docstring for this
function and whether or not we even need the StopIteration instance
creation API)

 -#define CALL_FUNCTION        131     /* #args + (#kwargs8) */
 -#define MAKE_FUNCTION        132     /* #defaults + #kwdefaults8 + 
 #annotations16 */
 -#define BUILD_SLICE  133     /* Number of items */
 +#define CALL_FUNCTION   131     /* #args + (#kwargs8) */
 +#define MAKE_FUNCTION   132     /* #defaults + #kwdefaults8 + 
 #annotations16 */
 +#define BUILD_SLICE     133     /* Number of items */

 Not sure putting these and all the other cosmetic changes into an already
 big patch is such a good idea...

I agree, but it's one of the challenges of a long-lived branch like
the PEP 380 one (I believe some of these cosmetic changes started life
in Greg's original patch and separating them out would have been quite
a pain). Anyone that wants to see the gory details of the branch
history can take a look at my bitbucket repo:

https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29

 diff --git a/Objects/abstract.c b/Objects/abstract.c
 --- a/Objects/abstract.c
 +++ b/Objects/abstract.c
 @@ -2267,7 +2267,6 @@

      func = PyObject_GetAttrString(o, name);
      if (func == NULL) {
 -        PyErr_SetString(PyExc_AttributeError, name);
          return 0;
      }

 @@ -2311,7 +2310,6 @@

      func = PyObject_GetAttrString(o, name);
      if (func == NULL) {
 -        PyErr_SetString(PyExc_AttributeError, name);
          return 0;
      }
      va_start(va, format);

 These two changes also look suspiciously unrelated?

IIRC, I removed those lines while working on the patch because the
message they produce (just the attribute name) is worse than the one
produced by the call to PyObject_GetAttrString (which also includes
the type of the object being accessed). Leaving the original
exceptions alone helped me track down some failures I was getting at
the time.

I've now made the various CallMethod helper APIs in abstract.c (1
public, 3 private) consistently leave the GetAttr exception alone and
added an explicit C API note to NEWS.

(Vaguely related tangent: the new code added by the patch probably has
a few parts that could benefit from the new GetAttrId private API)

 diff --git a/Objects/genobject.c b/Objects/genobject.c
 +        } else {
 +            PyObject *e = PyStopIteration_Create(result);
 +            if (e != NULL) {
 +                PyErr_SetObject(PyExc_StopIteration, e);
 +                Py_DECREF(e);
 +            }

 Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here
 anyway?

I think you're right - so noted in the tracker issue about the C API additions.

Thanks for the thorough review, a fresh set of eyes is very helpful :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread Nick Coghlan
On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich jackd...@gmail.com wrote:
 This is depending on how the counting is done (I didn't look at MAL's
 patch), and assuming that increasing the hash table size will generally
 reduce collisions if items collide but their hashes are different.

 The patch counts conflicts on an individual insert and not lifetime
 conflicts.  Looks sane to me.

Having a hard limit on the worst-case behaviour certainly sounds like
an attractive prospect. And there's nothing to worry about in terms of
secrecy or sufficient randomness - by default, attackers cannot
generate more than 1000 hash collisions in one lookup, period.

 That said, even with collision counting I'd like a way to disable it without
 changing the code, e.g. a flag or environment variable.

 Agreed.  Paranoid people can turn the behavior off and if it ever were
 to become a problem in practice we could point people to a solution.

Does MAL's patch allow the limit to be set on a per-dict basis
(including setting it to None to disable collision limiting
completely)? If people have data sets that need to tolerate that kind
of collision level (and haven't already decided to move to a data
structure other than the builtin dict), then it may make sense to
allow them to remove the limit when using trusted input.

For maintenance versions though, it would definitely need to be
possible to switch it off without touching the code.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682)

2012-01-13 Thread Georg Brandl
On 01/14/2012 07:53 AM, Nick Coghlan wrote:

 +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);

 Does this API need to be public? If yes, it needs to be documented.
 
 Hmm, good point - that one needs a bit of thought, so I've put it on
 the tracker: http://bugs.python.org/issue13783
 
 (that issue also covers your comments regarding the docstring for this
 function and whether or not we even need the StopIteration instance
 creation API)

Great.

 -#define CALL_FUNCTION131 /* #args + (#kwargs8) */
 -#define MAKE_FUNCTION132 /* #defaults + #kwdefaults8 + 
 #annotations16 */
 -#define BUILD_SLICE  133 /* Number of items */
 +#define CALL_FUNCTION   131 /* #args + (#kwargs8) */
 +#define MAKE_FUNCTION   132 /* #defaults + #kwdefaults8 + 
 #annotations16 */
 +#define BUILD_SLICE 133 /* Number of items */

 Not sure putting these and all the other cosmetic changes into an already
 big patch is such a good idea...
 
 I agree, but it's one of the challenges of a long-lived branch like
 the PEP 380 one (I believe some of these cosmetic changes started life
 in Greg's original patch and separating them out would have been quite
 a pain). Anyone that wants to see the gory details of the branch
 history can take a look at my bitbucket repo:
 
 https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29

I see.  I hadn't followed the development of PEP 380 closely before.

In any case, it is probably a good idea to mention this branch URL in the
commit message in case it is meant to be kept permanently  (it would also be
possible to put only that branch of your sandbox into another clone at
hg.python.org).

 diff --git a/Objects/abstract.c b/Objects/abstract.c
 --- a/Objects/abstract.c
 +++ b/Objects/abstract.c
 @@ -2267,7 +2267,6 @@

  func = PyObject_GetAttrString(o, name);
  if (func == NULL) {
 -PyErr_SetString(PyExc_AttributeError, name);
  return 0;
  }

 @@ -2311,7 +2310,6 @@

  func = PyObject_GetAttrString(o, name);
  if (func == NULL) {
 -PyErr_SetString(PyExc_AttributeError, name);
  return 0;
  }
  va_start(va, format);

 These two changes also look suspiciously unrelated?
 
 IIRC, I removed those lines while working on the patch because the
 message they produce (just the attribute name) is worse than the one
 produced by the call to PyObject_GetAttrString (which also includes
 the type of the object being accessed). Leaving the original
 exceptions alone helped me track down some failures I was getting at
 the time.

I agree that it's useful.

 I've now made the various CallMethod helper APIs in abstract.c (1
 public, 3 private) consistently leave the GetAttr exception alone and
 added an explicit C API note to NEWS.
 
 (Vaguely related tangent: the new code added by the patch probably has
 a few parts that could benefit from the new GetAttrId private API)

Maybe another candidate for an issue, so that we don't forget?

cheers,
Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com