[issue13609] Add "os.get_terminal_size()" function

2012-01-05 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

Zitat von Antoine Pitrou :

> Antoine Pitrou  added the comment:
>
> The point of named tuples here is that you can use both
> get_terminal_size().columns
> or
> columns, rows = get_terminal_size()
> depending on the situation.

And my point is that we should make the choice which
of these is more obvious, and drop the other.

> Also, the better repr() makes debugging easier.

A class could still have a nice repr.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13642] urllib incorrectly quotes username and password in https basic auth

2012-01-05 Thread Joonas Kuorilehto

Joonas Kuorilehto  added the comment:

Updated patch for 2.7 hg tip attached. Please review, test and if ok, port to 
3.x.

I guess the URL needs to be quoted so commented out the assertion for the URL 
being equal. I added unquote in the base64 encoding of the password, which 
makes the test pass. Seems to work for me and no urllib tests were broken. Did 
not run others.

http://test.webdav.org/ has some basic auth test accounts configured if you 
want to try it out. You can use wireshark to grab the base64 from the 
unencrypted http. Fancy opener works for me with this now, too.

--
Added file: 
http://bugs.python.org/file24148/tests-and-fakehttp-request-storing-2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

Those who use or advocate a simple randomized starting hash (Perl, Ruby, 
perhaps MS, and the CCC presenters) are presuming that the randomized hash 
values are kept private. Indeed, they should be (and the docs could note this) 
unless an attacker has direct access to the interpreter. An attacker who does, 
as in a Python programming class, can much more easily freeze the interpreter 
by 'accidentally' writing code equivalent to "while True: pass".

I do not think we, as Python developers, should be concerned about esoteric 
timing attacks. They strike me as a site issue rather than a language issue. As 
I understand them, they require *large* numbers of probes coupled with 
responses based on the same hash function. So a site being so probed already 
has bit of a problem. And if hashing were randomized per process, and probes 
were randomly distributed among processes, and processes were periodically 
killed and restarted with new seeds, could such an attack get anywhere (besides 
the DOS effect of the probing)? The point of the CCC talk was that with one 
constant known hash, one could lock up a server for a long while with just one 
upload.

So I think we should copy Perl and Ruby, do the easy thing, and add a random 
seed to 3.3 hashing, subject to keeping equality for equal numbers. Let 
whatever thereby fails, fail, and be updated. For prior versions, add an option 
for strings and perhaps numbers, and document that some tests will fail if 
enabled.

We could also consider, for 3.3, making the output of hash() be different from 
the internal values used for dicts, perhaps by switching random seeds in 
hash(). So even if someone does return hash(x) values to potential attackers, 
they are not the values used in dicts. (This would require a slight change in 
the doc.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Paul McMillan

Paul McMillan  added the comment:

As Alex said, Java has refused to fix the issue.

I believe that Ruby 1.9 (at least the master branch code that I looked
at) is using murmurhash2 with a random seed.

In either case, yes, these functions are vulnerable to a number of
attacks. We're solving the problem more completely than they did.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13701] Remove Decimal Python 2.3 Compatibility

2012-01-05 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Mark, do you think the C version of decimal is going to happen for 3.3?

If so, it make little sense to make any changes at all the current version of 
the pure python code.

Another advantage to leaving the pure python code alone is that it will make 
maintenance easier if we have to make updates (remember, if the spec gets 
updated, that will be considered a bug fix and backported).

--
priority: normal -> low

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13642] urllib incorrectly quotes username and password in https basic auth

2012-01-05 Thread Joonas Kuorilehto

Joonas Kuorilehto  added the comment:

> Regarding unittests instead, there is already a method called
> test_userpass_inurl which could be extended with some tests on a
> password containing spaces ( Lib/test/test_urllib.py:263). But what
> I haven't yet understood is: does it really exists a user:pass in
> python.org?

Note Lib/test/test_urllib.py:261 ; there is a fake HTTP wrapper in the test. So 
the request is not really sent.

I modified FakeHTTPConnection to store the sent HTTP request. I also copied the 
test you mentioned from python3 to 2.7. The second test I add in the patch 
fails. The test should pass with python2.5 from OS X (did not run the test but 
checked headers against netcat).

Please take a look at the tests I added. I'm not sure if geturl() should return 
the quoted version or not. But certainly the quoted version must not be used in 
the base64. If you think geturl() should return the quoted version, I'm fine 
with that - in principle characters like \n in the password could be bad in an 
URL unless quoted.

Maybe the tests could ALSO be added to some other places, but I think this full 
path makes sense to check like this.

--
keywords: +patch
Added file: 
http://bugs.python.org/file24147/tests-and-fakehttp-request-storing.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8184] multiprocessing.managers will not fail if listening ocket already in use

2012-01-05 Thread Phill

Phill  added the comment:

Rather than listening on a socket, listening on a named pipe

eg:
address = (r'\\.\pipe\Test', 'AF_PIPE')
listener = Listener(*address)
conn = listener.accept()

It doesnt raise an exception when i run the script again a second time.

Like I said, I dont know much about named pipes and im not even sure thats how 
they are intended to work in this context. IE: if one process is listening, can 
another listen on that named pipe as well?

Phill

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Alex Gaynor

Alex Gaynor  added the comment:

Perl is so paranoid they obscure their variable names!  In all seriousness, 
both Perl and Ruby are vulnerable to the timing attacks, and as far as I know 
the JVM is not patching this themselves, but telling applications to fix it 
themselves (I know JRuby switched to Murmurhash).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Christian Heimes

Christian Heimes  added the comment:

Either we are really paranoid (I know that I am *g*) or Perl's and Ruby's 
randomized hashing function suffer from the issues we are worried about. They 
don't compensate for hash(''), hash(n * '\0') or hash(shortstring).

Perl 5.12.4 hv.h:

#define PERL_HASH(hash,str,len) \
 STMT_START { \
register const char * const s_PeRlHaSh_tmp = str; \
register const unsigned char *s_PeRlHaSh = (const unsigned char 
*)s_PeRlHaSh_tmp; \
register I32 i_PeRlHaSh = len; \
register U32 hash_PeRlHaSh = PERL_HASH_SEED; \
while (i_PeRlHaSh--) { \
hash_PeRlHaSh += *s_PeRlHaSh++; \
hash_PeRlHaSh += (hash_PeRlHaSh << 10); \
hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6); \
} \
hash_PeRlHaSh += (hash_PeRlHaSh << 3); \
hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11); \
(hash) = (hash_PeRlHaSh + (hash_PeRlHaSh << 15)); \
} STMT_END

Ruby 1.8.7-p357 st.c:strhash()

#define CHAR_BIT 8
hash_seed = rb_genrand_int32() # Mersenne Twister

register unsigned long val = hash_seed;

while ((c = *string++) != '\0') {
val = val*997 + c;
val = (val << 13) | (val >> (sizeof(st_data_t) * CHAR_BIT - 13));
}

return val + (val>>5);

I wasn't able to find Java's fix quickly. Anybody else?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread STINNER Victor

STINNER Victor  added the comment:

"Given that a user has an application with an oracle function that returns the 
hash of a unicode string, an attacker can probe tenth of thousand one and two 
character unicode strings. That should give him/her enough data to calculate 
both seeds. hash("") already gives away lots of infomration about the seeds, 
too."

Sorry, but I don't see how you compute the secret using these data.

You are right, hash("\0") gives some information about the secret. With my 
patch, hash("\0")^1 gives: ((prefix * 103) & HASH_MASK) ^ suffix.

(hash("\0")^1) ^ (hash("\0\0")^2) gives ((prefix * 103) & HASH_MASK) ^ 
((prefix * 103**2)  & HASH_MASK).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8184] multiprocessing.managers will not fail if listening ocket already in use

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> im not sure if thats how named pipes are supposed to behave though

I'm not sure what you mean. Are you creating named pipes yourself?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8184] multiprocessing.managers will not fail if listening ocket already in use

2012-01-05 Thread Phill

Phill  added the comment:

I have commented out the line:
self._socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

In lib/multiprocessing/connection.py

as a test and it works fine, the problem still persists for named pipes (im not 
sure if thats how named pipes are supposed to behave though)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13609] Add "os.get_terminal_size()" function

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> I don't think that it's possible that stdin, stdout and/or stderr have
> its own terminal. I suppose that the 3 streams are always attached to
> the same terminal. So I don't see why the function would take an
> argument. Tell me if I am wrong.

I think it can be useful in case the program creates its own
session/terminal using openpty?

> Instead of using sys.__stdout__.fileno(), you can directly use 1
> because Python always create sys.__stdout__ from the file descriptor
> 1.

>From pythonrun.c:

/* Set sys.stdin */
fd = fileno(stdin);
[...]
/* Set sys.stdout */
fd = fileno(stdout);
[...]
/* Set sys.stderr, replaces the preliminary stderr */
fd = fileno(stderr);

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread STINNER Victor

STINNER Victor  added the comment:

Note for myself, random-2.patch: _PyRandom_Init() must generate a prefix and a 
suffix different than zero (call PyOS_URandom in a loop, and fail after 100 
tries).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8184] multiprocessing.managers will not fail if listening ocket already in use

2012-01-05 Thread Phill

Phill  added the comment:

Normally I would be happy to but my combined python experience is about 30 
minutes so I am probably not the man

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13609] Add "os.get_terminal_size()" function

2012-01-05 Thread STINNER Victor

STINNER Victor  added the comment:

Some comments about termsize.diff.3.

I don't see why there are two functions, one should be enough: 
get_terminal_size() should be dropped, and query_terminal_size() renamed to 
get_terminal_size(). As said before, I don't think that reading ROWS and 
COLUMNS environment variables is useful. If a program chose to rely on these 
variables, it can reimplement its own "try env var or fallback on 
get_terminal_size()" function.

get_terminal_size() should not have a fallback value: it should raise an error, 
and the caller is responsible to decide what to do in this case (e.g. catch the 
exception and use its own default value). Most functions in the posix module 
work like this, and it avoids the difficult choice of the right default value. 
(fallback=None is an hack to avoid an exception, it's not the pythonic.)

I don't think that it's possible that stdin, stdout and/or stderr have its own 
terminal. I suppose that the 3 streams are always attached to the same 
terminal. So I don't see why the function would take an argument. Tell me if I 
am wrong.

Instead of using sys.__stdout__.fileno(), you can directly use 1 because Python 
always create sys.__stdout__ from the file descriptor 1.

I think that a tuple (columns, rows) would be just fine. A namedtuple helps 
when you have a variable number of fields, or more than 3 fields, but here you 
just have 2 fields, it's not too much difficult to remember which one contains 
the columns.

I would prefer an optional function, instead of implementing a function raising 
a NotImplementedError. All other posix functions are defined like this.

ioctl() is already exposed in the fcntl module, I don't see a specific test for 
 header. It looks like the module is always compiled on Unix, I 
don't see how fcntl and ioctl are tested in setup.py.

I don't think that you need  to get GetConsoleScreenBufferInfo(), 
 should be enough. So just check for "#ifdef MS_WINDOWS".

Your function is helpful, and it is surprising that nobody proposed to 
implement it in Python. Some libraries did already implement their own function 
(like the "py" library).

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread STINNER Victor

STINNER Victor  added the comment:

> What I propose is to make the amount of information necessary
> to analyze and generate collisions impractically large.

Not only: the attacker has to compute the collisions for the new seed. I don't 
know how long it is, the code to generate collisions is not public yet. I 
suppose than generating collisions is longer if we change the hash function to 
add more instructions (I don't know how much).

If generating the collisions requires a farm of computers / GPUs / something 
else and 7 days, it doesn't matter if it's easy to retreive the secret.

If the attack wants to precompute collisions for all possible seeds, (s)he will 
also have to store them. With 64 bits of entropy, if an attack is 1 byte long, 
you have to store 2^64 bytes (16,777,216 TB).

It is a problem if it takes less than a day with a desktop PC to generate data 
for an attack. In this case, it should be difficult to compute the secret.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9993] shutil.move fails on symlink source

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Thanks! I think we also need a doc update for the change in behaviour (with a 
"versionchanged" tag).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12415] Missing: How to checkout the Doc sources

2012-01-05 Thread Sandro Tosi

Sandro Tosi  added the comment:

Hi Éric, did you reconsider the text of the patch or this issue just passed 
under your radar?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9993] shutil.move fails on symlink source

2012-01-05 Thread Hynek Schlawack

Hynek Schlawack  added the comment:

I took the liberty to fix the tests.

Basically I've adapted them to the new mock based cross file system approach 
(that doesn't depend on luck anymore :)).

I also had to add one more `os.path.realpath` because on some OS (like OS X) 
the tmp directory path already consists of symlinks.

I didn't touch the actual code.

Tested on OS X and Ubuntu Linux (Oneiric).

--
Added file: http://bugs.python.org/file24146/shutil_move_symlinks.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13609] Add "os.get_terminal_size()" function

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

The point of named tuples here is that you can use both
get_terminal_size().columns
or
columns, rows = get_terminal_size()
depending on the situation.
Also, the better repr() makes debugging easier.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Paul McMillan

Paul McMillan  added the comment:

Marc-Andre: Victor already pasted the relevant part of my code:
http://bugs.python.org/issue13703#msg150568
The link to the fuller version, with revision history and a copy of the code 
before I modified it is here:
https://gist.github.com/0a91e52efa74f61858b5

>Why? The attack doesn't work with short strings? What do you call a "short 
>string"?

Well, the demonstrated collision is for 16 character ascii strings. Worst case 
UTF-8, we're looking at 3 manipulable bytes per character, but they may be 
harder to collide since some of those bytes are fixed.

> only be making it harder for script kiddies, but as soon as someone
> crypt-analysis the used hash algorithm, you're lost again.

Not true. What I propose is to make the amount of information necessary to 
analyze and generate collisions impractically large. My proposed hash function 
is certainly broken if you brute force the lookup table. There are undoubtedly 
other problems with it too. The point is that it's hard enough. We aren't going 
for perfect security - we're going for enough to make this attack impractical.

What are the downsides to counting collisions? For one thing, it's something 
that needs to be kept track of on a per-dict basis, and can't be cached the way 
the hash results are. How do you choose a good value for the limit? If you set 
it to something conservative, you still pay the collision price every time a 
dict  is created to discover that the keys collide. This means that it's 
possible to feed to bad data up to exactly the limit, and suddenly the python 
app is inexplicably slow. If you set the limit too aggressively, then sometimes 
valid data gets caught, and python randomly dies in hard to debug ways with an 
error the programmer has never seen in testing and cannot reproduce.

It adds a new way to kill most python applications, and so programs are going 
to have to be re-written to cope with it. It also introduces a new place to 
cause errors - if the WSGI server dies, it's hard for my application to catch 
that and recover gracefully.

>... not in Python itself, but if you consider all the types in Python
> extensions and classes implementing __hash__ in user code, the number
> of hash functions to fix quickly becomes unmanageable.

When we looked at the Django project, we wouldn't have anything to fix since 
ours end up relying on the python internal values eventually. I suspect a lot 
of other code is similar.

Mark said:
>What is the mechanism by which the attacker can determine the seeds?

The biggest information leak is probably the ordering in which dict entries are 
returned. This can be used to deduce the underlying hash values. This is much 
easier than trying to do it via timing.

> But that's not the issue we are supposed to be dealing with.
> A single (genuinely random) seed will deal with the attack described in 
> the talk and it is (almost) as fast as using 0 as a seed.

This is not true. A single random seed shifts the hash table, but does not 
actually prevent an attacker from generating collisions. Please see my other 
posts on the topic here and on the mailing list.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13609] Add "os.get_terminal_size()" function

2012-01-05 Thread Zbyszek Szmek

Zbyszek Szmek  added the comment:

> I haven't read much of this issue, but I strongly dislike the use of 
> named tuples.
I don't really have a very strong opinion, but (cols, rows) does seem a lot 
like a tuple -- it really is just a pair of values without other function or 
state. Still I would much prefer to say
  get_terminal_size().columns
than
  get_terminal_size()[0]
So a bare tuple seems like the worst choice.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2012-01-05 Thread Benjamin Peterson

Benjamin Peterson  added the comment:

Closing now.

--
nosy: +benjamin.peterson
resolution:  -> out of date
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build

2012-01-05 Thread Benjamin Peterson

Benjamin Peterson  added the comment:

I'm just going to close this and say "use 3.3".

--
nosy: +benjamin.peterson
resolution:  -> out of date
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13701] Remove Decimal Python 2.3 Compatibility

2012-01-05 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

I suggest to remove all mentionings of 2.3 compatibility from the file (I 
actually could find only a single one), for 3.3. Changing 3.2 is out of scope, 
as it isn't a bug fix (except that the one place referring to 2.3 claims that 
there is a comment at the top of the file which actually doesn't exist anymore).

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13609] Add "os.get_terminal_size()" function

2012-01-05 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

I haven't read much of this issue, but I strongly dislike the use of named 
tuples. Either we want people to use named fields, then we should use a regular 
class (possibly with slots), or we want to define the result as two values, 
then there should be a plain tuple result.

named tuples should only be used as a compatibility mechanism, when the first 
design used tuples, and it was later found that additional values need to be 
returned which would change the number of values (or the original design was 
considered bad because it returned too many positional values to properly keep 
track).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13697] python RLock implementation unsafe with signals

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> The first and most simple thing we could do would be to nuke the
> Python version (and also the loggin hack at the same time): does that
> sound reasonable ?

To me, yes, but I think it's better to ask on python-dev as for nuking
the Python version.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13511] Specifying multiple lib and include directories on linux

2012-01-05 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

See http://www.gnu.org/software/autoconf/manual/autoconf.html
for a description of the includedir and libdir options:

— Variable: includedir
The directory for installing C header files.
— Variable: libdir
The directory for installing object code libraries.

So it just doesn't make sense to have multiple directories in these options, 
and you shouldn't be passing directories that that you want to be searched.

Closing this report as invalid.

--
resolution:  -> invalid
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13717] print fails on unicode '\udce5' surrogates not allowed

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

The file tree contains a file which has an undecodable character in it. It ends 
up mangled as specified in PEP 383.
Priting such filenames is not directly supported (since they have invalid 
characters in them), but you can workaround it in several ways, for example 
escaping all non-ASCII chars: `print(ascii(f))`.

(note that opening the file will still work fine; only outputting the filename 
without special care will fail)

Python 2 is different since it doesn't attempt to decode filenames at all, it 
just treats them as opaque bytes.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13717] print fails on unicode '\udce5' surrogates not allowed

2012-01-05 Thread Ezio Melotti

Ezio Melotti  added the comment:

On Python 3, os.walk() uses the surrogateescape error handler.  If the filename 
is in e.g. iso-8859-* and the filesystem encoding is UTF-8, decoding '\xe5' 
will then result in '\udce5', and '\udce5' can't then be printed because it's a 
lone surrogate.

See also 
http://docs.python.org/dev/library/os.html#file-names-command-line-arguments-and-environment-variables

--
resolution:  -> invalid
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Glenn Linderman

Changes by Glenn Linderman :


--
nosy: +v+python

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13717] print fails on unicode '\udce5' surrogates not allowed

2012-01-05 Thread Atle Pedersen

New submission from Atle Pedersen :

I've made a short program to traverse file tree and print file names.

for root, dirs, files in os.walk(path):
for f in files:
hex = ' '.join(["%02X"%ord(x) for x in f])
print('file is',hex,f)

This fails with the following file:

file is 67 72 DCE5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C Traceback (most 
recent call last):
  File "/home/atle/bin/findpictures.py", line 16, in 
print('file is',hexa,f)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce5' in position 
2: surrogates not allowed

I don't really understand the issue, but this works with Python 2, and fails 
using 3.1.4 (gentoo: dev-lang/python-3.1.4-r3)

Same code using Python 2.7.2 gives:
('file is', '67 72 E5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C', 
'gr\xe5kallen.jpg.html')

--
components: Unicode
messages: 150684
nosy: Atle.Pedersen, ezio.melotti
priority: normal
severity: normal
status: open
title: print fails on unicode '\udce5' surrogates not allowed
type: behavior
versions: Python 3.1

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13716] distutils doc contains lots of XXX

2012-01-05 Thread Florent Xicluna

Changes by Florent Xicluna :


--
nosy: +eric.araujo, tarek
versions: +Python 3.2, Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13716] distutils doc contains lots of XXX

2012-01-05 Thread Florent Xicluna

New submission from Florent Xicluna :

http://docs.python.org/distutils/apiref.html?highlight=XXX#module-distutils.ccompiler

We find lots of "XXX" and "XXX see also." which give no information.

--
assignee: docs@python
components: Documentation
messages: 150683
nosy: docs@python, flox
priority: normal
severity: normal
stage: needs patch
status: open
title: distutils doc contains lots of XXX
type: behavior
versions: Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12042] What's New multiprocessing example error

2012-01-05 Thread Sandro Tosi

Sandro Tosi  added the comment:

Thanks Davi for the report and Jordan for the patch! Jordan, a tip for your 
(hopefully) future contributions: for doc patches, please don't re-indent the 
whole paragraph, since it makes harder to identify the actual changes, just 
change what is needed, and the committer will take care (if needed) to re 
indent the paragraph.

I've re-edited a bit and committed the fix in the active branches.

--
nosy: +sandro.tosi
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed
versions:  -Python 3.1

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12042] What's New multiprocessing example error

2012-01-05 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset 3353f9747a39 by Sandro Tosi in branch '2.7':
Issue #12042: a queue is only used to retrive results; preliminary patch by 
Jordan Stadler
http://hg.python.org/cpython/rev/3353f9747a39

New changeset 0d4bb1356f39 by Sandro Tosi in branch '3.2':
Issue #12042: a queue is only used to retrive results; preliminary patch by 
Jordan Stadler
http://hg.python.org/cpython/rev/0d4bb1356f39

New changeset e379617b4c4c by Sandro Tosi in branch 'default':
Issue #12042: merge with 3.2
http://hg.python.org/cpython/rev/e379617b4c4c

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11984] Wrong "See also" in symbol and token module docs

2012-01-05 Thread Sandro Tosi

Sandro Tosi  added the comment:

Hi Davi, thanks for your report! I've removed the reference from symbol to 
parser, given this last module doesn't show anymore how to user symbol, and 
it's encouraging to use ast module instead.

--
nosy: +sandro.tosi
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed
versions:  -Python 3.1

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11984] Wrong "See also" in symbol and token module docs

2012-01-05 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset d8102ccc5bf7 by Sandro Tosi in branch '2.7':
Issue #11984: remove reference to parser, it's not showing symbol usage anymore
http://hg.python.org/cpython/rev/d8102ccc5bf7

New changeset b326d90ce9c9 by Sandro Tosi in branch '3.2':
Issue #11984: remove reference to parser, it's not showing symbol usage anymore
http://hg.python.org/cpython/rev/b326d90ce9c9

New changeset f375a1be031c by Sandro Tosi in branch 'default':
Issue #11984: merge with 3.2
http://hg.python.org/cpython/rev/f375a1be031c

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12926] tarfile tarinfo.extract*() broken with symlinks

2012-01-05 Thread Lars Gustäbel

Lars Gustäbel  added the comment:

This should be fixed now, thanks.

--
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed
versions: +Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12926] tarfile tarinfo.extract*() broken with symlinks

2012-01-05 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset 573fc99873bd by Lars Gustäbel in branch '3.2':
Issue #12926: Fix a bug in tarfile's link extraction.
http://hg.python.org/cpython/rev/573fc99873bd

New changeset 5936c2005ab7 by Lars Gustäbel in branch 'default':
Merge from 3.2: Issue #12926: Fix a bug in tarfile's link extraction.
http://hg.python.org/cpython/rev/5936c2005ab7

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13697] python RLock implementation unsafe with signals

2012-01-05 Thread Charles-François Natali

Charles-François Natali  added the comment:

> Hmm, but that would break single-threaded programs which expect their
> select() (or other) to return EINTR when a signal is received (which is
> a perfectly valid expectation in that case).

Yes, that's why I said "that"s another story" ;-)
EINTR is really a pain, and relying on it to return from a syscall
upon signal reception is a bad idea (certain OS restart syscalls by
default - not select() though - and if the signal is received before
you call the syscall, you'll deadlock). This would IMHO be the best
way to go, but I know we can't reasonably change this now.

> I don't know if that's still useful to build Python without threads. I
> would expect most platforms to have a compatible threads implementation
> (and Python probably can't run on very small embedded platforms).
> Perhaps you can ask on python-dev.

There are another problems, for example it's very well known that
signals and threads don't mix well (so for example you'd have to block
all signals before spawning the new thread, and reenable them
afterwards).
I'm not sure it's worth the extra complication. I can still try to
write a quick patch to see if it gets somewhere (and doesn't break
test_threading and test_signals).

The first and most simple thing we could do would be to nuke the
Python version (and also the loggin hack at the same time): does that
sound reasonable ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8184] multiprocessing.managers will not fail if listening ocket already in use

2012-01-05 Thread Charles-François Natali

Charles-François Natali  added the comment:

> If the above gets solved on windows my problem will just go away, thanks

Would you like to propose a patch with test?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12760] Add create mode to open()

2012-01-05 Thread Charles-François Natali

Charles-François Natali  added the comment:

I've done a small review.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13715] typo in unicodedata documentation

2012-01-05 Thread Ezio Melotti

Changes by Ezio Melotti :


--
assignee: docs@python -> ezio.melotti
nosy: +ezio.melotti
stage:  -> patch review
type:  -> enhancement
versions:  -Python 2.6, Python 3.1, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13702] relative symlinks in tarfile.extract broken (windows)

2012-01-05 Thread Lars Gustäbel

Lars Gustäbel  added the comment:

The dereference option is only used for archive creation, so the contents of 
the file a symbolic link is pointing to is added instead of the symbolic link 
itself.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13702] relative symlinks in tarfile.extract broken (windows)

2012-01-05 Thread Patrick von Reth

Patrick von Reth  added the comment:

to ignore the bug I also tried dereference=True, but it looks like python3 is 
ignoring it for extraction.
Is this the normal behavior or just another bug?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13702] relative symlinks in tarfile.extract broken (windows)

2012-01-05 Thread Lars Gustäbel

Lars Gustäbel  added the comment:

You actually hit two bugs at the same time here: The target of the created 
symlink was not translated from unix to windows path delimiters and is 
therefore broken. The second bug is issue12926 which leads to the error in 
TarFile.makefile(). 

Brian, AFAIK all file-specific functions on windows accept forward slashes in 
pathnames, right? Has this been discussed in the course of the windows 
implementation of os.symlink()? I could certainly fix the slash translation in 
tarfile.py, but may be it's os.symlink() that should been fixed.

--
dependencies: +tarfile tarinfo.extract*() broken with symlinks

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13715] typo in unicodedata documentation

2012-01-05 Thread Eli Collins

New submission from Eli Collins :

I noticed a minor typo in the unicodedata.normalize() documentation...

The line reading 'U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C)' 
is not proper unicode, it should be in the reverse order: 'U+0043 (LATIN 
CAPITAL LETTER C) U+0327 (COMBINING CEDILLA)'.

attached is a small patch to fix this.

--
assignee: docs@python
components: Documentation
files: docfix.patch
keywords: patch
messages: 150670
nosy: docs@python, eli.collins
priority: normal
severity: normal
status: open
title: typo in unicodedata documentation
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file24145/docfix.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13128] httplib debuglevel on CONNECT doesn't print response headers

2012-01-05 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +orsenthil
versions: +Python 3.3 -Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13712] pysetup create should not convert package_data to extra_files

2012-01-05 Thread Erik Bray

Erik Bray  added the comment:

FWIW, I'm for the first option for specifying package_data:

[files]
package_data =
spam = first second third

I'm pretty sure this is how I ended up implementing it in d2to1, since I needed 
this functionality.

Theoretically spaces could be supported with an escape sequence, but I don't 
think that's worth complicating things for if package_data is deprecated 
anyways.  I'm all for making it difficult for anyone trying to include 
filenames with spaces in their source code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> I concur with Marc. The change is too intrusive and may cause too much
> trouble for the issue.

Do you know if mod_wsgi et al. are tackling the issue on their side?

> Also it seems to be unnecessary for platforms with 64bit hash.

We still support Python on 32-bit platforms, so this can't be a serious
argument.
If you think that no-one runs a server on a 32-bit kernel nowadays, I
would point out that "no-one" apparently doesn't include ourselves ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13699] test_gdb has recently started failing

2012-01-05 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset a3d4cde1c357 by Vinay Sajip in branch '3.2':
Closes #13699. Skipped two tests if Python is optimised.
http://hg.python.org/cpython/rev/a3d4cde1c357

New changeset 7d87ebbbd718 by Vinay Sajip in branch 'default':
Closes #13699: merged fix from 3.2.
http://hg.python.org/cpython/rev/7d87ebbbd718

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Giampaolo Rodola'

Giampaolo Rodola'  added the comment:

Python can't do that. It's a socket implementation detail. Python just exposes 
the underlying socket implementation as-is.
I'm closing this out as rejected.

--
assignee:  -> giampaolo.rodola
resolution:  -> rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Mark Shannon

Mark Shannon  added the comment:

But that's not the issue we are supposed to be dealing with.
A single (genuinely random) seed will deal with the attack described in 
the talk and it is (almost) as fast as using 0 as a seed.
Why make things complicated dealing with a hypothetical problem?

>> Why should hash("") always return 0?
>> I can't find it in the docs anywhere.
> 
> hash("") should return something constant that doesn't reveal information 
> about the random seeds. 0 is an arbitrary choice that is as good as anything 
> else. hash("") already returns 0, hence my suggestion for 0.

Is special casing arbitrary values really any more secure?
If we special case "", the attacker will just start using "\0" and so on...

> 
> --
> 
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Sworddragon

Sworddragon  added the comment:

If i set the timeout argument an exception s thrown if the ip address is 
changed. At least it's a workaround but we should think about if Python 
shouldn't try to detect changes of the ip address.

It would be nicer to continue the file transfer like it does if the connection 
gets lost without a change of the ip address instead of sending the complete 
data again.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Giampaolo Rodola'

Giampaolo Rodola'  added the comment:

Since you say the connection hangs I think you can set a timeout:

>>> ftp = ftplib.FTP(..., timeout=30)

That is applied to both control and data connection (and hence storbinary). 
This way you should get a socket.timeout exception after 30 seconds.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Christian Heimes

Christian Heimes  added the comment:

It's quite possible that a user has created a function (by mistake or 
deliberately) that gives away the hash of an arbitrary string. We haven't 
taught developers that (s)he shouldn't disclose the hash of a string.

> Why should hash("") always return 0?
> I can't find it in the docs anywhere.

hash("") should return something constant that doesn't reveal information about 
the random seeds. 0 is an arbitrary choice that is as good as anything else. 
hash("") already returns 0, hence my suggestion for 0.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Sworddragon

Sworddragon  added the comment:

The problem is that it is for example here in germany very common that the 
provider disconnects the client every 24 hours and gives him a new ip address 
if his router reconnects. This makes it very difficult to send big files with 
ftplib.

For example for daily backups I have written an automatic backup script which 
uses ftplib. The transfer needs some hours and very often it fails (it silently 
never ends) because I got a new ip address.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Giampaolo Rodola'

Giampaolo Rodola'  added the comment:

What storbinary does is just using a socket to send data.
There's no way for storbinary to ask the socket whether an unpredicted event 
such as an IP change occurred and neither it should.

As a user, you just shouldn't change the IP address while a network app is 
running on that network interface and expect it to keep working or raise an 
exception.  
The consequences are unpredictable and are probably subject to change depending 
on what platform you're on.

In summary, this is not a problem which should be dealt with by base ftplib or 
any other network lib in the stdlib.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Mark Shannon

Mark Shannon  added the comment:

What is the mechanism by which the attacker can determine the seeds?
The actual hash value is not directly observable externally.
The attacker can only determine the timing effects of multiple 
insertions into a dict, or have I missed something?

> - hash("") should always return 0

Why should hash("") always return 0?
I can't find it in the docs anywhere.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Sworddragon

Sworddragon  added the comment:

If the connection gets lost and reconnected again but the ip address doesn't 
change storbinary() continues the data transfer. But if the ip address was 
changed due to the reconnect storbinary() hangs in a loop.

I expect either that storbinary() detects the change of the ip address and 
continues the data transfer like it does if the ip address has never changed or 
it should throw an exception.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13714] Methods of ftplib never ends if the ip address changes

2012-01-05 Thread Giampaolo Rodola'

Giampaolo Rodola'  added the comment:

It seems expected behavior to me, and the same issue should apply to all other 
network libs as well. What would you expect ftplib to do in such case?

--
nosy: +giampaolo.rodola

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-05 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Paul McMillan wrote:
> 
> This is not something that can be fixed by limiting the size of POST/GET. 
> 
> Parsing documents (even offline) can generate these problems. I can create 
> books that calibre (a Python-based ebook format shifting tool) can't convert, 
> but are otherwise perfectly valid for non-python devices. If I'm allowed to 
> insert usernames into a database and you ever retrieve those in a dict, 
> you're vulnerable. If I can post things one at a time that eventually get 
> parsed into a dict (like the tag example), you're vulnerable. I can generate 
> web traffic that creates log files that are unparsable (even offline) in 
> Python if dicts are used anywhere. Any application that accepts data from 
> users needs to be considered.
> 
> Even if the web framework has a dictionary implementation that randomizes the 
> hashes so it's not vulnerable, the entire python standard library uses dicts 
> all over the place. If this is a problem which must be fixed by the 
> framework, they must reinvent every standard library function they hope to 
> use.
> 
> Any non-trivial python application which parses data needs the fix. The 
> entire standard library needs the fix if is to be relied upon by applications 
> which accept data. It makes sense to fix Python.

Agreed: Limiting the size of POST requests only applies to *web* applications.
Other applications will need other fixes.

Trying to fix the problem in general by tweaking the hash function to
(apparently) make it hard for an attacker to guess a good set of
colliding strings/integers/etc. is not really a good solution. You'd
only be making it harder for script kiddies, but as soon as someone
crypt-analysis the used hash algorithm, you're lost again.

You'd need to use crypto hash functions or universal hash functions
if you want to achieve good security, but that's not an option for
Python objects, since the hash functions need to be as fast as possible
(which rules out crypto hash functions) and cannot easily drop the invariant
"a=b => hash(a)=hash(b)" (which rules out universal hash functions, AFAICT).

IMO, the strategy to simply cap the number of allowed collisions is
a better way to achieve protection against this particular resource
attack. The probability of having valid data reach such a limit is
low and, if configurable, can be made 0.

> Of course we must fix all the basic hashing functions in python, not just the 
> string hash. There aren't that many. 

... not in Python itself, but if you consider all the types in Python
extensions and classes implementing __hash__ in user code, the number
of hash functions to fix quickly becomes unmanageable.

> Marc-Andre:
> If you look at my proposed code, you'll notice that we do more than simply 
> shift the period of the hash. It's not trivial for an attacker to create 
> colliding hash functions without knowing the key.

Could you post it on the ticket ?

BTW: I wonder how long it's going to take before someone figures out
that our merge sort based list.sort() is vulnerable as well... its
worst-case performance is O(n log n), making attacks somewhat harder.
The popular quicksort which Python used for a long time has O(n²),
making it much easier to attack, but fortunately, we replaced it
with merge sort in Python 2.3, before anyone noticed ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com