[issue29535] datetime hash is deterministic in some cases

2019-08-24 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
assignee:  -> docs@python
components: +Documentation
nosy: +docs@python
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
type: security -> behavior
versions:  -Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-24 Thread miss-islington


miss-islington  added the comment:


New changeset 076d0b9f5def35aeb0f8e8aadf658dc35aace81d by Miss Islington (bot) 
in branch '3.8':
bpo-29535: Remove promize about hash randomization of datetime objects. 
(GH-15269)
https://github.com/python/cpython/commit/076d0b9f5def35aeb0f8e8aadf658dc35aace81d


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-24 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset e9c90aa43144b0be1e4e393e8cb549573437a5da by Serhiy Storchaka in 
branch 'master':
bpo-29535: Remove promize about hash randomization of datetime objects. 
(GH-15269)
https://github.com/python/cpython/commit/e9c90aa43144b0be1e4e393e8cb549573437a5da


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-24 Thread miss-islington


Change by miss-islington :


--
pull_requests: +15148
pull_request: https://github.com/python/cpython/pull/15454

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-23 Thread Ashwin Ramaswami


Ashwin Ramaswami  added the comment:

Oh, that PR is already there in PR 15269, great!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-23 Thread Ashwin Ramaswami


Ashwin Ramaswami  added the comment:

Makes sense, thanks for the explanation. The risk is that if there is code 
that, say, converts a POST dictionary to a dictionary with numeric keys, that 
code could be exploited. Creating a non-deterministic hash doesn't necessarily 
preclude hash(x) = x for a small enough x either. 

Given that other libraries (NumPy, etc.) rely on the numeric hash staying the 
way it is, it makes sense to keep it as it is. Since when did something that 
seems at first glance to be an implementation detail become more like a 
backwards-incompatible API, though? (For example, the implementation of the 
numeric hash was changed without any backwards-compatibility issues in 
https://bugs.python.org/issue14621). Might there be a better way to clarify 
this distinction for other features in Python?

I think the way forward for this patch is to keep the datetime hash as it is, 
and remove "datetime" in the parts of documentation that enumerate which data 
types have non-deterministic hashes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-21 Thread STINNER Victor


Change by STINNER Victor :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-16 Thread Tim Peters


Tim Peters  added the comment:

I'm with Mark:  leave numeric hashes alone.  There's no reason to change them, 
and in addition to what Mark wrote it's a positively Good Thing that `hash(i) 
== i` for all sufficiently small ints.  Not only is that efficient to compute, 
it guarantees there are no collisions _at all_ in the common case of a dict 
indexed by a contiguous range of integers.

The _purpose_ of `hash()` in Python isn't to create an illusion of randomness, 
it's to support efficient dicts and sets.  Mucking with string hashes was a 
pragmatic hack to alleviate concerns about DOS attacks specific to 
string-indexed dicts.  A better technical solution to that would have been to 
introduce a different flavor of dict with guaranteed good worst-case behaviors, 
but, pragmatically, a great many users would never realize that's what they 
really wanted, and it wouldn't have helped pre-existing code at all.

But there's no reason to spread that hack beyond the use cases that needed it, 
and much reason not to.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-16 Thread Mark Dickinson


Mark Dickinson  added the comment:

> Why so?

Python's hash needs to obey the invariant that `hash(x) == hash(y)` for any two 
hashable objects with `x == y`. That makes life particularly hard for numeric 
types, because there are a number of *different* numeric types where equality 
can hold across those types. This includes not just built-in types, but third 
party types as well (think of NumPy, gmpy2, SymPy, and other libraries 
providing numbers that need to compare equal to Python numbers with the same 
value).

So for example, `hash(1.5)`,  `hash(Decimal("1.5"))`, `hash(Fraction(3, 2))`, 
`hash(1.5 + 0j)`, `hash(numpy.float32(1.5))`, `hash(bigfloat.BigFloat(1.5, 
precision=200))` must _all_ be equal to one another within a single running 
Python process.

Moreover, hash computation needs to be efficient for common types like floats 
and integers, while also not being impossibly slow for other types. (Early 
versions of Decimal's hash did a check to see whether the Decimal instance was 
an exact integer, and if so, converted that Decimal instance to an integer 
before taking its hash. But doing that with `Decimal(1e99)` doesn't go 
well.)

It would definitely be *possible* to:

- compute a hash in a cross-type-compatible way
- do some sort of uniform post-processing of that hash, incorporating 
information from a per-process random salt

The operations described by Melissa O'Neill in her PCG paper give ideas for 
some ways to do such post-processing: regard the hash and the salt as together 
forming a 128-bit integer, and then collapse that 128-integer down to a 64-bit 
integer using one of the PCG post-processing methods. Note that as far as I 
know there's been no validation of those methods from a cryptographic (rather 
than a statistical) perspective.

However, it would be significant work, be disruptive not just to CPython, but 
to 3rd party packages and to other Python implementations, would slow down 
common hashing operations, and would increase the amount and the complexity of 
code that has to be maintained into the future.

So there's no shortage of reasons *not* to change the numeric hash. What I 
think we're lacking is a single reason *to* change it. Can you give a plausible 
example of a situation where the predictability of the numeric hash can lead to 
possible security issues?

See also the recent issue #37807.

> but *not* if my keys, say, are tuples of strings

Bad example. :-) The hash of a tuple is based on the hash of its contents. So 
if those contents are strings, the tuple benefits from the string hash 
randomization.

mirzakhani:~ mdickinson$ python -c "print(hash(('abc', 'def')))"
-824966383135019564
mirzakhani:~ mdickinson$ python -c "print(hash(('abc', 'def')))"
-5971473524922642515
mirzakhani:~ mdickinson$ python -c "print(hash(('abc', 'def')))"
5384650403450490974

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-15 Thread Ashwin Ramaswami


Ashwin Ramaswami  added the comment:

> Making the numeric hash non-predictable while maintaining its current 
> properties would be difficult.

Why so?

> In fact, I think it's reasonable to assume that there are no websites 
> vulnerable to a DOS via *numeric* hash collisions until we see evidence 
> otherwise. I'd expect that there are *way* more places where a dict is being 
> constructed with string keys in this way than with numeric keys.

That's true, but why do we restrict ourselves to websites? This is how I see 
it: As a Python developer, it seems like my program is immune to hash collision 
DoS if I use strings/bytes as dictionary keys, but *not* if my keys, say, are 
tuples of strings. Why not make the hash non-predictable for all builtin types 
by default?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-15 Thread Mark Dickinson


Mark Dickinson  added the comment:

> shouldn't numerics, datetime objects, and tuples be non-deterministically 
> hashed as well? [...]

Making the numeric hash non-predictable while maintaining its current 
properties would be difficult.

But fortunately, I don't think it's necessary. IIUC, the original DOS attack 
involved carefully-crafted collections of keywords and values being passed to a 
website backend, with that backend then putting those keywords and values into 
a Python dictionary. I'd expect that there are *way* more places where a dict 
is being constructed with string keys in this way than with numeric keys. In 
fact, I think it's reasonable to assume that there are no websites vulnerable 
to a DOS via *numeric* hash collisions until we see evidence otherwise.

FWIW, I'd expect the same to be true for datetime objects; I'm not sure why 
they were originally included. IANASE, but it seems to me that covering Unicode 
strings and bytestrings should be enough in practice.

--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-14 Thread Ashwin Ramaswami


Ashwin Ramaswami  added the comment:

Randomizing the hash of datetime objects was first proposed in 
https://bugs.python.org/issue13703#msg151796.

For the same reasons as str and bytes are non-deterministically hashed in in 
PEP 456, shouldn't numerics, datetime objects, and tuples be 
non-deterministically hashed as well? This is for the reason that they can all 
be used as dictionary keys (additionally, hash(n) begins to repeat when n is a 
large enough number) -- so it seems like they are also susceptible to the hash 
collision DoS attacks.

--
nosy: +dmalcolm

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-14 Thread Christian Heimes


Christian Heimes  added the comment:

PEP 456 explains why hash of str and bytes must be randomized.

I don't know any reason why hash of datetime objects must be randomized. They 
can be deterministic like floats and ints.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-14 Thread Ashwin Ramaswami


Ashwin Ramaswami  added the comment:

Why is it ok for certain hashes (such as tuples) to be not non-deterministic, 
while other hashes (such as datetime) need to be non-deterministic?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-14 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-14 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +14991
pull_request: https://github.com/python/cpython/pull/15269

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Other option is to remove a note about datetime hash. It is an implementation 
detail.

There are other objects with nondeterministic hash, for example tuples 
containing strings, but we should not document this explicitly or make the hash 
of all tuples nondeterministic.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-13 Thread Ashwin Ramaswami


Ashwin Ramaswami  added the comment:

I've added a PR which should fix this. Do you think the documentation should 
also be updated to change "By default, the :meth:`__hash__` values of str, 
bytes and datetime objects are "salted" with an unpredictable random value." to 
"By default, the :meth:`__hash__` values of str, bytes, datetime.date, 
datetime.time and datetime.datetime objects are "salted" with an unpredictable 
random value."?

Technically, there are other objects in datetime such as datetime.relativedelta 
whose hash values are _not_ non-deterministic.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-13 Thread Ashwin Ramaswami


Change by Ashwin Ramaswami :


--
keywords: +patch
pull_requests: +14983
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15264

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-13 Thread Armin Rigo


Change by Armin Rigo :


--
nosy:  -arigo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2019-08-13 Thread Ashwin Ramaswami


Change by Ashwin Ramaswami :


--
nosy: +epicfaace
versions: +Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-06-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It looks to me that this difference is not intentional. This is just a 
consequence of the fact that __hash__ functions for some of these objects use 
the hash of the pickle state which is a bytes object.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-06-04 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Rather than changing the documentation, I would prefer to have this fixed for 
date/time/datetiem unless MAL has a reason not to make the change.

For timezone, I don't think we really care.  

For timedelta, it is reasonable to always be the same hash.  It isn't 
conceptually different from the float given by s.total_seconds() or a tuple of 
(days, seconds, microseconds).

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-06-04 Thread Armin Rigo

Changes by Armin Rigo :


--
pull_requests: +2016

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-02-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +belopolsky
versions: +Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-02-11 Thread Christian Heimes

Christian Heimes added the comment:

I only checked Python 2.7. For Python 3.x it's a bit more complicated:

timedelta: PyObject_Hash(), always the same hash value
date: _Py_HashBytes(), always a randomized hash value
time: _Py_HashBytes() for offset = None, PyObject_Hash() for offset != 0
datetime: _Py_HashBytes() for offset = None, PyObject_Hash() for offset != 0
timezone: PyObject_Hash() (inherited from object)

I don't know why the datetime module doesn't use hash randomization for 
datetime and time objects with an offset. MAL is the master of (date)time. He 
might know.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-02-11 Thread Armin Rigo

Armin Rigo added the comment:

That's not what the docs say.  E.g.: 
https://docs.python.org/3/reference/datamodel.html#object.__hash__ says

By default, the __hash__() values of str, bytes and datetime objects are 
“salted” with an unpredictable random value. Although they remain constant 
within an individual Python process, they are not predictable between repeated 
invocations of Python.

Morever, this command really prints changing numbers:

~/svn/python/3.7-debug/python -c "import datetime;print(hash(d  
atetime.datetime(2016,10,10,0,0,0,0)))"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-02-11 Thread Christian Heimes

Christian Heimes added the comment:

Only the hash of str and bytes are randomized/ The types date, datetime and 
time are not subject to hash randomization. Same for int, float, bool and None.

--
nosy: +christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29535] datetime hash is deterministic in some cases

2017-02-11 Thread Armin Rigo

New submission from Armin Rigo:

The documentation on the hash randomization says that date, time and datetime 
have a hash based on strings, that is therefore nondeterministic in several 
runs of Python.  I may either be missing a caveat, or the actual implementation 
does not follow its promise in case a timezone is attached to the datetime or 
time object:

~/svn/python/3.7-debug/python -c "import datetime;print(hash(d  
atetime.datetime(2016,10,10,0,0,0,0,datetime.timezone(datetime.timedelta(0, 
36000)"
(this gives -6021186165085109055 all the time)

~/svn/python/3.7-debug/python -c "import 
datetime;print(hash(datetime.time(0,0,0,0, 
datetime.timezone(datetime.timedelta(0, 36000)"
(this gives -3850122659820237607 all the time)

--
messages: 287601
nosy: arigo
priority: normal
severity: normal
status: open
title: datetime hash is deterministic in some cases
type: security
versions: Python 3.5, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com