Jeroen Demeyer added the comment:
Correction: the FNV variant of SeaHash only fails the new testsuite, not the
old one. The DJB variant of SeaHash fails both.
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
SeaHash seems to be designed for 64 bits. I'm guessing that replacing the
shifts by
x ^= ((x >> 16) >> (x >> 29))
would be what you'd do for a 32-bit hash. Alternatively, we could always
compute the hash with 64 bits (using uint64_t)
Jeroen Demeyer added the comment:
This weekend I realized something important which I didn't realize before: some
hash functions which I assumed to be good (i.e. small chance of collisions
between any given two tuples) turned out to often fail the tests. This is
because you don't just want
Jeroen Demeyer added the comment:
> the author wants this transformation to be easily invertible, so a prime is
> necessary
A multiplication by any odd number modulo 2**64 is invertible. As I argued
before, the concept of primes is meaningless (except for the prime 2) when
computing
Jeroen Demeyer added the comment:
I spent about 2 days doing an extensive study of the FNV and DJB algorithms. I
share my conclusions below.
To be very clear what I mean, I am talking about the following algorithms (t is
a tuple and m is the multiplier which is always assumed to be odd
Jeroen Demeyer added the comment:
> Replacing DJBX33A's multiplier of 33 is also a different algorithm. So is
> working with inputs other than unsigned bytes.
I would argue that this is just extending the parameters of the algorithm. If
the algorithm is general enough, then that sho
Jeroen Demeyer added the comment:
> The two-liner above with the xor in the second line is exactly Bernstein 33A,
> followed by a permutation of 33A's _output_ space.
Not output space, but internal state (I assume that you do that operation
inside the loop). It's replacing D
Jeroen Demeyer added the comment:
> please restore the original tuple hash test.
Sure. I wasn't sure what to do and was I afraid that having 2 tests for tuple
hashes would be too much. If that's OK for you, then surely I will restore the
t
Jeroen Demeyer added the comment:
> Do you believe any other multiplier would work better toward that end?
Absolutely. Ideally, the multiplier should just be a random 64-bit number.
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
> There are _two_ hash functions at play in that collision: the tuple hash
> function, and the integer hash function. Regardless of whether the _tuple_
> hash function does [anything involving just `t`], that only directly affects
>
Jeroen Demeyer added the comment:
Regarding t ^= t << 7: I tested PR 9471 with all shift values from 1 to 20. The
new testsuite passed for all shifts from 1 to 13, except for 6. It failed for 6
and for 14 to 20. This indicates that smaller shift values are better (even
when looking
Jeroen Demeyer added the comment:
> I want to leave low-order hash bits alone. That's deliberate.
Since I didn't understand the rationale for this and since shifting << 1 also
seems to work well, I edited PR 9471 to use DJBX33A with t ^= t << 1.
Since you insisted
Jeroen Demeyer added the comment:
> And one more:
x = (x * mult) ^ t;
also appears to work equally well.
The order of operations does not really matter: you can write the loop as
x *= mult # Appears only in FNV-1
x ^= t[0]
x *= mult
x ^= t[1]
x *= mult
x ^= t[2]
x *= mult
x ^
Jeroen Demeyer added the comment:
> j is even implies (j ^ -3) == -(j ^ 3)
This follows from what I posted before: if j is even, then j ^ 3 is odd, so we
can apply the rule x ^ -2 = -x to x = j ^ 3:
(j ^ 3) ^ -2 = -(j ^ 3)
which implies
j ^ (3 ^ -2) = -(j ^ 3)
or equivalently
j ^
Jeroen Demeyer added the comment:
> When testing what, specifically? And the standard 32-bit FNV multiplier, or
> the standard 64-bit FNV multiplier?
FNV-1a with the t ^= 2 * t mangling running my new testsuite on either PR 9471
or PR 9534 using the 64-bit FNV multiplier to produce
Jeroen Demeyer added the comment:
> The low bits are already un-improvable in the most important cases.
Maybe you misunderstood me. Suppose that there is a hash collision, say
hash((3, 3)) == hash((-3, -3)) and you change the hashing algorithm to fix this
collision. If that change d
Jeroen Demeyer added the comment:
> It would be good if PyType_Ready() will check that base class of static type
> is static.
What's the rationale for this change? It's not explained in this bug report nor
in the code.
--
nosy: +jd
Change by Jeroen Demeyer :
--
pull_requests: +8942
___
Python tracker
<https://bugs.python.org/issue32797>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
I created a new PR based on Tim's t ^= t << 7 idea, except that I'm using << 1
instead, to have more mixing in the lower bits.
With the standard FNV multiplier on 64 bits, I did get collisions while
testing. I haven't figured out exactly why th
Change by Jeroen Demeyer :
--
pull_requests: +8937
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
> In the absence of a real analysis, the intuition is simply that "t ^= t << 7"
> will clear masses of leading sign bits when hashing "small" negative integers.
That's a clever solution. If you want to go that route, I would r
Jeroen Demeyer added the comment:
> For example, FNV-1a has far better "avalanche" behavior than Bernstein
We don't care about that though. We want to have no collisions, not a random
output.
--
___
Python tracker
<https:/
Jeroen Demeyer added the comment:
> stuff like "t += t >> 16" is a many-to-one function, not a permutation
Yes, I am aware of that. However, the number of collisions here is really quite
small. It's very unlikely to hit one by accident.
I also chose >> over <<
Jeroen Demeyer added the comment:
While writing up the analysis above, it occurred to me that collisions already
happen for 2-tuples:
>>> hash((3, -2)) == hash((-3, 0))
True
These kind of 2-tuples of small integers don't look contrived at all. I can
easily see them
Jeroen Demeyer added the comment:
> Has anyone figured out the real source of the degeneration when mixing in
> negative integers?
The underlying reason for the collisions is the following mathematical relation:
x ^ -x = -2 << i
where i is the index of the smallest
Jeroen Demeyer added the comment:
> We shouldn't feel shoved into altering something that we don't agree is broken
It *is* broken. You are just denying the reality.
That's also the reason that I'm insisting so much: I don't want to push my
personal fix (despite that it may seem so), I w
Jeroen Demeyer added the comment:
> the made-up hacks Python used to worm around a class of gross flaws its prior
> DJBX33X approach suffered, taking DJBX33X out of its original context and
> applying it in an area it wasn't designed for.
But we know why the DJBX33A hash didn't wor
Jeroen Demeyer added the comment:
Thanks for the reference, I never heard of the FNV hash. Unfortunately, I
haven't seen a reference for the rationale of how they pick their multiplier.
It's not clear what you are suggesting though. Keep using the FNV-ish hash
somehow? Or using a DJBX33A
Jeroen Demeyer added the comment:
> I'm not aware of any research papers about picking multipliers in this
> context, but would love to see one.
The only real condition that I can think of is that the order should be large:
we do not want MULTIPLIER**n = 1 (mod 2**N) for a small nu
Jeroen Demeyer added the comment:
> Which was your original suggestion. Which you appear to be opposed to now?
> I'm unclear about why, if so.
I'm not strictly opposed to that. It's just that I have less confidence in the
current ad-hoc hash compared to something following the D
Jeroen Demeyer added the comment:
> I don't know that primes are important here, but neither do I know that
> they're _not_ important here.
Hashes are effectively computed modulo 2**N. "Primes" are meaningless in that
setting (except for the prime 2 which does have a meani
Jeroen Demeyer added the comment:
I pushed a minor update to the PR, changing the MULTIPLIER and explaining the
chosen constants.
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
Concerning the MULTIPLIER:
> Why do you claim the original was "too small"? Too small for what purpose?
If the multiplier is too small, then the resulting hash values are small too.
This causes collisions to appear for smaller numbers:
Jeroen Demeyer added the comment:
> FWIW, the current algorithm also has some advantages that we don't want to
> lose. In particular, the combination of the int hash and tuple hash are good
> at producing distinct values for non-negative numbers:
>>> from iterto
Jeroen Demeyer added the comment:
> Further, that example specifically exploits that "hash(-1) == hash(-2)"
No, it does not. There is no hashing of -1 involved in the example
hash((1,0,0)) == hash((1,-2,-2)).
--
___
Python tr
Jeroen Demeyer added the comment:
I should also make it clear that the collision hash((1,0,0)) == hash((1,-2,-2))
that I reported is due to the algorithm, it's not due to some bad luck that 2
numbers happen to be the same. There are also many more similar hash collisions
for tuples (all
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +8884
stage: resolved -> patch review
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
> So, jdemeyer, if it's possible to show (or describe) to us an example of a
> problem you had, such that we could repeat it, that would be helpful (and
> necessary) to evaluate any proposed changes. What were the inputs to hash()
> that caus
Jeroen Demeyer added the comment:
For the record: my collision is not contrived. It came up in actual code where
I was hashing some custom class using tuples and found an unexpected high
number of collisions, which I eventually traced back to the collision I
reported here.
By the way, I
Jeroen Demeyer added the comment:
To come back to the topic of hashing, with "Bernstein hash" for tuples I did
indeed mean the simple (in Python pseudocode):
# t is a tuple
h = INITIAL_VALUE
for x in t:
z = hash(x)
h = (h * MULTIPLIER) + z
return h
I actually start
Jeroen Demeyer added the comment:
> Since I don't believe we've had other reports of real-life pathologies since
You just did in this bug report. Why doesn't that count?
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
> ISTM, you're being somewhat aggressive and condescending.
Interestingly, I felt the same with your reply, which explains my rapid
reversion of your closing of the ticket. The way I see it: I pointed out a bug
in tuple hashing and you just clo
Jeroen Demeyer added the comment:
> Nor is it hard to come up with collisions for most simple hash functions.
The Bernstein hash algorithm is a simple algorithm for which it can be proven
mathematically that collisions cannot be generated if the multiplier is
unknown. That is an object
New submission from Jeroen Demeyer :
It's not hard to come up with a hash collision for tuples:
>>> hash( (1,0,0) )
2528505496374819208
>>> hash( (1,-2,-2) )
2528505496374819208
The underlying reason is that the hashing code mixes ^ and *. This can create
collisions beca
Change by Jeroen Demeyer :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Change by Jeroen Demeyer :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Change by Jeroen Demeyer :
--
pull_requests: +8542
___
Python tracker
<https://bugs.python.org/issue25750>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
> Or, define a new "get_sourcemap()" method that could return additional
> metadata, e.g. a line number mapping.
What's your use case for that? I am asking because it might make more sense for
get_source() (or whatever its replaceme
Jeroen Demeyer added the comment:
> I would love, for example, to be able to get stack traces within extension
> modules integrated into Python tracebacks if they are compiled with some
> appropriate debug flags.
Awesome idea, I want to work on that :-)
This should be possi
Jeroen Demeyer added the comment:
I think I could make linecache use the ResourceReader API to find the source
file instead of looking in the sys.path entries. However, that's orthogonal to
PR 6653: we still need PR 6653 to continue looking for the source file in the
first place
Jeroen Demeyer added the comment:
I just realized the existence of
https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceReader
Isn't this *exactly* what we need here?
--
___
Python tracker
<https://bugs.python.org/issue32
Jeroen Demeyer added the comment:
> If you're OK with that, I don't see the problem.
I recall that we already agreed several months ago, when I submitted
https://github.com/python/cpython/pull/6653
--
___
Python tracker
<https://bugs.pyth
Jeroen Demeyer added the comment:
> One other thing I will say is that the PEP 302 APIs were seemingly designed
> for working with Python source and were not necessarily oriented towards
> alternative code representations that were executed
That's also my impression. I'm perfe
Jeroen Demeyer added the comment:
> Now, though, we need a way for SageMath to get it working on releases up to
> and including 3.7, *without* any help from Cython or CPython
That's not really what I need. We already have a work-around in SageMath:
try:
from importlib.machinery
Jeroen Demeyer added the comment:
> So, where is the filename coming from?
Python 3.7.0 (default, Jul 23 2018, 10:07:21)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sage.all
Jeroen Demeyer added the comment:
> The one possible fly in the ointment is if there are use cases that we
need to support where a single .so/.pyd file is built from *multiple*
source files, as get_source doesn't allow for that.
Yes, we must support that. A cython module may have multi
Jeroen Demeyer added the comment:
> Keeping two identical package structures in different places, one for .py
> files and generated .so files, and one for Cython source files (.pyx, .pxd,
> .pxi), is not desirable from a user perspective, and would require namespace
&
Jeroen Demeyer added the comment:
Nick: do you agree that this "source-only metapath" should by default contain
an entry to look in sys.path for source files?
--
___
Python tracker
<https://bugs.python.o
Jeroen Demeyer added the comment:
> In a subdirectory similar to __pycache__.
I thought about that, but I don't like the idea because then the directory
structure of the actual sources would be different from the directory structure
of the installed sources. So for example, how should &
Jeroen Demeyer added the comment:
To everybody who mentioned that Cython sources don't belong on sys.path: where
*do* they belong then?
One issue which hasn't been mentioned before is packaging. By installing Cython
sources along with the generated .so files, we can re-use all the existing
Jeroen Demeyer added the comment:
> In my view (and that of the documentation for sys.path), sys.path is where
> you put things that the Python interpreter can load - .so files, .pyc files
> and .py files.
It's quite typical for packages to install stuff in site-packa
Jeroen Demeyer added the comment:
> Then instead of adding the source directory to sys.path
What's wrong with that? Installing the .pyx sources together with the .so
compiled modules makes a lot of sense to me: it is very analogous to installing
the .py sources together with the .pyc b
Jeroen Demeyer added the comment:
I started a discussion at
https://mail.python.org/mm3/archives/list/distutils-...@python.org/thread/4FHEGHZYWCDRWVPGYLAU5VUS5BAX73MO/
--
___
Python tracker
<https://bugs.python.org/issue32
Change by Jeroen Demeyer :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<https://bugs.python.org/issue34126>
___
___
Python-bugs-list
Jeroen Demeyer added the comment:
I would prefer to wait with this issue until PEP 580 has been decided. If it's
accepted, it will change function calling code a lot. As I wrote in PEP 580, I
was planning to have a PEP on profiling anyway
Jeroen Demeyer added the comment:
Linker issues are always tricky...
I understand that there is no problem within libpython, so the questions below
refer to extension modules. I am asking them from the point of view of somebody
writing Python extension modules who is clueless about Cygwin
Jeroen Demeyer added the comment:
For those who are not very aware of Cygwin issues: what is wrong with
PyVarObject_HEAD_INIT(_Type, 0);
--
nosy: +jdemeyer
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
I always assumed that enabling profiling only from the Python bytecode
interpreter was a deliberate choice.
--
nosy: +jdemeyer
___
Python tracker
<https://bugs.python.org/issue29
Jeroen Demeyer added the comment:
> For compatibility with extensions built with older Pythons you should define
> new type flag and read tp_fastcall only if the flag is set.
Can you comment on https://github.com/python/cpython/pull/4944 why you think
that such compatibility
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +8091
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34287>
___
___
Py
New submission from Jeroen Demeyer :
A METH_NOARGS function has a second unused argument which is always NULL (this
is guaranteed by the documentation).
However, some functions in Modules/_io/bufferedio.c actually that second NULL
argument. This is technically not a bug, but it looks more
Jeroen Demeyer added the comment:
OK, I closed this without applying the change. It means one extra special case
in PEP 580, but it's not a big deal.
--
___
Python tracker
<https://bugs.python.org/issue34
Change by Jeroen Demeyer :
--
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/issue34280>
___
___
Pyth
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +8077
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34280>
___
___
Py
New submission from Jeroen Demeyer :
A C function with signature METH_NOARGS takes two arguments where the first is
"self" and the second is guaranteed to be NULL. Given that this second argument
really should never be used, I would like to drop the guarantee that the
argume
Jeroen Demeyer added the comment:
Also, some tools may want to edit the library after installation. Rebasing on
Cygwin is an example of that.
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
> Isn't it useful to avoid accidental change while open files with editor for
> just reading?
Why would that argument apply to a binary file (and only to binary files)?
> Is there any real world problem about read-only library?
It makes it slight
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +8012
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34245>
___
___
Py
New submission from Jeroen Demeyer :
In Makefile.pre.in, there is this:
# Shared libraries must be installed with executable mode on some systems;
# rather than figuring out exactly which, we always give them executable mode.
# Also, making them read-only seems to be a good idea
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +7942
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34125>
___
___
Py
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +7939
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34190>
___
___
Py
Jeroen Demeyer added the comment:
Just to avoid duplicate work: I can have a look at this.
--
___
Python tracker
<https://bugs.python.org/issue34190>
___
___
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +7834
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34126>
___
___
Py
Jeroen Demeyer added the comment:
Also:
>>> import sys; sys.setprofile(lambda *args: None)
>>> dict.get([], "foo")
Segmentation fault
--
title: Profiling dict.get() crashes Python -> Profiling cert
Jeroen Demeyer added the comment:
Simple patch coming up...
--
___
Python tracker
<https://bugs.python.org/issue34126>
___
___
Python-bugs-list mailin
New submission from Jeroen Demeyer :
>>> import sys; sys.setprofile(lambda *args:None)
>>> dict.get()
Segmentation fault
--
components: Interpreter Core
messages: 321747
nosy: jdemeyer
priority: normal
severity: normal
status: open
title: Profiling dict.get() crash
New submission from Jeroen Demeyer :
Enable profiling for C functions:
>>> def prof(frame, typ, arg):
... if typ.startswith("c_"):
... print(arg, typ)
>>> import sys; sys.setprofile(prof)
and notice how profiling depends on **kwargs:
Jeroen Demeyer added the comment:
Real world example where this actually came up:
https://github.com/jupyter-widgets/ipywidgets/issues/430#issuecomment-247016263
--
nosy: +jdemeyer
___
Python tracker
<https://bugs.python.org/issue19
Jeroen Demeyer added the comment:
> While this is a obvious bug, f.__module__ = f seems very unnatural.
Sure.
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
By the way, there is a precedent for OverflowError on infinities:
>>> import math
>>> int(math.inf)
Traceback (most recent call last):
File "", line 1, in
OverflowError: cannot conve
Jeroen Demeyer added the comment:
I'm certainly in favor of adding a way for infinite iterators to state that
they are infinite. I just feel like TypeError is way overused in Python. I also
disagree that it's a category error: it's an iterable, so it does make sense to
ask for its length
Jeroen Demeyer added the comment:
Why TypeError? Wouldn't OverflowError be more suitable?
--
nosy: +jdemeyer
___
Python tracker
<https://bugs.python.org/issue33
Jeroen Demeyer added the comment:
I just posted to python-dev about this issue:
https://mail.python.org/pipermail/python-dev/2018-June/153959.html
However, I think that comparing using "==" is the right thing to do. So I think
that
>>> [].append == [].append
False
s
Jeroen Demeyer added the comment:
> This is a duplicate of issue1617161.
Well, it's really the opposite. That issue seems to be arguing that __self__
should be compared using "is" while I think it should be compared using "==".
--
___
New submission from Jeroen Demeyer :
Methods of Python functions compare equal if the functions are equal and if
__self__ is equal:
>>> class X:
... def __eq__(self, other): return True
... def meth(self): pass
>>> X().meth == X().meth
True
This is because X() ==
Change by Jeroen Demeyer :
--
pull_requests: +7099
___
Python tracker
<https://bugs.python.org/issue5755>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
> 2.7 doesn't have CFLAGS_NODIST
I meant backporting this patch as-is to 2.7 without adding -Wstrict-prototypes
to CFLAGS_NODIST
--
___
Python tracker
<https://bugs.python.org/iss
Jeroen Demeyer added the comment:
> Can we backport this to 3.7 and 3.6? I think it's safe and helpful.
And 2.7 for the same reasons.
--
___
Python tracker
<https://bugs.python.org/iss
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
> Or just remove it
I updated the PR to do that.
I didn't want to propose that initially because that patch was proposed here
almost 2 years ago but not accepted.
--
___
Python tra
Change by Jeroen Demeyer <j.deme...@ugent.be>:
--
pull_requests: +6476
stage: -> patch review
___
Python tracker <rep...@bugs.python.org>
<https://bugs.py
401 - 500 of 586 matches
Mail list logo