> > here's the crash now:
> >
> > Traceback (most recent call last):
> > File
> > "/nix/store/f2krmq3iv5nibcvn4rw7nrnrciqprdkh-python3-3.12.9/lib/python3.12/pdb.py",
> > line 1960, in main
> > pdb._run(target)
> > File
> > "/nix/store/f2krmq3iv5nibcvn4rw7nrnrciqprdkh-python3-3.12.9/lib/python3.12/pdb.py",
> > line 1754, in _run
> > self.run(target.code)
> > File
> > "/nix/store/f2krmq3iv5nibcvn4rw7nrnrciqprdkh-python3-3.12.9/lib/python3.12/bdb.py",
> > line 627, in run
> > exec(cmd, globals, locals)
> > File "/home/karl3/projects/rep/rep/dict.py", line 394, in <module>
> > doc.update([[val,val]])
> > File "/home/karl3/projects/rep/rep/dict.py", line 349, in update
> > super().update(keyhashitems())
> > File "/home/karl3/projects/rep/rep/dict.py", line 164, in update
> > assert int.from_bytes(keyhash[:hashbytes], 'big') >> hashshift == newidx
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > AssertionError
> >
> >
> >
> > for keyhash, item in keyhashitems:
> >
> > assert item
> > != self._sentinel
> >
> > byteidx =
> > int.from_bytes(keyhash[:hashbytes], 'big')
> >
> > newidx = byteidx >> hashshift
> >
> >
> > if self._capacity > 0:
> >
> > # this
> > block checks for collision with previous stored values
> >
> > if capacity > self._capacity:
> >
> >
> > superidx = int.from_bytes(keyhash[:self._hashbytes],
> > 'big') >> self._hashshift
> > else:
> >
> >
> > superidx = newidx
> >
> >
> > place = self.array[superidx]
> >
> > if place !=
> > self._sentinel:
> >
> > collision = self._key(place)
> >
> >
> > if collision != keyhash:
> >
> > assert
> > superidx == int.from_bytes(collision[:self._hashbytes], 'big') >>
> > self._hashshift
> > updates[newidx] =
> > [collision, place, False]
> >
> > # this separated approach to checking for collisions
> > allows for accepting
> > #
> > batched data that ends up containing hash collisions solely within
> > itself
> > placing =
> > updates.get(newidx)
> > if placing is not None:
> > collision, place, is_new = placing
> > while newidx == int.from_bytes(collision[:hashbytes],
> > 'big') >> hashshift:
> > capacity <<= 1
> > expansion <<= 1
> > #spread += 1
> > #hashbits = self._hashbits + spread
> > hashbits += 1
> > hashbytes = (hashbits+7) >> 3
> > hashshift = (hashbytes << 3) - hashbits
> > byteidx = int.from_bytes(keyhash[:hashbytes], 'big')
> > newidx = byteidx >> hashshift
> > assert capacity == (1 << hashbits)
> > new_updates = {}
> > for keyhash, item, is_new in updates.values():
> > if is_new:
> > newnewidx =
> > int.from_bytes(keyhash[:hashbytes], 'big') >> hashshift
> > assert newnewidx not in new_updates
> > new_updates[newnewidx] = [keyhash, item, True]
> >
> > updates
> > = new_updates
> > assert newidx not in updates
> > assert int.from_bytes(keyhash[:hashbytes], 'big') >>
> > hashshift == newidx
> > updates[newidx] = [keyhash, item, True]
> >
> >
> > it looks like the problem is that keyhash is shadowed in the loop at the
> > end :D
shadowing is when a variable is used in an inner scope with the same
name as an outer scope. many languages have strictly scoped blocks but
python is not one of those languages allowing occasional mistakes like
this. keyhash is used in an outer scope. i then use it as a loop
variable, which changes its value in the outer scope.
usually a shadowed variable would make the opposite crash -- where in
the inner loop, the inner value is used as if it were the outer. here,
since it's python, the crash is from the outer value being used after
the inner loop has finished. [...
>
> now i've got this:
> File "/home/karl3/projects/rep/rep/dict.py", line 220, in content_generator
> assert superidx * expansion + subidx ==
> int.from_bytes(dbg_keyhash[:hashbytes], 'big') >> hashshift
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> AssertionError
> Uncaught exception. Entering post mortem debugging
> Running 'cont' or 'step' will restart the program
> > /home/karl3/projects/rep/rep/dict.py(220)content_generator()
> -> assert superidx * expansion + subidx ==
> int.from_bytes(dbg_keyhash[:hashbytes], 'big') >> hashshift
> (Pdb) p item
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>
> but it's just because item is a sentinel which shouldn't hash to its
> index, can check for that