Re: A question about hasing in idutils
On Tue, Mar 26, 2024 at 7:25 PM Dmitry Goncharov wrote: > Thank you for a quick response, Greg. > > On Tue, Mar 26, 2024 at 4:56 PM Greg McGary wrote: > > The code makes no effort to be endian-independent because it is written > > for in-memory hashing on a uniprocessor or homogeneous multiprocessor. > > Do you know of a specific difficulty of making that hashing > endian-independent? > No real difficulty--just some trivial extra code & overhead to swap bytes. > > Why do the hash keys and table layout need to be consistent across > > heterogeneous architectures? > > In the gnu make there were occurrences when a change was introduced, > along with a unit test, and tested on little endian. > Later, someone would run the test on big endian and find out that the > test fails, because the test expects make to do something (e.g. remove > temporary files or print targets) in a certain order, and on big > endian the order turns out to be different due to hashing. > We'd then fix the test some way or another, until another occurrence. > So, in the end hashing works well in gnu make. This difference just > introduces unnecessary maintenance work. > This all makes sense. The hash values are exposed indirectly, and the values matter for unit tests. Thanks for explaining! G
Re: A question about hasing in idutils
> It should be easy to fix: swap the bytes in sum_get_unaligned_32. Thanks for the pointer, Andreas. Will try it out. regards, Dmitry
Re: A question about hasing in idutils
Thank you for a quick response, Greg. On Tue, Mar 26, 2024 at 4:56 PM Greg McGary wrote: > The code makes no effort to be endian-independent because it is written > for in-memory hashing on a uniprocessor or homogeneous multiprocessor. Do you know of a specific difficulty of making that hashing endian-independent? > Why do the hash keys and table layout need to be consistent across > heterogeneous architectures? In the gnu make there were occurrences when a change was introduced, along with a unit test, and tested on little endian. Later, someone would run the test on big endian and find out that the test fails, because the test expects make to do something (e.g. remove temporary files or print targets) in a certain order, and on big endian the order turns out to be different due to hashing. We'd then fix the test some way or another, until another occurrence. So, in the end hashing works well in gnu make. This difference just introduces unnecessary maintenance work. regards, Dmitry
Re: A question about hasing in idutils
On Mon, Mar 25, 2024 at 6:17 PM Dmitry Goncharov wrote: > Good morning. > > The hash table from id utils from imported to gnu make. > This hash table hashes the same string to different brackets on little > and big endian. > Can you please shed some light why there needs to be this difference > in hashing on little and big endian? Hi Dmitry, The code makes no effort to be endian-independent because it is written for in-memory hashing on a uniprocessor or homogeneous multiprocessor. Why do the hash keys and table layout need to be consistent across heterogeneous architectures? Are any of the caches now persistent or shared? It's been decades since I wrote that hashing code. It is hacky and ad-hoc, though effective. If you get better performance from another hashing library, feel free to replace it. G
Re: A question about hasing in idutils
On Mär 25 2024, Dmitry Goncharov wrote: > The hash table from id utils from imported to gnu make. > This hash table hashes the same string to different brackets on little > and big endian. > Can you please shed some light why there needs to be this difference > in hashing on little and big endian? It should be easy to fix: swap the bytes in sum_get_unaligned_32. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."