Re: A question about hasing in idutils

2024-03-27 Thread Greg McGary
On Tue, Mar 26, 2024 at 7:25 PM Dmitry Goncharov 
wrote:

> Thank you for a quick response, Greg.
>
> On Tue, Mar 26, 2024 at 4:56 PM Greg McGary  wrote:
> > The code makes no effort to be endian-independent because it is written
> > for in-memory hashing on a uniprocessor or homogeneous multiprocessor.
>
> Do you know of a specific difficulty of making that hashing
> endian-independent?
>

No real difficulty--just some trivial extra code & overhead to swap bytes.


> > Why do the hash keys and table layout need to be consistent across
> > heterogeneous architectures?
>
> In the gnu make there were occurrences when a change was introduced,
> along with a unit test, and tested on little endian.
> Later, someone would run the test on big endian and find out that the
> test fails, because the test expects make to do something (e.g. remove
> temporary files or print targets) in a certain order, and on big
> endian the order turns out to be different due to hashing.
> We'd then fix the test some way or another, until another occurrence.
> So, in the end hashing works well in gnu make. This difference just
> introduces unnecessary maintenance work.
>

This all makes sense. The hash values are exposed indirectly, and the values
matter for unit tests. Thanks for explaining!

G


Re: A question about hasing in idutils

2024-03-26 Thread Dmitry Goncharov
> It should be easy to fix: swap the bytes in sum_get_unaligned_32.

Thanks for the pointer, Andreas.
Will try it out.

regards, Dmitry



Re: A question about hasing in idutils

2024-03-26 Thread Dmitry Goncharov
Thank you for a quick response, Greg.

On Tue, Mar 26, 2024 at 4:56 PM Greg McGary  wrote:
> The code makes no effort to be endian-independent because it is written
> for in-memory hashing on a uniprocessor or homogeneous multiprocessor.

Do you know of a specific difficulty of making that hashing endian-independent?

> Why do the hash keys and table layout need to be consistent across
> heterogeneous architectures?

In the gnu make there were occurrences when a change was introduced,
along with a unit test, and tested on little endian.
Later, someone would run the test on big endian and find out that the
test fails, because the test expects make to do something (e.g. remove
temporary files or print targets) in a certain order, and on big
endian the order turns out to be different due to hashing.
We'd then fix the test some way or another, until another occurrence.
So, in the end hashing works well in gnu make. This difference just
introduces unnecessary maintenance work.

regards, Dmitry



Re: A question about hasing in idutils

2024-03-26 Thread Greg McGary
On Mon, Mar 25, 2024 at 6:17 PM Dmitry Goncharov 
wrote:

> Good morning.
>
> The hash table from id utils from imported to gnu make.
> This hash table hashes the same string to different brackets on little
> and big endian.
> Can you please shed some light why there needs to be this difference
> in hashing on little and big endian?


Hi Dmitry,

The code makes no effort to be endian-independent because it is written
for in-memory hashing on a uniprocessor or homogeneous multiprocessor.
Why do the hash keys and table layout need to be consistent across
heterogeneous architectures? Are any of the caches now persistent or shared?

It's been decades since I wrote that hashing code. It is hacky and ad-hoc,
though effective. If you get better performance from another hashing
library,
feel free to replace it.

G


Re: A question about hasing in idutils

2024-03-26 Thread Andreas Schwab
On Mär 25 2024, Dmitry Goncharov wrote:

> The hash table from id utils from imported to gnu make.
> This hash table hashes the same string to different brackets on little
> and big endian.
> Can you please shed some light why there needs to be this difference
> in hashing on little and big endian?

It should be easy to fix: swap the bytes in sum_get_unaligned_32.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."