On 06/10/2025 20:29, Pádraig Brady wrote:
On 05/10/2025 01:32, Collin Funk wrote:
I was looking at changing announce-gen to use SHA-256 and SHA3-256
instead of SHA-1 and SHA-256. That lead me to discovering the following:

      $ cksum -a sha3 --length=256 --base64 --untagged \
          Makefile > Makefile.sum
      $ cksum -a sha3 --check Makefile.sum
      cksum: Makefile.sum: no properly formatted checksum lines found

The same issue exists for --algorithm=sha2. This patch fixes it:

      $ ./src/cksum -a sha3 --check Makefile.sum
      Makefile: OK
      $ sed 's|[[:graph:]]  Makefile$|  Makefile|g' \
          Makefile.sum > truncated
      $ ./src/cksum -a sha3 --check truncated
      cksum: truncated: no properly formatted checksum lines found

I left the behavior the same for blake2b since 'b2sum' does not support
--base64. I'm not sure if 'cksum -a blake2b' and 'b2sum' should differ
in this case...

I just thought of another ambiguous edge case with this,
where we have untagged base64 encoded input that happens to be all hex digits.

We can't always use the length to distinguish base64 vs hex as:

    $ cksum -a sha2 -l 384 --base64 --untagged /dev/null | wc -c  76
    $ cksum -a sha2 -l 256 --untagged /dev/null | wc -c
    76

It's not likely of course, but annoying nonetheless.
The same applies to sha3 of course.

For the sha2,3 cases, the ambiguity only arises for the 256 <-> 384 case,
i.e. with 64 characters in the digest. Since 22 might be hex, that would
mean we'd hit the ambiguity (only in non default untagged format) once
in every this many files:

  $ numfmt --to=si $(printf '%.f' $(python -c 'print(1/((22/64)**64))'))
  480R

So let's just document that in the code :)


The same ambiguity applies to '--check -a blake2b'
only it could hit in more cases as you can have smaller -l

The worst case for blake2b is with -l 24 (no padding),
in which case you could hit this issue once in every this many files:

  $ numfmt --to=si $(printf '%.f' $(python -c 'print(1/((22/64)**4))'))
  72

Now you'd only be using -l 24 for content aware hashing etc.
not for verification. with more standard sizes like -l 128,256,512
you've padding chars so don't hit the issue. So I suppose practical
worst case you'd hit this might be With -l 120 which is 1 in every 1.9G files,
and the -l 384 case is 1 in every 480 Ronna files as calculated above.
So we might just document this case also.

cheers,
Padraig

Reply via email to