Pádraig Brady <[email protected]> writes:

> On 05/10/2025 01:32, Collin Funk wrote:
>> I was looking at changing announce-gen to use SHA-256 and SHA3-256
>> instead of SHA-1 and SHA-256. That lead me to discovering the following:
>>      $ cksum -a sha3 --length=256 --base64 --untagged \
>>          Makefile > Makefile.sum
>>      $ cksum -a sha3 --check Makefile.sum
>>      cksum: Makefile.sum: no properly formatted checksum lines found
>> The same issue exists for --algorithm=sha2. This patch fixes it:
>>      $ ./src/cksum -a sha3 --check Makefile.sum
>>      Makefile: OK
>>      $ sed 's|[[:graph:]]  Makefile$|  Makefile|g' \
>>          Makefile.sum > truncated
>>      $ ./src/cksum -a sha3 --check truncated
>>      cksum: truncated: no properly formatted checksum lines found
>> I left the behavior the same for blake2b since 'b2sum' does not
>> support
>> --base64. I'm not sure if 'cksum -a blake2b' and 'b2sum' should differ
>> in this case...
>
> Nice one.
>
> Re blake2b we probably should auto determine digest_hex_bytes
> in the base64 case too, so that all length adjustable algorithms
> are supported in untagged format. (like b2sum, sha256sum also
> does not support --base64).

Sure.

Since blake2b accepts any digest length that is divisible by 8, the
base64 case is a bit trickier. Two different digest lengths can have the
same base64 length, but different amounts of padding. See the following
example:

    $ cksum -a blake2b --length 224 --base64 --untagged Makefile
    NkhSwK3KuNkA1VSxNKC+gA/LG5GBln6E2AFkpw==  Makefile
    $ cksum -a blake2b --length 232 --base64 --untagged Makefile
    PaS2GwuymJ1Wy5zadJJDUY/uo647qvgm1jvFTiU=  Makefile

> Since this isn't a regression, but rather an oversight since --base64 was 
> added,
> if the blake2b case is addressed, then the NEWS could be generalized to say:
>
>   'cksum --check' now supports base64 encoded input in untagged format,
>   for all length adjustable algorithms (blake2b, sha2, sha3).
>   [bug introduced in coreutils-9.2]

That works. I thought it was our oversight when adding sha3 and
deprecating sha256, etc. But I guess it always existed for blake2b
before that. The 9.8 release just made the situation a bit worse.

> I'd use `tr -d '='` rather than `sed 's|[[:graph:]]...` in the test
> as it's more obvious and also I'm not sure how portable the :graph: is.

This doesn't work since not all of the lengths have padding:

    $ cksum -a sha2 --l 384 --base64 --untagged Makefile
    yTpCxowsARWbP1w+lE6IEsZFx1dJcV9IyvlI+WS6aAj7ahIovM2JWafEQWdGfDi9  Makefile

I assume that the '[[:graph:]]' character class is portable. But
's|[a-zA-Z0-9+/=]...' does fine and avoids character classes.

> Otherwise it looks good.

Thanks, v2 attached. I'll push in a bit.

> p.s. I notice another edge case with checking untagged base64 format.
> Theoretically a base64 checksum could start with SHA256 etc.
> which would currently cause cksum to treat it as misformatted line.

I wonder how long it will take someone to run into that situation.

Collin

>From b59081b2c6be1885358800a3a261722b6ae469b7 Mon Sep 17 00:00:00 2001
Message-ID: <b59081b2c6be1885358800a3a261722b6ae469b7.1759695523.git.collin.fu...@gmail.com>
From: Collin Funk <[email protected]>
Date: Sat, 4 Oct 2025 17:18:01 -0700
Subject: [PATCH v2] cksum: allow -a {blake2b,sha2,sha3} --check to work on
 base64

* NEWS: Mention the bug.
* src/digest.c (split_3): Check that the base64 digest matches the
length supported by the algorithm.
(digest_check): Check that the read digest matches the base64 length of
the algorithm's digest. The previous condition would not work for
'cksum -a blake2b -l 8 ...'.
* tests/cksum/cksum-base64-untagged.sh: New file.
* tests/local.mk (all_tests): Add the new test.
---
 NEWS                                 |  5 +++
 src/digest.c                         | 44 +++++++++++++++++++++---
 tests/cksum/cksum-base64-untagged.sh | 51 ++++++++++++++++++++++++++++
 tests/local.mk                       |  1 +
 4 files changed, 97 insertions(+), 4 deletions(-)
 create mode 100755 tests/cksum/cksum-base64-untagged.sh

diff --git a/NEWS b/NEWS
index b8c4ed4ef..8cfa5d361 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,11 @@ GNU coreutils NEWS                                    -*- outline -*-
   that use the GNU extension /NUM or +NUM formats.
   [bug introduced in coreutils-8.28]
 
+  'cksum --check' now supports base64 encoded input in untagged format,
+  for all length adjustable algorithms (blake2b, sha2, sha3).
+  [bug introduced in coreutils-9.2]
+
+
 ** Improvements
 
   wc -l now operates 10% faster on hosts that support AVX512 instructions.
diff --git a/src/digest.c b/src/digest.c
index ce0e222e1..13b166795 100644
--- a/src/digest.c
+++ b/src/digest.c
@@ -930,15 +930,51 @@ split_3 (char *s, size_t s_len,
 # endif
   unsigned char const *hp = *digest;
   digest_hex_bytes = 0;
-  while (c_isxdigit (*hp++))
-    digest_hex_bytes++;
+  for (; c_isxdigit (*hp); ++hp, ++digest_hex_bytes)
+    ;
 # if HASH_ALGO_CKSUM
+  /* Check the number of base64 characters.  This works because the hexadecimal
+     character set is a subset of the base64 character set.  */
+  size_t digest_base64_bytes = digest_hex_bytes;
+  size_t trailing_equals = 0;
+  for (; isubase64 (*hp); ++hp, ++digest_base64_bytes)
+    ;
+  for (; *hp == '='; ++hp, ++trailing_equals)
+    ;
   if ((cksum_algorithm == sha2 || cksum_algorithm == sha3)
       && digest_hex_bytes / 2 != SHA224_DIGEST_SIZE
       && digest_hex_bytes / 2 != SHA256_DIGEST_SIZE
       && digest_hex_bytes / 2 != SHA384_DIGEST_SIZE
       && digest_hex_bytes / 2 != SHA512_DIGEST_SIZE)
-    return false;
+    {
+      if (digest_base64_bytes + trailing_equals
+          == BASE64_LENGTH (SHA224_DIGEST_SIZE))
+        digest_hex_bytes = SHA224_DIGEST_SIZE * 2;
+      else if (digest_base64_bytes + trailing_equals
+               == BASE64_LENGTH (SHA256_DIGEST_SIZE))
+        digest_hex_bytes = SHA256_DIGEST_SIZE * 2;
+      else if (digest_base64_bytes + trailing_equals
+               == BASE64_LENGTH (SHA384_DIGEST_SIZE))
+        digest_hex_bytes = SHA384_DIGEST_SIZE * 2;
+      else if (digest_base64_bytes + trailing_equals
+               == BASE64_LENGTH (SHA512_DIGEST_SIZE))
+        digest_hex_bytes = SHA512_DIGEST_SIZE * 2;
+      else
+        return false;
+    }
+  else if (cksum_algorithm == blake2b
+           && digest_hex_bytes < digest_base64_bytes)
+    {
+      for (int j = 8; j <= DIGEST_MAX_LEN * 8; j += 8)
+        {
+          if (BASE64_LENGTH (j / 8) == digest_base64_bytes + trailing_equals
+              && j % 3 == trailing_equals)
+            {
+              digest_hex_bytes = j / 4;
+              break;
+            }
+        }
+    }
 # endif
   if (digest_hex_bytes < 2 || digest_hex_bytes % 2
       || DIGEST_MAX_LEN * 2 < digest_hex_bytes)
@@ -1332,7 +1368,7 @@ digest_check (char const *checkfile_name)
             {
               bool match = false;
 #if HASH_ALGO_CKSUM
-              if (d_len < digest_hex_bytes)
+              if (d_len == BASE64_LENGTH (digest_length / 8))
                 match = b64_equal (digest, bin_buffer);
               else
 #endif
diff --git a/tests/cksum/cksum-base64-untagged.sh b/tests/cksum/cksum-base64-untagged.sh
new file mode 100755
index 000000000..6bb1eb05f
--- /dev/null
+++ b/tests/cksum/cksum-base64-untagged.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+# Test that cksum can guess the digest length from base64 checksums.
+
+# Copyright (C) 2025 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ cksum
+getlimits_
+
+echo 'test input' > inp
+echo 'inp: OK' > expout
+echo 'cksum: truncated: no properly formatted checksum lines found' > experr
+
+for algorithm in sha2 sha3; do
+  for length in 224 256 384 512; do
+    # Create files with base64 checksums in the untagged format.
+    cksum -a $algorithm --length $length --base64 --untagged inp \
+      > check || fail=1
+    # Check that the length can be determined from the base64 checksum.
+    cksum -a $algorithm --check check > out || fail=1
+    compare expout out || fail=1
+    # Check that only valid lengths are supported.
+    sed 's|[a-zA-Z0-9+/=]  inp$|  inp|g' check > truncated || fail=1
+    returns_ 1 cksum -a $algorithm --check truncated 2> err || fail=1
+    compare experr err || fail=1
+  done
+done
+
+for length in 8 216 224 232 248 256 264 376 384 392 504 512; do
+  # Create files with base64 checksums in the untagged format.
+  cksum -a blake2b --length $length --base64 --untagged inp \
+    > check || fail=1
+  # Check that the length can be determined from the base64 checksum.
+  cksum -a blake2b --check check > out || fail=1
+  compare expout out || fail=1
+done
+
+Exit $fail
diff --git a/tests/local.mk b/tests/local.mk
index 19bc194fb..52184b7ac 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -305,6 +305,7 @@ all_tests =					\
   tests/cksum/cksum-a.sh			\
   tests/cksum/cksum-c.sh			\
   tests/cksum/cksum-base64.pl			\
+  tests/cksum/cksum-base64-untagged.sh		\
   tests/cksum/cksum-raw.sh			\
   tests/misc/comm.pl				\
   tests/csplit/csplit.sh			\
-- 
2.51.0

Reply via email to