On 12. 1. 26 18:31, Timofei Zhakov wrote:
On Mon, Jan 12, 2026 at 3:29 PM Branko Čibej <[email protected]> wrote:

    On 12. 1. 26 14:30, Timofei Zhakov wrote:
    On Mon, Jan 12, 2026 at 1:22 PM Branko Čibej <[email protected]>
    wrote:

        On 12. 1. 26 12:52, [email protected] wrote:
        Author: brane
        Date: Mon Jan 12 11:52:07 2026
        New Revision: 1931262

        Log:
        Fix warnings in the new checksum code.

        * subversion/libsvn_subr/checksum.c
           (svn_checksum): Remove unused variable.
        * subversion/libsvn_subr/checksum_apr.c: Include <limits.h>.
           (svn_checksum__sha1, svn_checksum__sha1_ctx_update): Do not blindly 
cast
            or implicitly narrow the data length. Check the limits first.

        Modified:
            subversion/trunk/subversion/libsvn_subr/checksum.c
            subversion/trunk/subversion/libsvn_subr/checksum_apr.c

        Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
        
==============================================================================
        --- subversion/trunk/subversion/libsvn_subr/checksum.c  Mon Jan 12 
11:51:21 2026        (r1931261)
        +++ subversion/trunk/subversion/libsvn_subr/checksum.c  Mon Jan 12 
11:52:07 2026        (r1931262)
        @@ -475,8 +475,6 @@ svn_checksum(svn_checksum_t **checksum,
                       apr_size_t len,
                       apr_pool_t *pool)
          {
        -  apr_sha1_ctx_t sha1_ctx;
        -
            SVN_ERR(validate_kind(kind));
            *checksum = svn_checksum_create(kind, pool);
        Modified: subversion/trunk/subversion/libsvn_subr/checksum_apr.c
        
==============================================================================
        --- subversion/trunk/subversion/libsvn_subr/checksum_apr.c      Mon Jan 
12 11:51:21 2026        (r1931261)
        +++ subversion/trunk/subversion/libsvn_subr/checksum_apr.c      Mon Jan 
12 11:52:07 2026        (r1931262)
        @@ -21,6 +21,8 @@
           * 
====================================================================
           */
+#include <limits.h>
        +
          #include "svn_private_config.h"
          #ifdef SVN_CHECKSUM_BACKEND_APR
@@ -89,6 +91,10 @@ svn_checksum__sha1(unsigned char *digest
                             apr_size_t len)
          {
            apr_sha1_ctx_t sha1_ctx;
        +
        +  /* Do not blindly truncate the data length. */
        +  SVN_ERR_ASSERT(len < UINT_MAX);
        +
            apr_sha1_init(&sha1_ctx);
            apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
            apr_sha1_final(digest, &sha1_ctx);
        @@ -121,7 +127,9 @@ svn_checksum__sha1_ctx_update(svn_checks
                                        const void *data,
                                        apr_size_t len)
          {
        -  apr_sha1_update(&ctx->apr_ctx, data, len);
        +  /* Do not blindly truncate the data length. */
        +  SVN_ERR_ASSERT(len < UINT_MAX);
        +  apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
            return SVN_NO_ERROR;
          }

        Just a quick note, it's generally not a good idea to use
        unchecked typecasts anywhere. In this case there was a
        borderline chance of Subversion quietly computing invalid
        hashes. The assert at least removes the "quiet" part,
        although I suspect the better solution is to call
        apr_sha1_update in UINT_MAX-sized chunks until all data are
        consumed.


    I agree. It would be great if APR was accepting size as size_t.
    But it's not a big deal. Anyways, I'm pretty sure the issue
    existed there before.

    Maybe this function could feed it blocks of UINT_MAX if it
    exceeds the limit. But I don't think we should bother to support
    such edge cases, because if it happens, the issue is on the
    caller side -- it should not allocate 4GB buffers in memory.


    Is that a fact? Sorry but these days 4gigs is laughably small in
    certain applications. Our API promises "apr_size_t" chunks which
    on 64-bit platforms means larger than 4 GiB. We can change our API
    or we can fix the bug. It doesn't matter if the bug was there
    before, we know it's there now.

    -- Brane


Okay, it makes sense. I made a patch to support that behaviour. What do you think?

[[[
Index: subversion/libsvn_subr/checksum_apr.c
===================================================================
--- subversion/libsvn_subr/checksum_apr.c (revision 1931267)
+++ subversion/libsvn_subr/checksum_apr.c (working copy)
@@ -91,12 +91,19 @@
                    apr_size_t len)
 {
   apr_sha1_ctx_t sha1_ctx;
+  const char *ptr = data;

-  /* Do not blindly truncate the data length. */
-  SVN_ERR_ASSERT(len < UINT_MAX);
+  apr_sha1_init(&sha1_ctx);

-  apr_sha1_init(&sha1_ctx);
-  apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
+  while (len > 0)
+    {
+      unsigned int block = (len < UINT_MAX) ? (unsigned int)len
+                                            : UINT_MAX;
+      apr_sha1_update(&sha1_ctx, ptr, block);
+      len -= block;
+      ptr += block;
+    }
+
   apr_sha1_final(digest, &sha1_ctx);
   return SVN_NO_ERROR;
 }
@@ -127,9 +134,17 @@
                               const void *data,
                               apr_size_t len)
 {
-  /* Do not blindly truncate the data length. */
-  SVN_ERR_ASSERT(len < UINT_MAX);
-  apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
+  const char *ptr = data;
+
+  while (len > 0)
+    {
+      unsigned int block = (len < UINT_MAX) ? (unsigned int)len
+                                            : UINT_MAX;
+      apr_sha1_update(&ctx->apr_ctx, ptr, block);
+      len -= block;
+      ptr += block;
+    }
+
   return SVN_NO_ERROR;
 }
]]]

I don't really like casting 'void *' to 'char *' but it seems like the only option.

That cast isn't a problem, it already happens implicitly when apr_sha1_update is called; this is from apr_sha1.h:

APU_DECLARE(void) apr_sha1_update(apr_sha1_ctx_t *context, const char *input,
                                unsigned int inputLen);


It would be nice, or at least it would satisfy my sense of aesthetics, if the loop wasn't copy-pasted twice ... I'd make a helper function that will almost certainly be inlined. As a bonus, that helper function can take a 'const char*' argument, then you don't need the cast nor a second variable. Something like this?

static void update_sha1(apr_sha1_ctx_t *ctx, const char* data, apr_size_t len)
{
  while (len > UINT_MAX)
    {
      apr_sha1_update(ctx, data, UINT_MAX);
      len -= UINT_MAX;
      data += UINT_MAX;
    }
  if (len > 0)
    apr_sha1_update(ctx, data, (unsigned int)len);  /* This cast is safe. */
}


then just call update_sha1() with no casts instead of apr_sha1_update(). Nice.


I'm planning to do a similar thing to the bcrypt backend. A tiny note: long is usually 32 bit even on 64 bit platforms (on Windows).

+1. Yes, I remembered later that ULONG == DWORD == not something I'd use for object sizes on a 64-bit platform. Same problem as using unsigned int for the length in APR.

-- Brane

Reply via email to