libsvn_subr

Timofei Zhakov Sat, 17 Jan 2026 09:12:59 -0800

On Sat, Jan 17, 2026 at 5:55 PM Timofei Zhakov <[email protected]> wrote:


> On Tue, Jan 13, 2026 at 11:43 AM Branko Čibej <[email protected]> wrote:
>
>> On 13. 1. 26 09:57, Timofei Zhakov wrote:
>>
>> On Mon, Jan 12, 2026 at 9:17 PM Branko Čibej <[email protected]> wrote:
>>
>>> On 12. 1. 26 18:31, Timofei Zhakov wrote:
>>>
>>> On Mon, Jan 12, 2026 at 3:29 PM Branko Čibej <[email protected]> wrote:
>>>
>>>> On 12. 1. 26 14:30, Timofei Zhakov wrote:
>>>>
>>>> On Mon, Jan 12, 2026 at 1:22 PM Branko Čibej <[email protected]> wrote:
>>>>
>>>>> On 12. 1. 26 12:52, [email protected] wrote:
>>>>>
>>>>> Author: brane
>>>>> Date: Mon Jan 12 11:52:07 2026
>>>>> New Revision: 1931262
>>>>>
>>>>> Log:
>>>>> Fix warnings in the new checksum code.
>>>>>
>>>>> * subversion/libsvn_subr/checksum.c
>>>>>   (svn_checksum): Remove unused variable.
>>>>> * subversion/libsvn_subr/checksum_apr.c: Include <limits.h>.
>>>>>   (svn_checksum__sha1, svn_checksum__sha1_ctx_update): Do not blindly cast
>>>>>    or implicitly narrow the data length. Check the limits first.
>>>>>
>>>>> Modified:
>>>>>    subversion/trunk/subversion/libsvn_subr/checksum.c
>>>>>    subversion/trunk/subversion/libsvn_subr/checksum_apr.c
>>>>>
>>>>> Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
>>>>> ==============================================================================
>>>>> --- subversion/trunk/subversion/libsvn_subr/checksum.c    Mon Jan 12 
>>>>> 11:51:21 2026        (r1931261)
>>>>> +++ subversion/trunk/subversion/libsvn_subr/checksum.c    Mon Jan 12 
>>>>> 11:52:07 2026        (r1931262)
>>>>> @@ -475,8 +475,6 @@ svn_checksum(svn_checksum_t **checksum,
>>>>>               apr_size_t len,
>>>>>               apr_pool_t *pool)
>>>>>  {
>>>>> -  apr_sha1_ctx_t sha1_ctx;
>>>>> -
>>>>>    SVN_ERR(validate_kind(kind));
>>>>>    *checksum = svn_checksum_create(kind, pool);
>>>>>
>>>>>
>>>>> Modified: subversion/trunk/subversion/libsvn_subr/checksum_apr.c
>>>>> ==============================================================================
>>>>> --- subversion/trunk/subversion/libsvn_subr/checksum_apr.c        Mon Jan 
>>>>> 12 11:51:21 2026        (r1931261)
>>>>> +++ subversion/trunk/subversion/libsvn_subr/checksum_apr.c        Mon Jan 
>>>>> 12 11:52:07 2026        (r1931262)
>>>>> @@ -21,6 +21,8 @@
>>>>>   * ====================================================================
>>>>>   */
>>>>>
>>>>> +#include <limits.h>
>>>>> +
>>>>>  #include "svn_private_config.h"
>>>>>  #ifdef SVN_CHECKSUM_BACKEND_APR
>>>>>
>>>>> @@ -89,6 +91,10 @@ svn_checksum__sha1(unsigned char *digest
>>>>>                     apr_size_t len)
>>>>>  {
>>>>>    apr_sha1_ctx_t sha1_ctx;
>>>>> +
>>>>> +  /* Do not blindly truncate the data length. */
>>>>> +  SVN_ERR_ASSERT(len < UINT_MAX);
>>>>> +
>>>>>    apr_sha1_init(&sha1_ctx);
>>>>>    apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
>>>>>    apr_sha1_final(digest, &sha1_ctx);
>>>>> @@ -121,7 +127,9 @@ svn_checksum__sha1_ctx_update(svn_checks
>>>>>                                const void *data,
>>>>>                                apr_size_t len)
>>>>>  {
>>>>> -  apr_sha1_update(&ctx->apr_ctx, data, len);
>>>>> +  /* Do not blindly truncate the data length. */
>>>>> +  SVN_ERR_ASSERT(len < UINT_MAX);
>>>>> +  apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
>>>>>    return SVN_NO_ERROR;
>>>>>  }
>>>>>
>>>>>
>>>>> Just a quick note, it's generally not a good idea to use unchecked
>>>>> typecasts anywhere. In this case there was a borderline chance of
>>>>> Subversion quietly computing invalid hashes. The assert at least removes
>>>>> the "quiet" part, although I suspect the better solution is to call
>>>>> apr_sha1_update in UINT_MAX-sized chunks until all data are consumed.
>>>>>
>>>>
>>>> I agree. It would be great if APR was accepting size as size_t. But
>>>> it's not a big deal. Anyways, I'm pretty sure the issue existed there
>>>> before.
>>>>
>>>> Maybe this function could feed it blocks of UINT_MAX if it exceeds the
>>>> limit. But I don't think we should bother to support such edge cases,
>>>> because if it happens, the issue is on the caller side -- it should not
>>>> allocate 4GB buffers in memory.
>>>>
>>>>
>>>>
>>>> Is that a fact? Sorry but these days 4gigs is laughably small in
>>>> certain applications. Our API promises "apr_size_t" chunks which on 64-bit
>>>> platforms means larger than 4 GiB. We can change our API or we can fix the
>>>> bug. It doesn't matter if the bug was there before, we know it's there now.
>>>>
>>>> -- Brane
>>>>
>>>>
>>> Okay, it makes sense. I made a patch to support that behaviour. What do
>>> you think?
>>>
>>> [[[
>>> Index: subversion/libsvn_subr/checksum_apr.c
>>> ===================================================================
>>> --- subversion/libsvn_subr/checksum_apr.c (revision 1931267)
>>> +++ subversion/libsvn_subr/checksum_apr.c (working copy)
>>> @@ -91,12 +91,19 @@
>>>                     apr_size_t len)
>>>  {
>>>    apr_sha1_ctx_t sha1_ctx;
>>> +  const char *ptr = data;
>>>
>>> -  /* Do not blindly truncate the data length. */
>>> -  SVN_ERR_ASSERT(len < UINT_MAX);
>>> +  apr_sha1_init(&sha1_ctx);
>>>
>>> -  apr_sha1_init(&sha1_ctx);
>>> -  apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
>>> +  while (len > 0)
>>> +    {
>>> +      unsigned int block = (len < UINT_MAX) ? (unsigned int)len
>>> +                                            : UINT_MAX;
>>> +      apr_sha1_update(&sha1_ctx, ptr, block);
>>> +      len -= block;
>>> +      ptr += block;
>>> +    }
>>> +
>>>    apr_sha1_final(digest, &sha1_ctx);
>>>    return SVN_NO_ERROR;
>>>  }
>>> @@ -127,9 +134,17 @@
>>>                                const void *data,
>>>                                apr_size_t len)
>>>  {
>>> -  /* Do not blindly truncate the data length. */
>>> -  SVN_ERR_ASSERT(len < UINT_MAX);
>>> -  apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
>>> +  const char *ptr = data;
>>> +
>>> +  while (len > 0)
>>> +    {
>>> +      unsigned int block = (len < UINT_MAX) ? (unsigned int)len
>>> +                                            : UINT_MAX;
>>> +      apr_sha1_update(&ctx->apr_ctx, ptr, block);
>>> +      len -= block;
>>> +      ptr += block;
>>> +    }
>>> +
>>>    return SVN_NO_ERROR;
>>>  }
>>> ]]]
>>>
>>> I don't really like casting 'void *' to 'char *' but it seems like the
>>> only option.
>>>
>>>
>>> That cast isn't a problem, it already happens implicitly when
>>> apr_sha1_update is called; this is from apr_sha1.h:
>>>
>>> APU_DECLARE(void) apr_sha1_update(apr_sha1_ctx_t *context, const char 
>>> *input,
>>>                                 unsigned int inputLen);
>>>
>>> Oh, good to know.
>>
>>
>> [...]
>>
>>
>>
>>>
>>> It would be nice, or at least it would satisfy my sense of aesthetics,
>>> if the loop wasn't copy-pasted twice ... I'd make a helper function that
>>> will almost certainly be inlined. As a bonus, that helper function can take
>>> a 'const char*' argument, then you don't need the cast nor a second
>>> variable. Something like this?
>>>
>>> static void update_sha1(apr_sha1_ctx_t *ctx, const char* data, apr_size_t 
>>> len)
>>> {
>>>   while (len > UINT_MAX)
>>>     {
>>>       apr_sha1_update(ctx, data, UINT_MAX);
>>>       len -= UINT_MAX;
>>>       data += UINT_MAX;
>>>     }
>>>   if (len > 0)
>>>     apr_sha1_update(ctx, data, (unsigned int)len);  /* This cast is safe. */
>>> }
>>>
>>>
>>> then just call update_sha1() with no casts instead of apr_sha1_update().
>>> Nice.
>>>
>>>
>> I thought about making a function. It might be better for readability to
>> have everything inlined rather than adding extra abstractions. But as I
>> think now it is probably worth it. This is like a wrapper around APR which
>> allows for bigger buffers. And again it removes the risk of copy-paste
>> mistakes.
>>
>> It's also a good argument to utilise implicit cast to char-ptr during
>> function invocation.
>>
>> Could you please elaborate why you are suggesting re-writing the loop in
>> this way? I think it was much simpler before. I would prefer to have a
>> single invocation of the function, because it assures that no matter what
>> we'd get the same behaviour. Imagine mistyping parameters in the first one
>> (inside of the loop) which would never be actually executed and then trying
>> to figure out where that bug comes from (and this all will most likely
>> happen while performing some unbounded operation which is also a bug).
>>
>>
>> It gets rid of this
>>
>> unsigned int block = (len < UINT_MAX) ? (unsigned int)len : UINT_MAX;
>>
>>
>> happening on every operation, something the compiler can't just ignore or
>> optimise out. It also avoids
>>
>>       len -= UINT_MAX;
>>       data += UINT_MAX;
>>
>>
>> when it isn't necessary; something that probably would be optimised out.
>>
>
>> *shrug* Compared to the actual SHA-1 calculation, that's trivial. I just
>> find it cleaner to not have to recalculate the block length every time.
>> It's really a matter of taste.
>>
>
>
> I think that the way to address this problem is more general. Good to know
> the "optimal" version of this "while loop" would be.
>
> Also perhaps it was suboptimal to use the ternary thing here, because
> there is a chance for implicit conversions.
>
> But basically, yeah, it's just a matter of preference. I would prefer the
> one that just takes less lines of code/logic branches/assembly after
> optimisation.
>
> By the way, I tested both of those snippets on the compiler explorer.
> Attaching the code I used below. Also I used this website: [1]. I didn't
> conclude anything special though.
>
> #include <limits.h>
> #include <stdio.h>
> #include <stdint.h>
>
> void __attribute__ ((noinline)) apr_sha1_update(const void *data, unsigned 
> int len)
> {
>     printf("qq", data, len);
> }
>
> void qq1(const char *data, size_t len)
> {
>   while (len > 0)
>     {
>       unsigned int block = (len <= UINT_MAX) ? (unsigned int)len : UINT_MAX;
>       apr_sha1_update(data, len);
>       len -= block;
>       data += block;
>     }
> }
>
> void qq2(const char *data, size_t len)
> {
>   while (len > UINT_MAX)
>     {
>       apr_sha1_update(data, UINT_MAX);
>       len -= UINT_MAX;
>       data += UINT_MAX;
>     }
>   if (len > 0)
>     apr_sha1_update(data, (unsigned int)len);  /* This cast is safe. */
> }
>
>
>
>>
>> I see that it removes the ternary operator, but it was good in explaining
>> what this code does; feed it UINT_MAX bytes or whatever amount we have. On
>> the other hand, after your rewrite, two completely separate branches would
>> be executed depending on whether it iterates UINT_MAX blocks or feeds the
>> remaining.
>>
>>
>> As I said, it's a matter of taste. I find it more obvious what's
>> happening when the while() condition tests against an explicit, constant
>> upper bound.
>>
>>
>>
>>
>>> I'm planning to do a similar thing to the bcrypt backend. A tiny note:
>>> long is usually 32 bit even on 64 bit platforms (on Windows).
>>>
>>>
>>> +1. Yes, I remembered later that ULONG == DWORD == not something I'd use
>>> for object sizes on a 64-bit platform. Same problem as using unsigned int
>>> for the length in APR.
>>>
>>
> I'm going to go ahead and commit it to trunk soon. Thanks for reviewing! I
> really appreciate that!
>
> [1] https://godbolt.org/
>
>
>
Committed in r1931389. Thanks!

-- 
Timofei Zhakov

Re: svn commit: r1931262 - subversion/trunk/subversion/libsvn_subr

Reply via email to