On Tue, Jan 13, 2026 at 11:43 AM Branko Čibej <[email protected]> wrote:
> On 13. 1. 26 09:57, Timofei Zhakov wrote: > > On Mon, Jan 12, 2026 at 9:17 PM Branko Čibej <[email protected]> wrote: > >> On 12. 1. 26 18:31, Timofei Zhakov wrote: >> >> On Mon, Jan 12, 2026 at 3:29 PM Branko Čibej <[email protected]> wrote: >> >>> On 12. 1. 26 14:30, Timofei Zhakov wrote: >>> >>> On Mon, Jan 12, 2026 at 1:22 PM Branko Čibej <[email protected]> wrote: >>> >>>> On 12. 1. 26 12:52, [email protected] wrote: >>>> >>>> Author: brane >>>> Date: Mon Jan 12 11:52:07 2026 >>>> New Revision: 1931262 >>>> >>>> Log: >>>> Fix warnings in the new checksum code. >>>> >>>> * subversion/libsvn_subr/checksum.c >>>> (svn_checksum): Remove unused variable. >>>> * subversion/libsvn_subr/checksum_apr.c: Include <limits.h>. >>>> (svn_checksum__sha1, svn_checksum__sha1_ctx_update): Do not blindly cast >>>> or implicitly narrow the data length. Check the limits first. >>>> >>>> Modified: >>>> subversion/trunk/subversion/libsvn_subr/checksum.c >>>> subversion/trunk/subversion/libsvn_subr/checksum_apr.c >>>> >>>> Modified: subversion/trunk/subversion/libsvn_subr/checksum.c >>>> ============================================================================== >>>> --- subversion/trunk/subversion/libsvn_subr/checksum.c Mon Jan 12 >>>> 11:51:21 2026 (r1931261) >>>> +++ subversion/trunk/subversion/libsvn_subr/checksum.c Mon Jan 12 >>>> 11:52:07 2026 (r1931262) >>>> @@ -475,8 +475,6 @@ svn_checksum(svn_checksum_t **checksum, >>>> apr_size_t len, >>>> apr_pool_t *pool) >>>> { >>>> - apr_sha1_ctx_t sha1_ctx; >>>> - >>>> SVN_ERR(validate_kind(kind)); >>>> *checksum = svn_checksum_create(kind, pool); >>>> >>>> >>>> Modified: subversion/trunk/subversion/libsvn_subr/checksum_apr.c >>>> ============================================================================== >>>> --- subversion/trunk/subversion/libsvn_subr/checksum_apr.c Mon Jan 12 >>>> 11:51:21 2026 (r1931261) >>>> +++ subversion/trunk/subversion/libsvn_subr/checksum_apr.c Mon Jan 12 >>>> 11:52:07 2026 (r1931262) >>>> @@ -21,6 +21,8 @@ >>>> * ==================================================================== >>>> */ >>>> >>>> +#include <limits.h> >>>> + >>>> #include "svn_private_config.h" >>>> #ifdef SVN_CHECKSUM_BACKEND_APR >>>> >>>> @@ -89,6 +91,10 @@ svn_checksum__sha1(unsigned char *digest >>>> apr_size_t len) >>>> { >>>> apr_sha1_ctx_t sha1_ctx; >>>> + >>>> + /* Do not blindly truncate the data length. */ >>>> + SVN_ERR_ASSERT(len < UINT_MAX); >>>> + >>>> apr_sha1_init(&sha1_ctx); >>>> apr_sha1_update(&sha1_ctx, data, (unsigned int)len); >>>> apr_sha1_final(digest, &sha1_ctx); >>>> @@ -121,7 +127,9 @@ svn_checksum__sha1_ctx_update(svn_checks >>>> const void *data, >>>> apr_size_t len) >>>> { >>>> - apr_sha1_update(&ctx->apr_ctx, data, len); >>>> + /* Do not blindly truncate the data length. */ >>>> + SVN_ERR_ASSERT(len < UINT_MAX); >>>> + apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len); >>>> return SVN_NO_ERROR; >>>> } >>>> >>>> >>>> Just a quick note, it's generally not a good idea to use unchecked >>>> typecasts anywhere. In this case there was a borderline chance of >>>> Subversion quietly computing invalid hashes. The assert at least removes >>>> the "quiet" part, although I suspect the better solution is to call >>>> apr_sha1_update in UINT_MAX-sized chunks until all data are consumed. >>>> >>> >>> I agree. It would be great if APR was accepting size as size_t. But it's >>> not a big deal. Anyways, I'm pretty sure the issue existed there before. >>> >>> Maybe this function could feed it blocks of UINT_MAX if it exceeds the >>> limit. But I don't think we should bother to support such edge cases, >>> because if it happens, the issue is on the caller side -- it should not >>> allocate 4GB buffers in memory. >>> >>> >>> >>> Is that a fact? Sorry but these days 4gigs is laughably small in certain >>> applications. Our API promises "apr_size_t" chunks which on 64-bit >>> platforms means larger than 4 GiB. We can change our API or we can fix the >>> bug. It doesn't matter if the bug was there before, we know it's there now. >>> >>> -- Brane >>> >>> >> Okay, it makes sense. I made a patch to support that behaviour. What do >> you think? >> >> [[[ >> Index: subversion/libsvn_subr/checksum_apr.c >> =================================================================== >> --- subversion/libsvn_subr/checksum_apr.c (revision 1931267) >> +++ subversion/libsvn_subr/checksum_apr.c (working copy) >> @@ -91,12 +91,19 @@ >> apr_size_t len) >> { >> apr_sha1_ctx_t sha1_ctx; >> + const char *ptr = data; >> >> - /* Do not blindly truncate the data length. */ >> - SVN_ERR_ASSERT(len < UINT_MAX); >> + apr_sha1_init(&sha1_ctx); >> >> - apr_sha1_init(&sha1_ctx); >> - apr_sha1_update(&sha1_ctx, data, (unsigned int)len); >> + while (len > 0) >> + { >> + unsigned int block = (len < UINT_MAX) ? (unsigned int)len >> + : UINT_MAX; >> + apr_sha1_update(&sha1_ctx, ptr, block); >> + len -= block; >> + ptr += block; >> + } >> + >> apr_sha1_final(digest, &sha1_ctx); >> return SVN_NO_ERROR; >> } >> @@ -127,9 +134,17 @@ >> const void *data, >> apr_size_t len) >> { >> - /* Do not blindly truncate the data length. */ >> - SVN_ERR_ASSERT(len < UINT_MAX); >> - apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len); >> + const char *ptr = data; >> + >> + while (len > 0) >> + { >> + unsigned int block = (len < UINT_MAX) ? (unsigned int)len >> + : UINT_MAX; >> + apr_sha1_update(&ctx->apr_ctx, ptr, block); >> + len -= block; >> + ptr += block; >> + } >> + >> return SVN_NO_ERROR; >> } >> ]]] >> >> I don't really like casting 'void *' to 'char *' but it seems like the >> only option. >> >> >> That cast isn't a problem, it already happens implicitly when >> apr_sha1_update is called; this is from apr_sha1.h: >> >> APU_DECLARE(void) apr_sha1_update(apr_sha1_ctx_t *context, const char *input, >> unsigned int inputLen); >> >> Oh, good to know. > > > [...] > > > >> >> It would be nice, or at least it would satisfy my sense of aesthetics, if >> the loop wasn't copy-pasted twice ... I'd make a helper function that will >> almost certainly be inlined. As a bonus, that helper function can take a >> 'const char*' argument, then you don't need the cast nor a second variable. >> Something like this? >> >> static void update_sha1(apr_sha1_ctx_t *ctx, const char* data, apr_size_t >> len) >> { >> while (len > UINT_MAX) >> { >> apr_sha1_update(ctx, data, UINT_MAX); >> len -= UINT_MAX; >> data += UINT_MAX; >> } >> if (len > 0) >> apr_sha1_update(ctx, data, (unsigned int)len); /* This cast is safe. */ >> } >> >> >> then just call update_sha1() with no casts instead of apr_sha1_update(). >> Nice. >> >> > I thought about making a function. It might be better for readability to > have everything inlined rather than adding extra abstractions. But as I > think now it is probably worth it. This is like a wrapper around APR which > allows for bigger buffers. And again it removes the risk of copy-paste > mistakes. > > It's also a good argument to utilise implicit cast to char-ptr during > function invocation. > > Could you please elaborate why you are suggesting re-writing the loop in > this way? I think it was much simpler before. I would prefer to have a > single invocation of the function, because it assures that no matter what > we'd get the same behaviour. Imagine mistyping parameters in the first one > (inside of the loop) which would never be actually executed and then trying > to figure out where that bug comes from (and this all will most likely > happen while performing some unbounded operation which is also a bug). > > > It gets rid of this > > unsigned int block = (len < UINT_MAX) ? (unsigned int)len : UINT_MAX; > > > happening on every operation, something the compiler can't just ignore or > optimise out. It also avoids > > len -= UINT_MAX; > data += UINT_MAX; > > > when it isn't necessary; something that probably would be optimised out. > > *shrug* Compared to the actual SHA-1 calculation, that's trivial. I just > find it cleaner to not have to recalculate the block length every time. > It's really a matter of taste. > I think that the way to address this problem is more general. Good to know the "optimal" version of this "while loop" would be. Also perhaps it was suboptimal to use the ternary thing here, because there is a chance for implicit conversions. But basically, yeah, it's just a matter of preference. I would prefer the one that just takes less lines of code/logic branches/assembly after optimisation. By the way, I tested both of those snippets on the compiler explorer. Attaching the code I used below. Also I used this website: [1]. I didn't conclude anything special though. #include <limits.h> #include <stdio.h> #include <stdint.h> void __attribute__ ((noinline)) apr_sha1_update(const void *data, unsigned int len) { printf("qq", data, len); } void qq1(const char *data, size_t len) { while (len > 0) { unsigned int block = (len <= UINT_MAX) ? (unsigned int)len : UINT_MAX; apr_sha1_update(data, len); len -= block; data += block; } } void qq2(const char *data, size_t len) { while (len > UINT_MAX) { apr_sha1_update(data, UINT_MAX); len -= UINT_MAX; data += UINT_MAX; } if (len > 0) apr_sha1_update(data, (unsigned int)len); /* This cast is safe. */ } > > I see that it removes the ternary operator, but it was good in explaining > what this code does; feed it UINT_MAX bytes or whatever amount we have. On > the other hand, after your rewrite, two completely separate branches would > be executed depending on whether it iterates UINT_MAX blocks or feeds the > remaining. > > > As I said, it's a matter of taste. I find it more obvious what's happening > when the while() condition tests against an explicit, constant upper bound. > > > > >> I'm planning to do a similar thing to the bcrypt backend. A tiny note: >> long is usually 32 bit even on 64 bit platforms (on Windows). >> >> >> +1. Yes, I remembered later that ULONG == DWORD == not something I'd use >> for object sizes on a 64-bit platform. Same problem as using unsigned int >> for the length in APR. >> > I'm going to go ahead and commit it to trunk soon. Thanks for reviewing! I really appreciate that! [1] https://godbolt.org/ -- Timofei Zhakov

