Jeff King <p...@peff.net> writes:

> The problem isn't actually a sha1 mismatch, though that's what
> parse_object() will report. The issue is actually that the file is
> truncated. So zlib does not say "this is corrupt", but rather "I need
> more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both
> for "I need more bytes" (in which we know we are truncated, because we
> fed the whole mmap'd file in the first place) as well as "I need more
> output buffer space" (which just means we should keep looping!).
>
> So we need to distinguish those cases. I think this is the simplest fix:
>
> diff --git a/sha1-file.c b/sha1-file.c
> index dd0b6aa873..a7ff5fe25d 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
>        * see the comment in unpack_sha1_rest for details.
>        */
>       while (total_read <= size &&
> +            stream->avail_in > 0 &&
>              (status == Z_OK || status == Z_BUF_ERROR)) {
>               stream->next_out = buf;
>               stream->avail_out = sizeof(buf);

Hmph.  If the last round consumed the final input byte and needed
output space of N bytes, but only M (< N) bytes of the output space
was available, then it would have reduced both avail_in and
avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
refrain from consuming that final byte (leaving avail_in to at least
one) and give us Z_BUF_ERROR in such a case?

> This works just by checking that we are making forward progress in the
> output buffer. I think that would _probably_ be OK for this case, since
> we know we have all of the input available. But in a case where we're
> feeding the input in a stream, it would not be. It's possible there that
> we would not create any output in one round, but would do so after
> feeding more input bytes.

Yes, exactly.

> I think the patch I showed above addresses the root cause more directly.
> I'll wrap that up in a real commit, but I think there may be some
> related work:
>
>   - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't
>     strictly correct, but is probably good enough). However, "git
>     cat-file blob 19f9c827" exits non-zero without printing anything. It
>     probably should complain more loudly.
>
>   - the offending loop comes from f6371f9210. But that commit was mostly
>     cargo-culting other parts of sha1-file.c. I'm worried that this bug
>     exists elsewhere, too. I'll dig around to see if I can find other
>     instances.

Thanks.

Reply via email to