Re: [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects

2014-08-14 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 As a recovery tool, unpack-objects should go on unpacking as many
 objects as it can.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
 ---
  builtin/unpack-objects.c | 42 +-
  t/t1050-large.sh |  7 +++
  2 files changed, 48 insertions(+), 1 deletion(-)

 diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
 index 99cde45..8b5c67e 100644
 --- a/builtin/unpack-objects.c
 +++ b/builtin/unpack-objects.c
 @@ -88,10 +88,50 @@ static void use(int bytes)
   consumed_bytes += bytes;
  }
  
 +static void inflate_and_throw_away(unsigned long size)
 +{
 + git_zstream stream;
 + char buf[8192];
 +
 + memset(stream, 0, sizeof(stream));
 + stream.next_out = (unsigned char *)buf;
 + stream.avail_out = sizeof(buf);
 + stream.next_in = fill(1);
 + stream.avail_in = len;
 + git_inflate_init(stream);
 +
 + for (;;) {
 + int ret = git_inflate(stream, 0);
 + use(len - stream.avail_in);
 + if (stream.total_out == size  ret == Z_STREAM_END)
 + break;
 + if (ret != Z_OK) {
 + error(inflate returned %d, ret);
 + if (!recover)
 + exit(1);
 + has_errors = 1;
 + break;
 + }
 + stream.next_out = (unsigned char *)buf;
 + stream.avail_out = sizeof(buf);
 + stream.next_in = fill(1);
 + stream.avail_in = len;
 + }
 + git_inflate_end(stream);
 +}

This looks wrong in a few ways.

You already know that we saw an error when you get to this function,
whether you will see an out-of-sync stream in this loop or not, so
there is no reason to copy the assignment to has_errors from the
other function.  You also know 'recover' is true---otherwise you
wouldn't be here.

But more importantly, the basic structure of this loop is the same
as the loop we already have in the only caller of this new function,
not just the regular zlib produced this much that is not yet the
expected size, go on reading more and we are at the end of the
stream with Z_STREAM_END, and we are done, but even to the stream
is corrupt, we need to exit the loop, they are identical.  Is a
copy-and-paste like this the best we can do to add this skip to the
end of the current stream?  We would really want to keep the number
of copies of this loop down; we saw a same bug introduced on the
termination condition multiple times to different copies X-.

  static void *get_data(unsigned long size)
  {
   git_zstream stream;
 - void *buf = xmalloc(size);
 + void *buf = xmalloc_gentle(size);
 +
 + if (!buf) {
 + if (!recover)
 + exit(1);
 + has_errors = 1;
 + inflate_and_throw_away(size);
 + return NULL;
 + }
  
   memset(stream, 0, sizeof(stream));
  
 diff --git a/t/t1050-large.sh b/t/t1050-large.sh
 index 5642f84..eec2cca 100755
 --- a/t/t1050-large.sh
 +++ b/t/t1050-large.sh
 @@ -169,4 +169,11 @@ test_expect_success 'fsck' '
   test $n -gt 1
  '
  
 +test_expect_success 'unpack-objects' '
 + P=`ls .git/objects/pack/*.pack` 
 + git unpack-objects -n -r $P 2err
 + test $? = 1 
 + grep error: attempting to allocate .* over limit err
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects

2014-08-14 Thread Duy Nguyen
On Thu, Aug 14, 2014 at 11:58 PM, Junio C Hamano gits...@pobox.com wrote:
 +static void inflate_and_throw_away(unsigned long size)
 +{

 But more importantly, the basic structure of this loop is the same
 as the loop we already have in the only caller of this new function,
 not just the regular zlib produced this much that is not yet the
 expected size, go on reading more and we are at the end of the
 stream with Z_STREAM_END, and we are done, but even to the stream
 is corrupt, we need to exit the loop, they are identical.  Is a
 copy-and-paste like this the best we can do to add this skip to the
 end of the current stream?  We would really want to keep the number
 of copies of this loop down; we saw a same bug introduced on the
 termination condition multiple times to different copies X-.

I know. I'm the author of one of those bugs. I considered updating the
inflate loop in get_data() to support this throw-away mode but was
afraid I may make the same mistake like in index-pack again. I'll
probably drop this patch and think if I can unify inflate loop (for
other places too, not just unpack-objects).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html