Re: No progress from push when using bitmaps

2014-03-14 Thread Michael Haggerty
On 03/13/2014 11:07 PM, Jeff King wrote:
 On Thu, Mar 13, 2014 at 03:01:09PM -0700, Shawn Pearce wrote:
 
 It would definitely be good to have throughput measurements while
 writing out the pack. However, I'm not sure we have anything useful to
 count. We know the total number of objects we're reusing, but we're not
 actually parsing the data; we're just blitting it out as a stream. I
 think the progress code will need some refactoring to handle a
 throughput-only case.

 Yes. I think JGit suffers from this same bug, and again we never
 noticed it because usually only the servers are bitmapped, not the
 clients.

 pack-objects writes a throughput meter when its writing objects.
 Really just the bytes out/second would be enough to let the user know
 the client is working. Unfortunately I think that is still tied to the
 overall progress system having some other counter?
 
 Yes, I'm looking at it right now. The throughput meter is actually
 connected to the sha1fd output. So really we just need to call
 display_progress periodically as we loop through the data. It's a
 one-liner fix.
 
 _But_ it still looks ugly, because, as you mention, it's tied to the
 progress meter, which is counting up to N objects. So we basically sit
 there at 0, pumping data, and then after the pack is done, we can say
 we sent N. :)
 
 There are a few ways around this:
 
   1. Add a new phase Writing packs which counts from 0 to 1. Even
  though it's more accurate, moving from 0 to 1 really isn't that
  useful (the throughput is, but the 0/1 just looks like noise).
 
   2. Add a new phase Writing reused objects that counts from 0 bytes
  up to N bytes. This looks stupid, though, because we are repeating
  the current byte count both here and in the throughput.
 
   3. Use the regular Writing objects progress, but fake the object
  count. We know we are writing M bytes with N objects. Bump the
  counter by 1 for every M/N bytes we write.

Would it be practical to change it to a percentage of bytes written?
Then we'd have progress info that is both convenient *and* truthful.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-14 Thread Duy Nguyen
On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 Would it be practical to change it to a percentage of bytes written?
 Then we'd have progress info that is both convenient *and* truthful.

I agreed for a second, then remembered that we don't know the final
pack size until we finish writing it.. Not sure if we could estimate
(cheaply) with a good accuracy though.

If an object is reused, we already know its compressed size. If it's
not reused and is a loose object, we could use on-disk size. It's a
lot harder to estimate an not-reused, deltified object. All we have is
the uncompressed size, and the size of each delta in the delta chain..
Neither gives a good hint of what the compressed size would be.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-14 Thread Jeff King
On Fri, Mar 14, 2014 at 05:21:59PM +0700, Duy Nguyen wrote:

 On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
  Would it be practical to change it to a percentage of bytes written?
  Then we'd have progress info that is both convenient *and* truthful.
 
 I agreed for a second, then remembered that we don't know the final
 pack size until we finish writing it.. Not sure if we could estimate
 (cheaply) with a good accuracy though.

Right. I'm not sure what Michael meant by it. We can send a percentage
of bytes written for the reused pack (my option 2), but we do not know
the total bytes for the rest of the objects. So we'd end up with two
progress meters (one for the reused pack, and one for everything else),
both counting up to different endpoints. And it would require quite a
few changes to the progress code.

 If an object is reused, we already know its compressed size. If it's
 not reused and is a loose object, we could use on-disk size. It's a
 lot harder to estimate an not-reused, deltified object. All we have is
 the uncompressed size, and the size of each delta in the delta chain..
 Neither gives a good hint of what the compressed size would be.

Hmm. I think we do have the compressed delta size after having run the
compression phase (because that is ultimately what we compare to find
the best delta). Loose objects are probably the hardest here, as we
actually recompress them (IIRC, because packfiles encode the type/size
info outside of the compressed bit, whereas it is inside for loose
objects; the experimental loose format harmonized this, but it never
caught on).

Without doing that recompression, any value you came up with would be an
estimate, though it would be pretty close (not off by more than a few
bytes per object). However, you can't just run through the packing list
and add up the object sizes; you'd need to do a real dry-run through
the writing phase. There are probably more I'm missing, but you need at
least to figure out:

  1. The actual compressed size of a full loose object, as described
 above.

  2. The variable-length headers for each object based on its type and
 size.

  3. The final form that the object will take based on what has come
 before. For example, if there is a max pack size, we may split an
 object from its delta base, in which case we have to throw away the
 delta. We don't know where those breaks will be until we walk
 through the whole list.

  4. If an object we attempt to reuse turns out to be corrupted, we
 fall back to the non-reuse code path, which will have a different
 size. So you'd need to actually check the reused object CRCs during
 the dry-run (and for local repacks, not transfers, we actually
 inflate and check the zlib, too, for safety).

So I think it's _possible_. But it's definitely not trivial. For now, I
think it makes sense to go with something like the patch I posted
earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a
regression in the bitmaps case. And it does not make it any harder for
somebody to later convert us to a true byte-counter (i.e., it is the
easy half already).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-14 Thread Duy Nguyen
On Fri, Mar 14, 2014 at 10:29 PM, Jeff King p...@peff.net wrote:
 If an object is reused, we already know its compressed size. If it's
 not reused and is a loose object, we could use on-disk size. It's a
 lot harder to estimate an not-reused, deltified object. All we have is
 the uncompressed size, and the size of each delta in the delta chain..
 Neither gives a good hint of what the compressed size would be.

 Hmm. I think we do have the compressed delta size after having run the
 compression phase (because that is ultimately what we compare to find
 the best delta).

There are cases when we try not to find deltas (large blobs, file too
small, or -delta attribute). The large blob case is especially
interesting because progress bar crawls slowly when they write these
objects.

 Loose objects are probably the hardest here, as we
 actually recompress them (IIRC, because packfiles encode the type/size
 info outside of the compressed bit, whereas it is inside for loose
 objects; the experimental loose format harmonized this, but it never
 caught on).

 Without doing that recompression, any value you came up with would be an
 estimate, though it would be pretty close (not off by more than a few
 bytes per object).

That's my hope. Although if they tweak compression level then the
estimation could be off (gzip -9 and gzip -1 produce big difference in
size)

 However, you can't just run through the packing list
 and add up the object sizes; you'd need to do a real dry-run through
 the writing phase. There are probably more I'm missing, but you need at
 least to figure out:

   1. The actual compressed size of a full loose object, as described
  above.

   2. The variable-length headers for each object based on its type and
  size.

We could run through a typical repo, calculate the average header
length then use it for all objects?


   3. The final form that the object will take based on what has come
  before. For example, if there is a max pack size, we may split an
  object from its delta base, in which case we have to throw away the
  delta. We don't know where those breaks will be until we walk
  through the whole list.

Ah this could probably be avoided. max pack size does not apply to
streaming pack-objects, where progress bar is most shown. Falling back
to object number in this case does not sound too bad.


   4. If an object we attempt to reuse turns out to be corrupted, we
  fall back to the non-reuse code path, which will have a different
  size. So you'd need to actually check the reused object CRCs during
  the dry-run (and for local repacks, not transfers, we actually
  inflate and check the zlib, too, for safety).

Ugh..


 So I think it's _possible_. But it's definitely not trivial. For now, I
 think it makes sense to go with something like the patch I posted
 earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a
 regression in the bitmaps case. And it does not make it any harder for
 somebody to later convert us to a true byte-counter (i.e., it is the
 easy half already).

Agreed.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Jeff King
On Wed, Mar 12, 2014 at 05:21:21PM -0700, Shawn Pearce wrote:

 Today I tried pushing a copy of linux.git from a client that had
 bitmaps into a JGit server. The client stalled for a long time with no
 progress, because it reused the existing pack. No progress appeared
 while it was sending the existing file on the wire:
 
   $ git push git://localhost/linux.git master
   Reusing existing pack: 2938117, done.
   Total 2938117 (delta 0), reused 0 (delta 0)
   remote: Resolving deltas:  66% (1637269/2455727)
 
 This is not the best user experience. :-(

Yeah, I agree that sucks. I hadn't noticed it, as I don't typically have
my client repos bitmapped (and on fetch, the interesting progress is
coming from the local index-pack).

It would definitely be good to have throughput measurements while
writing out the pack. However, I'm not sure we have anything useful to
count. We know the total number of objects we're reusing, but we're not
actually parsing the data; we're just blitting it out as a stream. I
think the progress code will need some refactoring to handle a
throughput-only case.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Shawn Pearce
On Thu, Mar 13, 2014 at 2:26 PM, Jeff King p...@peff.net wrote:
 On Wed, Mar 12, 2014 at 05:21:21PM -0700, Shawn Pearce wrote:

 Today I tried pushing a copy of linux.git from a client that had
 bitmaps into a JGit server. The client stalled for a long time with no
 progress, because it reused the existing pack. No progress appeared
 while it was sending the existing file on the wire:

   $ git push git://localhost/linux.git master
   Reusing existing pack: 2938117, done.
   Total 2938117 (delta 0), reused 0 (delta 0)
   remote: Resolving deltas:  66% (1637269/2455727)

 This is not the best user experience. :-(

 Yeah, I agree that sucks. I hadn't noticed it, as I don't typically have
 my client repos bitmapped (and on fetch, the interesting progress is
 coming from the local index-pack).

 It would definitely be good to have throughput measurements while
 writing out the pack. However, I'm not sure we have anything useful to
 count. We know the total number of objects we're reusing, but we're not
 actually parsing the data; we're just blitting it out as a stream. I
 think the progress code will need some refactoring to handle a
 throughput-only case.

Yes. I think JGit suffers from this same bug, and again we never
noticed it because usually only the servers are bitmapped, not the
clients.

pack-objects writes a throughput meter when its writing objects.
Really just the bytes out/second would be enough to let the user know
the client is working. Unfortunately I think that is still tied to the
overall progress system having some other counter?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Jeff King
On Thu, Mar 13, 2014 at 03:01:09PM -0700, Shawn Pearce wrote:

  It would definitely be good to have throughput measurements while
  writing out the pack. However, I'm not sure we have anything useful to
  count. We know the total number of objects we're reusing, but we're not
  actually parsing the data; we're just blitting it out as a stream. I
  think the progress code will need some refactoring to handle a
  throughput-only case.
 
 Yes. I think JGit suffers from this same bug, and again we never
 noticed it because usually only the servers are bitmapped, not the
 clients.
 
 pack-objects writes a throughput meter when its writing objects.
 Really just the bytes out/second would be enough to let the user know
 the client is working. Unfortunately I think that is still tied to the
 overall progress system having some other counter?

Yes, I'm looking at it right now. The throughput meter is actually
connected to the sha1fd output. So really we just need to call
display_progress periodically as we loop through the data. It's a
one-liner fix.

_But_ it still looks ugly, because, as you mention, it's tied to the
progress meter, which is counting up to N objects. So we basically sit
there at 0, pumping data, and then after the pack is done, we can say
we sent N. :)

There are a few ways around this:

  1. Add a new phase Writing packs which counts from 0 to 1. Even
 though it's more accurate, moving from 0 to 1 really isn't that
 useful (the throughput is, but the 0/1 just looks like noise).

  2. Add a new phase Writing reused objects that counts from 0 bytes
 up to N bytes. This looks stupid, though, because we are repeating
 the current byte count both here and in the throughput.

  3. Use the regular Writing objects progress, but fake the object
 count. We know we are writing M bytes with N objects. Bump the
 counter by 1 for every M/N bytes we write.

The first two require some non-trivial surgery to the progress code. I
am leaning towards the third. Not just because it's easy, but because I
think it actually shows the most intuitive display. Yes, it's fudging
the object numbers, but those are largely meaningless anyway (in fact,
it makes them _better_ because now they're even, instead of getting 95%
done and then hitting some blob that is as big as the rest of the repo
combined).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 There are a few ways around this:

   1. Add a new phase Writing packs which counts from 0 to 1. Even
  though it's more accurate, moving from 0 to 1 really isn't that
  useful (the throughput is, but the 0/1 just looks like noise).

   2. Add a new phase Writing reused objects that counts from 0 bytes
  up to N bytes. This looks stupid, though, because we are repeating
  the current byte count both here and in the throughput.

   3. Use the regular Writing objects progress, but fake the object
  count. We know we are writing M bytes with N objects. Bump the
  counter by 1 for every M/N bytes we write.

 The first two require some non-trivial surgery to the progress code. I
 am leaning towards the third. Not just because it's easy, but because I
 think it actually shows the most intuitive display. Yes, it's fudging
 the object numbers, but those are largely meaningless anyway (in fact,
 it makes them _better_ because now they're even, instead of getting 95%
 done and then hitting some blob that is as big as the rest of the repo
 combined).

I think the above argument, especially the fudging but largely
meaningless anyway part, makes perfect sense.

Thanks for looking into this.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No progress from push when using bitmaps

2014-03-13 Thread Jeff King
On Thu, Mar 13, 2014 at 06:07:54PM -0400, Jeff King wrote:

   3. Use the regular Writing objects progress, but fake the object
  count. We know we are writing M bytes with N objects. Bump the
  counter by 1 for every M/N bytes we write.

Here is that strategy. I think it looks pretty nice, and it seamlessly
handles the case where you have extra objects to send on top of the
reused pack (we just keep the same progress meter counting up).

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 831dd05..f187859 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -709,7 +709,7 @@ static struct object_entry **compute_write_order(void)
 static off_t write_reused_pack(struct sha1file *f)
 {
unsigned char buffer[8192];
-   off_t to_write;
+   off_t to_write, total;
int fd;
 
if (!is_pack_valid(reuse_packfile))
@@ -726,7 +726,7 @@ static off_t write_reused_pack(struct sha1file *f)
if (reuse_packfile_offset  0)
reuse_packfile_offset = reuse_packfile-pack_size - 20;
 
-   to_write = reuse_packfile_offset - sizeof(struct pack_header);
+   total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
 
while (to_write) {
int read_pack = xread(fd, buffer, sizeof(buffer));
@@ -739,10 +739,23 @@ static off_t write_reused_pack(struct sha1file *f)
 
sha1write(f, buffer, read_pack);
to_write -= read_pack;
+
+   /*
+* We don't know the actual number of objects written,
+* only how many bytes written, how many bytes total, and
+* how many objects total. So we can fake it by pretending all
+* objects we are writing are the same size. This gives us a
+* smooth progress meter, and at the end it matches the true
+* answer.
+*/
+   written = reuse_packfile_objects *
+   (((double)(total - to_write)) / total);
+   display_progress(progress_state, written);
}
 
close(fd);
-   written += reuse_packfile_objects;
+   written = reuse_packfile_objects;
+   display_progress(progress_state, written);
return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


No progress from push when using bitmaps

2014-03-12 Thread Shawn Pearce
Today I tried pushing a copy of linux.git from a client that had
bitmaps into a JGit server. The client stalled for a long time with no
progress, because it reused the existing pack. No progress appeared
while it was sending the existing file on the wire:

  $ git push git://localhost/linux.git master
  Reusing existing pack: 2938117, done.
  Total 2938117 (delta 0), reused 0 (delta 0)
  remote: Resolving deltas:  66% (1637269/2455727)

This is not the best user experience. :-(
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html