Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Jeff King
On Fri, Jun 24, 2016 at 03:41:47PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The difference in time between the two is measurable on my system, but
> > it's only a few milliseconds (for 4096 bytes). So maybe it's not worth
> > worrying about (though as a general technique, it does make me worry
> > that it's easy to get wrong in a way that will fail racily).
> 
> Yeah, GNU dd has iflag=fullblock, but if we assume GNU, we can
> safely assume "head -c", so I do not think of a way to do this
> portably enough.

Thinking on it more, "head -c" is _not_ what one would want in all
cases. It would work here, but not in t9300, for instance, where the
code is trying to read an exact number of bytes from a fifo. I don't
think "head" makes any promises about buffering and may read extra
bytes.

So I dunno. "dd" generally does make such promises, or perhaps the perl
sysread() solution in t9300 is not so bad.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Junio C Hamano
Jeff King  writes:

> The difference in time between the two is measurable on my system, but
> it's only a few milliseconds (for 4096 bytes). So maybe it's not worth
> worrying about (though as a general technique, it does make me worry
> that it's easy to get wrong in a way that will fail racily).

Yeah, GNU dd has iflag=fullblock, but if we assume GNU, we can
safely assume "head -c", so I do not think of a way to do this
portably enough.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Jeff King
On Fri, Jun 24, 2016 at 04:58:58PM -0400, Eric Sunshine wrote:

> On Fri, Jun 24, 2016 at 3:07 PM, Jeff King  wrote:
> > On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:
> >> Jeff King  writes:
> >> > +tar_info () {
> >> > +   "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> >> > +}
> >
> >> Seeing an awk piped into cut always makes me want to suggest a
> >> single sed/awk/perl invocation.
> >
> > I want the auto-splitting of awk, but then to auto-split the result
> > using a different delimiter. Is there a not-painful way to do that in
> > awk?
> 
> The awk split() function is POSIX and accepts an optional separator argument.

Thanks. I'm not that familiar with awk functions, simply because I came
of age after perl existed, and using perl tends to be more portable and
powerful (if you can assume it's available). But this is simple enough
that it should be OK.

Replacing it with:

"$TAR" tvf "$1" |
awk '{
split($4, date, "-")
print $3 " " date[1]
}'

seems to work.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Jeff King
On Fri, Jun 24, 2016 at 03:07:44PM -0400, Jeff King wrote:

> > "dd bs=1 count=4096" is hopefully more portable.
> 
> Hmm. I always wonder whether dd is actually very portable, but we do use
> it already, at least.
> 
> Perhaps the perl monstrosity in t9300 could be replaced with that, too.

Hrm. So I wrote a patch for t9300 for this. But I wanted to flip the
order to:

  dd bs=4096 count=1

because otherwise, dd will call read() 4096 times, for 1 byte each.

But it's not safe to do that on a pipe. For example:

  {
echo 1
sleep 1
echo 2
  } | dd bs=4 count=1

will copy only 2 bytes. So it's racily wrong, depending on how the
writer feeds the data to write().

The 1-byte reads do work (assuming blocking descriptors and that dd
restarts a read after a signal, which mine seems to). But yuck.

The difference in time between the two is measurable on my system, but
it's only a few milliseconds (for 4096 bytes). So maybe it's not worth
worrying about (though as a general technique, it does make me worry
that it's easy to get wrong in a way that will fail racily).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Eric Sunshine
On Fri, Jun 24, 2016 at 3:07 PM, Jeff King  wrote:
> On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:
>> Jeff King  writes:
>> > +tar_info () {
>> > +   "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
>> > +}
>
>> Seeing an awk piped into cut always makes me want to suggest a
>> single sed/awk/perl invocation.
>
> I want the auto-splitting of awk, but then to auto-split the result
> using a different delimiter. Is there a not-painful way to do that in
> awk?

The awk split() function is POSIX and accepts an optional separator argument.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Junio C Hamano
Jeff King  writes:

>> > +# When parsing, we'll pull out only the year from the date; that
>> > +# avoids any question of timezones impacting the result. 
>> 
>> ... as long as the month-day part is not close to the year boundary.
>> So this explanation is insuffucient to convince the reader that
>> "that avoids any question" is correct, without saying that it is in
>> August of year 4147.
>
> I thought that part didn't need to be said, but I can say it
> (technically we can include the month, too, but I don't think that level
> of accuracy is really important for these tests).

Oh, I wasn't suggesting to include the month in the comparison.  But
to understand why it is safe from TZ jitters to test only year, the
reader needs to know (or do the math herself) that the timestamp is
away from the year boundary, so mentioning August in the justifying
comment is needed.

>> Seeing an awk piped into cut always makes me want to suggest a
>> single sed/awk/perl invocation.
>
> I want the auto-splitting of awk, but then to auto-split the result
> using a different delimiter. Is there a not-painful way to do that in
> awk?
>
> I could certainly come up with a regex to do it in sed, but I wanted to
> keep the parsing as liberal and generic as possible.
>
> Certainly I could do it in perl, but I had the general impression that
> we prefer to keep the dependency on perl to a minimum. Maybe it doesn't
> matter.

Heh.  It was merely "makes me want to suggest", not "I suggest".  If
I were doing this myself, I would have done a single sed but it does
not matter.

> I think we would want something more like:
>
>   test_signal_match 13 $(cat exit-code)

I like that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Jeff King
On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The ustar format only has room for 11 (or 12, depending on
> > some implementations) octal digits for the size and mtime of
> > each file. After this, we have to add pax extended headers
> > to specify the real data, and git does not yet know how to
> > do so.
> 
> I am not a native speaker but "After" above made me hiccup.  I think
> I am correct to understand that it means "after passing this limit",
> aka "to represent files bigger or newer than these", but still it
> felt somewhat strange.

Yeah, I agree that it reads badly. I'm not sure what I was thinking.
I'll tweak it in the re-roll.

> > +# See if our system tar can handle a tar file with huge sizes and dates 
> > far in
> > +# the future, and that we can actually parse its output.
> > +#
> > +# The reference file was generated by GNU tar, and the magic time and size 
> > are
> > +# both octal 010001, which overflows normal ustar fields.
> > +#
> > +# When parsing, we'll pull out only the year from the date; that
> > +# avoids any question of timezones impacting the result. 
> 
> ... as long as the month-day part is not close to the year boundary.
> So this explanation is insuffucient to convince the reader that
> "that avoids any question" is correct, without saying that it is in
> August of year 4147.

I thought that part didn't need to be said, but I can say it
(technically we can include the month, too, but I don't think that level
of accuracy is really important for these tests).

> > +tar_info () {
> > +   "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> > +}
> 
> A blank after the shell function to make it easier to see the
> boundary.

I was intentionally trying to couple it with prereq below, as the
comment describes both of them.

> Seeing an awk piped into cut always makes me want to suggest a
> single sed/awk/perl invocation.

I want the auto-splitting of awk, but then to auto-split the result
using a different delimiter. Is there a not-painful way to do that in
awk?

I could certainly come up with a regex to do it in sed, but I wanted to
keep the parsing as liberal and generic as possible.

Certainly I could do it in perl, but I had the general impression that
we prefer to keep the dependency on perl to a minimum. Maybe it doesn't
matter.

> > +# We expect git to die with SIGPIPE here (otherwise we
> > +# would generate the whole 64GB).
> > +test_expect_failure BUNZIP 'generate tar with huge size' '
> > +   {
> > +   git archive HEAD
> > +   echo $? >exit-code
> > +   } | head -c 4096 >huge.tar &&
> > +   echo 141 >expect &&
> > +   test_cmp expect exit-code
> > +'
> 
> "head -c" is GNU-ism, isn't it?

You're right; for some reason I thought it was in POSIX.

We do have a couple instances of it, but they are all in the valgrind
setup code (which I guess most people don't ever run).

> "dd bs=1 count=4096" is hopefully more portable.

Hmm. I always wonder whether dd is actually very portable, but we do use
it already, at least.

Perhaps the perl monstrosity in t9300 could be replaced with that, too.

> ksh signal death you already know about.  I wonder if we want to
> expose something like list_contains as a friend of test_cmp.
> 
>   list_contains 141,269 $(cat exit-code)

I think we would want something more like:

  test_signal_match 13 $(cat exit-code)

Each call site should not have to know about every signal convention
(and in your example, the magic "3" of Windows is left out).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Junio C Hamano
Jeff King  writes:

> The ustar format only has room for 11 (or 12, depending on
> some implementations) octal digits for the size and mtime of
> each file. After this, we have to add pax extended headers
> to specify the real data, and git does not yet know how to
> do so.

I am not a native speaker but "After" above made me hiccup.  I think
I am correct to understand that it means "after passing this limit",
aka "to represent files bigger or newer than these", but still it
felt somewhat strange.

> So as a prerequisite, we can feed the system tar a reference
> tarball to make sure it can handle these features. The
> reference tar here was created with:
>
>   dd if=/dev/zero seek=64G bs=1 count=1 of=huge
>   touch -d @68719476737 huge
>   tar cf - --format=pax |
>   head -c 2048
>
> using GNU tar. Note that this is not a complete tarfile, but
> it's enough to contain the headers we want to examine.

Cute.  I didn't remember they had @ format,
even though I must have seen what they do while working on 2c733fb2
(parse_date(): '@' prefix forces git-timestamp, 2012-02-02).

> +# See if our system tar can handle a tar file with huge sizes and dates far 
> in
> +# the future, and that we can actually parse its output.
> +#
> +# The reference file was generated by GNU tar, and the magic time and size 
> are
> +# both octal 010001, which overflows normal ustar fields.
> +#
> +# When parsing, we'll pull out only the year from the date; that
> +# avoids any question of timezones impacting the result. 

... as long as the month-day part is not close to the year boundary.
So this explanation is insuffucient to convince the reader that
"that avoids any question" is correct, without saying that it is in
August of year 4147.

> +tar_info () {
> + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> +}

A blank after the shell function to make it easier to see the
boundary.

Seeing an awk piped into cut always makes me want to suggest a
single sed/awk/perl invocation.

> +# We expect git to die with SIGPIPE here (otherwise we
> +# would generate the whole 64GB).
> +test_expect_failure BUNZIP 'generate tar with huge size' '
> + {
> + git archive HEAD
> + echo $? >exit-code
> + } | head -c 4096 >huge.tar &&
> + echo 141 >expect &&
> + test_cmp expect exit-code
> +'

"head -c" is GNU-ism, isn't it?

"dd bs=1 count=4096" is hopefully more portable.

ksh signal death you already know about.  I wonder if we want to
expose something like list_contains as a friend of test_cmp.

list_contains 141,269 $(cat exit-code)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Johannes Sixt

Am 24.06.2016 um 18:46 schrieb Jeff King:

On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote:

It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of
SIGPIPE - I haven't tested this, yet).


Thanks, I meant to ask about that. We do a workaround in t0005, but we
_don't_ do it in the new sigpipe handling for test_must_fail. Is the
latter just broken, too?


That's well possible. It is not prepared to see ksh's exit codes for 
signals.


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Jeff King
On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote:

> Am 24.06.2016 um 01:20 schrieb Jeff King:
> > +# We expect git to die with SIGPIPE here (otherwise we
> > +# would generate the whole 64GB).
> > +test_expect_failure BUNZIP 'generate tar with huge size' '
> > +   {
> > +   git archive HEAD
> > +   echo $? >exit-code
> > +   } | head -c 4096 >huge.tar &&
> > +   echo 141 >expect &&
> > +   test_cmp expect exit-code
> 
> It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of
> SIGPIPE - I haven't tested this, yet).

Thanks, I meant to ask about that. We do a workaround in t0005, but we
_don't_ do it in the new sigpipe handling for test_must_fail. Is the
latter just broken, too?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-24 Thread Johannes Sixt

Am 24.06.2016 um 01:20 schrieb Jeff King:

+# We expect git to die with SIGPIPE here (otherwise we
+# would generate the whole 64GB).
+test_expect_failure BUNZIP 'generate tar with huge size' '
+   {
+   git archive HEAD
+   echo $? >exit-code
+   } | head -c 4096 >huge.tar &&
+   echo 141 >expect &&
+   test_cmp expect exit-code


It's going to be 269 with ksh, and who-knows-what on Windows (due to 
lack of SIGPIPE - I haven't tested this, yet).


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-23 Thread Jeff King
On Thu, Jun 23, 2016 at 07:20:44PM -0400, Jeff King wrote:

> I'm still not excited about the 64MB write, just because it's awfully
> heavyweight for such a trivial test. It runs pretty fast on my RAM disk,
> but maybe not on other people's system.
> 
> I considered but didn't explore two other options:
> 
>   1. I couldn't convince zlib to write a smaller file (this is done with
>  core.compression=9). But I'm not sure if that's inherent to the
>  on-disk format, or simply the maximum size of a deflate block.
> 
>  So it's possible that one could hand-roll zlib data that says "I'm
>  64GB" but is only a few bytes long.
> 
>   2. We don't ever want to see the whole 64GB, of course; we want to
>  stream it out and only care about the header (as an aside, this
>  makes a wonderful test that we are hitting the streaming code path,
>  as it's unlikely to work without it :) ).
> 
>  So another option would be to include a truncated file that claims
>  to be 64GB, and has only the first 256kb or something worth of data
>  (which should deflate down to almost nothing).
> 
>  git-fsck wouldn't work, of course, but we don't need to run it.
>  Other bits of git might complain, but our plan is for git to get
>  SIGPIPE before hitting that point anyway.
> 
>  So that seems pretty easy, but it is potentially flaky.

Writing that convinced me that (2) is actually quite a sane way to go.
The patch is below, which seems to work.

I arbitrarily picked the first 2048 bytes of the loose object. That's
1/32768 of the original. If we assume the compression ratio is stable
through the file (and it should be; the file is all zeroes), that should
generate 2MB of data should we need it (way more than we feed to our
"head -c" invocation).

This patch is on top of the whole series just to illustrate it. Doing it
for real will involve squashing it into the first patch (and adjusting
the commit message), and then handling the minor rebase conflicts. I'll
hold off on a re-roll until I get any comments on v3.

-Peff

---
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 07e0bdc..e542938 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -339,19 +339,12 @@ test_lazy_prereq TAR_HUGE '
test_cmp expect actual
 '
 
-# Likewise, we need bunzip for the 64GB git object.
-test_lazy_prereq BUNZIP '
-   bunzip2 --version
-'
-
-test_expect_success BUNZIP 'set up repository with huge blob' '
+test_expect_success 'set up repository with huge blob' '
obj_d=19 &&
obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
obj=${obj_d}${obj_f} &&
mkdir -p .git/objects/$obj_d &&
-   bunzip2 -c \
-   <"$TEST_DIRECTORY"/t5000/$obj.bz2 \
-   >.git/objects/$obj_d/$obj_f &&
+   cp "$TEST_DIRECTORY"/t5000/$obj .git/objects/$obj_d/$obj_f &&
rm -f .git/index &&
git update-index --add --cacheinfo 100644,$obj,huge &&
git commit -m huge
@@ -359,7 +352,7 @@ test_expect_success BUNZIP 'set up repository with huge 
blob' '
 
 # We expect git to die with SIGPIPE here (otherwise we
 # would generate the whole 64GB).
-test_expect_success BUNZIP 'generate tar with huge size' '
+test_expect_success 'generate tar with huge size' '
{
git archive HEAD
echo $? >exit-code
@@ -368,7 +361,7 @@ test_expect_success BUNZIP 'generate tar with huge size' '
test_cmp expect exit-code
 '
 
-test_expect_success BUNZIP,TAR_HUGE 'system tar can read our huge size' '
+test_expect_success TAR_HUGE 'system tar can read our huge size' '
echo 68719476737 >expect &&
tar_info huge.tar | cut -d" " -f1 >actual &&
test_cmp expect actual
diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a 
b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a
new file mode 100644
index 
..5cbe9ec312bfd7b7e0398ca281e9d42848743704
GIT binary patch
literal 2048
zcmb=p_2!@^nMd9)O$Wu%SgnPira`Ve7_Hm1V`0kgE@zR6?%0tYir(3dz7ebDNy(^
zL0KkKS?}>-rd{r~jXFbzl$fdD`NUjRG-;qPzyn5xj3Xw4Mw$Zz)lx%14F-mTL6atf)Gz?ip`!@NpwXs)%#?y?q!Xf$
zP8vZs=>+{8*(#7u93>!3K5|fsT$F-(VMr#rK{}}f`lJ%u7(h0VZPg+4e

[PATCH v3 1/4] t5000: test tar files that overflow ustar headers

2016-06-23 Thread Jeff King
The ustar format only has room for 11 (or 12, depending on
some implementations) octal digits for the size and mtime of
each file. After this, we have to add pax extended headers
to specify the real data, and git does not yet know how to
do so.

Before fixing that, let's start off with some test
infrastructure, as designing portable and efficient tests
for this is non-trivial.

We want to use the system tar to check our output (because
what we really care about is interoperability), but we can't
rely on it:

  1. being able to read pax headers

  2. being able to handle huge sizes or mtimes

  3. supporting a "t" format we can parse

So as a prerequisite, we can feed the system tar a reference
tarball to make sure it can handle these features. The
reference tar here was created with:

  dd if=/dev/zero seek=64G bs=1 count=1 of=huge
  touch -d @68719476737 huge
  tar cf - --format=pax |
  head -c 2048

using GNU tar. Note that this is not a complete tarfile, but
it's enough to contain the headers we want to examine.

Likewise, we need to convince git that it has a 64GB blob to
output. Running "git add" on that 64GB file takes many
minutes of CPU, and even compressed, the result is 64MB. So
again, I pre-generated that loose object, and then used
bzip2 on the result, which shrinks it to a few hundred
bytes.  Unfortunately, we do still inflate it to 64MB on
disk briefly while the test is running.

The tests are split so that we test as much as we can even
with an uncooperative system tar. This actually catches the
current breakage (which is that we die("BUG") trying to
write the ustar header) on every system, and then on systems
where we can, we go farther and actually verify the result.

Helped-by: Robin H. Johnson 
Signed-off-by: Jeff King 
---
I'm still not excited about the 64MB write, just because it's awfully
heavyweight for such a trivial test. It runs pretty fast on my RAM disk,
but maybe not on other people's system.

I considered but didn't explore two other options:

  1. I couldn't convince zlib to write a smaller file (this is done with
 core.compression=9). But I'm not sure if that's inherent to the
 on-disk format, or simply the maximum size of a deflate block.

 So it's possible that one could hand-roll zlib data that says "I'm
 64GB" but is only a few bytes long.

  2. We don't ever want to see the whole 64GB, of course; we want to
 stream it out and only care about the header (as an aside, this
 makes a wonderful test that we are hitting the streaming code path,
 as it's unlikely to work without it :) ).

 So another option would be to include a truncated file that claims
 to be 64GB, and has only the first 256kb or something worth of data
 (which should deflate down to almost nothing).

 git-fsck wouldn't work, of course, but we don't need to run it.
 Other bits of git might complain, but our plan is for git to get
 SIGPIPE before hitting that point anyway.

 So that seems pretty easy, but it is potentially flaky.

 t/t5000-tar-tree.sh|  73 +
 .../19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2   | Bin 0 -> 578 bytes
 t/t5000/huge-and-future.tar| Bin 0 -> 2048 bytes
 3 files changed, 73 insertions(+)
 create mode 100644 t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2
 create mode 100644 t/t5000/huge-and-future.tar

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 4b68bba..e7c9271 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -319,4 +319,77 @@ test_expect_success 'catch non-matching pathspec' '
test_must_fail git archive -v HEAD -- "*.abc" >/dev/null
 '
 
+# See if our system tar can handle a tar file with huge sizes and dates far in
+# the future, and that we can actually parse its output.
+#
+# The reference file was generated by GNU tar, and the magic time and size are
+# both octal 010001, which overflows normal ustar fields.
+#
+# When parsing, we'll pull out only the year from the date; that
+# avoids any question of timezones impacting the result. The output
+# of tar_info is expected to be " ", both in decimal. It ignores
+# the return value of tar. We have to do this, because our reference file is
+# only a partial (the whole thing would be 64GB!).
+tar_info () {
+   "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
+}
+test_lazy_prereq TAR_HUGE '
+   echo "68719476737 4147" >expect &&
+   tar_info "$TEST_DIRECTORY"/t5000/huge-and-future.tar >actual &&
+   test_cmp expect actual
+'
+
+# Likewise, we need bunzip for the 64GB git object.
+test_lazy_prereq BUNZIP '
+   bunzip2 --version
+'
+
+test_expect_success BUNZIP 'set up repository with huge blob' '
+   obj_d=19 &&
+   obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
+   obj=${obj_d}${obj_f} &&
+   mkdir -p .git/objects/$obj_d &&
+   bunzip2 -c \
+