Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Wed, Jun 22, 2016 at 07:46:13AM +0200, René Scharfe wrote: > Yes, it's only useful as a debug flag, but the fact that it breaks > highlights a (admittedly mostly theoretical) shortcoming: The code doesn't > handle extended headers that are over the size limit as nicely as it could. > So the test was already worth it, even if won't land in master. :) Kind of. It was impossible to trigger in the original (and we still don't actually handle it in the revised version; we just die in xsnprintf). But still, I'll go with the simpler thing we've discussed here. The symmetry with ustar_mtime isn't worth it, and doubly so if we just write a single pax header. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Am 21.06.2016 um 23:04 schrieb Jeff King: > On Tue, Jun 21, 2016 at 10:57:43PM +0200, René Scharfe wrote: > >> Am 21.06.2016 um 22:42 schrieb René Scharfe: >>> The value 120 is magic; we need it to pass the tests. That's >>> because prepare_header() is used for building extended header >>> records as well and we don't create extended headers for extended >>> headers (not sure if that would work anyway), so they simply >>> vanish when they're over the limit as their size field is set to >>> zero. >> >> So how about something like this to make sure extended headers are >> only written for regular files and not for other extended headers? > > This is quite similar to what I wrote originally, but I moved to the > ustar_size() format to better match the mtime code (which needs > something like that, because we pass around args->time). > > I think you could drop ustar_size() completely here and just put the > "if" into write_tar_entry(). Which would look like this: --- archive-tar.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2..274bdfa 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args, struct ustar_header header; struct strbuf ext_header = STRBUF_INIT; unsigned int old_mode = mode; - unsigned long size; + unsigned long size, size_in_header; void *buffer; int err = 0; @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - prepare_header(args, , mode, size); + size_in_header = size; + if (S_ISREG(mode) && size > 0777UL) { + size_in_header = 0; + strbuf_append_ext_header_uint(_header, "size", size); + } + + prepare_header(args, , mode, size_in_header); if (ext_header.len > 0) { err = write_extended_header(args, sha1, ext_header.buf, -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Am 21.06.2016 um 23:02 schrieb Jeff King: On Tue, Jun 21, 2016 at 10:42:52PM +0200, René Scharfe wrote: If we could set the limit to a lower value than 8GB for testing then we could at least check if the extended header is written, e.g. if ustar_size() could be convinced to return 0 every time using a hidden command line parameter or an environment variable or something better. Yes, we could do that, though I think it loses most of the value of the test. We can check that if we hit an arbitrary value we generate the pax header, but I think what we _really_ care about is: did we generate an output that somebody else's tar implementation can handle. I agree with the last point, but don't see how that diminishes the value of such a test. If we provide file sizes only through extended headers (the normal header field being set to 0) and we can extract files with correct sizes then tar must have interpreted those header as intended, right? The diminished value is: 1. This is a situation that doesn't actually happen in real life. 2. Now we're carrying extra code inside git only for the sake of testing (which can have its own bugs, etc). Still, it may be better than nothing. -- >8 -- Subject: archive-tar: test creation of pax extended size headers --- The value 120 is magic; we need it to pass the tests. That's because prepare_header() is used for building extended header records as well and we don't create extended headers for extended headers (not sure if that would work anyway), so they simply vanish when they're over the limit as their size field is set to zero. Right, so this is sort of what I meant in (2). Now we have a tar.ustarsizemax setting shipped in git that is totally broken if you set it to "1". I can live with it as a tradeoff, but it is definitely a negative IMHO. Yes, it's only useful as a debug flag, but the fact that it breaks highlights a (admittedly mostly theoretical) shortcoming: The code doesn't handle extended headers that are over the size limit as nicely as it could. So the test was already worth it, even if won't land in master. :) René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Tue, Jun 21, 2016 at 10:57:43PM +0200, René Scharfe wrote: > Am 21.06.2016 um 22:42 schrieb René Scharfe: > > The value 120 is magic; we need it to pass the tests. That's > > because prepare_header() is used for building extended header > > records as well and we don't create extended headers for extended > > headers (not sure if that would work anyway), so they simply > > vanish when they're over the limit as their size field is set to > > zero. > > So how about something like this to make sure extended headers are > only written for regular files and not for other extended headers? This is quite similar to what I wrote originally, but I moved to the ustar_size() format to better match the mtime code (which needs something like that, because we pass around args->time). I think you could drop ustar_size() completely here and just put the "if" into write_tar_entry(). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Tue, Jun 21, 2016 at 10:42:52PM +0200, René Scharfe wrote: > >> If we could set the limit to a lower value than 8GB for testing then we > >> could at least check if the extended header is written, e.g. if > >> ustar_size() > >> could be convinced to return 0 every time using a hidden command line > >> parameter or an environment variable or something better. > > > > Yes, we could do that, though I think it loses most of the value of the > > test. We can check that if we hit an arbitrary value we generate the pax > > header, but I think what we _really_ care about is: did we generate an > > output that somebody else's tar implementation can handle. > > I agree with the last point, but don't see how that diminishes the > value of such a test. If we provide file sizes only through extended > headers (the normal header field being set to 0) and we can extract > files with correct sizes then tar must have interpreted those header > as intended, right? The diminished value is: 1. This is a situation that doesn't actually happen in real life. 2. Now we're carrying extra code inside git only for the sake of testing (which can have its own bugs, etc). Still, it may be better than nothing. > -- >8 -- > Subject: archive-tar: test creation of pax extended size headers > > --- > The value 120 is magic; we need it to pass the tests. That's > because prepare_header() is used for building extended header > records as well and we don't create extended headers for extended > headers (not sure if that would work anyway), so they simply > vanish when they're over the limit as their size field is set to > zero. Right, so this is sort of what I meant in (2). Now we have a tar.ustarsizemax setting shipped in git that is totally broken if you set it to "1". I can live with it as a tradeoff, but it is definitely a negative IMHO. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Am 21.06.2016 um 17:59 schrieb Jeff King: > On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote: > >>> Unfortunately, it's quite an expensive test to run. For one >>> thing, unless your filesystem supports files with holes, it >>> takes 64GB of disk space (you might think piping straight to >>> `hash-object --stdin` would be better, but it's not; that >>> tries to buffer all 64GB in RAM!). Furthermore, hashing and >>> compressing the object takes several minutes of CPU time. >>> >>> We could ship just the resulting compressed object data as a >>> loose object, but even that takes 64MB. So sadly, this code >>> path remains untested in the test suite. >> >> If we could set the limit to a lower value than 8GB for testing then we >> could at least check if the extended header is written, e.g. if ustar_size() >> could be convinced to return 0 every time using a hidden command line >> parameter or an environment variable or something better. > > Yes, we could do that, though I think it loses most of the value of the > test. We can check that if we hit an arbitrary value we generate the pax > header, but I think what we _really_ care about is: did we generate an > output that somebody else's tar implementation can handle. I agree with the last point, but don't see how that diminishes the value of such a test. If we provide file sizes only through extended headers (the normal header field being set to 0) and we can extract files with correct sizes then tar must have interpreted those header as intended, right? (Or it just guessed the sizes by searching for the next header magic, but such a fallback won't be accurate for files ending with NUL characters due to NUL-padding, so we just have to add such a file.) René -- >8 -- Subject: archive-tar: test creation of pax extended size headers --- The value 120 is magic; we need it to pass the tests. That's because prepare_header() is used for building extended header records as well and we don't create extended headers for extended headers (not sure if that would work anyway), so they simply vanish when they're over the limit as their size field is set to zero. archive-tar.c | 7 ++- t/t5000-tar-tree.sh | 7 +++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index f53e61c..fbbc4cc 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -14,6 +14,7 @@ static char block[BLOCKSIZE]; static unsigned long offset; static int tar_umask = 002; +static unsigned long ustar_size_max = 0777UL; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args); @@ -179,7 +180,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) static inline unsigned long ustar_size(uintmax_t size) { - if (size <= 0777UL) + if (size <= ustar_size_max) return size; else return 0; @@ -412,6 +413,10 @@ static int git_tar_config(const char *var, const char *value, void *cb) } return 0; } + if (!strcmp(var, "tar.ustarsizemax")) { + ustar_size_max = git_config_ulong(var, value); + return 0; + } return tar_filter_config(var, value, cb); } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 4b68bba..03bb4c7 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -102,6 +102,7 @@ test_expect_success \ echo long filename >a/four$hundred && mkdir a/bin && test-genrandom "frotz" 50 >a/bin/sh && + printf "\0\0\0" >>a/bin/sh && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 && printf "A not substituted O" >a/substfile2 && if test_have_prereq SYMLINKS; then @@ -157,6 +158,12 @@ test_expect_success 'git-archive --prefix=olde-' ' check_tar with_olde-prefix olde- +test_expect_success !TAR_NEEDS_PAX_FALLBACK 'pax extended size headers' ' + git -c tar.ustarsizemax=120 archive HEAD >extended_size_header.tar +' + +check_tar extended_size_header + test_expect_success 'git archive on large files' ' test_config core.bigfilethreshold 1 && git archive HEAD >b3.tar && -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Tue, Jun 21, 2016 at 07:44:55PM +, Robin H. Johnson wrote: > On Thu, Jun 16, 2016 at 12:37:33AM -0400, Jeff King wrote: > > We could ship just the resulting compressed object data as a > > loose object, but even that takes 64MB. So sadly, this code > > path remains untested in the test suite. > Additionally compress the object data, and insert it for the purpose of > testing? It's still an expensive test time-wise, but repeated > gzip compression on zeros does still help; to that end, here's the > pieces to add a testcase while only being <9KiB. Interesting idea. With bzip2 it actually drops to about 600 bytes (I suspect we could get it even smaller by writing a custom program to generate it, but it's diminishing returns). I think there are some other portability issues, though (like does the receiving tar actually support 64GB sizes). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Am 21.06.2016 um 22:42 schrieb René Scharfe: > The value 120 is magic; we need it to pass the tests. That's > because prepare_header() is used for building extended header > records as well and we don't create extended headers for extended > headers (not sure if that would work anyway), so they simply > vanish when they're over the limit as their size field is set to > zero. So how about something like this to make sure extended headers are only written for regular files and not for other extended headers? --- archive-tar.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index ed562d4..f53e61c 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -199,7 +199,7 @@ static void prepare_header(struct archiver_args *args, { xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 0); xsnprintf(header->size, sizeof(header->size), "%011lo", - S_ISREG(mode) ? ustar_size(size) : 0); + S_ISREG(mode) ? size : 0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", ustar_mtime(args->time)); @@ -240,7 +240,7 @@ static int write_tar_entry(struct archiver_args *args, struct ustar_header header; struct strbuf ext_header = STRBUF_INIT; unsigned int old_mode = mode; - unsigned long size; + unsigned long size, size_in_header; void *buffer; int err = 0; @@ -299,12 +299,14 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - if (S_ISREG(mode) && ustar_size(size) != size) + size_in_header = S_ISREG(mode) ? ustar_size(size) : size; + if (size_in_header != size) strbuf_append_ext_header_uint(_header, "size", size); + if (ustar_mtime(args->time) != args->time) strbuf_append_ext_header_uint(_header, "mtime", args->time); - prepare_header(args, , mode, size); + prepare_header(args, , mode, size_in_header); if (ext_header.len > 0) { err = write_extended_header(args, sha1, ext_header.buf, -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Am 21.06.2016 um 17:59 schrieb Jeff King: > On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote: > >>> Unfortunately, it's quite an expensive test to run. For one >>> thing, unless your filesystem supports files with holes, it >>> takes 64GB of disk space (you might think piping straight to >>> `hash-object --stdin` would be better, but it's not; that >>> tries to buffer all 64GB in RAM!). Furthermore, hashing and >>> compressing the object takes several minutes of CPU time. >>> >>> We could ship just the resulting compressed object data as a >>> loose object, but even that takes 64MB. So sadly, this code >>> path remains untested in the test suite. >> >> If we could set the limit to a lower value than 8GB for testing then we >> could at least check if the extended header is written, e.g. if ustar_size() >> could be convinced to return 0 every time using a hidden command line >> parameter or an environment variable or something better. > > Yes, we could do that, though I think it loses most of the value of the > test. We can check that if we hit an arbitrary value we generate the pax > header, but I think what we _really_ care about is: did we generate an > output that somebody else's tar implementation can handle. I agree with the last point, but don't see how that diminishes the value of such a test. If we provide file sizes only through extended headers (the normal header field being set to 0) and we can extract files with correct sizes then tar must have interpreted those header as intended, right? (Or it just guessed the sizes by searching for the next header magic, but such a fallback won't be accurate for files ending with NUL characters due to NUL-padding, so we just have to add such a file.) René -- >8 -- Subject: archive-tar: test creation of pax extended size headers --- The value 120 is magic; we need it to pass the tests. That's because prepare_header() is used for building extended header records as well and we don't create extended headers for extended headers (not sure if that would work anyway), so they simply vanish when they're over the limit as their size field is set to zero. archive-tar.c | 7 ++- t/t5000-tar-tree.sh | 7 +++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index f53e61c..fbbc4cc 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -14,6 +14,7 @@ static char block[BLOCKSIZE]; static unsigned long offset; static int tar_umask = 002; +static unsigned long ustar_size_max = 0777UL; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args); @@ -179,7 +180,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) static inline unsigned long ustar_size(uintmax_t size) { - if (size <= 0777UL) + if (size <= ustar_size_max) return size; else return 0; @@ -412,6 +413,10 @@ static int git_tar_config(const char *var, const char *value, void *cb) } return 0; } + if (!strcmp(var, "tar.ustarsizemax")) { + ustar_size_max = git_config_ulong(var, value); + return 0; + } return tar_filter_config(var, value, cb); } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 4b68bba..03bb4c7 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -102,6 +102,7 @@ test_expect_success \ echo long filename >a/four$hundred && mkdir a/bin && test-genrandom "frotz" 50 >a/bin/sh && + printf "\0\0\0" >>a/bin/sh && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 && printf "A not substituted O" >a/substfile2 && if test_have_prereq SYMLINKS; then @@ -157,6 +158,12 @@ test_expect_success 'git-archive --prefix=olde-' ' check_tar with_olde-prefix olde- +test_expect_success !TAR_NEEDS_PAX_FALLBACK 'pax extended size headers' ' + git -c tar.ustarsizemax=120 archive HEAD >extended_size_header.tar +' + +check_tar extended_size_header + test_expect_success 'git archive on large files' ' test_config core.bigfilethreshold 1 && git archive HEAD >b3.tar && -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Thu, Jun 16, 2016 at 12:37:33AM -0400, Jeff King wrote: > We could ship just the resulting compressed object data as a > loose object, but even that takes 64MB. So sadly, this code > path remains untested in the test suite. Additionally compress the object data, and insert it for the purpose of testing? It's still an expensive test time-wise, but repeated gzip compression on zeros does still help; to that end, here's the pieces to add a testcase while only being <9KiB. t5005-archive-hugefile.sh: ... mkdir t zcat t5005-barerepo-64gb-obj.tar.gz.gz | tar xzf - -C t GIT_DIR=t git archive HEAD | head -c 1 | tar tvf - 2>/dev/null ... Test repo attached, it's only 8KiB. Test repo can be recreated with: truncate -s 64g bigfile git add bigfile # takes 10 mins git commit -q -m foo # takes another 10 minutes tar cf - -C .git . |gzip -9 --no-name |gzip -9f --no-name >barerepo.tar.gz.gz -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote: > > Unfortunately, it's quite an expensive test to run. For one > > thing, unless your filesystem supports files with holes, it > > takes 64GB of disk space (you might think piping straight to > > `hash-object --stdin` would be better, but it's not; that > > tries to buffer all 64GB in RAM!). Furthermore, hashing and > > compressing the object takes several minutes of CPU time. > > > > We could ship just the resulting compressed object data as a > > loose object, but even that takes 64MB. So sadly, this code > > path remains untested in the test suite. > > If we could set the limit to a lower value than 8GB for testing then we > could at least check if the extended header is written, e.g. if ustar_size() > could be convinced to return 0 every time using a hidden command line > parameter or an environment variable or something better. Yes, we could do that, though I think it loses most of the value of the test. We can check that if we hit an arbitrary value we generate the pax header, but I think what we _really_ care about is: did we generate an output that somebody else's tar implementation can handle. And for the smaller-than-64GB case, GNU tar happily handles our existing output (though I suspect other tars might fail at "only" 8GB). > > +static inline unsigned long ustar_size(uintmax_t size) > > +{ > > + if (size < 0777UL) > > Shouldn't that be less-or-equal? Yeah, you're right (and for the one in the next patch, too). > > + if (ustar_size(size) != size) > > + strbuf_append_ext_header_uint(_header, "size", size); > > It needs "S_ISREG(mode) && " as well, no? In practice it probably doesn't > matter (until someone stores a 8GB long symlink target), but the size field > should only be set for regular files. Thanks for noticing that. I remembered wondering that when I was early in debugging/diagnosing, but forgot to follow up on it. I agree it's unlikely in practice, but we should have consistent checks (I think it would actually make sense to move the ISREG check inside ustar_size, and then we can apply it consistently here and when generating the header; my goal with ustar_size() was to avoid having the same logic in multiple places). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
On Tue, Jun 21, 2016 at 11:59:20AM -0400, Jeff King wrote: > > > + if (ustar_size(size) != size) > > > + strbuf_append_ext_header_uint(_header, "size", size); > > > > It needs "S_ISREG(mode) && " as well, no? In practice it probably doesn't > > matter (until someone stores a 8GB long symlink target), but the size field > > should only be set for regular files. > > Thanks for noticing that. I remembered wondering that when I was early > in debugging/diagnosing, but forgot to follow up on it. I agree it's > unlikely in practice, but we should have consistent checks (I think it > would actually make sense to move the ISREG check inside ustar_size, and > then we can apply it consistently here and when generating the header; > my goal with ustar_size() was to avoid having the same logic in multiple > places). Actually, scratch that. It makes things awkward because it would be hard to tell when ustar_size() returned zero because it's a file with a big size, and thus needs a pax header, versus that it was not a file, and therefore must _not_ have a pax header. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
Am 16.06.2016 um 06:37 schrieb Jeff King: The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 0777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. There's no test included here. I did confirm that this works (using GNU tar) with: $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge $ git add huge $ git commit -q -m foo $ git archive HEAD | head -c 1 | tar tvf - 2>/dev/null -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge Pre-f2f0267, this would yield a bogus size of 8GB, and post-f2f0267, git-archive simply dies. Unfortunately, it's quite an expensive test to run. For one thing, unless your filesystem supports files with holes, it takes 64GB of disk space (you might think piping straight to `hash-object --stdin` would be better, but it's not; that tries to buffer all 64GB in RAM!). Furthermore, hashing and compressing the object takes several minutes of CPU time. We could ship just the resulting compressed object data as a loose object, but even that takes 64MB. So sadly, this code path remains untested in the test suite. If we could set the limit to a lower value than 8GB for testing then we could at least check if the extended header is written, e.g. if ustar_size() could be convinced to return 0 every time using a hidden command line parameter or an environment variable or something better. Signed-off-by: Jeff King--- archive-tar.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2..7340b64 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) return i; } +static inline unsigned long ustar_size(uintmax_t size) +{ + if (size < 0777UL) Shouldn't that be less-or-equal? + return size; + else + return 0; +} + static void prepare_header(struct archiver_args *args, struct ustar_header *header,