Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime

2016-06-23 Thread Jeff King
On Thu, Jun 23, 2016 at 11:38:19PM +0200, René Scharfe wrote:

> > Yeah, I had a similar thought while writing it, but wasn't quite sure
> > how that was supposed to be formatted. I modeled my output after what
> > GNU tar writes, but of course they are expecting a different mtime for
> > each file.
> > 
> > I'll look into how global pax headers should look.
> 
> Moving the strbuf_append_ext_header() call into
> write_global_extended_header() should be enough.  The changes to the test
> script are a bit more interesting, though.  Perhaps I can come up with
> something over the weekend.

Yeah, I have the code itself working now. I'm just slogging through
writing tests that are efficient and portable. I hope to have something
out soon.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime

2016-06-23 Thread René Scharfe

Am 23.06.2016 um 21:22 schrieb Jeff King:

On Wed, Jun 22, 2016 at 07:46:25AM +0200, René Scharfe wrote:


Am 21.06.2016 um 00:54 schrieb René Scharfe:

Am 16.06.2016 um 06:37 schrieb Jeff King:

2. System tars that cannot handle pax headers.


In t5000 there is a simple interpreter for path headers for systems
whose tar doesn't handle them.  Adding one for mtime headers may be
feasible.

It's just a bit complicated to link a pax header file to the file it
applies to when it doesn't also contain a path header.  But we know that
the mtime of all entries in tar files created by git archive are is the
same, so we could simply read one header and then adjust the mtime of
all files accordingly.


This brings me to the idea of using a single global pax header for mtime
instead of one per entry.  It reduces the size of the archive and allows for
slightly easier testing -- it just fits better since we know that all our
mtimes are the same.


Yeah, I had a similar thought while writing it, but wasn't quite sure
how that was supposed to be formatted. I modeled my output after what
GNU tar writes, but of course they are expecting a different mtime for
each file.

I'll look into how global pax headers should look.


Moving the strbuf_append_ext_header() call into 
write_global_extended_header() should be enough.  The changes to the 
test script are a bit more interesting, though.  Perhaps I can come up 
with something over the weekend.


René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime

2016-06-23 Thread Jeff King
On Wed, Jun 22, 2016 at 07:46:25AM +0200, René Scharfe wrote:

> Am 21.06.2016 um 00:54 schrieb René Scharfe:
> > Am 16.06.2016 um 06:37 schrieb Jeff King:
> > >2. System tars that cannot handle pax headers.
> > 
> > In t5000 there is a simple interpreter for path headers for systems
> > whose tar doesn't handle them.  Adding one for mtime headers may be
> > feasible.
> > 
> > It's just a bit complicated to link a pax header file to the file it
> > applies to when it doesn't also contain a path header.  But we know that
> > the mtime of all entries in tar files created by git archive are is the
> > same, so we could simply read one header and then adjust the mtime of
> > all files accordingly.
> 
> This brings me to the idea of using a single global pax header for mtime
> instead of one per entry.  It reduces the size of the archive and allows for
> slightly easier testing -- it just fits better since we know that all our
> mtimes are the same.

Yeah, I had a similar thought while writing it, but wasn't quite sure
how that was supposed to be formatted. I modeled my output after what
GNU tar writes, but of course they are expecting a different mtime for
each file.

I'll look into how global pax headers should look.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime

2016-06-21 Thread René Scharfe

Am 21.06.2016 um 00:54 schrieb René Scharfe:

Am 16.06.2016 um 06:37 schrieb Jeff King:

   2. System tars that cannot handle pax headers.


In t5000 there is a simple interpreter for path headers for systems
whose tar doesn't handle them.  Adding one for mtime headers may be
feasible.

It's just a bit complicated to link a pax header file to the file it
applies to when it doesn't also contain a path header.  But we know that
the mtime of all entries in tar files created by git archive are is the
same, so we could simply read one header and then adjust the mtime of
all files accordingly.


This brings me to the idea of using a single global pax header for mtime 
instead of one per entry.  It reduces the size of the archive and allows 
for slightly easier testing -- it just fits better since we know that 
all our mtimes are the same.


René

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime

2016-06-20 Thread René Scharfe

Am 16.06.2016 um 06:37 schrieb Jeff King:

The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits.  To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.

This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't worth
dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to
fix this.

Given that this is just around the corner (geologically
speaking), and because the fix for "size" makes doing this
on top trivial, let's just make it work.

Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.

Unlike the last patch for "size", this one is easy to test
efficiently (the magic date is octal 010001):

   $ echo content >file
   $ git add file
   $ GIT_COMMITTER_DATE='@68719476737 +' \
 git commit -q -m 'tempori parendum'
   $ git archive HEAD | tar tvf -
   -rw-rw-r-- root/root 8 4147-08-20 03:32 file

With the current tip of master, this dies in xsnprintf (and
prior to f2f0267, it overflowed into the checksum field, and
the resulting tarfile claimed to be from the year 2242).

However, I decided not to include this in the test suite,
because I suspect it will create portability headaches,
including:

   1. The exact format of the system tar's "t" output.


Probably not worth it.


   2. System tars that cannot handle pax headers.


In t5000 there is a simple interpreter for path headers for systems 
whose tar doesn't handle them.  Adding one for mtime headers may be 
feasible.


It's just a bit complicated to link a pax header file to the file it 
applies to when it doesn't also contain a path header.  But we know that 
the mtime of all entries in tar files created by git archive are is the 
same, so we could simply read one header and then adjust the mtime of 
all files accordingly.



   3. System tars tha cannot handle dates that far in the
  future.

   4. If we replace "tar t" with "tar x", then filesystems
  that cannot handle dates that far in the future.


We can test that by supplying a test archive and set a prerequisite if 
tar (possibly with our header interpreter) and the filesystem can cope 
with that.



In other words, we do not really have a reliable tar reader
for these corner cases, so the best we could do is a
byte-for-byte comparison of the output.

Signed-off-by: Jeff King 
---
  archive-tar.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index 7340b64..749722f 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size)
return 0;
  }

+static inline unsigned long ustar_mtime(time_t mtime)
+{
+   if (mtime < 0777)


That should be less-or-equal, right?


+   return mtime;
+   else
+   return 0;
+}
+
  static void prepare_header(struct archiver_args *args,
   struct ustar_header *header,
   unsigned int mode, unsigned long size)
@@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args,
xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 0);
xsnprintf(header->size, sizeof(header->size), "%011lo",
  S_ISREG(mode) ? ustar_size(size) : 0);
-   xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) 
args->time);
+   xsnprintf(header->mtime, sizeof(header->mtime), "%011lo",
+ ustar_mtime(args->time));

xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
@@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args,

if (ustar_size(size) != size)
strbuf_append_ext_header_uint(_header, "size", size);
+   if (ustar_mtime(args->time) != args->time)
+   strbuf_append_ext_header_uint(_header, "mtime", args->time);

prepare_header(args, , mode, size);



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html