Re: [PATCH] archive-zip: load userdiff config

2017-01-03 Thread Jeff King
On Tue, Jan 03, 2017 at 06:24:39PM +0100, René Scharfe wrote:

> Am 02.01.2017 um 23:25 schrieb Jeff King:
> > Since 4aff646d17 (archive-zip: mark text files in archives,
> > 2015-03-05), the zip archiver will look at the userdiff
> > driver to decide whether a file is text or binary. This
> > usually doesn't need to look any further than the attributes
> > themselves (e.g., "-diff", etc). But if the user defines a
> > custom driver like "diff=foo", we need to look at
> > "diff.foo.binary" in the config. Prior to this patch, we
> > didn't actually load it.
> 
> Ah, didn't think of that, obviously.
> 
> Would it make sense for userdiff_find_by_path() to die if userdiff_config()
> hasn't been called, yet?

Yeah, perhaps. That makes it impossible for a program to intentionally
ignore the config. But it looks like even plumbing diff commands load
userdiff (which makes sense; they control its behavior through things
like ALLOW_TEXTCONV). So it's probably fine to have it everywhere.

Other options include:

  1. Just loading it always as part of git_default_config.

  2. Lazy-loading it on the first call. This seems elegant, though it
 does open up hidden cache-invalidation issues. E.g., somebody asks
 for userdiff_find_by_path(), we load the values, then they
 setup_git_repository(), and we would need to reload. That's
 far-fetched for userdiff, but it makes lazy-loading as a general
 pattern a bit of a potential maintenance trap.

 We could also introduce some infrastructure to deal with that
 (e.g., if callers could ask the config machinery "have you been
 invalidated"). That would help here and in other places (e.g., I
 considered this when dealing with get_shared_repository()).

> > I also happened to notice that zipfiles are created using the local
> > timezone (because they have no notion of the timezone, so we have to
> > pick _something_). That's probably the least-terrible option, but it was
> > certainly surprising to me when I tried to bit-for-bit reproduce a
> > zipfile from GitHub on my local machine.
> 
> That reminds me of an old request to allow users better control over the
> meta-data written into archives.  Being able to specify a time zone offset
> could be a start.

I did it with:

  TZ=PST8PDT git archive ...

which let me get a bit-for-bit match with what GitHub generates. The
real problem was just knowing that I needed to do that. OTOH, we're
considering having GitHub generate all archives in UTC for sanity's
sake, and it would be nice to do that by setting zip.timezone instead of
hacking $TZ for each invocation.

> > +static int archive_zip_config(const char *var, const char *value, void 
> > *data)
> > +{
> > +   return userdiff_config(var, value);
> > +}
> > +
> >  static int write_zip_archive(const struct archiver *ar,
> >  struct archiver_args *args)
> >  {
> > int err;
> > 
> > +   git_config(archive_zip_config, NULL);
> > +
> 
> I briefly thought about moving this call to archive.c with the rest of the
> config-related stuff, but I agree it's better kept here.

That was my first thought, but there are already two config calls:
write_archive() loads default config, but then archive-tar loads
tar-specific config. Since only zip cares about userdiff, I patterned it
after the latter. But arguably everybody _could_ end up calling into
userdiff. If we take that philosophy, though, I'd be more inclined to
push it into git_default_config(). That covers archive writers _and_ any
other programs which might happen to call into the diff code.

> Looks good, thanks!

Thanks for reviewing.

-Peff


Re: [PATCH] archive-zip: load userdiff config

2017-01-03 Thread René Scharfe

Am 02.01.2017 um 23:25 schrieb Jeff King:

Since 4aff646d17 (archive-zip: mark text files in archives,
2015-03-05), the zip archiver will look at the userdiff
driver to decide whether a file is text or binary. This
usually doesn't need to look any further than the attributes
themselves (e.g., "-diff", etc). But if the user defines a
custom driver like "diff=foo", we need to look at
"diff.foo.binary" in the config. Prior to this patch, we
didn't actually load it.


Ah, didn't think of that, obviously.

Would it make sense for userdiff_find_by_path() to die if 
userdiff_config() hasn't been called, yet?



I also happened to notice that zipfiles are created using the local
timezone (because they have no notion of the timezone, so we have to
pick _something_). That's probably the least-terrible option, but it was
certainly surprising to me when I tried to bit-for-bit reproduce a
zipfile from GitHub on my local machine.


That reminds me of an old request to allow users better control over the 
meta-data written into archives.  Being able to specify a time zone 
offset could be a start.



 archive-zip.c  |  7 +++
 t/t5003-archive-zip.sh | 22 ++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 9db47357b0..b429a8d974 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -554,11 +554,18 @@ static void dos_time(time_t *time, int *dos_date, int 
*dos_time)
*dos_time = t->tm_sec / 2 + t->tm_min * 32 + t->tm_hour * 2048;
 }

+static int archive_zip_config(const char *var, const char *value, void *data)
+{
+   return userdiff_config(var, value);
+}
+
 static int write_zip_archive(const struct archiver *ar,
 struct archiver_args *args)
 {
int err;

+   git_config(archive_zip_config, NULL);
+


I briefly thought about moving this call to archive.c with the rest of 
the config-related stuff, but I agree it's better kept here.



dos_time(>time, _date, _time);

zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 14744b2a4b..55c7870997 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -64,6 +64,12 @@ check_zip() {
test_cmp_bin $original/nodiff.crlf $extracted/nodiff.crlf &&
test_cmp_bin $original/nodiff.lf   $extracted/nodiff.lf
"
+
+   test_expect_success UNZIP " validate that custom diff is unchanged " "
+   test_cmp_bin $original/custom.cr   $extracted/custom.cr &&
+   test_cmp_bin $original/custom.crlf $extracted/custom.crlf &&
+   test_cmp_bin $original/custom.lf   $extracted/custom.lf
+   "
 }

 test_expect_success \
@@ -78,6 +84,9 @@ test_expect_success \
  printf "text\r" >a/nodiff.cr &&
  printf "text\r\n"   >a/nodiff.crlf &&
  printf "text\n" >a/nodiff.lf &&
+ printf "text\r" >a/custom.cr &&
+ printf "text\r\n"   >a/custom.crlf &&
+ printf "text\n" >a/custom.lf &&
  printf "\0\r"   >a/binary.cr &&
  printf "\0\r\n" >a/binary.crlf &&
  printf "\0\n"   >a/binary.lf &&
@@ -112,15 +121,20 @@ test_expect_success 'add files to repository' '
 test_expect_success 'setup export-subst and diff attributes' '
echo "a/nodiff.* -diff" >>.git/info/attributes &&
echo "a/diff.* diff" >>.git/info/attributes &&
+   echo "a/custom.* diff=custom" >>.git/info/attributes &&
+   git config diff.custom.binary true &&
echo "substfile?" export-subst >>.git/info/attributes &&
git log --max-count=1 "--pretty=format:A${SUBSTFORMAT}O" HEAD \
>a/substfile1
 '

-test_expect_success \
-'create bare clone' \
-'git clone --bare . bare.git &&
- cp .git/info/attributes bare.git/info/attributes'
+test_expect_success 'create bare clone' '
+   git clone --bare . bare.git &&
+   cp .git/info/attributes bare.git/info/attributes &&
+   # Recreate our changes to .git/config rather than just copying it, as
+   # we do not want to clobber core.bare or other settings.
+   git -C bare.git config diff.custom.binary true
+'

 test_expect_success \
 'remove ignored file' \



Looks good, thanks!

René


[PATCH] archive-zip: load userdiff config

2017-01-02 Thread Jeff King
Since 4aff646d17 (archive-zip: mark text files in archives,
2015-03-05), the zip archiver will look at the userdiff
driver to decide whether a file is text or binary. This
usually doesn't need to look any further than the attributes
themselves (e.g., "-diff", etc). But if the user defines a
custom driver like "diff=foo", we need to look at
"diff.foo.binary" in the config. Prior to this patch, we
didn't actually load it.

Signed-off-by: Jeff King 
---
I'd be surprised if anybody actually triggered this in practice. I don't
think any of the custom-driver fields except "binary" matter, and using
direct attributes is almost always easier than setting up a custom
driver. Though you could also do trickery with:

  git -c diff.default.binary=true archive ...

if you wanted to be really clever.

I ran across this while investigating a case where somebody's zipfile
was all marked as binary (it turned out not to be related; the issue was
just that their Git was pre-4aff646d17).

I also happened to notice that zipfiles are created using the local
timezone (because they have no notion of the timezone, so we have to
pick _something_). That's probably the least-terrible option, but it was
certainly surprising to me when I tried to bit-for-bit reproduce a
zipfile from GitHub on my local machine.

 archive-zip.c  |  7 +++
 t/t5003-archive-zip.sh | 22 ++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 9db47357b0..b429a8d974 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -554,11 +554,18 @@ static void dos_time(time_t *time, int *dos_date, int 
*dos_time)
*dos_time = t->tm_sec / 2 + t->tm_min * 32 + t->tm_hour * 2048;
 }
 
+static int archive_zip_config(const char *var, const char *value, void *data)
+{
+   return userdiff_config(var, value);
+}
+
 static int write_zip_archive(const struct archiver *ar,
 struct archiver_args *args)
 {
int err;
 
+   git_config(archive_zip_config, NULL);
+
dos_time(>time, _date, _time);
 
zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 14744b2a4b..55c7870997 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -64,6 +64,12 @@ check_zip() {
test_cmp_bin $original/nodiff.crlf $extracted/nodiff.crlf &&
test_cmp_bin $original/nodiff.lf   $extracted/nodiff.lf
"
+
+   test_expect_success UNZIP " validate that custom diff is unchanged " "
+   test_cmp_bin $original/custom.cr   $extracted/custom.cr &&
+   test_cmp_bin $original/custom.crlf $extracted/custom.crlf &&
+   test_cmp_bin $original/custom.lf   $extracted/custom.lf
+   "
 }
 
 test_expect_success \
@@ -78,6 +84,9 @@ test_expect_success \
  printf "text\r"   >a/nodiff.cr &&
  printf "text\r\n" >a/nodiff.crlf &&
  printf "text\n"   >a/nodiff.lf &&
+ printf "text\r"   >a/custom.cr &&
+ printf "text\r\n" >a/custom.crlf &&
+ printf "text\n"   >a/custom.lf &&
  printf "\0\r" >a/binary.cr &&
  printf "\0\r\n"   >a/binary.crlf &&
  printf "\0\n" >a/binary.lf &&
@@ -112,15 +121,20 @@ test_expect_success 'add files to repository' '
 test_expect_success 'setup export-subst and diff attributes' '
echo "a/nodiff.* -diff" >>.git/info/attributes &&
echo "a/diff.* diff" >>.git/info/attributes &&
+   echo "a/custom.* diff=custom" >>.git/info/attributes &&
+   git config diff.custom.binary true &&
echo "substfile?" export-subst >>.git/info/attributes &&
git log --max-count=1 "--pretty=format:A${SUBSTFORMAT}O" HEAD \
>a/substfile1
 '
 
-test_expect_success \
-'create bare clone' \
-'git clone --bare . bare.git &&
- cp .git/info/attributes bare.git/info/attributes'
+test_expect_success 'create bare clone' '
+   git clone --bare . bare.git &&
+   cp .git/info/attributes bare.git/info/attributes &&
+   # Recreate our changes to .git/config rather than just copying it, as
+   # we do not want to clobber core.bare or other settings.
+   git -C bare.git config diff.custom.binary true
+'
 
 test_expect_success \
 'remove ignored file' \
-- 
2.11.0.519.g31435224cf