Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Brandon Williams
On 04/05, Johannes Schindelin wrote:
> Hi Brandon,
> 
> On Tue, 4 Apr 2017, Brandon Williams wrote:
> 
> > I'm all for seeing a patch like this applied.  I agree that we can't
> > expect the world to be running the most up-to-date version of curl but
> > we should be able to select some "oldest" version we will support which
> > can be bumped up every couple of years.  
> > 
> > I mean, ensuring that you are running with an up-to-date version of curl
> > is really important when it comes to all of the security fixes that have
> > been made in each revision.
> 
> I am not in the business of dictating to others what software they have to
> run. I am in the business of maintaining Git for Windows. And part of that
> job is to drag along code that is maybe not the most elegant, but works.
> 
> The patch in question resolves such a wart. Sure, it would be a cleanup.
> Is it a huge maintenance burden to keep those few #ifdef's, though?
> Absolutely not.

haha, very true.  Of course I'm stating my opinion based on my limited
experience and work environment.

-- 
Brandon Williams


Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Stefan Beller
On Tue, Apr 4, 2017 at 3:46 PM, Johannes Schindelin
 wrote:
> Hi Brandon,
>
> On Tue, 4 Apr 2017, Brandon Williams wrote:
>
>> I'm all for seeing a patch like this applied.  I agree that we can't
>> expect the world to be running the most up-to-date version of curl but
>> we should be able to select some "oldest" version we will support which
>> can be bumped up every couple of years.
>>
>> I mean, ensuring that you are running with an up-to-date version of curl
>> is really important when it comes to all of the security fixes that have
>> been made in each revision.
>
> I am not in the business of dictating to others what software they have to
> run. I am in the business of maintaining Git for Windows. And part of that
> job is to drag along code that is maybe not the most elegant, but works.
>
> The patch in question resolves such a wart. Sure, it would be a cleanup.
> Is it a huge maintenance burden to keep those few #ifdef's, though?
> Absolutely not.

Keeping them around is the easy part, the hard part is promising to users
that the software you maintain is as good as your reputation, when e.g.
we find out that certain #ifdefs don't even compile.
(See Frank Gevaerts answer)

So from my point of view it ought to be easier to maintain software that
is fully compiled and tested by a lot of people, and not have a long tail
of niche features that are not well tested.

Initially I thought I had a similar stance as you ("A well written line of code
is cheap") but I kept quiet, as I do not have a lot of experience with dealing
"old" Software.

Maybe the git community would want to take a look at the kernel community
(or other similar communities), how they solve the "long term stable" problem
of computer science.

And there are different things to be considered:
(1) the kernel community has "stable" maintainer(s) that are not the same
as the maintainer of the bleeding edge branch. So I would expect that these
maintainers have expertise in "dealing with old stuff, particular from
$DATE_RANGE".
(2) one of the kernels rules is "don't break user space". However sometimes
they do remove code[1]. And then their policy seemed to be: Wait until someone
shows up and we can revert the removal.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1212.1/01152.html

So I would propose to take this patch, but be prepared to revert it in case
some user yells.

Thanks,
Stefan


Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Johannes Schindelin
Hi Brandon,

On Tue, 4 Apr 2017, Brandon Williams wrote:

> I'm all for seeing a patch like this applied.  I agree that we can't
> expect the world to be running the most up-to-date version of curl but
> we should be able to select some "oldest" version we will support which
> can be bumped up every couple of years.  
> 
> I mean, ensuring that you are running with an up-to-date version of curl
> is really important when it comes to all of the security fixes that have
> been made in each revision.

I am not in the business of dictating to others what software they have to
run. I am in the business of maintaining Git for Windows. And part of that
job is to drag along code that is maybe not the most elegant, but works.

The patch in question resolves such a wart. Sure, it would be a cleanup.
Is it a huge maintenance burden to keep those few #ifdef's, though?
Absolutely not.

Ciao,
Johannes


Re: What's cooking in git.git (Mar 2017, #12; Wed, 29)

2017-04-04 Thread Stefan Beller
> * sb/submodule-short-status (2017-03-27) 7 commits
>  - submodule.c: correctly handle nested submodules in is_submodule_modified
>  - short status: improve reporting for submodule changes
>  - submodule.c: stricter checking for submodules in is_submodule_modified
>  - submodule.c: port is_submodule_modified to use porcelain 2
>  - submodule.c: convert is_submodule_modified to use strbuf_getwholeline
>  - submodule.c: factor out early loop termination in is_submodule_modified
>  - submodule.c: use argv_array in is_submodule_modified
>
>  The output from "git status --short" has been extended to show
>  various kinds of dirtyness in submodules differently; instead of to
>  "M" for modified, 'm' and '?' can be shown to signal changes only
>  to the working tree of the submodule but not the commit that is
>  checked out.
>
>  Waiting for further comments.
>  The endgame looked mostly OK.

The last comments were by Jonathan and you agreeing this is a superb series. ;)
http://public-inbox.org/git/20170329231308.gz31...@aiede.mtv.corp.google.com/


Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-04 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 4, 2017 at 12:39 AM, Eric Wong  wrote:
> Ævar Arnfjörð Bjarmason  wrote:
>> On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong  wrote:
>> > Ævar Arnfjörð Bjarmason  wrote:
>> >>  - Should we be covering good practices for your repo going forward to
>> >>maintain good performance? E.g. don't have some huge tree all in
>> >>one directory (use subdirs), don't add binary (rather
>> >>un-delta-able) content if you can help it etc.
>> >
>> > Yes, I think so.
>>
>> I'll try to write something up.
>>
>> > I think avoiding ever growing ChangeLog-type files should also
>> > be added to things to avoid.
>>
>> How were those bad specifically? They should delta quite well, it's
>> expensive to commit large files but no more because they're
>> ever-growing.
>
> It might be blame/annotate specifically, I was remembering this
> thread from a decade ago:
>
>   
> https://public-inbox.org/git/4aca3dc20712110933i636342fbifb15171d3e3ca...@mail.gmail.com/T/

I did some basic testing on this, and I think advice about
ChangeLog-style files isn't worth including. On gcc.git blame on
ChangeLog still takes a few hundred MB of RAM, but finishes in about
2s on my machine. That gcc/fold-const.c file takes ~10s for me though,
but that thread seems to have resulted in some patches to git-blame.

Running this:

parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr
"\n" "\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} |
tr "\n" "\t" && echo {}' ::: $(git ls-files) | tee
/tmp/git-blame-times.txt

On git.git shows that the slowest blames are just files with either
lots of commits, or lots of lines, or some combination of the two. The
gcc.git repo has some more pathological cases, top 10 on that repo:

$ parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr "\n"
"\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} | tr
"\n" "\t" && echo {}' ::: $(git ls-files|grep -e ^gcc/ -e
ChangeLog|grep -v '/.*/') | tee /tmp/gcc-blame-times.txt
$ sort -nr /tmp/gcc-blame-times.txt |head -n 10
0:18.12 151314517 gcc/tree.cgcc/tree.c
0:17.35 66336   7435 gcc/ChangeLog  gcc/ChangeLog
0:16.87 163430455 gcc/dwarf2out.c   gcc/dwarf2out.c
0:16.76 11607937 gcc/varasm.c   gcc/varasm.c
0:16.36 16925491 gcc/tree.h gcc/tree.h
0:15.34 94  493 gcc/xcoffout.c  gcc/xcoffout.c
0:15.22 54  194 gcc/xcoffout.h  gcc/xcoffout.h
0:15.12 964 9224 gcc/reload1.c  gcc/reload1.c
0:14.90 15932202 gcc/toplev.c   gcc/toplev.c
0:14.66 11  43 gcc/typeclass.h  gcc/typeclass.h

Which makes it pretty clear that blame is slow where you'd expect, not
with files that are prepended or appended to.


>> One issue with e.g. storing logs (I keep my IRC logs in git) is that
>> if you're constantly committing large (text) files without repack your
>> .git grows by a *lot* in a very short amount of time until a very
>> expensive repack, so now I split my IRC logs by month.
>
> Yep, that too; as auto GC is triggered by the number of loose
> objects, not the size/packability of them.


[PATCH v4 3/4] test-strcmp-offset: created test for strcmp_offset

2017-04-04 Thread git
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 Makefile  |  1 +
 t/helper/.gitignore   |  1 +
 t/helper/test-strcmp-offset.c | 64 +++
 t/t0065-strcmp-offset.sh  | 11 
 4 files changed, 77 insertions(+)
 create mode 100644 t/helper/test-strcmp-offset.c
 create mode 100755 t/t0065-strcmp-offset.sh

diff --git a/Makefile b/Makefile
index 9ec6065..4c4c246 100644
--- a/Makefile
+++ b/Makefile
@@ -631,6 +631,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sha1-array
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-strcmp-offset
 TEST_PROGRAMS_NEED_X += test-string-list
 TEST_PROGRAMS_NEED_X += test-submodule-config
 TEST_PROGRAMS_NEED_X += test-subprocess
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index d6e8b36..0a89531 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -25,6 +25,7 @@
 /test-sha1
 /test-sha1-array
 /test-sigchain
+/test-strcmp-offset
 /test-string-list
 /test-submodule-config
 /test-subprocess
diff --git a/t/helper/test-strcmp-offset.c b/t/helper/test-strcmp-offset.c
new file mode 100644
index 000..fe01318
--- /dev/null
+++ b/t/helper/test-strcmp-offset.c
@@ -0,0 +1,64 @@
+#include "cache.h"
+
+struct test_data {
+   const char *s1;
+   const char *s2;
+   int first_change;
+};
+
+static struct test_data data[] = {
+   { "abc", "abc", 0 },
+   { "abc", "def", 0 },
+
+   { "abc", "abz", 2 },
+
+   { "abc", "abcdef", 3 },
+
+   { "abc\xF0zzz", "abc\xFFzzz", 3 },
+
+   { NULL, NULL, 0 }
+};
+
+int try_pair(const char *sa, const char *sb, int first_change)
+{
+   int failed = 0;
+   int offset, r_exp, r_tst;
+   int r_exp_sign, r_tst_sign;
+
+   /*
+* Because differnt CRTs behave differently, only rely on signs
+* of the result values.
+*/
+   r_exp = strcmp(sa, sb);
+   r_exp_sign = ((r_exp < 0) ? -1 : ((r_exp == 0) ? 0 : 1));
+
+   r_tst = strcmp_offset(sa, sb, );
+   r_tst_sign = ((r_tst < 0) ? -1 : ((r_tst == 0) ? 0 : 1));
+
+   if (r_tst_sign != r_exp_sign) {
+   error("FAIL: '%s' vs '%s', result expect %d, observed %d\n",
+ sa, sb, r_exp_sign, r_tst_sign);
+   failed = 1;
+   }
+
+   if (offset != first_change) {
+   error("FAIL: '%s' vs '%s', offset expect %d, observed %d\n",
+ sa, sb, first_change, offset);
+   failed = 1;
+   }
+
+   return failed;
+}
+
+int cmd_main(int argc, const char **argv)
+{
+   int failed = 0;
+   int k;
+
+   for (k=0; data[k].s1; k++) {
+   failed += try_pair(data[k].s1, data[k].s2, 
data[k].first_change);
+   failed += try_pair(data[k].s2, data[k].s1, 
data[k].first_change);
+   }
+
+   return failed;
+}
diff --git a/t/t0065-strcmp-offset.sh b/t/t0065-strcmp-offset.sh
new file mode 100755
index 000..0176c8c
--- /dev/null
+++ b/t/t0065-strcmp-offset.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+test_description='Test strcmp_offset functionality'
+
+. ./test-lib.sh
+
+test_expect_success run_helper '
+   test-strcmp-offset
+'
+
+test_done
-- 
2.9.3



[PATCH v4 0/4] read-cache: speed up add_index_entry

2017-04-04 Thread git
From: Jeff Hostetler 

Teach add_index_entry_with_check() and has_dir_name()
to avoid index lookups if the given path sorts after
the last entry in the index.

This saves at least 2 binary searches per entry.

This improves performance during checkout and read-tree because
merge_working_tree() and unpack_trees() processes a list of already
sorted entries.

This helps performance on very large repositories.


Before and after numbers on index with 1M files.
./p0004-read-tree.sh
0004.2: read-tree (1003037)  3.24(2.46+0.72)
0004.3: switch branches (3038 1003037)   7.53(5.66+1.56)

$ ./p0004-read-tree.sh
0004.2: read-tree (1003040)  2.45(1.79+0.61)
0004.3: switch branches (3041 1003040)   6.65(4.22+1.60)


Before and after numbers on index with 100K files.

./p0004-read-tree.sh
0004.2: read-tree (103037)  0.30(0.20+0.08)
0004.3: switch branches (3038 103037)   0.65(0.47+0.16)

$ ./p0004-read-tree.sh
0004.2: read-tree (103040)  0.25(0.16+0.07)
0004.3: switch branches (3041 103040)   0.58(0.44+0.13)



Jeff Hostetler (4):
  p0004-read-tree: perf test to time read-tree
  read-cache: add strcmp_offset function
  test-strcmp-offset: created test for strcmp_offset
  read-cache: speed up add_index_entry during checkout

 Makefile  |  1 +
 cache.h   |  1 +
 read-cache.c  | 73 -
 t/helper/.gitignore   |  1 +
 t/helper/test-strcmp-offset.c | 64 +
 t/perf/p0004-read-tree.sh | 84 +++
 t/t0065-strcmp-offset.sh  | 11 ++
 7 files changed, 234 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-strcmp-offset.c
 create mode 100755 t/perf/p0004-read-tree.sh
 create mode 100755 t/t0065-strcmp-offset.sh

-- 
2.9.3



[PATCH v4 4/4] read-cache: speed up add_index_entry during checkout

2017-04-04 Thread git
From: Jeff Hostetler 

Teach add_index_entry_with_check() and has_dir_name()
to see if the path of the new item is greater than the
last path in the index array before attempting to search
for it.

During checkout, merge_working_tree() populates the new
index in sorted order, so this change saves at least 2
lookups per file.

Signed-off-by: Jeff Hostetler 
---
 read-cache.c | 44 +++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index b3fc77d..bf9fc53 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -927,6 +927,21 @@ static int has_dir_name(struct index_state *istate,
int stage = ce_stage(ce);
const char *name = ce->name;
const char *slash = name + ce_namelen(ce);
+   int len_eq_last;
+   int cmp_last = 0;
+
+   if (istate->cache_nr > 0) {
+   /*
+* Compare the entry's full path with the last path in the 
index.
+* If it sorts AFTER the last entry in the index and they have 
no
+* common prefix, then there cannot be any F/D name conflicts.
+*/
+   cmp_last = strcmp_offset(name,
+   istate->cache[istate->cache_nr-1]->name,
+   _eq_last);
+   if (cmp_last > 0 && len_eq_last == 0)
+   return retval;
+   }
 
for (;;) {
int len;
@@ -939,6 +954,24 @@ static int has_dir_name(struct index_state *istate,
}
len = slash - name;
 
+   if (cmp_last > 0) {
+   /*
+* If this part of the directory prefix (including the 
trailing
+* slash) already appears in the path of the last entry 
in the
+* index, then we cannot also have a file with this 
prefix (or
+* any parent directory prefix).
+*/
+   if (len+1 <= len_eq_last)
+   return retval;
+   /*
+* If this part of the directory prefix (excluding the 
trailing
+* slash) is longer than the known equal portions, then 
this part
+* of the prefix cannot collide with a file.  Go on to 
the parent.
+*/
+   if (len > len_eq_last)
+   continue;
+   }
+
pos = index_name_stage_pos(istate, name, len, stage);
if (pos >= 0) {
/*
@@ -1030,7 +1063,16 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
 
if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
cache_tree_invalidate_path(istate, ce->name);
-   pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), 
ce_stage(ce));
+
+   /*
+* If this entry's path sorts after the last entry in the index,
+* we can avoid searching for it.
+*/
+   if (istate->cache_nr > 0 &&
+   strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
+   pos = -istate->cache_nr - 1;
+   else
+   pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), 
ce_stage(ce));
 
/* existing match? Just replace it. */
if (pos >= 0) {
-- 
2.9.3



[PATCH v4 2/4] read-cache: add strcmp_offset function

2017-04-04 Thread git
From: Jeff Hostetler 

Add strcmp_offset() function to also return the offset of the
first change.

Signed-off-by: Jeff Hostetler 
---
 cache.h  |  1 +
 read-cache.c | 29 +
 2 files changed, 30 insertions(+)

diff --git a/cache.h b/cache.h
index 80b6372..4d82490 100644
--- a/cache.h
+++ b/cache.h
@@ -574,6 +574,7 @@ extern int write_locked_index(struct index_state *, struct 
lock_file *lock, unsi
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
+extern int strcmp_offset(const char *s1_in, const char *s2_in, int 
*first_change);
 extern int index_dir_exists(struct index_state *istate, const char *name, int 
namelen);
 extern void adjust_dirname_case(struct index_state *istate, char *name);
 extern struct cache_entry *index_file_exists(struct index_state *istate, const 
char *name, int namelen, int igncase);
diff --git a/read-cache.c b/read-cache.c
index 9054369..b3fc77d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -887,6 +887,35 @@ static int has_file_name(struct index_state *istate,
return retval;
 }
 
+
+/*
+ * Like strcmp(), but also return the offset of the first change.
+ */
+int strcmp_offset(const char *s1_in, const char *s2_in, int *first_change)
+{
+   const unsigned char *s1 = (const unsigned char *)s1_in;
+   const unsigned char *s2 = (const unsigned char *)s2_in;
+   int diff = 0;
+   int k;
+
+   *first_change = 0;
+   for (k=0; s1[k]; k++)
+   if ((diff = (s1[k] - s2[k])))
+   goto found_it;
+   if (!s2[k])
+   return 0;
+   diff = -1;
+
+found_it:
+   *first_change = k;
+   if (diff > 0)
+   return 1;
+   else if (diff < 0)
+   return -1;
+   else
+   return 0;
+}
+
 /*
  * Do we have another file with a pathname that is a proper
  * subset of the name we're trying to add?
-- 
2.9.3



[PATCH v4 1/4] p0004-read-tree: perf test to time read-tree

2017-04-04 Thread git
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 t/perf/p0004-read-tree.sh | 84 +++
 1 file changed, 84 insertions(+)
 create mode 100755 t/perf/p0004-read-tree.sh

diff --git a/t/perf/p0004-read-tree.sh b/t/perf/p0004-read-tree.sh
new file mode 100755
index 000..ffbbf8e
--- /dev/null
+++ b/t/perf/p0004-read-tree.sh
@@ -0,0 +1,84 @@
+#!/bin/sh
+
+test_description="Tests performance of read-tree"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+## usage: dir depth width files
+make_paths () {
+   for f in $(seq $4)
+   do
+   echo $1/file$f
+   done;
+   if test $2 -gt 0;
+   then
+   for w in $(seq $3)
+   do
+   make_paths $1/dir$w $(($2 - 1)) $3 $4
+   done
+   fi
+   return 0
+}
+
+fill_index () {
+   make_paths $1 $2 $3 $4 |
+   sed "s/^/100644 $EMPTY_BLOB /" |
+   git update-index --index-info
+   return 0
+}
+
+br_base=xxx_base_xxx
+br_work=xxx_work_xxx
+
+new_dir=xxx_dir_xxx
+
+## (5, 10, 9) will create 999,999 files.
+## (4, 10, 9) will create  99,999 files.
+depth=5
+width=10
+files=9
+
+export br_base
+export br_work
+export new_dir
+export depth
+export width
+export files
+
+## The number of files in the xxx_base_xxx branch.
+nr_base=$(git ls-files | wc -l)
+export nr_base
+
+## Inflate the index with thousands of empty files and commit it.
+## Turn on sparse-checkout so that we don't have to populate them
+## later when we start switching branches.
+test_expect_success 'inflate the index' '
+   git reset --hard &&
+   git branch $br_base &&
+   git branch $br_work &&
+   git checkout $br_work &&
+   fill_index $new_dir $depth $width $files &&
+   git commit -m $br_work &&
+   echo $new_dir/file1 >.git/info/sparse-checkout &&
+   git config --local core.sparsecheckout 1 &&
+   git reset --hard
+'
+
+## The number of files in the xxx_work_xxx branch.
+nr_work=$(git ls-files | wc -l)
+export nr_work
+
+test_perf "read-tree ($nr_work)" '
+   git read-tree -m $br_base $br_work -n
+'
+
+test_perf "switch branches ($nr_base $nr_work)" '
+   git checkout $br_base &&
+   git checkout $br_work
+'
+
+
+test_done
-- 
2.9.3



Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-04 Thread Jeff King
On Tue, Apr 04, 2017 at 06:42:23PM +, David Turner wrote:

> > What does it look like when it fails? What does GIT_TRACE_CURL look like (or
> > GIT_CURL_VERBOSE if your client is older, but remember to sanitize any auth
> > lines)?
> 
> Unfortunately, we've already worked around the problem by pushing over SSH, 
> so I no longer have a failing case to examine. Last time I tried, I actually 
> did some 
> hackery to create a push smaller than 2GB, but it still failed (this time, 
> with 
> "502 Bad Gateway").  So, something is clearly weird in GitLab land.
> 
> I did see "Transfer-Encoding: chunked" in one of the responses from the 
> server,
>  but not in the request (not sure if that's normal). The smaller push had: 
> Content-Length: 1048908476

The 502 makes me think it's a problem in the GitLab reverse-proxy layer
(and also my experience debugging Git-over-HTTP weirdness on GitHub's reverse
proxy layer, which had a number of pitfalls ;) ).

You should be able to do a synthetic test like:

  git init
  dd if=/dev/urandom of=foo.rand bs=1k count=1024
  git add .
  git commit -m 'random megabyte'
  GIT_TRACE_CURL=/tmp/foo.out \
git -c http.postbuffer=0 push https://...

You should see two POSTs to /git-receive-pack, like this:

  Send header: POST /peff/test.git/git-receive-pack HTTP/1.1
  Send header: Host: github.com
  Send header: Authorization: Basic 
  Send header: User-Agent: git/2.12.2.952.g759391acc
  Send header: Content-Type: application/x-git-receive-pack-request
  Send header: Accept: application/x-git-receive-pack-result
  Send header: Content-Length: 4

  Send header: POST /peff/test.git/git-receive-pack HTTP/1.1
  Send header: Host: github.com
  Send header: Authorization: Basic 
  Send header: User-Agent: git/2.12.2.952.g759391acc
  Send header: Accept-Encoding: gzip
  Send header: Content-Type: application/x-git-receive-pack-request
  Send header: Accept: application/x-git-receive-pack-result
  Send header: Transfer-Encoding: chunked

The first is a probe to make sure we can hit the endpoint without
sending the whole payload. And the second should pass up the 1MB
packfile in chunks.

That would at least tell you if the problem is the chunked encoding, or
if it's related to the size.

> (For me to publish longer log traces requires a level of security review 
> which is 
> probably too much of a hassle unless you think it will be really useful).

Nah, I doubt there's much to see except "did a small chunked transfer
work", and anything relevant you can pick out of the server response
(but probably "502" is the extent of it).

> > IMHO, complaining about the negative number to the user would be an
> > improvement.
> 
> That seems reasonable.

You can do that with:

   if (http_post_buffer < 0)
die("negative http.postBuffer not allowed");

but I was trying to suggest that using git_parse_unsigned() should
detect that error for you. It doesn't seem to, though! The strtoumax()
function happily converts negatives into their twos-complement
wraparounds. We could detect it by looking for a leading "-" ourselves,
though I wonder if anybody is relying on the "-1" behavior.

-Peff


Re: [PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-04 Thread Jeff King
On Tue, Apr 04, 2017 at 09:13:33AM -0400, Jeff Hostetler wrote:

> > But yeah, another way to resolve this is to get rid of the config
> > option altogether, or document in git-config.txt that
> > core.checksumIndex is obeyed by everything except git-fsck.
> 
> If there's no objections then, I'll just remove the config
> setting and keep the force_ global for fsck.

That makes the most sense to me. It doesn't fill the "remain careful by
default, and we can flip the config later" criterion that Junio set out.
But I really just don't think "careful" here is buying us anything.

-Peff


Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Jeff King
On Tue, Apr 04, 2017 at 04:06:46PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > But a couple of #ifdef's? C'mon, man, we can carry this *without sweat*
> > indefinitely ;-)
> 
> I don't really care about applying this patch, but I wouldn't mind
> seeing it applied.
> 
> I just wanted to clarify the counteractive point that it's not unusual
> for some (particularly corporate) environments to be compiling fresh
> upstream releases of some software against really ancient versions of
> other upstream libraries.
> 
> But as Frank Gevaerts's reply (thanks!) which came after your reply
> points out, this code has already been broken since v2.12.0, so it's
> rarely used enough that nobody's reported being unable to compile git
> 2.12.0 on e.g. CentOS 5 >2 months since release.

Yeah, this is exactly the kind of thing I was wondering about (but was
too lazy to actually build an ancient version of curl -- thank you,
Frank).

In this case it was a compile error, which was obvious. I'm much more
worried about subtle interactions, or the fact that some of the ifdefs
are around security features that get ignored. In some cases we at least
issue a warning, but not always. E.g., we silently ignore
http.sslcapath.  Depending what you're using it for that could fail
closed (if you were trying to add a CA) or open (if you were trying to
restrict to a specific CA).

-Peff


RE: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-04 Thread David Turner
> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, April 3, 2017 10:02 PM
> To: David Turner 
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values
> 
> On Mon, Apr 03, 2017 at 05:03:49PM +, David Turner wrote:
> 
> > > > Unfortunately, in order to push some large repos, the http
> > > > postbuffer must sometimes exceed two gigabytes.  On a 64-bit system,
> this is OK:
> > > > we just malloc a larger buffer.
> > >
> > > I'm still not sure why a 2GB post-buffer is necessary. It sounds
> > > like something is broken in your setup. Large pushes should be sent
> chunked.
> > >
> > > I know broken setups are a fact of life, but this feels like a
> > > really hacky work- around.
> >
> > I'm not sure what other workaround I should use.  I guess I could do
> > multiple pushes, but only if individual objects are under the size
> > limit, and I'm not sure all of mine are (maybe I'll get lucky tho).  I
> > know that this is a configuration issue with gitlab:
> > https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know
> > when that will get fixed.  I could manually copy the repo to the
> > server and do a local push, but I don't know that I have the necessary
> > permissions to do that. Or I could do this, which would hopefully
> > actually solve the problem.
> 
> I didn't think we had gotten details on what was actually broken. Is it really
> that GitLab does not support chunked transfers? I know that's what the issue
> above says, but it sounds like it is just assuming that is the problem based 
> on
> the recent messages to the list.

I agree that we don't know for sure what's actually broken.  I think probably 
the
GitLab bug tracker might be the better place to figure that out, unless GitLab 
thinks they're hitting a bug in git.

> If that's really the issue, then OK. That's lame, but something the client 
> has to
> work around. It seems like a pretty big gap, though (and one I'd think people
> would have hit before; the default post-buffer is only 1MB. Surely people
> routinely push more than that to GitLab servers?
> So I'm really wondering if there is something else going on here.
> 
> What does it look like when it fails? What does GIT_TRACE_CURL look like (or
> GIT_CURL_VERBOSE if your client is older, but remember to sanitize any auth
> lines)?

Unfortunately, we've already worked around the problem by pushing over SSH, 
so I no longer have a failing case to examine. Last time I tried, I actually 
did some 
hackery to create a push smaller than 2GB, but it still failed (this time, with 
"502 Bad Gateway").  So, something is clearly weird in GitLab land.

I did see "Transfer-Encoding: chunked" in one of the responses from the server,
 but not in the request (not sure if that's normal). The smaller push had: 
Content-Length: 1048908476

(For me to publish longer log traces requires a level of security review which 
is 
probably too much of a hassle unless you think it will be really useful).

> > > The ultimate fate of this number, though, is to be handed to:
> > >
> > >   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> > >
> > > where the final argument is interpreted as a long. So I suspect that
> > > on 64-bit Windows, setting http.postbuffer to "3G" would cause some
> > > kind of weird error (either a truncated post or some internal curl
> > > error due to the negative size, depending on how curl handles it).
> >
> > Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.
> 
> Ah, neat. I didn't even know about CURLOPT_POSTFIELDSIZE_LARGE, and
> thought we'd have to just limit 32-bit platforms. That's a much better
> solution.
> 
> > > I saw the earlier iteration used a size_t, but you switched it after
> > > the compiler
> > > (rightfully) complained about the signedness. But I'm not sure why
> > > we would want ssize_t here instead of just using git_parse_unsigned().
> >
> > It was originally signed.  I'm not sure why that was, but I figured it
> > would be simpler to save the extra bit just in case.
> 
> I think it was simply because git_config_int() is the generic "number"
> parser, and nobody ever thought about it.
> 
> In fact, passing a negative value to curl would be disastrous, as it would use
> strlen(). I don't think a negative value could ever get that far, though. It 
> looks
> like the config code would silently turn a negative value into
> LARGE_PACKET_MAX.

I would still prefer to preserve the bit just in case we ever decide that a 
negative 
value should have some special meaning later.  Of course, that special meaning 
wouldn't be "pass directly to curl".  (As I think about git's http protocol, 
and how 
hard it is to change it, I always want to leave the maximal number of extra 
bits 
free possible for general future usage).

> IMHO, complaining about the negative number to the user would be an
> improvement.

That seems 

Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-04 Thread Jeff Hostetler



On 4/4/2017 11:18 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Apr 4, 2017 at 5:07 PM, Jeff Hostetler  wrote:


On 4/3/2017 5:16 PM, Ævar Arnfjörð Bjarmason wrote:


Add a new manpage that gives an overview of how to tweak git's
performance.

There's currently no good single resource for things a git site
administrator might want to look into to improve performance for his
site & his users. This unfinished documentation aims to be the first
thing someone might want to look at when investigating ways to improve
git performance.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I've been wanting to get something like this started for a while. It's
obviously woefully incomplete. Pointers about what to include would be
great & whether including something like this makes sense.

Things I have on my TODO list:

 - Add a section discussing how refs impact performance, suggest
   e.g. archiving old tags if possible, or at least run "git remote
   prune origin" regularly on clients.

 - Discuss split index a bit, although I'm not very confident in
   describing what its pros & cons are.

 - Should we be covering good practices for your repo going forward to
   maintain good performance? E.g. don't have some huge tree all in
   one directory (use subdirs), don't add binary (rather
   un-delta-able) content if you can help it etc.

- The new core.checksumIndex option being discussed on-list. Which
  actually drove my to finally write this up (hrm, this sounds useful,
  but unless I was watching the list I'd probably never see it...).



You might also consider core.preloadIndex.


It's been enabled by default since 2.1.0 (299e29870b), or do you mean
talk about disabling it, or "this is a perf option we have on by
default"?

I don't know the pros of disabling that, haven't used it myself & it's
not clear from the docs.


Sorry, no, don't disable it.  Maybe an ack that
it should be on.





For people with very large trees, talk about sparse-checkout.


*nod*


And (on Windows) core.fscache.  Or leave a place for
an addendum for Windows that we can fill in later.


I have no core.fscache in my git.git, did you mean something else?


This is only in the Git for Windows tree.  It hasn't
made it upstream yet.

https://github.com/git-for-windows/git/commits/master/compat/win32/fscache.c

Ignore this for now if you want and we can fill in the details
later for you.









 Documentation/Makefile   |   1 +
 Documentation/gitperformance.txt | 107
+++
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/gitperformance.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..528aa22354 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -23,6 +23,7 @@ MAN5_TXT += gitrepository-layout.txt
 MAN5_TXT += gitweb.conf.txt

 MAN7_TXT += gitcli.txt
+MAN7_TXT += gitperformance.txt
 MAN7_TXT += gitcore-tutorial.txt
 MAN7_TXT += gitcredentials.txt
 MAN7_TXT += gitcvs-migration.txt
diff --git a/Documentation/gitperformance.txt
b/Documentation/gitperformance.txt
new file mode 100644
index 00..0548d1e721
--- /dev/null
+++ b/Documentation/gitperformance.txt
@@ -0,0 +1,107 @@
+giteveryday(7)
+==
+
+NAME
+
+gitperformance - How to improve Git's performance
+
+SYNOPSIS
+
+
+A guide to improving Git's performance beyond the defaults.
+
+DESCRIPTION
+---
+
+Git is mostly performant by default, but ships with various
+configuration options, command-line options, etc. that might improve
+performance, but for various reasons aren't on by default.
+
+This document provides a brief overview of these features.
+
+The reader should not assume that turning on all of these features
+will increase performance, depending on the repository, workload &
+use-case turning some of them on might severely harm performance.
+
+This document serves as a starting point for things to look into when
+it comes to improving performance, not as a checklist for things to
+enable or disable.
+
+Performance by topic
+
+
+It can be hard to divide the performance features into topics, but
+most of them fall into various well-defined buckets. E.g. there are
+features that help with the performance of "git status", and couldn't
+possibly impact repositories without working copies, and then some
+that only impact the performance of cloning from a server, or help the
+server itself etc.
+
+git status
+~~
+
+Running "git status" requires traversing the working tree & comparing
+it with the index. Several configuration options can help with its
+performance, with some trade-offs.
+
+- config: "core.untrackedCache=true" (see linkgit:git-config[1]) can
+  save on `stat(2)` calls by caching the mtime of filesystem
+  directories, and if they didn't change avoid recursing into that
+  directory to `stat(2)` every file in them.
++
+pros: Can drastically speed up "git status".

Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Brandon Williams
On 04/04, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Apr 4, 2017 at 1:54 PM, Johannes Schindelin
>  wrote:
> > Hi,
> >
> > On Tue, 4 Apr 2017, Ævar Arnfjörð Bjarmason wrote:
> >
> >> I think it's completely fine to include your patch as-is. At some
> >> point we need to pass the burden of dealing with these old software
> >> versions, saying that you should use a <10 year old library isn't
> >> unreasonable. Anyone packaging new git on RHEL5 or derivatives can
> >> just package a newer libcurl as well.
> >
> > But how much maintenance burden is it, really? Is the continued use of
> > those #ifdef's really worth this much discussion, let alone applying a
> > patch that may break users who have so far been happy?
> >
> > It would be a different thing if we had to have hacks to support old cURL
> > versions, where we need to ship entire >10kB source files that tap into
> > internal data structures that may, or may not have changed. Such a hack, I
> > would be happy to discuss when we could possibly remove it.
> >
> > But a couple of #ifdef's? C'mon, man, we can carry this *without sweat*
> > indefinitely ;-)
> 
> I don't really care about applying this patch, but I wouldn't mind
> seeing it applied.
> 
> I just wanted to clarify the counteractive point that it's not unusual
> for some (particularly corporate) environments to be compiling fresh
> upstream releases of some software against really ancient versions of
> other upstream libraries.
> 
> But as Frank Gevaerts's reply (thanks!) which came after your reply
> points out, this code has already been broken since v2.12.0, so it's
> rarely used enough that nobody's reported being unable to compile git
> 2.12.0 on e.g. CentOS 5 >2 months since release.
> 
> I think this is a stronger argument for removing stuff like this. At
> some point we're shipping code nobody's tested in combination with the
> rest of our code. This can easily becomes a source of bugs as someone
> e.g. compiling a new git on co5 becomes literally the first person to
> ever test some new combination of codepaths we've added around mostly
> unused ifdefs.

I'm all for seeing a patch like this applied.  I agree that we can't
expect the world to be running the most up-to-date version of curl but
we should be able to select some "oldest" version we will support which
can be bumped up every couple of years.  

I mean, ensuring that you are running with an up-to-date version of curl
is really important when it comes to all of the security fixes that have
been made in each revision.

-- 
Brandon Williams


Re: [PATCH v2] pathspec: honor `PATHSPEC_PREFIX_ORIGIN` with empty prefix

2017-04-04 Thread Brandon Williams
On 04/04, Patrick Steinhardt wrote:
> Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
> pathspec elements, 2017-01-04), we were always using the computed
> `match` variable to perform pathspec matching whenever
> `PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
> the parsed pathspecs to other commands, as the computed `match` may
> contain a pathspec relative to the repository root. The commit changed
> this logic to only do so when we do have an actual prefix and when
> literal pathspecs are deactivated.
> 
> But this change may actually break some commands which expect passed
> pathspecs to be relative to the repository root. One such case is `git
> add --patch`, which now fails when using relative paths from a
> subdirectory. For example if executing "git add -p ../foo.c" in a
> subdirectory, the `git-add--interactive` command will directly pass
> "../foo.c" to `git-ls-files`. As ls-files is executed at the
> repository's root, the command will notice that "../foo.c" is outside
> the repository and fail.
> 
> Fix the issue by again using the computed `match` variable when
> `PATHSPEC_PREFIX_ORIGIN` is set and global literal pathspecs are
> deactivated. Note that in contrast to previous behavior, we will now
> always call `prefix_magic` regardless of whether a prefix is actually
> set. But this is the right thing to do: when the `match` variable has
> been resolved to the repository's root, it will be set to an empty
> string. When passing the empty string directly to other commands, it
> will result in a warning regarding deprecated empty pathspecs. By always
> adding the prefix magic, we will end up with at least the string
> ":(prefix:0)" and thus avoid the warning.
> 
> Signed-off-by: Patrick Steinhardt 
> ---
> 
> This is the second version of [1]. It fixes a bug catched by
> Brandon when the pathspec is resolved to the empty string and
> improves the test a bit to actually catch this issue.

This version looks good to me.  Thanks for fixing that small issue!

-- 
Brandon Williams


Re: [PATCH v2 1/2] test-online-cpus: helper to return cpu count

2017-04-04 Thread Ramsay Jones


On 03/04/17 16:16, g...@jeffhostetler.com wrote:
> From: Jeff Hostetler 
> 
> Created helper executable to print the value of online_cpus()
> allowing multi-threaded tests to be skipped when appropriate.
> 
> Signed-off-by: Jeff Hostetler 
> ---
>  Makefile| 1 +
>  t/helper/.gitignore | 1 +
>  t/helper/test-online-cpus.c | 8 
>  3 files changed, 10 insertions(+)
>  create mode 100644 t/helper/test-online-cpus.c
> 
> diff --git a/Makefile b/Makefile
> index 9b36068..3bb31e9 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -626,6 +626,7 @@ TEST_PROGRAMS_NEED_X += test-line-buffer
>  TEST_PROGRAMS_NEED_X += test-match-trees
>  TEST_PROGRAMS_NEED_X += test-mergesort
>  TEST_PROGRAMS_NEED_X += test-mktemp
> +TEST_PROGRAMS_NEED_X += test-online-cpus
>  TEST_PROGRAMS_NEED_X += test-parse-options
>  TEST_PROGRAMS_NEED_X += test-path-utils
>  TEST_PROGRAMS_NEED_X += test-prio-queue
> diff --git a/t/helper/.gitignore b/t/helper/.gitignore
> index 758ed2e..b05d67c 100644
> --- a/t/helper/.gitignore
> +++ b/t/helper/.gitignore
> @@ -16,6 +16,7 @@
>  /test-match-trees
>  /test-mergesort
>  /test-mktemp
> +/test-online-cpus
>  /test-parse-options
>  /test-path-utils
>  /test-prio-queue
> diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> new file mode 100644
> index 000..c881073
> --- /dev/null
> +++ b/t/helper/test-online-cpus.c
> @@ -0,0 +1,8 @@
> +#include "stdio.h"
> +#include "thread-utils.h"
> +
> +int cmd_main(int argc, const char **argv)
> +{
> + printf("%d\n", online_cpus());
> + return 0;
> +}
> 

In order to suppress a warning (for lack of extern declaration of
cmd_main), we need to include git-compat-util.h ( or cache.h etc,.),
like so:

  $ git diff
  diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
  index c88107360..b5277fb50 100644
  --- a/t/helper/test-online-cpus.c
  +++ b/t/helper/test-online-cpus.c
  @@ -1,3 +1,4 @@
  +#include "git-compat-util.h"
   #include "stdio.h"
   #include "thread-utils.h"
 
  $ 

Otherwise, this series fixes the test for me.

Thanks!

ATB,
Ramsay Jones




Re: [PATCH v5] http.postbuffer: allow full range of ssize_t values

2017-04-04 Thread Ramsay Jones


On 04/04/17 00:53, David Turner wrote:
> Unfortunately, in order to push some large repos, the http postbuffer
> must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> we just malloc a larger buffer.
> 
> This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
> buffer size.
> 
> Signed-off-by: David Turner 
> ---
> 
> V5 addresses comments from Torsten Boegershausen and Ramsay Jones.  Since
> I don't have a 32-bit machine handy, it's difficult for me to check
> for compiler warnings on 32-bit machines.  Hopefully my guess as
> to the solution to Ramsay's issue will be correct.

Yep, this compiles without complaint.

Thanks!

ATB,
Ramsay Jones


Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-04 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 4, 2017 at 5:07 PM, Jeff Hostetler  wrote:
>
> On 4/3/2017 5:16 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> Add a new manpage that gives an overview of how to tweak git's
>> performance.
>>
>> There's currently no good single resource for things a git site
>> administrator might want to look into to improve performance for his
>> site & his users. This unfinished documentation aims to be the first
>> thing someone might want to look at when investigating ways to improve
>> git performance.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>
>> I've been wanting to get something like this started for a while. It's
>> obviously woefully incomplete. Pointers about what to include would be
>> great & whether including something like this makes sense.
>>
>> Things I have on my TODO list:
>>
>>  - Add a section discussing how refs impact performance, suggest
>>e.g. archiving old tags if possible, or at least run "git remote
>>prune origin" regularly on clients.
>>
>>  - Discuss split index a bit, although I'm not very confident in
>>describing what its pros & cons are.
>>
>>  - Should we be covering good practices for your repo going forward to
>>maintain good performance? E.g. don't have some huge tree all in
>>one directory (use subdirs), don't add binary (rather
>>un-delta-able) content if you can help it etc.
>>
>> - The new core.checksumIndex option being discussed on-list. Which
>>   actually drove my to finally write this up (hrm, this sounds useful,
>>   but unless I was watching the list I'd probably never see it...).
>
>
> You might also consider core.preloadIndex.

It's been enabled by default since 2.1.0 (299e29870b), or do you mean
talk about disabling it, or "this is a perf option we have on by
default"?

I don't know the pros of disabling that, haven't used it myself & it's
not clear from the docs.

> For people with very large trees, talk about sparse-checkout.

*nod*

> And (on Windows) core.fscache.  Or leave a place for
> an addendum for Windows that we can fill in later.

I have no core.fscache in my git.git, did you mean something else?

>
>
>>
>>
>>  Documentation/Makefile   |   1 +
>>  Documentation/gitperformance.txt | 107
>> +++
>>  2 files changed, 108 insertions(+)
>>  create mode 100644 Documentation/gitperformance.txt
>>
>> diff --git a/Documentation/Makefile b/Documentation/Makefile
>> index b5be2e2d3f..528aa22354 100644
>> --- a/Documentation/Makefile
>> +++ b/Documentation/Makefile
>> @@ -23,6 +23,7 @@ MAN5_TXT += gitrepository-layout.txt
>>  MAN5_TXT += gitweb.conf.txt
>>
>>  MAN7_TXT += gitcli.txt
>> +MAN7_TXT += gitperformance.txt
>>  MAN7_TXT += gitcore-tutorial.txt
>>  MAN7_TXT += gitcredentials.txt
>>  MAN7_TXT += gitcvs-migration.txt
>> diff --git a/Documentation/gitperformance.txt
>> b/Documentation/gitperformance.txt
>> new file mode 100644
>> index 00..0548d1e721
>> --- /dev/null
>> +++ b/Documentation/gitperformance.txt
>> @@ -0,0 +1,107 @@
>> +giteveryday(7)
>> +==
>> +
>> +NAME
>> +
>> +gitperformance - How to improve Git's performance
>> +
>> +SYNOPSIS
>> +
>> +
>> +A guide to improving Git's performance beyond the defaults.
>> +
>> +DESCRIPTION
>> +---
>> +
>> +Git is mostly performant by default, but ships with various
>> +configuration options, command-line options, etc. that might improve
>> +performance, but for various reasons aren't on by default.
>> +
>> +This document provides a brief overview of these features.
>> +
>> +The reader should not assume that turning on all of these features
>> +will increase performance, depending on the repository, workload &
>> +use-case turning some of them on might severely harm performance.
>> +
>> +This document serves as a starting point for things to look into when
>> +it comes to improving performance, not as a checklist for things to
>> +enable or disable.
>> +
>> +Performance by topic
>> +
>> +
>> +It can be hard to divide the performance features into topics, but
>> +most of them fall into various well-defined buckets. E.g. there are
>> +features that help with the performance of "git status", and couldn't
>> +possibly impact repositories without working copies, and then some
>> +that only impact the performance of cloning from a server, or help the
>> +server itself etc.
>> +
>> +git status
>> +~~
>> +
>> +Running "git status" requires traversing the working tree & comparing
>> +it with the index. Several configuration options can help with its
>> +performance, with some trade-offs.
>> +
>> +- config: "core.untrackedCache=true" (see linkgit:git-config[1]) can
>> +  save on `stat(2)` calls by caching the mtime of filesystem
>> +  directories, and if they didn't change avoid recursing into that
>> +  directory to `stat(2)` every file in them.
>> ++
>> +pros: Can drastically speed up "git status".
>> ++
>> +cons: There's a speed 

Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-04 Thread Jeff Hostetler



On 4/3/2017 5:16 PM, Ævar Arnfjörð Bjarmason wrote:

Add a new manpage that gives an overview of how to tweak git's
performance.

There's currently no good single resource for things a git site
administrator might want to look into to improve performance for his
site & his users. This unfinished documentation aims to be the first
thing someone might want to look at when investigating ways to improve
git performance.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I've been wanting to get something like this started for a while. It's
obviously woefully incomplete. Pointers about what to include would be
great & whether including something like this makes sense.

Things I have on my TODO list:

 - Add a section discussing how refs impact performance, suggest
   e.g. archiving old tags if possible, or at least run "git remote
   prune origin" regularly on clients.

 - Discuss split index a bit, although I'm not very confident in
   describing what its pros & cons are.

 - Should we be covering good practices for your repo going forward to
   maintain good performance? E.g. don't have some huge tree all in
   one directory (use subdirs), don't add binary (rather
   un-delta-able) content if you can help it etc.

- The new core.checksumIndex option being discussed on-list. Which
  actually drove my to finally write this up (hrm, this sounds useful,
  but unless I was watching the list I'd probably never see it...).


You might also consider core.preloadIndex.

For people with very large trees, talk about sparse-checkout.

And (on Windows) core.fscache.  Or leave a place for
an addendum for Windows that we can fill in later.




 Documentation/Makefile   |   1 +
 Documentation/gitperformance.txt | 107 +++
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/gitperformance.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..528aa22354 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -23,6 +23,7 @@ MAN5_TXT += gitrepository-layout.txt
 MAN5_TXT += gitweb.conf.txt

 MAN7_TXT += gitcli.txt
+MAN7_TXT += gitperformance.txt
 MAN7_TXT += gitcore-tutorial.txt
 MAN7_TXT += gitcredentials.txt
 MAN7_TXT += gitcvs-migration.txt
diff --git a/Documentation/gitperformance.txt b/Documentation/gitperformance.txt
new file mode 100644
index 00..0548d1e721
--- /dev/null
+++ b/Documentation/gitperformance.txt
@@ -0,0 +1,107 @@
+giteveryday(7)
+==
+
+NAME
+
+gitperformance - How to improve Git's performance
+
+SYNOPSIS
+
+
+A guide to improving Git's performance beyond the defaults.
+
+DESCRIPTION
+---
+
+Git is mostly performant by default, but ships with various
+configuration options, command-line options, etc. that might improve
+performance, but for various reasons aren't on by default.
+
+This document provides a brief overview of these features.
+
+The reader should not assume that turning on all of these features
+will increase performance, depending on the repository, workload &
+use-case turning some of them on might severely harm performance.
+
+This document serves as a starting point for things to look into when
+it comes to improving performance, not as a checklist for things to
+enable or disable.
+
+Performance by topic
+
+
+It can be hard to divide the performance features into topics, but
+most of them fall into various well-defined buckets. E.g. there are
+features that help with the performance of "git status", and couldn't
+possibly impact repositories without working copies, and then some
+that only impact the performance of cloning from a server, or help the
+server itself etc.
+
+git status
+~~
+
+Running "git status" requires traversing the working tree & comparing
+it with the index. Several configuration options can help with its
+performance, with some trade-offs.
+
+- config: "core.untrackedCache=true" (see linkgit:git-config[1]) can
+  save on `stat(2)` calls by caching the mtime of filesystem
+  directories, and if they didn't change avoid recursing into that
+  directory to `stat(2)` every file in them.
++
+pros: Can drastically speed up "git status".
++
+cons: There's a speed hit for initially populating & maintaining the
+cache. Doesn't work on all filesystems (see `--test-untracked-cache`
+in linkgit:git-update-index[1]).
+
+- config: "status.showUntrackedFiles=no" (see
+  linkgit:git-config[1]). Skips looking for files in the working tree
+  git doesn't already know about.
++
+pros: Speeds up "git status" by making it do a lot less work.
++
+cons: If there's any new & untracked files anywhere in the working
+tree they won't be noticed by git. Makes it easy to accidentally miss
+files to "git add" before committing, or files which might impact the
+code in the working tree, but which git won't know exist.
+
+git grep
+
+
+- config: "grep.patternType=perl" (see linkgit:git-config[1]) will use
+  

Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 4, 2017 at 1:54 PM, Johannes Schindelin
 wrote:
> Hi,
>
> On Tue, 4 Apr 2017, Ævar Arnfjörð Bjarmason wrote:
>
>> I think it's completely fine to include your patch as-is. At some
>> point we need to pass the burden of dealing with these old software
>> versions, saying that you should use a <10 year old library isn't
>> unreasonable. Anyone packaging new git on RHEL5 or derivatives can
>> just package a newer libcurl as well.
>
> But how much maintenance burden is it, really? Is the continued use of
> those #ifdef's really worth this much discussion, let alone applying a
> patch that may break users who have so far been happy?
>
> It would be a different thing if we had to have hacks to support old cURL
> versions, where we need to ship entire >10kB source files that tap into
> internal data structures that may, or may not have changed. Such a hack, I
> would be happy to discuss when we could possibly remove it.
>
> But a couple of #ifdef's? C'mon, man, we can carry this *without sweat*
> indefinitely ;-)

I don't really care about applying this patch, but I wouldn't mind
seeing it applied.

I just wanted to clarify the counteractive point that it's not unusual
for some (particularly corporate) environments to be compiling fresh
upstream releases of some software against really ancient versions of
other upstream libraries.

But as Frank Gevaerts's reply (thanks!) which came after your reply
points out, this code has already been broken since v2.12.0, so it's
rarely used enough that nobody's reported being unable to compile git
2.12.0 on e.g. CentOS 5 >2 months since release.

I think this is a stronger argument for removing stuff like this. At
some point we're shipping code nobody's tested in combination with the
rest of our code. This can easily becomes a source of bugs as someone
e.g. compiling a new git on co5 becomes literally the first person to
ever test some new combination of codepaths we've added around mostly
unused ifdefs.


Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Frank Gevaerts
On Mon, Apr 03, 2017 at 10:54:38PM -0400, Jeff King wrote:
> A nearby thread raised the question of whether we can rely on a version
> of libcurl that contains a particular feature. The version in question
> is curl 7.11.1, which came out in March 2004.

I had a quick look at the 7.11.1 support, and I found that current git
actually doesn't build with anything older than curl 7.19.4. The issue
is aeae4db174, which moved some stuff to a helper function but didn't
copy the corresponding #ifdefs.

With that issue fixed, I could build with versions back to 7.12.2, which
is the oldest curl version I could get to build on my modern system.
Note that while I could build with those versions, I didn't actually
check if the result worked.

I'd say it's going to be increasingly likely with time that similar
issues will crop up for such old versions of dependencies.

Frank

diff --git a/http.c b/http.c
index 96d84bbed..8c782a086 100644
--- a/http.c
+++ b/http.c
@@ -674,6 +674,7 @@ void setup_curl_trace(CURL *handle)
curl_easy_setopt(handle, CURLOPT_DEBUGDATA, NULL);
 }
 
+#if LIBCURL_VERSION_NUM >= 0x071304
 static long get_curl_allowed_protocols(int from_user)
 {
long allowed_protocols = 0;
@@ -689,6 +690,7 @@ static long get_curl_allowed_protocols(int from_user)
 
return allowed_protocols;
 }
+#endif
 
 static CURL *get_curl_handle(void)
 {

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


Re: [PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-04 Thread Jeff Hostetler



On 4/4/2017 4:23 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Apr 4, 2017 at 4:29 AM, Jeff King  wrote:

On Mon, Apr 03, 2017 at 10:31:03PM +0200, Ævar Arnfjörð Bjarmason wrote:


On Mon, Apr 3, 2017 at 8:53 PM,   wrote:

Teach fsck to override core.checksumindex and always verify
the index checksum when reading the index.


Sorry to only chime in about this at v4.

I think this patch & the documentation you added for
core.checksumindex in 1/4 would be much less confusing if you made
this a on-by-default command-line option like e.g. "full".

With this patch nothing amends the documentation to indicate that the
core.checksumindex is magically overridden when fsck runs, I think
something like this (needs amending to integrate) on top would make
this clearer:


I think that is the wrong direction to reduce confusion. We don't need
more options to twiddle this flag, we need fewer. For instance, in your
proposed documentation:


@@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
namespace, and all reflogs
object pools.  This is now default; you can turn it off
with --no-full.

+--[no-]checksum-index:
+   Validate the checksum at the end of the index file, on by
+   default, locally overrides any "core.checksumIndex" setting
+   unless negated. See linkgit:git-config[1].


That tells us _what_ it does, but I'm left scratching my head with
"why".

I don't think there is any reason you would want fsck not to compute
that checksum (just like there is no option to ask fsck not to check
pack sha1 trailers).

I would go so far as to say that the config option itself is unnecessary
in this iteration of the series. I only asked for it so that we could
test the verification code paths (both for correctness and performance).
But if fsck can exercise the code path, we can check correctness that
way. And for performance, it's probably enough to test two separate
builds (which Jeff has already done).

Junio also asked for the usual "add a config, and then later we'll flip
the default" procedure. But IMHO that isn't necessary here. Nobody
should ever care about this flag. It was largely useless to check it on
every read in the first place. And if you suspect there's corruption in
your repository, you should run "git fsck".


The part that confused my & I found unintuitive is that there's a new
core.WHATEVER config that'll get silently overridden by a specific
command, git-fsck.

Nothing else I can think of in core.* works like this, i.e. it's a
namespace for "applies to all of git", core.editor, core.ignoreCase
etc.


My "force_core_checksum_index" global override was a bit of a hack.
I looked at having fsck explicitly set the "core.checksumindex" config
value (which would write to the disk in the code paths I followed)
before it loaded the index, but if we ever find an invalid checksum,
read_cache() would call die() and fsck would not have a chance to set
it back in the user's config.  So I introduced the global override.



Having git-fsck have a command-line option that's on by default as I
suggested is one way to get out of that confusion. It makes it a
special case of a CLI option overriding some config.


I looked at doing that, but thought it would be overkill since
no one is likely to care about turning it off -- or rather, fsck
should always do it whenever it reads the index.



But yeah, another way to resolve this is to get rid of the config
option altogether, or document in git-config.txt that
core.checksumIndex is obeyed by everything except git-fsck.



If there's no objections then, I'll just remove the config
setting and keep the force_ global for fsck.

Jeff



Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Johannes Schindelin
Hi,

On Tue, 4 Apr 2017, Ævar Arnfjörð Bjarmason wrote:

> I think it's completely fine to include your patch as-is. At some
> point we need to pass the burden of dealing with these old software
> versions, saying that you should use a <10 year old library isn't
> unreasonable. Anyone packaging new git on RHEL5 or derivatives can
> just package a newer libcurl as well.

But how much maintenance burden is it, really? Is the continued use of
those #ifdef's really worth this much discussion, let alone applying a
patch that may break users who have so far been happy?

It would be a different thing if we had to have hacks to support old cURL
versions, where we need to ship entire >10kB source files that tap into
internal data structures that may, or may not have changed. Such a hack, I
would be happy to discuss when we could possibly remove it.

But a couple of #ifdef's? C'mon, man, we can carry this *without sweat*
indefinitely ;-)

Ciao,
Dscho

Re: Path Too Long fix

2017-04-04 Thread frank1982
I have recently used a program called Long Path Tool & i am so happy with the
results that this particular program does. I have more than 50 GB of
duplicate files in my computer and i was going crazy with the hard disk
space that was being consumed and thanks to the "Long Path Tool" where i
don't need to search each & every file to delete. Long Path Tool did the job
for me within a very short period of time. So no need to google each problem
to remove duplicate & unwanted files, just grab the program called Long Path
Tool.



--
View this message in context: 
http://git.661346.n2.nabble.com/Path-Too-Long-fix-tp6156555p7657471.html
Sent from the git mailing list archive at Nabble.com.


Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 4, 2017 at 10:33 AM, Jeff King  wrote:
> On Tue, Apr 04, 2017 at 10:17:51AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Tue, Apr 4, 2017 at 4:54 AM, Jeff King  wrote:
>> > My feeling is that this is old enough to stop caring about. Which means
>> > we can drop some of the #ifdefs that clutter the HTTP code (and there's
>> > a patch at the end of this mail dropping support for everything older
>> > than 7.11.1). But that made wonder: how old is too old? I think it's
>> > nice that we don't force people to upgrade to the latest version of
>> > curl. But at some point, if you are running a 13-year-old version of
>> > libcurl, how likely are you to run a brand new version of Git? :)
>> >
>> > If we declared 7.16.0 as a cutoff, we could unconditionally define
>> > USE_CURL_MULTI, which gets rid of quite a few messy ifdefs.
>>
>> I don't object to this patch, but just as a general comment, in
>> enterprise-y environments using an old OS (e.g. CentOS 5/6) & then
>> compiling some selected packages like git based on OS libraries is
>> quite common.
>>
>> E.g. at work we're running git 2.12.0 compiled against CentOS 6
>> libraries, which has curl 7.20.1, released on
>> Apr 14 2010. Not so long ago we were still running CentOS 5 which
>> comes with 7.15.5 released in Aug 7 2006 which would break with your
>> patch.
>>
>> Whether we support that is another question, I think it's reasonable
>> to say that if you're compiling git on such an old system you also
>> need to compile a libcurl instead of using the OS version. I just
>> wanted to point out that this *does* happen, someone is going to be
>> compiling new git releases on CentOS 5 & will be hit by this.
>
> Thanks, that's a very useful bit of data. According to:
>
>   https://access.redhat.com/support/policy/updates/errata
>
> RHEL5 (which as I understand it basically the same as CentOS 5 in terms
> of packages).

Yeah CentOS is just s/RHEL/CentOS/g across the RHEL-provided source
packages. It's a way to use RHEL without paying for (or getting) RHEL
support.

> ended its 10-year support cycle just a few days ago. That
> would perhaps make 7.20.1 a reasonable cutoff.

FWIW these cut-offs don't have a lot to do with how long some
installations run things like RHEL5 or CO5 actually run. RHEL5's EOP3
phase ended days ago, but the End of Extended Life-cycle Support goes
until late 2020, and for anyone running the CentOS flavors these dates
matter very little, since they're about how long RedHat is supporting
it, and if you don't have RedHat support in the first place...

> I dunno. We could also just punt on the whole thing. It was nice to see
> a bunch of #ifdefs go away, but the code they're protecting is actually
> not that complicated. Most of them do not have an #else at all, and we
> just silently skip features on old versions.
>
> I think it might be nice to declare a "too old" version, though, just so
> we can stop adding _new_ ifdefs. Maybe 7.11.1 is that version now, and
> in another few years we can bump to 7.16.0. :)

I think it's completely fine to include your patch as-is. At some
point we need to pass the burden of dealing with these old software
versions, saying that you should use a <10 year old library isn't
unreasonable. Anyone packaging new git on RHEL5 or derivatives can
just package a newer libcurl as well.


[PATCH v4 5/5] refs: kill set_worktree_head_symref()

2017-04-04 Thread Nguyễn Thái Ngọc Duy
70999e9cec (branch -m: update all per-worktree HEADs - 2016-03-27)
added this function in order to update HEADs of all relevant
worktrees, when a branch is renamed.

It, as a public ref api, kind of breaks abstraction when it uses
internal functions of files backend. With the introduction of
refs_create_symref(), we can move back pretty close to the code before
70999e9cec, where create_symref() was used for updating HEAD.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 branch.c  | 12 ++--
 refs.h| 10 --
 refs/files-backend.c  | 44 ---
 t/t1407-worktree-ref-store.sh | 12 
 4 files changed, 18 insertions(+), 60 deletions(-)

diff --git a/branch.c b/branch.c
index 0b949b7fb2..69d5eea84b 100644
--- a/branch.c
+++ b/branch.c
@@ -353,18 +353,18 @@ int replace_each_worktree_head_symref(const char *oldref, 
const char *newref,
int i;
 
for (i = 0; worktrees[i]; i++) {
+   struct ref_store *refs;
+
if (worktrees[i]->is_detached)
continue;
if (worktrees[i]->head_ref &&
strcmp(oldref, worktrees[i]->head_ref))
continue;
 
-   if (set_worktree_head_symref(get_worktree_git_dir(worktrees[i]),
-newref, logmsg)) {
-   ret = -1;
-   error(_("HEAD of working tree %s is not updated"),
- worktrees[i]->path);
-   }
+   refs = get_worktree_ref_store(worktrees[i]);
+   if (refs_create_symref(refs, "HEAD", newref, logmsg))
+   ret = error(_("HEAD of working tree %s is not updated"),
+   worktrees[i]->path);
}
 
free_worktrees(worktrees);
diff --git a/refs.h b/refs.h
index 6df69a2adb..447381d378 100644
--- a/refs.h
+++ b/refs.h
@@ -402,16 +402,6 @@ int refs_create_symref(struct ref_store *refs, const char 
*refname,
   const char *target, const char *logmsg);
 int create_symref(const char *refname, const char *target, const char *logmsg);
 
-/*
- * Update HEAD of the specified gitdir.
- * Similar to create_symref("relative-git-dir/HEAD", target, NULL), but
- * this can update the main working tree's HEAD regardless of where
- * $GIT_DIR points to.
- * Return 0 if successful, non-zero otherwise.
- * */
-int set_worktree_head_symref(const char *gitdir, const char *target,
-const char *logmsg);
-
 enum action_on_err {
UPDATE_REFS_MSG_ON_ERR,
UPDATE_REFS_DIE_ON_ERR,
diff --git a/refs/files-backend.c b/refs/files-backend.c
index a5b405436f..e428d1abff 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3161,50 +3161,6 @@ static int files_create_symref(struct ref_store 
*ref_store,
return ret;
 }
 
-int set_worktree_head_symref(const char *gitdir, const char *target, const 
char *logmsg)
-{
-   /*
-* FIXME: this obviously will not work well for future refs
-* backends. This function needs to die.
-*/
-   struct files_ref_store *refs =
-   files_downcast(get_main_ref_store(),
-  REF_STORE_WRITE,
-  "set_head_symref");
-
-   static struct lock_file head_lock;
-   struct ref_lock *lock;
-   struct strbuf head_path = STRBUF_INIT;
-   const char *head_rel;
-   int ret;
-
-   strbuf_addf(_path, "%s/HEAD", absolute_path(gitdir));
-   if (hold_lock_file_for_update(_lock, head_path.buf,
- LOCK_NO_DEREF) < 0) {
-   struct strbuf err = STRBUF_INIT;
-   unable_to_lock_message(head_path.buf, errno, );
-   error("%s", err.buf);
-   strbuf_release();
-   strbuf_release(_path);
-   return -1;
-   }
-
-   /* head_rel will be "HEAD" for the main tree, "worktrees/wt/HEAD" for
-  linked trees */
-   head_rel = remove_leading_path(head_path.buf,
-  absolute_path(get_git_common_dir()));
-   /* to make use of create_symref_locked(), initialize ref_lock */
-   lock = xcalloc(1, sizeof(struct ref_lock));
-   lock->lk = _lock;
-   lock->ref_name = xstrdup(head_rel);
-
-   ret = create_symref_locked(refs, lock, head_rel, target, logmsg);
-
-   unlock_ref(lock); /* will free lock */
-   strbuf_release(_path);
-   return ret;
-}
-
 static int files_reflog_exists(struct ref_store *ref_store,
   const char *refname)
 {
diff --git a/t/t1407-worktree-ref-store.sh b/t/t1407-worktree-ref-store.sh
index 04d1e9d177..5df06f3556 100755
--- a/t/t1407-worktree-ref-store.sh
+++ b/t/t1407-worktree-ref-store.sh
@@ -37,4 +37,16 @@ test_expect_success 'resolve_ref()' '

[PATCH v4 0/5] Kill manual ref parsing code in worktree.c

2017-04-04 Thread Nguyễn Thái Ngọc Duy
v4 adds a new patch, 2/5, which makes the hashmap related functions in
refs.c generic, so I could add a new map for worktree ref stores and
not abuse submodule hashmap.

I mentioned about moving these hashmap back to submodule.c and
worktree.c where it can map more than just ref stores (in worktree
case, we can cache 'struct worktree', for example). But I'm not doing
it now to keep things small.

The commit message in 1/5 is rephrased a bit, hopefully clearer.

Nguyễn Thái Ngọc Duy (5):
  environment.c: fix potential segfault by get_git_common_dir()
  refs.c: make submodule ref store hashmap generic
  refs: introduce get_worktree_ref_store()
  worktree.c: kill parse_ref() in favor of refs_resolve_ref_unsafe()
  refs: kill set_worktree_head_symref()

 branch.c   |  15 ++---
 environment.c  |   2 +
 refs.c |  95 +++---
 refs.h |  12 +---
 refs/files-backend.c   |  44 --
 t/helper/test-ref-store.c  |  18 ++
 t/t1407-worktree-ref-store.sh (new +x) |  52 +
 worktree.c | 102 +
 worktree.h |   2 +-
 9 files changed, 173 insertions(+), 169 deletions(-)
 create mode 100755 t/t1407-worktree-ref-store.sh

-- 
2.11.0.157.gd943d85



[PATCH v4 4/5] worktree.c: kill parse_ref() in favor of refs_resolve_ref_unsafe()

2017-04-04 Thread Nguyễn Thái Ngọc Duy
The manual parsing code is replaced with a call to refs_resolve_ref_unsafe().
The manual parsing code must die because only refs/files-backend.c
should do that.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 branch.c   |   3 +-
 t/helper/test-ref-store.c  |  18 ++
 t/t1407-worktree-ref-store.sh (new +x) |  40 +
 worktree.c | 102 +
 worktree.h |   2 +-
 5 files changed, 88 insertions(+), 77 deletions(-)
 create mode 100755 t/t1407-worktree-ref-store.sh

diff --git a/branch.c b/branch.c
index 5c12036b02..0b949b7fb2 100644
--- a/branch.c
+++ b/branch.c
@@ -355,7 +355,8 @@ int replace_each_worktree_head_symref(const char *oldref, 
const char *newref,
for (i = 0; worktrees[i]; i++) {
if (worktrees[i]->is_detached)
continue;
-   if (strcmp(oldref, worktrees[i]->head_ref))
+   if (worktrees[i]->head_ref &&
+   strcmp(oldref, worktrees[i]->head_ref))
continue;
 
if (set_worktree_head_symref(get_worktree_git_dir(worktrees[i]),
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 2d84c45ffe..4a487c014e 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "refs.h"
+#include "worktree.h"
 
 static const char *notnull(const char *arg, const char *name)
 {
@@ -32,6 +33,23 @@ static const char **get_store(const char **argv, struct 
ref_store **refs)
strbuf_release();
 
*refs = get_submodule_ref_store(gitdir);
+   } else if (skip_prefix(argv[0], "worktree:", )) {
+   struct worktree **p, **worktrees = get_worktrees(0);
+
+   for (p = worktrees; *p; p++) {
+   struct worktree *wt = *p;
+
+   if (!wt->id) {
+   /* special case for main worktree */
+   if (!strcmp(gitdir, "main"))
+   break;
+   } else if (!strcmp(gitdir, wt->id))
+   break;
+   }
+   if (!*p)
+   die("no such worktree: %s", gitdir);
+
+   *refs = get_worktree_ref_store(*p);
} else
die("unknown backend %s", argv[0]);
 
diff --git a/t/t1407-worktree-ref-store.sh b/t/t1407-worktree-ref-store.sh
new file mode 100755
index 00..04d1e9d177
--- /dev/null
+++ b/t/t1407-worktree-ref-store.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description='test worktree ref store api'
+
+. ./test-lib.sh
+
+RWT="test-ref-store worktree:wt"
+RMAIN="test-ref-store worktree:main"
+
+test_expect_success 'setup' '
+   test_commit first &&
+   git worktree add -b wt-master wt &&
+   (
+   cd wt &&
+   test_commit second
+   )
+'
+
+test_expect_success 'resolve_ref()' '
+   SHA1=`git rev-parse master` &&
+   echo "$SHA1 refs/heads/master 0x0" >expected &&
+   $RWT resolve-ref refs/heads/master 0 >actual &&
+   test_cmp expected actual &&
+   $RMAIN resolve-ref refs/heads/master 0 >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'resolve_ref()' '
+   SHA1=`git -C wt rev-parse HEAD` &&
+   echo "$SHA1 refs/heads/wt-master 0x1" >expected &&
+   $RWT resolve-ref HEAD 0 >actual &&
+   test_cmp expected actual &&
+
+   SHA1=`git rev-parse HEAD` &&
+   echo "$SHA1 refs/heads/master 0x1" >expected &&
+   $RMAIN resolve-ref HEAD 0 >actual &&
+   test_cmp expected actual
+'
+
+test_done
diff --git a/worktree.c b/worktree.c
index fa7bc67a50..c1ec334b06 100644
--- a/worktree.c
+++ b/worktree.c
@@ -19,54 +19,25 @@ void free_worktrees(struct worktree **worktrees)
free (worktrees);
 }
 
-/*
- * read 'path_to_ref' into 'ref'.  Also if is_detached is not NULL,
- * set is_detached to 1 (0) if the ref is detached (is not detached).
- *
- * $GIT_COMMON_DIR/$symref (e.g. HEAD) is practically outside $GIT_DIR so
- * for linked worktrees, `resolve_ref_unsafe()` won't work (it uses
- * git_path). Parse the ref ourselves.
- *
- * return -1 if the ref is not a proper ref, 0 otherwise (success)
- */
-static int parse_ref(char *path_to_ref, struct strbuf *ref, int *is_detached)
-{
-   if (is_detached)
-   *is_detached = 0;
-   if (!strbuf_readlink(ref, path_to_ref, 0)) {
-   /* HEAD is symbolic link */
-   if (!starts_with(ref->buf, "refs/") ||
-   check_refname_format(ref->buf, 0))
-   return -1;
-   } else if (strbuf_read_file(ref, path_to_ref, 0) >= 0) {
-   /* textual symref or detached */
-   if (!starts_with(ref->buf, "ref:")) {
-   if (is_detached)
- 

[PATCH v4 1/5] environment.c: fix potential segfault by get_git_common_dir()

2017-04-04 Thread Nguyễn Thái Ngọc Duy
setup_git_env() must be called before this function to initialize
git_common_dir so that it returns a non NULL string. And it must return
a non NULL string or segfault can happen because all callers expect so.

It does not do so explicitly though and depends on get_git_dir() being
called first (which will guarantee setup_git_env()). Avoid this
dependency and call setup_git_env() by itself.

test-ref-store.c will hit this problem because it's very lightweight,
just enough initialization to exercise refs code, and get_git_dir() will
never be called until get_worktrees() is, which uses get_git_common_dir
and hits a segfault.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 environment.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/environment.c b/environment.c
index 42dc3106d2..2986ee7200 100644
--- a/environment.c
+++ b/environment.c
@@ -214,6 +214,8 @@ const char *get_git_dir(void)
 
 const char *get_git_common_dir(void)
 {
+   if (!git_dir)
+   setup_git_env();
return git_common_dir;
 }
 
-- 
2.11.0.157.gd943d85



[PATCH v4 3/5] refs: introduce get_worktree_ref_store()

2017-04-04 Thread Nguyễn Thái Ngọc Duy
files-backend at this point is still aware of the per-repo/worktree
separation in refs, so it can handle a linked worktree.

Some refs operations are known not working when current files-backend is
used in a linked worktree (e.g. reflog). Tests will be written when
refs_* functions start to be called with worktree backend to verify that
they work as expected.

Note: accessing a worktree of a submodule remains unaddressed. Perhaps
after get_worktrees() can access submodule (or rather a new function
get_submodule_worktrees(), that lists worktrees of a submodule), we can
update this function to work with submodules as well.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 refs.c | 31 +++
 refs.h |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/refs.c b/refs.c
index 875e30a0b8..a4083caf6a 100644
--- a/refs.c
+++ b/refs.c
@@ -10,6 +10,7 @@
 #include "object.h"
 #include "tag.h"
 #include "submodule.h"
+#include "worktree.h"
 
 /*
  * List of all available backends
@@ -1486,6 +1487,9 @@ static struct ref_store *main_ref_store;
 /* A hashmap of ref_stores, stored by submodule name: */
 static struct hashmap submodule_ref_stores;
 
+/* A hashmap of ref_stores, stored by worktree id: */
+static struct hashmap worktree_ref_stores;
+
 /*
  * Look up a ref store by name. If that ref_store hasn't been
  * registered yet, return NULL.
@@ -1590,6 +1594,33 @@ struct ref_store *get_submodule_ref_store(const char 
*submodule)
return refs;
 }
 
+struct ref_store *get_worktree_ref_store(const struct worktree *wt)
+{
+   struct ref_store *refs;
+   unsigned int refs_all_capabilities =
+   REF_STORE_READ | REF_STORE_WRITE |
+   REF_STORE_ODB | REF_STORE_MAIN;
+
+   if (wt->is_current)
+   return get_main_ref_store();
+
+   refs = lookup_ref_store_map(_ref_stores, wt->id);
+   if (refs)
+   return refs;
+
+   if (wt->id)
+   refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
+ refs_all_capabilities);
+   else
+   refs = ref_store_init(get_git_common_dir(),
+ refs_all_capabilities);
+
+   if (refs)
+   register_ref_store_map(_ref_stores, "worktree",
+  refs, wt->id);
+   return refs;
+}
+
 void base_ref_store_init(struct ref_store *refs,
 const struct ref_storage_be *be)
 {
diff --git a/refs.h b/refs.h
index 49e97d7d5f..6df69a2adb 100644
--- a/refs.h
+++ b/refs.h
@@ -5,6 +5,7 @@ struct object_id;
 struct ref_store;
 struct strbuf;
 struct string_list;
+struct worktree;
 
 /*
  * Resolve a reference, recursively following symbolic refererences.
@@ -655,5 +656,6 @@ struct ref_store *get_main_ref_store(void);
  * submodule==NULL.
  */
 struct ref_store *get_submodule_ref_store(const char *submodule);
+struct ref_store *get_worktree_ref_store(const struct worktree *wt);
 
 #endif /* REFS_H */
-- 
2.11.0.157.gd943d85



[PATCH v4 2/5] refs.c: make submodule ref store hashmap generic

2017-04-04 Thread Nguyễn Thái Ngọc Duy
This removes the "submodule" from submodule_hash_entry and other
function names. The goal is to reuse the same code and data structure
for other ref store types. The first one is worktree ref stores.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 refs.c | 64 
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/refs.c b/refs.c
index ec1f563824..875e30a0b8 100644
--- a/refs.c
+++ b/refs.c
@@ -1450,32 +1450,32 @@ int resolve_gitlink_ref(const char *submodule, const 
char *refname,
return 0;
 }
 
-struct submodule_hash_entry
+struct ref_store_hash_entry
 {
struct hashmap_entry ent; /* must be the first member! */
 
struct ref_store *refs;
 
-   /* NUL-terminated name of submodule: */
-   char submodule[FLEX_ARRAY];
+   /* NUL-terminated identifier of the ref store: */
+   char name[FLEX_ARRAY];
 };
 
-static int submodule_hash_cmp(const void *entry, const void *entry_or_key,
+static int ref_store_hash_cmp(const void *entry, const void *entry_or_key,
  const void *keydata)
 {
-   const struct submodule_hash_entry *e1 = entry, *e2 = entry_or_key;
-   const char *submodule = keydata ? keydata : e2->submodule;
+   const struct ref_store_hash_entry *e1 = entry, *e2 = entry_or_key;
+   const char *name = keydata ? keydata : e2->name;
 
-   return strcmp(e1->submodule, submodule);
+   return strcmp(e1->name, name);
 }
 
-static struct submodule_hash_entry *alloc_submodule_hash_entry(
-   const char *submodule, struct ref_store *refs)
+static struct ref_store_hash_entry *alloc_ref_store_hash_entry(
+   const char *name, struct ref_store *refs)
 {
-   struct submodule_hash_entry *entry;
+   struct ref_store_hash_entry *entry;
 
-   FLEX_ALLOC_STR(entry, submodule, submodule);
-   hashmap_entry_init(entry, strhash(submodule));
+   FLEX_ALLOC_STR(entry, name, name);
+   hashmap_entry_init(entry, strhash(name));
entry->refs = refs;
return entry;
 }
@@ -1487,19 +1487,19 @@ static struct ref_store *main_ref_store;
 static struct hashmap submodule_ref_stores;
 
 /*
- * Return the ref_store instance for the specified submodule. If that
- * ref_store hasn't been initialized yet, return NULL.
+ * Look up a ref store by name. If that ref_store hasn't been
+ * registered yet, return NULL.
  */
-static struct ref_store *lookup_submodule_ref_store(const char *submodule)
+static struct ref_store *lookup_ref_store_map(struct hashmap *map,
+ const char *name)
 {
-   struct submodule_hash_entry *entry;
+   struct ref_store_hash_entry *entry;
 
-   if (!submodule_ref_stores.tablesize)
+   if (!map->tablesize)
/* It's initialized on demand in register_ref_store(). */
return NULL;
 
-   entry = hashmap_get_from_hash(_ref_stores,
- strhash(submodule), submodule);
+   entry = hashmap_get_from_hash(map, strhash(name), name);
return entry ? entry->refs : NULL;
 }
 
@@ -1535,20 +1535,19 @@ struct ref_store *get_main_ref_store(void)
 }
 
 /*
- * Register the specified ref_store to be the one that should be used
- * for submodule. It is a fatal error to call this function twice for
- * the same submodule.
+ * Associate a ref store with a name. It is a fatal error to call this
+ * function twice for the same name.
  */
-static void register_submodule_ref_store(struct ref_store *refs,
-const char *submodule)
+static void register_ref_store_map(struct hashmap *map,
+  const char *type,
+  struct ref_store *refs,
+  const char *name)
 {
-   if (!submodule_ref_stores.tablesize)
-   hashmap_init(_ref_stores, submodule_hash_cmp, 0);
+   if (!map->tablesize)
+   hashmap_init(map, ref_store_hash_cmp, 0);
 
-   if (hashmap_put(_ref_stores,
-   alloc_submodule_hash_entry(submodule, refs)))
-   die("BUG: ref_store for submodule '%s' initialized twice",
-   submodule);
+   if (hashmap_put(map, alloc_ref_store_hash_entry(name, refs)))
+   die("BUG: %s ref_store '%s' initialized twice", type, name);
 }
 
 struct ref_store *get_submodule_ref_store(const char *submodule)
@@ -1565,7 +1564,7 @@ struct ref_store *get_submodule_ref_store(const char 
*submodule)
return get_main_ref_store();
}
 
-   refs = lookup_submodule_ref_store(submodule);
+   refs = lookup_ref_store_map(_ref_stores, submodule);
if (refs)
return refs;
 
@@ -1584,7 +1583,8 @@ struct ref_store *get_submodule_ref_store(const char 
*submodule)
/* pretend that add_submodule_odb() has been called */
refs = 

Re: [PATCH 2/2] config: resolve symlinks in conditional include's patterns

2017-04-04 Thread Duy Nguyen
On Fri, Mar 31, 2017 at 1:38 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> $GIT_DIR returned by get_git_dir() is normalized, with all symlinks
>> resolved (see setup_work_tree function). In order to match paths (or
>> patterns) against $GIT_DIR char-by-char, they have to be normalized
>> too. There is a note in config.txt about this, that the user need to
>> resolve symlinks by themselves if needed.
>>
>> The problem is, we allow certain path expansion, '~/' and './', for
>> convenience and can't ask the user to resolve symlinks in these
>> expansions. Make sure the expanded paths have all symlinks resolved.
>
> That sounds sensible but I fail to see why 1/2 is the right approach
> to do this, and I must be missing something.  Wouldn't you get the
> same result if you run realpath() yourself on expanded, after
> receiving the return value of expand_user_path() in it?

Because at that point I don't know what part is $HOME (i.e. valid path
that real_path can walk through), what part is random wildcards from
the pattern. Note that in this case we pass a wildmatch pattern to
expand_user_path(), like ~/[ab]foo/*bar*/**. After expansion it
becomes /home/pclouds/[ab]foo/*bar*/**. It does not feel right to let
real_path() walk the "[ab]foo..." part. In the tests, I hit
die("Invalid path") in strbuf_realpath(). Even if I set die_on_error()
to avoid that, strbuf_realpath() will not return the resolved path

> Can you add a test to demonstrate the issue (which would need to be
> protected with SYMLINKS prereq)?

Will do. It may look a bit ugly though because I have to force
setup_git_directory() to call real_path() because it doesn't always do
that.
-- 
Duy


Re: How do you script linux GIT client to pass kerberos credential to apache enabled GIT server?

2017-04-04 Thread Mantas Mikulėnas
On 2017-04-03 19:04, ken edward wrote:
> Hello,
> 
> I have my git repositories behind an apache server configured with
> kerberos. Works fine if the user is logged in on their workstation.
> Apache gets the kerberos credential, and validates, and  then sends
> the GIT repo being requested.
> 
> BUT, I want to write a script on linux that will also pass the
> kerberos credential to the apache GIT server without having any
> manually intervention. Seems I would create a kerberos keytab for the
> principal and then use that to authenticate kinit supports
> authenticating from a keytab using the -k -t  options,

kinit works, but I think kstart [1] is commonly used for this as well;
takes care of automatic ticket renewal.

ktutil should be able to create a keytab based on your password, but
I've had mixed luck with that. Though still probably easier than
creating a separate instance just for batch tasks...

[1]: https://www.eyrie.org/~eagle/software/kstart/

-- 
Mantas Mikulėnas 


Re: [PATCH] pathspec: always honor `PATHSPEC_PREFIX_ORIGIN` flag

2017-04-04 Thread Patrick Steinhardt
On Mon, Apr 03, 2017 at 09:26:48AM -0700, Brandon Williams wrote:
> On 04/03, Patrick Steinhardt wrote:
> > Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
> > pathspec elements, 2017-01-04), we were always using the computed
> > `match` variable to perform pathspec matching whenever
> > `PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
> > the parsed pathspecs to other commands, as the computed `match` may
> > contain a pathspec relative to the repository root. The commit changed
> > this logic to only do so when we do have an actual prefix and when
> > literal pathspecs are deactivated.
> > 
> > But this change may actually break some commands which expect passed
> > pathspecs to be relative to the repository root. One such case is `git
> > add --patch`, which now fails when using relative paths from a
> > subdirectory. For example if executing "git add -p ../foo.c" in a
> > subdirectory, the `git-add--interactive` command will directly pass
> > "../foo.c" to `git-ls-files`. As ls-files is executed at the
> > repository's root, the command will notice that "../foo.c" is outside
> > the repository and fail.
> > 
> > Fix the issue by again using the computed `match` variable whenever
> > `PATHSPEC_PREFIX_ORIGIN` is set. This restores behavior previous to
> > 5d8f084a5 and fixes interactive add.
> > 
> > Signed-off-by: Patrick Steinhardt 
> > ---
> >  pathspec.c |  6 +++---
> >  t/t3701-add-interactive.sh | 20 
> >  2 files changed, 23 insertions(+), 3 deletions(-)
> > 
> > diff --git a/pathspec.c b/pathspec.c
> > index 303efda83..3193e45a6 100644
> > --- a/pathspec.c
> > +++ b/pathspec.c
> > @@ -504,12 +504,12 @@ static void init_pathspec_item(struct pathspec_item 
> > *item, unsigned flags,
> >  * Prefix the pathspec (keep all magic) and assign to
> >  * original. Useful for passing to another command.
> >  */
> > -   if ((flags & PATHSPEC_PREFIX_ORIGIN) &&
> > -   prefixlen && !get_literal_global()) {
> > +   if (flags & PATHSPEC_PREFIX_ORIGIN) {
> > struct strbuf sb = STRBUF_INIT;
> >  
> > /* Preserve the actual prefix length of each pattern */
> > -   prefix_magic(, prefixlen, element_magic);
> > +   if (prefixlen && !get_literal_global())
> > +   prefix_magic(, prefixlen, element_magic);
> >  
> > strbuf_addstr(, match);
> > item->original = strbuf_detach(, NULL);
> 
> Would it just make sense to drop the requirement that prefixlen be
> non-zero?  My problem with this change currently is the ability to get
> an original string with is empty (ie "\0") which would cause git to
> throw some warnings about not allowing empty strings as pathspecs if
> they were then passed on to other processes.

You're right. My patch results in:

$ git init repo
$ cd repo
$ touch file
$ git add file
$ git commit -mfile
$ echo foo>file
$ mkdir subdir
$ cd subdir
$ git add -p ..
warning: empty strings as pathspecs will be made invalid...
warning: empty strings as pathspecs will be made invalid...

Dropping only the `prefixlen` condition and then unconditionally
calling `prefix_magic` suffices to fix the actual bug. I've
improved the test to use a pathspec which resolves to "" and
check stderr.

Thanks for catching this!

Regards
Patrick


signature.asc
Description: PGP signature


[PATCH v2] pathspec: honor `PATHSPEC_PREFIX_ORIGIN` with empty prefix

2017-04-04 Thread Patrick Steinhardt
Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
pathspec elements, 2017-01-04), we were always using the computed
`match` variable to perform pathspec matching whenever
`PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
the parsed pathspecs to other commands, as the computed `match` may
contain a pathspec relative to the repository root. The commit changed
this logic to only do so when we do have an actual prefix and when
literal pathspecs are deactivated.

But this change may actually break some commands which expect passed
pathspecs to be relative to the repository root. One such case is `git
add --patch`, which now fails when using relative paths from a
subdirectory. For example if executing "git add -p ../foo.c" in a
subdirectory, the `git-add--interactive` command will directly pass
"../foo.c" to `git-ls-files`. As ls-files is executed at the
repository's root, the command will notice that "../foo.c" is outside
the repository and fail.

Fix the issue by again using the computed `match` variable when
`PATHSPEC_PREFIX_ORIGIN` is set and global literal pathspecs are
deactivated. Note that in contrast to previous behavior, we will now
always call `prefix_magic` regardless of whether a prefix is actually
set. But this is the right thing to do: when the `match` variable has
been resolved to the repository's root, it will be set to an empty
string. When passing the empty string directly to other commands, it
will result in a warning regarding deprecated empty pathspecs. By always
adding the prefix magic, we will end up with at least the string
":(prefix:0)" and thus avoid the warning.

Signed-off-by: Patrick Steinhardt 
---

This is the second version of [1]. It fixes a bug catched by
Brandon when the pathspec is resolved to the empty string and
improves the test a bit to actually catch this issue.

[1]: 
http://public-inbox.org/git/1556910880cfce391bdca2d8f0cbcb8c71371691.1491206540.git...@pks.im/T/#u


 pathspec.c |  2 +-
 t/t3701-add-interactive.sh | 22 ++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index 303efda83..3079a817a 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -505,7 +505,7 @@ static void init_pathspec_item(struct pathspec_item *item, 
unsigned flags,
 * original. Useful for passing to another command.
 */
if ((flags & PATHSPEC_PREFIX_ORIGIN) &&
-   prefixlen && !get_literal_global()) {
+   !get_literal_global()) {
struct strbuf sb = STRBUF_INIT;
 
/* Preserve the actual prefix length of each pattern */
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index f9528fa00..2ecb43a61 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -436,6 +436,28 @@ test_expect_success 'add -p handles globs' '
test_cmp expect actual
 '
 
+test_expect_success 'add -p handles relative paths' '
+   git reset --hard &&
+
+   echo base >relpath.c &&
+   git add "*.c" &&
+   git commit -m relpath &&
+
+   echo change >relpath.c &&
+   mkdir -p subdir &&
+   git -C subdir add -p .. 2>error <<-\EOF &&
+   y
+   EOF
+
+   test_must_be_empty error &&
+
+   cat >expect <<-\EOF &&
+   relpath.c
+   EOF
+   git diff --cached --name-only >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'add -p does not expand argument lists' '
git reset --hard &&
 
-- 
2.12.2



Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Jeff King
On Tue, Apr 04, 2017 at 10:17:51AM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Apr 4, 2017 at 4:54 AM, Jeff King  wrote:
> > My feeling is that this is old enough to stop caring about. Which means
> > we can drop some of the #ifdefs that clutter the HTTP code (and there's
> > a patch at the end of this mail dropping support for everything older
> > than 7.11.1). But that made wonder: how old is too old? I think it's
> > nice that we don't force people to upgrade to the latest version of
> > curl. But at some point, if you are running a 13-year-old version of
> > libcurl, how likely are you to run a brand new version of Git? :)
> >
> > If we declared 7.16.0 as a cutoff, we could unconditionally define
> > USE_CURL_MULTI, which gets rid of quite a few messy ifdefs.
> 
> I don't object to this patch, but just as a general comment, in
> enterprise-y environments using an old OS (e.g. CentOS 5/6) & then
> compiling some selected packages like git based on OS libraries is
> quite common.
> 
> E.g. at work we're running git 2.12.0 compiled against CentOS 6
> libraries, which has curl 7.20.1, released on
> Apr 14 2010. Not so long ago we were still running CentOS 5 which
> comes with 7.15.5 released in Aug 7 2006 which would break with your
> patch.
> 
> Whether we support that is another question, I think it's reasonable
> to say that if you're compiling git on such an old system you also
> need to compile a libcurl instead of using the OS version. I just
> wanted to point out that this *does* happen, someone is going to be
> compiling new git releases on CentOS 5 & will be hit by this.

Thanks, that's a very useful bit of data. According to:

  https://access.redhat.com/support/policy/updates/errata

RHEL5 (which as I understand it basically the same as CentOS 5 in terms
of packages) ended its 10-year support cycle just a few days ago. That
would perhaps make 7.20.1 a reasonable cutoff.

I dunno. We could also just punt on the whole thing. It was nice to see
a bunch of #ifdefs go away, but the code they're protecting is actually
not that complicated. Most of them do not have an #else at all, and we
just silently skip features on old versions.

I think it might be nice to declare a "too old" version, though, just so
we can stop adding _new_ ifdefs. Maybe 7.11.1 is that version now, and
in another few years we can bump to 7.16.0. :)

-Peff


Re: [PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-04 Thread Jeff King
On Tue, Apr 04, 2017 at 10:23:52AM +0200, Ævar Arnfjörð Bjarmason wrote:

> The part that confused my & I found unintuitive is that there's a new
> core.WHATEVER config that'll get silently overridden by a specific
> command, git-fsck.
> 
> Nothing else I can think of in core.* works like this, i.e. it's a
> namespace for "applies to all of git", core.editor, core.ignoreCase
> etc.
> 
> Having git-fsck have a command-line option that's on by default as I
> suggested is one way to get out of that confusion. It makes it a
> special case of a CLI option overriding some config.

Yeah, I do agree your suggestion makes it slightly less confusing.

I mostly just think we can avoid the situation altogether, which IMHO is
preferable.

-Peff


Re: [PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-04 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 4, 2017 at 4:29 AM, Jeff King  wrote:
> On Mon, Apr 03, 2017 at 10:31:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Apr 3, 2017 at 8:53 PM,   wrote:
>> > Teach fsck to override core.checksumindex and always verify
>> > the index checksum when reading the index.
>>
>> Sorry to only chime in about this at v4.
>>
>> I think this patch & the documentation you added for
>> core.checksumindex in 1/4 would be much less confusing if you made
>> this a on-by-default command-line option like e.g. "full".
>>
>> With this patch nothing amends the documentation to indicate that the
>> core.checksumindex is magically overridden when fsck runs, I think
>> something like this (needs amending to integrate) on top would make
>> this clearer:
>
> I think that is the wrong direction to reduce confusion. We don't need
> more options to twiddle this flag, we need fewer. For instance, in your
> proposed documentation:
>
>> @@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
>> namespace, and all reflogs
>> object pools.  This is now default; you can turn it off
>> with --no-full.
>>
>> +--[no-]checksum-index:
>> +   Validate the checksum at the end of the index file, on by
>> +   default, locally overrides any "core.checksumIndex" setting
>> +   unless negated. See linkgit:git-config[1].
>
> That tells us _what_ it does, but I'm left scratching my head with
> "why".
>
> I don't think there is any reason you would want fsck not to compute
> that checksum (just like there is no option to ask fsck not to check
> pack sha1 trailers).
>
> I would go so far as to say that the config option itself is unnecessary
> in this iteration of the series. I only asked for it so that we could
> test the verification code paths (both for correctness and performance).
> But if fsck can exercise the code path, we can check correctness that
> way. And for performance, it's probably enough to test two separate
> builds (which Jeff has already done).
>
> Junio also asked for the usual "add a config, and then later we'll flip
> the default" procedure. But IMHO that isn't necessary here. Nobody
> should ever care about this flag. It was largely useless to check it on
> every read in the first place. And if you suspect there's corruption in
> your repository, you should run "git fsck".

The part that confused my & I found unintuitive is that there's a new
core.WHATEVER config that'll get silently overridden by a specific
command, git-fsck.

Nothing else I can think of in core.* works like this, i.e. it's a
namespace for "applies to all of git", core.editor, core.ignoreCase
etc.

Having git-fsck have a command-line option that's on by default as I
suggested is one way to get out of that confusion. It makes it a
special case of a CLI option overriding some config.

But yeah, another way to resolve this is to get rid of the config
option altogether, or document in git-config.txt that
core.checksumIndex is obeyed by everything except git-fsck.


Re: [RFC] dropping support for ancient versions of curl

2017-04-04 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 4, 2017 at 4:54 AM, Jeff King  wrote:
> My feeling is that this is old enough to stop caring about. Which means
> we can drop some of the #ifdefs that clutter the HTTP code (and there's
> a patch at the end of this mail dropping support for everything older
> than 7.11.1). But that made wonder: how old is too old? I think it's
> nice that we don't force people to upgrade to the latest version of
> curl. But at some point, if you are running a 13-year-old version of
> libcurl, how likely are you to run a brand new version of Git? :)
>
> If we declared 7.16.0 as a cutoff, we could unconditionally define
> USE_CURL_MULTI, which gets rid of quite a few messy ifdefs.

I don't object to this patch, but just as a general comment, in
enterprise-y environments using an old OS (e.g. CentOS 5/6) & then
compiling some selected packages like git based on OS libraries is
quite common.

E.g. at work we're running git 2.12.0 compiled against CentOS 6
libraries, which has curl 7.20.1, released on
Apr 14 2010. Not so long ago we were still running CentOS 5 which
comes with 7.15.5 released in Aug 7 2006 which would break with your
patch.

Whether we support that is another question, I think it's reasonable
to say that if you're compiling git on such an old system you also
need to compile a libcurl instead of using the OS version. I just
wanted to point out that this *does* happen, someone is going to be
compiling new git releases on CentOS 5 & will be hit by this.


Re: Bug in "git am" when the body starts with spaces

2017-04-04 Thread Jeff King
On Mon, Apr 03, 2017 at 10:42:46AM -0700, Jonathan Tan wrote:

> On 04/01/2017 09:18 PM, Jeff King wrote:
> > On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote:
> > > The logic is fairly simple: if we encounter an empty line, and we have
> > > pending in-body headers, we flush the pending headers, and mark us as
> > > no longer in header mode.
> > 
> > Hmm. I think this may work. At first I thought it was too strict in
> > always checking inbody_header_accum.len, because we want this to kick in
> > always, whether there's whitespace continuation or not. But that
> > accumulator has to collect preemptively, before it knows if there's
> > continuation. So it will always be non-empty if we've seen _any_ header,
> > and it will remain non-empty as long as we keep parsing (because any
> > time we flush, we do so in order to handle another line).
> > 
> > IOW, I think this implements the state-machine thing I wrote in my
> > earlier email, because the state "are we inside in-body header parsing"
> > is always reflected by having a non-empty accumulator. It is a bit
> > non-obvious though.
> 
> About obviousness, I think of a non-empty accumulator merely representing
> that the next line could potentially be a continuation line. And it is
> coincidence that this implies "are we inside in-body header parsing"; if not
> all in-body header lines could be "continued", there would be no such
> implication.
> 
> mi->inbody_header_accum.len is already used in check_inbody_header to mean
> "could the next line potentially be a continuation line" and to trigger a
> check for a negative criterion (in this case, a scissors line). I think it's
> fine to do the same thing, the negative criterion here being a blank line.

FWIW, I looked into making a single "state" variable, but it actually
got ugly pretty quickly. I think not because it's not a cleaner method
in the long run, but just because the existing code is so dependent on
individual state variables that needed changing. So I'm OK with Linus's
fix; it certainly follows the existing code patterns.

-Peff


Re: [PATCH] Fix 'git am' in-body header continuations

2017-04-04 Thread Jeff King
On Mon, Apr 03, 2017 at 11:00:09AM -0700, Jonathan Tan wrote:

> > diff --git a/t/t4150-am.sh b/t/t4150-am.sh
> > index 89a5bacac..44807e218 100755
> > --- a/t/t4150-am.sh
> > +++ b/t/t4150-am.sh
> > @@ -983,7 +983,9 @@ test_expect_success 'am works with multi-line in-body 
> > headers' '
> > rm -fr .git/rebase-apply &&
> > git checkout -f first &&
> > echo one >> file &&
> > -   git commit -am "$LONG" --author="$LONG " &&
> > +   git commit -am "$LONG
> > +
> > +Body test" --author="$LONG " &&
> 
> Instead of "Body test", I would write something more descriptive like "Not a
> continuation line because of blank line above", but I'm fine with either.

Yeah. I also wonder if we can make the indentation more obvious. I
thought at first that the patch was whitespace mangled. :-/

Maybe:

  SP=" " &&
  cat >msg <<-EOF &&
  $LONG

  $SP This line is indented but not a header continuation.
  EOF
  git commit -F msg ...

or something.

It might also be easier to understand what's going on if this gets its
own test. This is really just testing mailinfo. I wonder if it would
make more sense in t5100, where we would not have to deal with all the
commit/format-patch cruft.

-Peff


Re: [PATCH v7 4/5] dir_iterator: refactor state machine model

2017-04-04 Thread Michael Haggerty
Signed-off-by: Michael Haggerty 
---
Daniel,

> Although it does make tests harder to understand, if we were to
> specify how to iterate with human-readable flags we'd add the getopt()
> + flag configuration overhead to this helper program to be able to
> handle all cases properly. Additionally, new flags added to
> dir_iterator would have to edit the test program as well, generating
> extra work.

I think you're exaggerating the amount of extra work. I think all you
need to do is squash the attached patch into your patch 4/5, for the
gain of only 14 lines of simple code, four of which could be deleted
if you don't care about supporting "--". Supporting hypothetical
future new options would cost exactly two more lines for each option.
Plus, this also fixes the handling of more than two args.

It might be even shorter if you use `parse_options()`, but that seems
like overkill here.

On the plus side, anybody can now change the `DIR_ITERATOR_*`
constants without breaking things, or grep for them to find all of the
places that they are used. Plus the test code is more readable.

In my opinion that is a win.

Michael

 t/helper/test-dir-iterator.c | 28 +---
 t/t0065-dir-iterator.sh  | 10 +-
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a1b8c78434..c2eb2ca1f9 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,18 +4,32 @@
 #include "dir-iterator.h"
 
 int cmd_main(int argc, const char **argv) {
+   const char **myargv = argv;
+   int myargc = argc;
struct strbuf path = STRBUF_INIT;
struct dir_iterator *diter;
-   unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
-
-   if (argc < 2) {
-   return 1;
+   unsigned flag = 0;
+
+   while (--myargc && starts_with(*++myargv, "--")) {
+   if (!strcmp(*myargv, "--pre-order")) {
+   flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
+   } else if (!strcmp(*myargv, "--post-order")) {
+   flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
+   } else if (!strcmp(*myargv, "--list-root-dir")) {
+   flag |= DIR_ITERATOR_LIST_ROOT_DIR;
+   } else if (!strcmp(*myargv, "--")) {
+   myargc--;
+   myargv++;
+   break;
+   } else {
+   die("Unrecognized option: %s", *myargv);
+   }
}
 
-   strbuf_add(, argv[1], strlen(argv[1]));
+   if (myargc != 1)
+   die("expected exactly one non-option argument");
 
-   if (argc == 3)
-   flag = atoi(argv[2]);
+   strbuf_addstr(, *myargv);
 
diter = dir_iterator_begin(path.buf, flag);
 
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index ade3ee0e7e..4819e6181d 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -28,7 +28,7 @@ test_expect_success 'dir-iterator should iterate through all 
files' '
>dir/a/e &&
>dir/d/e/d/a &&
 
-   test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output &&
+   test-dir-iterator --pre-order ./dir | sort 
>./actual-pre-order-sorted-output &&
rm -rf dir &&
 
test_cmp expect-sorted-output actual-pre-order-sorted-output
@@ -44,7 +44,7 @@ test_expect_success 'dir-iterator should iterate through all 
files on post-order
>dir/a/e &&
>dir/d/e/d/a &&
 
-   test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output &&
+   test-dir-iterator --post-order ./dir | sort 
>actual-post-order-sorted-output &&
rm -rf dir &&
 
test_cmp expect-sorted-output actual-post-order-sorted-output
@@ -61,7 +61,7 @@ test_expect_success 'dir-iterator should list files properly 
on pre-order mode'
mkdir -p dir/a/b/c/ &&
>dir/a/b/c/d &&
 
-   test-dir-iterator ./dir 1 >actual-pre-order-output &&
+   test-dir-iterator --pre-order ./dir >actual-pre-order-output &&
rm -rf dir &&
 
test_cmp expect-pre-order-output actual-pre-order-output
@@ -78,7 +78,7 @@ test_expect_success 'dir-iterator should list files properly 
on post-order mode'
mkdir -p dir/a/b/c/ &&
>dir/a/b/c/d &&
 
-   test-dir-iterator ./dir 2 >actual-post-order-output &&
+   test-dir-iterator --post-order ./dir >actual-post-order-output &&
rm -rf dir &&
 
test_cmp expect-post-order-output actual-post-order-output
@@ -100,7 +100,7 @@ test_expect_success 'dir-iterator should list files 
properly on pre-order + post
mkdir -p dir/a/b/c/ &&
>dir/a/b/c/d &&
 
-   test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output 
&&
+   test-dir-iterator --pre-order --post-order --list-root-dir ./dir 
>actual-pre-order-post-order-root-dir-output &&
rm -rf dir &&
 
test_cmp 

Antw: Re: Terrible bad performance for it blame --date=iso -C

2017-04-04 Thread Ulrich Windl
>>> Jakub Narebski  schrieb am 03.04.2017 um 17:16 in
Nachricht
<0ccc5cab-26b7-4b02-b964-452b61e92...@gmail.com>:
> W dniu 03.04.2017 o 12:56, SZEDER Gábor pisze:
>> Ulrich Windl wrote:
> 
>>> In the other case (for the user bored of waiting seeking for some
>>> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
>>> very least: If git is expecting that some operation will take (or
>>> already did take) a lot of time, give some message explaining why it
>>> is taking a lot of time, and maybe how to avoid that.
>> 
>> It already does so by default since v2.8.0, see aba37f495 (blame: add
>> support for --[no-]progress option, 2015-12-12).
>> 
>>   $ time git blame sha1_file.c |wc -l
>>   4026
>>   
>>   real0m1.744s
>>   user0m1.672s
>>   sys 0m0.068s
>>   $ time git blame -C -C sha1_file.c |wc -l
>>   Blaming lines: 100% (4026/4026), done.
>>   4026
>>   
>>   real0m3.832s
>>   user0m3.716s
>>   sys 0m0.112s
>> 
>> However, after a short peek at that commit, it only displays progress
>> by default when stderr is a terminal, which might not be the case when
>> invoked from emacs.
> 
> Emacs (magit?) should use `git blame --porcelain`, and do its own
> progress report, just like 'git gui blame' and incremental blame mode
> of gitweb.

I was thinking similar: The pain vc-annotate obviously should work without
those "-C" options, and with prefix argument (C-u  in Emacs) it could
start looking for copied stuff. HMO...

Worse than no progress reporting is the inability to kill the process (if you
run out of patience) with C-g (that stops most commands in Emacs).

> 
> Actually... there already is git-blamed - Minor mode for incremental
> blame for Git, and mo-git-blame - An interactive, iterative 'git blame'
> mode for Emacs, both available on ELPA (Emacs Lisp Package Archive).

I confess taht is still use RCS from time to time, and I prefer the
higher-level Emacs commands ;-)

Regards,
Ulrich